fix(scrape): capture screenshot outside the nav-budget race#164
Merged
Conversation
A full-page screenshot of a heavy/tall page was being silently dropped:
Page.captureScreenshot ran inside the post_navigate_phase nav-budget race, so
on a slow page the budget elapsed mid-capture, the in-flight CDP request died
("WS closed"), and the response came back with no screenshot.
Move the capture out of the budget race: post_navigate_phase returns the HTML
only, and the screenshot is captured afterwards with its own page_timeout,
using the still-live session (the partial-snapshot path uses it too). The
page-load budget no longer cancels the capture.
Verified locally: with chrome_nav_budget_ms=2000 (forced budget hit) a
full-page capture of firecrawl.dev still returns a 756x20562 PNG.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Full-page screenshots of heavy/tall pages were silently dropped —
Page.captureScreenshotran inside the post_navigate_phase nav-budget race, so a slow page exhausted the budget mid-capture ("WS closed") and the response returned with no screenshot.Fix: capture runs AFTER the budget race with its own
page_timeout, using the still-live session. Verified locally withchrome_nav_budget_ms=2000(forced budget hit): full-page capture of firecrawl.dev returns a 756x20562 PNG. Found via prod testing of fastcrw.com.Follow-up to #163.