Skip to content

Make image capture deterministic: stepwise scroll + explicit image wait#49

Merged
jeremyfelt merged 2 commits into
trunkfrom
improve-image-capture
Jun 10, 2026
Merged

Make image capture deterministic: stepwise scroll + explicit image wait#49
jeremyfelt merged 2 commits into
trunkfrom
improve-image-capture

Conversation

@jeremyfelt

Copy link
Copy Markdown
Member

Stacked on #48 — merge that first; this diff will then show only the image-capture commit.

Why

Captures of image-heavy pages were unreliable: every run could include a different set of images, making diffs noisy. Two compounding causes:

  1. autoScroll never actually scrolled through the page. Each iteration jumped straight to the bottom (window.scrollTo(0, scrollHeight)). Lazy loaders — IntersectionObserver-based and native loading="lazy" — only trigger for content near the viewport, so everything in between loading or not was a timing race.
  2. The post-scroll networkidle settle fails silently. It only covers requests that already started, says nothing about decode state, and when it times out on a busy server the capture proceeds with whatever happened to arrive — no signal that anything is missing.

What

  • autoScroll steps one viewport at a time (re-reading the page height as content grows, capped for infinite feeds), using behavior: 'instant' so a site's scroll-behavior: smooth can't outpace the loop.
  • After the settle, capture waits — bounded by timeouts.settle — for every visible image to load and decode (img.decode()), and prints a per-capture warning naming the slug when images were still loading at screenshot time.
  • Hidden images are excluded from the wait: they can't paint into the screenshot, and a hidden native-lazy image (e.g. a desktop-only sidebar image at a mobile viewport width) never loads at all by design — waiting for it burned the full timeout and warned about nothing. Verified against a real page where exactly those images were the stragglers.
  • README: documents the image-wait behavior under "Trustworthy baselines" and clarifies what settle bounds.

Verification

  • npm test: 105 passing.
  • Back-to-back captures of two image-heavy pages (4 screenshots up to 13,000px tall): pixel-identical across runs, no warnings, ~6.5s total.
  • Old code vs new code on the same pages: the stepwise scroll captured 4.7% more changed pixels (731k) on one desktop page — content the jump-scroll was skipping.

🤖 Generated with Claude Code

jeremyfelt and others added 2 commits June 10, 2026 11:00
A page embedding a widget that never lets the network go quiet — a
CAPTCHA like Cloudflare Turnstile that polls and retries indefinitely
under automation, ad tech, analytics — times out the networkidle wait on
every viewport, burning the full retry budget per capture.

blockHosts in reglance.json takes bare hostnames and aborts every
request to them (and their subdomains) at the browser, so the page goes
idle and captures stay deterministic. Blocked requests are excluded from
the critical-resource retry check, since a deliberately blocked script
firing requestfailed is not a load failure.

Co-Authored-By: Claude Fable 5 <[email protected]>
autoScroll jumped straight to the bottom of the page, so lazy loaders
(IntersectionObserver, native loading="lazy") never fired for anything
in between — which images made it into a capture was a timing race,
producing noisy diffs. It now steps one viewport at a time so every
lazy image is triggered, re-reading the height as content grows.

After the network-idle settle, capture now also waits (bounded by
timeouts.settle) for every visible image to load and decode, and warns
per capture when any image was still loading instead of silently
shipping a partial screenshot. Hidden images are excluded: they cannot
paint, and a hidden native-lazy image (e.g. a desktop-only image at a
mobile width) never loads by design.

Also generalizes hostname examples in docs and comments.

Co-Authored-By: Claude Fable 5 <[email protected]>
@jeremyfelt jeremyfelt merged commit 546197f into trunk Jun 10, 2026
2 checks passed
@jeremyfelt jeremyfelt deleted the improve-image-capture branch June 10, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant