fix(ci): fix the monorepo#299
Conversation
ruocco-l
left a comment
There was a problem hiding this comment.
🚀 Thank you for figuring this out
57d00a2 to
c1046bd
Compare
… quote $GITHUB_ENV
| RUN rm -rf \ | ||
| /deploy/node_modules/playwright \ | ||
| /deploy/node_modules/playwright-core \ | ||
| /deploy/node_modules/camoufox-js \ | ||
| /deploy/node_modules/better-sqlite3 \ | ||
| /deploy/node_modules/.bin/playwright \ | ||
| /deploy/node_modules/.bin/playwright-core \ | ||
| /deploy/node_modules/.bin/camoufox-js |
There was a problem hiding this comment.
same pattern for playwright, puppeteer and web scrapers
We install the browser driver and then delete it and run the app from a sub-dir (COPY /deploy ./app + WORKDIR /home/myuser/app). Done like that because the browser binary lives in the base image and the JS driver must match it exactly. Rather than pin our own driver to match (fragile, they're versioned independently and drift, which caused the runtime crashes), we drop the driver from the bundle so the runtime resolves it from the base image's node_modules (resolution walks up from /home/myuser/app). Driver and binary then come from the same image and can't drift. We still install it in the build stage because tsc needs the types.
| # digest. The actor does NOT ship its own browser drivers — they're dropped from the deploy bundle | ||
| # and resolved from the base at runtime, so driver and binary can never drift. Tag 24-1.59.1 is the | ||
| # newest Camoufox base that survives page JS errors (24-1.60.0 crashes the process on them). | ||
| FROM apify/actor-node-playwright-camoufox:24-1.59.1@sha256:eadc96fa9492284eb45ef70b6b91c841fae7f142d25a22ae2887a21bb78b3469 AS builder |
There was a problem hiding this comment.
Same for each browser. Base is pinned by sha256 digest, not just a tag, on purpose: tags are mutable, so a bare tag wouldn't give a reproducible browser binary. Since the whole runtime browser stack comes from the base (see the driver note), the digest is what actually makes the browser deterministic. Renovate bumps the digest.
Also, not the newest base. I've tested the latest camoufox base (24-1.60.0, Camoufox 150): it crashes the whole process on any page that emits a JS error (playwright's Firefox pageError handler). I also tried the latest Playwright (1.61.x), which won't even launch this Camoufox binary (Browser.setDefaultViewport protocol error). 24-1.59.1 is the newest camoufox base that survives page errors.
| "@crawlee/browser-pool": "^3.8.2 || ^4.0.0-beta.0", | ||
| "@crawlee/core": "^3.8.2 || ^4.0.0-beta.0", | ||
| "@crawlee/types": "^3.8.2 || ^4.0.0-beta.0", | ||
| "@crawlee/utils": "^3.8.2 || ^4.0.0-beta.0", | ||
| "apify": "^3.1.8 || ^4.0.0-beta.0" |
There was a problem hiding this comment.
This package is shared by the crawlee-v3 generic actors and the v4-beta sitemap-extractor. We will make it 1 ver when we move all actors to crawlee v4
Also, after moving to v4 we will delete version-neutral slice: packages/actor-scraper/sitemap-scraper/src/scraper-tools-compat.d.ts
| @@ -0,0 +1 @@ | |||
| inject-workspace-packages=false | |||
There was a problem hiding this comment.
it's an explicit guard: a normal pnpm install must symlink workspace packages (fast local dev); only the Docker builds opt in per-command with --config.inject-workspace-packages=true on pnpm deploy, so /deploy is self-contained.
There was a problem hiding this comment.
Can you please expand on this? It seems this option is false by default in pnpm anyway https://pnpm.io/workspaces#injectworkspacepackages
(also, it seems this option should be camelCased and in pnpm-workspace.yaml instead)
| overrides: | ||
| "[email protected]>@crawlee/core": "4.0.0-beta.25" | ||
| "[email protected]>@crawlee/types": "4.0.0-beta.25" | ||
| "[email protected]>@crawlee/utils": "4.0.0-beta.25" |
There was a problem hiding this comment.
Remove once on stable crawlee v4
| "dockerContextDir": "../../../..", | ||
| "dockerfile": "../Dockerfile", |
There was a problem hiding this comment.
same in every actor
points the Docker build context at the monorepo root so the build can use the root pnpm-lock.yaml for a frozen, deterministic install. apify push of a single folder can't do this (no root lockfile in the upload). that's why builds run from the Git source.
inspired by actor-monorepo-example
| @@ -0,0 +1,80 @@ | |||
| import { ApifyClient } from 'apify-client'; | |||
There was a problem hiding this comment.
It triggers a build of the Actor's configured Git source and waits for the result. For exact-commit reproducibility, point the source at a release tag.
barjin
left a comment
There was a problem hiding this comment.
Thank you @nikitachapovskii-dev !
The changes look mostly fine to me. 👍 I wrote down few ideas I had looking at the code. Please look into these at your own discretion and feel free to merge (no need to wait for me, @ruocco-l 's opinion has more weight here :)).
| build_and_test: | ||
| name: Build & Test | ||
| runs-on: ubuntu-22.04 | ||
| timeout-minutes: 60 |
There was a problem hiding this comment.
I believe this won't fix the timeouting e2e tests. Likely caused by microsoft/playwright#40724 (bumping Playwright or reverting Node version might help)
There was a problem hiding this comment.
it's not the fix 😄, it's just for the future (if something happens then fail is better than getting stuck) The actual fix is browser-install step
| // camoufox-js launch options intentionally diverge from Playwright's, and the pinned | ||
| // playwright-core type identity differs from @crawlee/playwright's — cast through unknown. |
There was a problem hiding this comment.
Huh, this sounds curious - can you please investigate this and make an issue in crawlee or camoufox-js?
I don't think we changed anything recently, at least not intentionally.
| .vscode | ||
| yarn.lock | ||
| .yarn | ||
| # npm locks are generated locally in actor dirs (the monorepo uses pnpm-lock.yaml) |
There was a problem hiding this comment.
But this addition is just a comment, right? What is it commenting on?
There was a problem hiding this comment.
Aha, that's a leftover from when I tried to make it via apify push. Thanks for noticing
| @@ -0,0 +1 @@ | |||
| inject-workspace-packages=false | |||
There was a problem hiding this comment.
Can you please expand on this? It seems this option is false by default in pnpm anyway https://pnpm.io/workspaces#injectworkspacepackages
(also, it seems this option should be camelCased and in pnpm-workspace.yaml instead)
| const { SESSION_MAX_USAGE_COUNTS, DEFAULT_VIEWPORT, DEVTOOLS_TIMEOUT_SECS, META_KEY } = scraperToolsConstants; | ||
| const SCHEMA = JSON.parse(await readFile(new URL('../../INPUT_SCHEMA.json', import.meta.url), 'utf8')); | ||
|
|
||
| type CrawleePuppeteerPage = PuppeteerCrawlingContext['page']; |
There was a problem hiding this comment.
Why is this change necessary? PuppeteerCrawlingContext['page'] should be the exactly the Page type from puppeteer. A mismatch here might point to some issues with package version resolution / dual installs etc.
There was a problem hiding this comment.
You are right! I was lost fighting with browsers and didn't even check if it was a dual install here :/
| ENV APIFY_DISABLE_OUTDATED_WARNING=1 | ||
|
|
||
| CMD npm run start:prod --silent | ||
| CMD ["node", "dist/main.js"] |
There was a problem hiding this comment.
nit: we should either call (p)npm run start:prod here or drop the start:prod script from the package.json files. Keeping the script but not using it will eventually lead to a frustrating debugging experience 😄
Reviving the generic scrapers after the split + pnpm migration
After thee migration to pnpm and SDK split, CI went red and the actor builds stopped matching the platform. This PR gets everything green again and, more importantly, makes the platform builds reproducible.
CI
The e2e workflow simply never installed the browsers, on a warm pnpm store the side-effects cache skips the puppeteer/playwright postinstall, so Chrome wasn't there and every browser suite died with
Could not find Chrome. Added the install step the PR workflow already had. Both workflows now run on Node 24 withtimeout-minutes, so if something hangs it fails fast instead of eating six hours in silence.Deterministic builds from the monorepo root
We're a pnpm monorepo: one root
pnpm-lock.yamland a shared@apify/scraper-tools. For reproducible images the build has to install from that lockfile with--frozen-lockfile, and it has to see the local workspace package, not the published one. So every actor builds from the repo root via a Git source (dockerContextDir: ../../../..) and runspnpm install --frozen-lockfile+pnpm --filter <actor>... deploy.apify pushcan't do this, it uploads only the actor folder, without the root lockfile or the localscraper-tools.The release workflow drops
push-actor-actionand instead triggers each Actor's Git-source build throughscripts/trigger-apify-build.mjs(checks the version is aGIT_REPOsource with the right tag, fires the build, waits, fails on anything but success). This assumes each Actor on the platform has its source pointed at this repo (branch +:packages/actor-scraper/<actor>).Browsers come from the base image, not from us
The browser binary lives in the Apify base image, and the JS driver has to match it. Pinning our own driver to chase that match is brittle, they version separately and drift, which is exactly what kept crashing. So the browser actors just use the base image's driver: we drop it from the deploy bundle and let it resolve from the base at runtime (the app sits in
/home/myuser/app). Binary and driver now come from the same image and can't get out of sync, and the bases are pinned by digest.camoufox is the one exception worth flagging: it's stuck on
24-1.59.1. The newer24-1.60.0(Camoufox 150) crashes the whole process on any page that throws a JS error, that's upstream, it reproduces on the untouched base image.24-1.59.1is the newest one that survives.sitemap-extractor (Crawlee/Apify v4 beta)
Added to the release matrix. It rides the v4 beta while the other six stay on v3, so
@apify/scraper-toolsaccepts^3 || ^4, a tiny type shim keeps its v3 types from clashing in the v4 build, and a couple of pnpmoverridespin the beta's transitive@crawlee/*. All of that comes back out once we're fully on stable v4.Cleanup
Manifests still pointed at
apify-ts/apify-sdk-js, CONTRIBUTING said npm, a couple of READMEs were stale — fixed. Everything builds on Node 24.Follow-ups (separate)
Unpin camoufox once a newer base survives page errors. And drop the v4 bridge bits (the
^3peer half, the shim, the overrides) when everything's on stable v4.P.S. sorry, the description is huge, I know 😬