perf(core): memoize step return value hydration across inline replays#2472
perf(core): memoize step return value hydration across inline replays#2472pranaygp wants to merge 3 commits into
Conversation
The inline replay loop re-executes the workflow body and re-consumes the full event log on every iteration. For each already-completed step, the step consumer re-decrypted and re-devalue-parsed the serialized result on every replay — O(N^2) decrypt+parse operations across a single invocation of a sequential N-step workflow. Add a per-run memoization cache, owned by the inline loop in runtime.ts (alongside cachedEvents) so it survives across replay iterations of the same run but never leaks across runs. It is threaded into runWorkflow and stored on the orchestrator context, and consulted in the step_completed path keyed by the persisted event id. This makes a completed step's hydrated result O(1) on subsequent replays, turning the aggregate cost into O(N). Determinism is preserved: the cache lookup happens inside the existing ctx.promiseQueue slot and still resolves via the same resolve(), so a cache hit occupies the identical position in the ordered delivery chain a re-hydrate would have — pendingDeliveries accounting, delivery barriers, and Promise.race/all replay are untouched. Identity safety: hydrateStepReturnValue returns a fresh object graph each call and each replay runs in a fresh VM, so sharing an object reference across replays could let one replay's mutation leak into the next. Only primitive results are memoized (immutable, reference-share == re-parse); non-primitives re-hydrate fresh every replay, exactly as before. Hook, wait, and abort hydration paths are intentionally left uncached. Co-Authored-By: Claude Opus 4.8 <[email protected]>
🦋 Changeset detectedLatest commit: 3ca022f The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)nitro (1 failed):
🐘 Local Postgres (1 failed)nextjs-turbopack-stable-lazy-discovery-enabled (1 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
❌ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
CI failure triage — pre-existing Vercel-prod e2e flake (not a regression)The two red checks ( Scope is wrong for a hydration regression. This PR only memoizes primitive step-result hydration. A determinism/stale-value bug there would surface across all worlds — yet every local suite is 100% green:
Only 2 failures, only on Vercel Production. The two failing tests are unrelated to result hydration, and are abort/hook timing races:
The same test is red on plain
And on main run 27657696161 ( Local verification of this branch (rebuilt Re-running the e2e jobs should clear them. No code change is warranted. |
TooTallNate
left a comment
There was a problem hiding this comment.
Approve — the O(N²)→O(N) hydration memoization, done with the right safety bias
This is the most safety-sensitive of the four (it's the only one that caches a value across replays), and the design lands on the conservative side of every judgment call, which is exactly right for replay determinism.
The primitives-only decision is the crux and it's correct. hydrateStepReturnValue (devalue.parse) returns a fresh object graph each call, and each replay runs in a fresh VM, so today the workflow gets a brand-new value every replay. Caching and returning the same object reference would let const r = await step(); r.count++ observe a prior replay's mutation — silent divergence. The alternatives are both worse: structuredClone is lossy for reviver-produced specials (stream handles, step-fn proxies, Request/Response, AbortController/Signal) and still O(size). Restricting the cache to primitives (immutable, compared by value) makes "share the reference" provably indistinguishable from re-parsing, and non-primitives fall through to a full re-hydrate every replay — preserving current behavior exactly. Trading the object-case optimization for airtight determinism is the right call.
What I verified in the integration:
- Surgical wrap: only the
await hydrateStepReturnValue(...)call is replaced;pendingDeliveries++/--, thectx.promiseQueue.then(...)slot, andresolve(...)are byte-for-byte unchanged. The lookup helper always returns a Promise and awaits even on the miss path, so a cache hit occupies the identical position in the ordered delivery chain a re-hydrate would have — preserving thependingDeliveries-gated suspension/barrier/Promise.racedeterminism. - Lifetime: cache is created once per invocation in
runtime.ts(outside the per-iteration context), threaded throughrunWorkflow, never shared across runs. The optional param/context field degrades to re-hydrating-every-replay for harnesses that omit it. - Keying by
step_completedeventId(stable, world-assigned, same immutable bytes every replay) is sound, andhas()rather thanget() !== undefinedcorrectly treats a memoizedundefinedresult as a hit. - Errors never cached — a rejected hydrate re-attempts next replay, no parked rejected promise.
Built @workflow/core; full suite green (1249) including the new step-hydration-cache.test.ts / step-hydration-memoization.test.ts, async-deserialization-ordering.test.ts, and workflow.test.ts.
One forward-looking note (non-blocking): the object case is left on the table by design. If it's ever revisited, the safe path is deep-freeze + share frozen graphs — but only after handling the reviver-produced specials, so I agree with deferring it. LGTM.
VaguelySerious
left a comment
There was a problem hiding this comment.
AI review: no blocking issues
| // Only memoize values that are safe to return by reference across replays. | ||
| // Non-primitives fall through and are re-hydrated fresh on every replay. | ||
| if (isMemoizablePrimitive(value)) { | ||
| cache.set(eventId, value); |
There was a problem hiding this comment.
AI Review: Note
The per-run stepHydrationCache is never size-bounded or evicted: it grows one entry per primitive-returning completed step and lives for the entire invocation. The cost worth calling out (the PR doesn't) is the new residency — the decrypted/devalue-parsed plaintext of each cached primitive is now held for the whole invocation, on top of the serialized bytes already retained in cachedEvents. For a long sequential workflow whose steps return large strings, that roughly doubles peak retained memory for those results.
The dominant residency (the full event log in cachedEvents) already exists, so this isn't blocking, but a byte-size threshold would be a cheap safeguard — large primitives are exactly the cheap-to-re-hydrate case relative to their footprint, so letting them fall through to the existing re-hydrate path costs little and bounds the worst case. Ideally with a test asserting the bound.
There was a problem hiding this comment.
Good call — capped it, with a test. Pushed in a60baad.
What changed (step-hydration-cache.ts):
- Added
MAX_MEMOIZED_PRIMITIVE_LENGTH = 4096and extendedisMemoizablePrimitiveso astring/bigintlonger than 4 KiB is treated as non-memoizable. Those are the only primitive types that can carry a large payload — number/boolean/null/undefined/symbol are inherently small, so they're never length-checked. Oversized values now fall through to the existing per-replay re-hydrate path, exactly as you suggested: large primitives are cheap to re-hydrate relative to their footprint, so this caps the doubled-residency worst case at negligible cost. - Documented the memory characteristic on the cache module: per-invocation lifetime (fresh
Mapper run inruntime.ts, GC'd when the invocation returns), bounded by the number of primitive-returning completed steps, primitives-only, now byte-bounded.
Tests (step-hydration-cache.test.ts, +4): isMemoizablePrimitive true at the bound / false beyond it (string and bigint), and an end-to-end assertion that an oversized string re-hydrates on every replay and cache.size === 0 (the bound assertion you asked for); plus an at-bound string is a cache hit.
The cap only ever reduces what gets cached, so determinism is untouched — oversized values just take the already-correct re-hydrate path. Full core suite green (1253, incl. the ordering/determinism + memoization suites); biome + tsc clean.
On consistency with #2471 (the sibling scriptCache): noting the distinction since they're bounded for different reasons. #2471's cache is process-wide and monotonic across the whole process — in dev/watch it pins every historical bundle string (hundreds of MB over a session), which is a genuine regression vs. the prior keep-only-latest behavior, hence the Blocking bound there. This cache is per-invocation and freed wholesale when the run returns, so it can never accumulate across runs; the only real cost is the doubled residency for large primitives during one run, which the size cap here now bounds. Different scope, different severity, but both bounded now.
Address the review note that the per-run step hydration cache was never size-bounded: cached entries hold the decrypted/parsed plaintext of a primitive step result for the whole invocation, on top of the serialized bytes already retained in cachedEvents, so a long run returning large strings could roughly double peak retained memory for those results. Document the cache's memory characteristic (per-invocation, freed when the invocation ends, bounded by primitive-returning step count) and cap the only primitive types that can carry a large payload: string/bigint results longer than MAX_MEMOIZED_PRIMITIVE_LENGTH (4 KiB) fall through to the existing per-replay re-hydrate path instead of being memoized. Large payloads are cheap to re-hydrate relative to their footprint, so this caps the worst case at negligible cost. Other primitives are inherently small and always memoized. The cap only ever reduces what is cached, so deterministic replay is unaffected: oversized values take the already-correct re-hydrate path. Co-Authored-By: Claude Opus 4.8 <[email protected]>
…-hydration * origin/main: perf(core): lazy inline step start (save one world round-trip per step) (#2478) perf(core): skip per-step events.list via inline event-log delta (#2475) Version Packages (beta) (#2491) [world-vercel] Honor hasMore flag from v4 list pagination endpoint (#2486) Version Packages (beta) (#2451) Fix Next workflow module specifier root (#2455) [world-vercel] Send remoteRefBehavior=lazy on v4 metadata-only event listings (#2415) [swc-plugin] Fix eager discovery for object property steps (#2484) fix(web-shared): align attributes panel styling (#2483) [web-shared] Auto-scroll trace viewer on J/K span navigation (#2366) fix(web): render restarted step segment as solid gray, not running stripes (#2480) fix(web-shared): use solid gray for queued trace segment (#2474) Add trace viewer span markers for hooks and attributes (#2452) test: support Vercel protection bypass secret in e2e headers (#2458) fix(core): bump payload-compression cutoff to 5.0.0-beta.18 (#2470) Co-Authored-By: Claude Opus 4.8 <[email protected]> # Conflicts: # packages/core/src/runtime.ts
Summary
The inline replay loop (
runtime.ts→runWorkflow,workflow.ts) re-executes the workflow body and re-consumes the full event log on every iteration. For each already-completed step, the step consumer (step.ts,step_completedpath) re-ranhydrateStepReturnValue— AES-GCM decrypt + devalue-parse of the serialized result — on every replay, even though that exact result was already hydrated on every prior replay.For a sequential workflow of N steps, replay K hydrates K results, so the aggregate cost across a single invocation is O(N²) decrypt+parse operations.
This PR adds a per-run memoization cache so a completed step's hydrated result is returned in O(1) on subsequent replays within the same invocation, making the aggregate cost O(N).
Before / after
Cache scope & keying
runtime.ts(created once per run invocation, alongsidecachedEvents), threaded intorunWorkflow(..., stepHydrationCache?)and stored onWorkflowOrchestratorContext.stepHydrationCache. A fresh context is created each loop iteration, so the cache deliberately lives outside the per-iteration context to survive across iterations of the same run. It is never shared across unrelated runs or process-level invocations.step_completedevent'seventId— a stable, world-assigned id. The same event carries the same immutable serialized bytes across every replay, so a hit is guaranteed to correspond to identical input.runWorkflow(...)unit tests) degrade to re-hydrating every replay — identical to previous behavior.Memory characteristic
A cached entry holds the decrypted/devalue-parsed plaintext of a step result, retained for the rest of the invocation on top of the serialized bytes already held in
cachedEvents— so for large primitive results it roughly doubles peak retained memory for those results during the run. This residual is:Mapis created per run and GC'd when the invocation returns; nothing accumulates across runs or process-level invocations (a much weaker concern than a process-wide cache, where the dominant residency — the full event log incachedEvents— already exists for the same lifetime).string/bigintresult longer thanMAX_MEMOIZED_PRIMITIVE_LENGTH(4 KiB) is not memoized — it falls through to the existing per-replay re-hydrate path. Large payloads are cheap to re-hydrate relative to their footprint, so this caps the worst case at negligible cost. The cap only ever reduces what is cached, so deterministic replay is unaffected.Ordering safety analysis
The cache lookup replaces only the
await hydrateStepReturnValue(...)call inside the existingctx.promiseQueue.then(async () => { ... })slot. Everything else is byte-for-byte unchanged:ctx.pendingDeliveries++/--accounting is untouched.promiseQueueslot, at the same log position, and still resolves via the sameresolve(...).Promiseandawaits even on the miss path, so a cache hit occupies the exact position in the ordered delivery chain a re-hydrate would have.So delivery order,
pendingDeliveries-gated suspensions, thependingDeliveryBarriers/awaitEarlierDeliveriesmachinery, andPromise.race/Promise.allreplay determinism are all unaffected. Hook, wait, and abort hydration paths are intentionally not cached (they're the ordering-sensitive paths and not the O(N²) hotspot).Identity / immutability safety
hydrateStepReturnValue(devalue.parse) returns a fresh object graph on every call, and each replay iteration runs in a fresh workflow VM. Today the workflow therefore receives a brand-new value on every replay. If we cached and returned the same object reference across replays, workflow code that mutates a step result (const r = await step(); r.count++) would observe a previous replay's mutation on the next replay — a non-deterministic divergence. (structuredCloneon each hit is both lossy — revivers reconstruct stream handles, step-function proxies, Request/Response, and AbortController/AbortSignal class instances — and still O(size).)Decision: only primitives are memoized (string, number, boolean, bigint, symbol, null, undefined). Primitives are immutable and compared by value, so sharing the reference is provably indistinguishable from re-parsing. Any non-primitive result falls through to a full re-hydrate every replay, preserving current behavior exactly. Errors are never cached, so a rejected hydrate re-attempts on the next replay (no parked rejected promise). This trades away the optimization in the object-returning case to keep deterministic replay airtight — correctness over speed.
What I verified
step-hydration-cache.test.ts(14 tests: primitive detection, memoization, non-primitive eviction/fresh-object, falsy primitives, keying, error non-caching, no-cache passthrough, plus the size-bound — at-bound string is a hit, oversized string/bigint are not memoized andcache.sizestays 0) andstep-hydration-memoization.test.ts(3 tests through the realcreateUseStepconsumer: hydrate-skipped-on-replay-2 via spy, event-log ordering preserved on cache hits, fresh object per replay for object results).cd packages/core && pnpm test→ 1253 passed / 56 files, includingasync-deserialization-ordering.test.ts,workflow.test.ts(79 tests),runtime.test.ts,hook-sleep-interaction,abort-consistency. No regressions.pnpm build(full repo, 27/27),@workflow/corebuild +tsc --noEmitclean; Biome format applied; new files Biome-clean (the only lint errors were import-ordering, auto-fixed; remaining warnings are pre-existingnoExcessiveCognitiveComplexityon functions I only edited).nextjs-turbopackdev server, the determinism-sensitive subset):promiseAllWorkflow,promiseRaceWorkflow,promiseAnyWorkflow,sleepWinsRaceWorkflow,stepWinsRaceWorkflow,promiseRaceStressTestWorkflow,hookWorkflow,webhookWorkflow, parallel-steps-then-webhook replay race,sleepingWorkflow,parallelSleepWorkflow, retry/error/catchability suite,fetchWorkflow— all passed.Risks / deferred
string/bigintresults are intentionally not memoized to bound peak retained memory (see Memory characteristic); they re-hydrate each replay.🤖 Generated with Claude Code