perf(cache): passthrough uncached source files instead of tar-packing#127
Merged
Conversation
e5091eb to
11d43f3
Compare
An uncached `@heph/fs:file` target (one per source file) tar-packed its
single source file into the in-memory tmp cache on the "cache write" hot
path. The pack does a synchronous file read INLINE on the tokio worker
pool (`block_or_inline` is inline on Linux), so at CI scale thousands of
these saturate the disk and starve the runtime — a tiny go.mod "cache
write" was observed taking 14s. The work is also pure waste: the artifact
just re-exposes an immutable workspace file.
Carry produced outputs through the result pipeline as a new `ResultArtifact`
{ content: Arc<dyn Content>, group, r#type } instead of `Vec<CacheArtifact>`.
A producer sets `ContentFile.passthrough` when `source_path` is a durable
workspace file; `execute_and_cache_inner` then partitions such outputs out of
`cache_locally` entirely and carries them as their raw `OutputArtifact` (which
already implements `Content`, walking to the single source file at its
`out_path`). No file read, no tar, no copy, no manifest, no `CacheArtifact` —
and no LocalCacheWrite span, so it no longer shows up as "cache write" at all.
`seekable_reader`/`file_path` stay `None`, so the FUSE tar-index path is
bypassed and consumers materialize via the generic unpack-from-`walk()` path.
Gated two ways: the producer flag (other drivers' `Content::File` points into
sandboxes cleaned after caching, which would dangle) and `tmp` (a cacheable
revision must own a durable copy of its bytes, since `source_path` may change
across runs). The flag is Rust-only — it does not cross the plugin ABI yet, so
out-of-process plugins always pack.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
11d43f3 to
20f033f
Compare
A passthrough source artifact (`@heph/fs:file`/`fs:glob`) is referenced by path and read live on consume, never snapshotted into the cache. If the workspace file is modified between when it was hashed (the value folded into the target's `hashin` cache key) and when a consumer reads it, the live bytes silently diverge from the cache key — poisoning every downstream entry. Wrap the passthrough content in `PassthroughContent`, whose reader/walk tee the bytes through a `VerifyingReader` that re-hashes as they stream (no extra I/O — the consumer reads them anyway) and, at EOF, compares the digest against the recorded `hashout`. Mismatch returns an explicit `InvalidData` error naming the file, turning silent corruption into a hard failure. The hash is byte-for-byte identical to `hwalk::file_hashout` (xxh3 over content + exec-bit marker); a guard test pins them together. `seekable_reader`/`file_path` stay `None`, so FUSE bails the slot to the unpack-into-upper fallback (`io::copy` over `walk()`), routing FUSE through the same verified copy as OS mode. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CI stat:
//@heph/fs:file@f=mgmt/go/go.mod cache write 14s— a single tiny source file, uncached, taking 14s on the "cache write" phase.Root cause: an uncached
@heph/fs:filetarget (one per source file) tar-packs its single source file into the in-memory tmp cache during cache write. The pack does a synchronous file read inline on the tokio worker pool (block_or_inlineis inline on Linux). At CI scale, thousands of these saturate the disk and starve the async runtime, so each tiny read stalls for seconds. The work is also pure waste — the artifact just re-exposes an immutable workspace file.Fix
Produced outputs now travel the result pipeline as a new
ResultArtifact { content: Arc<dyn Content>, group, r#type }instead ofVec<CacheArtifact>. A producer setsContentFile.passthrough = truewhensource_pathis a durable workspace file;execute_and_cache_innerpartitions those outputs out ofcache_locallyentirely and carries them as their rawOutputArtifact— which already implementsContent, walking to the single source file at itsout_path.Result: no file read, no tar, no copy, no manifest, no
CacheArtifact, and noLocalCacheWritespan — so an@heph/fs:fileno longer shows up as "cache write" at all.seekable_reader/file_pathstayNone, so the FUSE tar-index path is bypassed and consumers materialize via the generic unpack-from-walk()fallback.Safety — gated two ways
Content::File(pluginexec log, etc.) point into sandboxes cleaned after caching; a path ref would dangle. Only fsfile/glob source artifacts set it.tmponly — a cacheable revision must own a durable copy of its bytes, sincesource_pathmay change across runs.Flag is Rust-only (no proto field yet) → out-of-process plugins always pack.
Tests
is_passthrough_gates_on_tmp_and_producer_flag— gating: only tmp + flaggedContent::File; cacheable / unflagged / non-file all pack.passthrough_result_artifact_reads_source_without_cache— a passthroughResultArtifactis never aCacheArtifact, carries no seekable/file_path, andwalk()yields the source content atout_path.Local:
lintclean (clippy-D warnings+ fmt); engine + builtins + plugin-abi suites green (322 tests incl. 2 new). Fulltstwas disk-bound on the dev box (No space left on devicein plugingo-e2e codegen tests that build the whole Go stdlib) — unrelated to this change, which reduces cache writes.🤖 Generated with Claude Code