Skip to content

Isolate-class workers: wasmtime runtime host + fold-wasm#5

Open
nishu-builder wants to merge 1 commit into
mainfrom
isolate-runtime-spike
Open

Isolate-class workers: wasmtime runtime host + fold-wasm#5
nishu-builder wants to merge 1 commit into
mainfrom
isolate-runtime-spike

Conversation

@nishu-builder

Copy link
Copy Markdown
Contributor

Fold over an N-node tree issues ~2N serial /run calls at a full container lifecycle each (~324ms p50, traced: ~100% container dispatch). This adds an isolate-class worker mechanism so orchestration-heavy recursive workers run as ~µs wasm isolates instead, with pre/post visitors staying ordinary docker images.

Results (OrbStack aarch64 VM, docker backend)

fixture container fold cold fold-wasm cold warm (memoized)
synthetic 160 nodes / 120 files (tests/fold-bench) 61.2s (quiet) – 106.7s (loaded) 10.9s 4–5ms
real tree: crates/ (60 nodes), via caos-cli 37.4s 10.3s
  • Isolate instantiation: 3–26µs per fold frame (vs ~324ms+ container dispatch); module compiled once per host.
  • A 24-level-deep chain completes (tests/fold-wasm) — parent frames suspend in run_many holding no slot/thread/container, so the warm-pool deadlock class is impossible by construction.
  • fold-wasm's remaining wall-clock is entirely its container visitors (file-count), fanned out 16-way per frame — orchestration itself is free. Wasm visitors are the obvious next win.

What's in here

  • Runtime-object mechanism (general, not fold-specific): a worker image may be a git tree {".caos-runtime": <blob: host image ref>, "module": <blob>}. compute.rs detects the marker before resolve_image and POSTs the job to a warm caos-isolate-host-{key} container (no SLOT — unlimited concurrent isolates). Memoization, cycle detection, and result pinning are shared with container workers; the marker carries the host image hash, so cache keys cover the runtime version. Future runtimes (TS/V8 host, runner-as-runtime) are additive.
  • crates/isolate-host: musl-static wasmtime 46 host (pooling allocator). Guest ABI caos_abi_v1job/tree/get/put_blob/put_tree/run/run_many/out/log over two wasm imports (JSON call/read). Deterministic WASI stubs (clock=0, fixed random, no fs/net/env). run_many fans nested runs out on OS threads (16-way, order-preserving). Tree encoding is pinned byte-identical to git's (unit test).
  • crates/isolate-common + crates/fold-wasm (wasm32-wasip1, 140KB): fold ported to the ABI; children fold via one run_many batch; blob children with no pre short-circuit straight to the post request with the canonical empty children tree. Request construction is byte-identical to the container client's — traces confirm container fold's leaf requests cache-hit entries created by fold-wasm (cross-implementation aliasing works).
  • Tracing: caos-trace per /run (parent edge, depth, per-phase ms) + tests/trace-report.py (summary/waterfall); isolate-trace per host job.
  • Packaging/publish: flake builds caos-isolate-host (service image) and .#fold-wasm; build-builtins.sh publishes std entries isolate-host (pins the host image objects) and fold-wasm (the runtime node). Toolchain gains wasm32-wasip1.
  • Tests: tests/fold-wasm (parity with container fold incl. symlink leaf = 31, deep chain, memoized rerun), tests/fold-bench (three-way benchmark). Includes two drive-by fixes for nix flake check under the refreshed toolchain lock (clippy type_complexity in the fly path via a BaseLayers alias; fmt reflow).

Semantics notes / open questions

See design/isolate-runtime-spike.md. Notable: container fold follows symlink-to-dir children, fold-wasm treats them as leaf blobs (parity pinned only for symlink-to-file, where both agree — arguably the container behavior is the bug). Also open: visitor strategy (wasm visitors vs elastic container pool), run_many bound tuning, host-container reaping.

Verification

  • nix flake check green; 3 isolate-host unit tests; tests/fold-wasm, tests/fold-bench pass.
  • Existing tests/{deep-deps,file-count,dirs-only,symlinks,untracked} all pass — the docker path is untouched.

Related Asana: Make workers fast, Parallelization for fold, Add v8isolate workers?

🤖 Generated with Claude Code

@nishu-builder nishu-builder force-pushed the isolate-runtime-spike branch from 4327419 to c759510 Compare July 3, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant