Skip to content

mutual TCO: trampoline tail-edge SCCs (lifts broker's 175-frame cliff)#22

Merged
samuelSavanovic merged 1 commit into
mainfrom
mutual-tco-scc-trampoline
May 10, 2026
Merged

mutual TCO: trampoline tail-edge SCCs (lifts broker's 175-frame cliff)#22
samuelSavanovic merged 1 commit into
mainfrom
mutual-tco-scc-trampoline

Conversation

@samuelSavanovic
Copy link
Copy Markdown
Owner

Summary

  • Compiler now finds tail-edge SCCs of size ≥ 2 via Tarjan and emits each viable member as a gem_fn_<name>_body plus a thin trampoline-loop wrapper. Intra-SCC tail calls set a global TLB (gem_tail_fn, gem_tail_args, gem_tail_argc, gem_tail_env) and return instead of doing a real C call, so a 7-fn cycle iterates at constant stack depth.
  • Direct self-tail recursion still uses the existing while(1)+continue shape inside the body — only cross-fn intra-SCC tail calls take the trampoline.
  • Lifts the STOMP broker's 175-frame writer-cycle ceiling (see examples/stomp_broker/NOTES.md "Milestone 6, second pass" for the new sweep numbers — cells previously dead at delivered ≈ N×175 now complete; cells that hit a different cliff survive 3× longer).

What's in the diff

  • runtime/gem.h + runtime/gem_error.c: GEM_MAX_TAIL_ARGS = 16, plus the four globals for the TLB.
  • compiler/codegen.gem:
    • find_tail_call_sccs (Tarjan + viability filter — bails the whole SCC silently on any member with rest params, defaults, boxed params, name-shadow, or > 16 params).
    • scc_wrapper_for emits the trampoline loop with per-iteration arena reset (gated like emit_tco_continue).
    • emit_scc_tail_call writes the TLB, pops the body's frame, returns GEM_NIL.
    • compile_fn and compile_stmt_return wired to detect SCC membership and intra-SCC tail-call sites.
    • SCC body forward decl added next to the existing wrapper forward decl.
  • examples/96_mutual_tco.gem: 2-cycle (even/odd, 20k iters), 3-cycle (broker shape, 50k), and a mixed direct-self-TCO + mutual case (10k). Output appended to expected_output.txt.
  • bootstrap/stage0.c regenerated; make bootstrap roundtrip clean on first pass.
  • Docs: examples/stomp_broker/NOTES.md second-pass writeup; docs/ROADMAP.md "Deep non-tail recursion ceiling" downgraded P1 → P3 with what's left (catchable stack-overflow, growable stacks); docs/OPTIMIZATIONS_LOG.md shipped entry under Codegen Output; CLAUDE.md Key Decisions updated.

Test plan

  • make test — all 124 examples + LSP smokes green.
  • make bootstrap — clean roundtrip on first pass; clean rebuild from new stage0.c passes the full suite.
  • bash examples/stomp_broker/smoke_test.sh — 4/4 pass.
  • bash benchmarks/stomp/sweep.sh — sweep numbers in NOTES.md "Milestone 6, second pass".
  • Sanity: examples/96_mutual_tco.gem exercises 2-/3-/5-fn SCCs plus a member with both direct-self TCO and cross-fn cycles in the same body.

Not in this PR

  • 1k-fanout characterization (needs GEM_MAX_PROCS=4096 build).
  • Slow-consumer to OOM and 100-publisher queue test.
  • Backpressure design.

These are the next-tier issues the second-pass cliff (mailbox / process-table) surfaces; per the resume prompt, they're for a later session.

…cliff

Direct self-tail recursion was already collapsed into while(1)+continue;
mutual cycles weren't, so the STOMP broker's 7-fn writer_loop ↔
handle_frame ↔ handle_<command> SCC overflowed minicoro's 256 KB stack at
exactly 175 frames per connection (see examples/stomp_broker/NOTES.md
"Milestone 6: lived experience"). The compiler now finds these cycles via
Tarjan's SCC over the fn_def→fn_def tail-edge graph it already builds for
mark_process_tail. Every viable SCC member emits as gem_fn_<name>_body
plus a thin trampoline wrapper; an intra-SCC tail call writes a global
TLB (gem_tail_fn / gem_tail_args / gem_tail_argc / gem_tail_env) and
returns instead of doing a real C call, so the wrapper iterates at
constant depth.

Sweep cells previously at delivered ≈ N×175 (100×200, 50×500, 200×100)
now complete cleanly. Cells that hit a different cliff (500×100,
10×1000) survive 3× longer but still die — those are mailbox or
process-table issues, the next-tier bottleneck the original M6 prompt
anticipated would surface once the recursion ceiling was lifted. Full
numbers in NOTES.md "Milestone 6, second pass".
@samuelSavanovic samuelSavanovic enabled auto-merge (squash) May 10, 2026 14:05
@samuelSavanovic samuelSavanovic merged commit 4c9d7b8 into main May 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant