Skip to content

fix: name registry must own keys — strdup mode in stb_ds#23

Merged
samuelSavanovic merged 1 commit into
mainfrom
stomp-second-pass-cliff
May 10, 2026
Merged

fix: name registry must own keys — strdup mode in stb_ds#23
samuelSavanovic merged 1 commit into
mainfrom
stomp-second-pass-cliff

Conversation

@samuelSavanovic
Copy link
Copy Markdown
Owner

Summary

  • The second-pass STOMP sweep cliff (s500x100x256 and s10x1000x256 still DIED after mutual-TCO) was a dangling-pointer SEGV in the global name registry. gem_register_name stored args[0].sval verbatim in stb_ds; destination names live in the registry gen_server's arena, so the per-iteration arena reset (~1 MB, easily reached past 25k messages routed through registry.lookup_or_create) turned every key into a dangling pointer. Next whereis strcmps freed memory → SIGSEGV. Fix is sh_new_strdup(gem_name_registry) at scheduler init.
  • Diagnostics added to disambiguate the silent-death failure mode: RSS sampler in sweep.sh, gem_diag atexit line counting gem_spawn_fn overflows (gated on GEM_DIAG=1), and the harness now runs the broker binary directly so wait $bpid captures its exit status / signal cleanly.
  • Lived-experience writeup appended to examples/stomp_broker/NOTES.md under "Third pass".

Test plan

  • make test (125 examples, including new 97_register_arena_reset.gem which exits SIGSEGV without the fix).
  • make bootstrap (clean roundtrip).
  • bash benchmarks/stomp/sweep.sh — every cell delivers 100% post-fix:
    • s500x100x256: 50,000/50,000 @ 62,524 msg/s (was 26,999/50,000 DIED)
    • s10x1000x256: 10,000/10,000 @ 18,463 msg/s (was 5,439/10,000 DIED)

…stry

The second-pass STOMP sweep cliff (s500x100x256, s10x1000x256 still
DIED after mutual-TCO landed) was a dangling-pointer SEGV in the
global name registry, not mailbox bloat or process-table exhaustion.

`gem_register_name` was storing the caller's `args[0].sval` pointer
verbatim in stb_ds (default mode, no strdup). Destination names like
"/topic/foo" arrive at the registry gen_server as deep-copied
`lookup_or_create` messages, so the keys live in the registry's arena.
Once the registry's per-iteration arena reset fires (>1MB allocated at
the back-edge — easily reached at ~25k+ messages routed through
`registry.lookup_or_create` from `connection.gem`'s SEND path), every
key in stb_ds becomes a dangling pointer. The next `whereis` walks
them and segfaults in strcmp. The bug was always there; mutual TCO
lifted the 175-frame writer cliff that masked it.

Diagnostics added to surface this kind of failure in the future:

  - benchmarks/stomp/sweep.sh now compiles the broker once and runs
    the binary directly so `wait $bpid` captures the broker's exit
    status (signal vs clean exit) instead of the gem launcher's.
  - RSS sampler ported from run.sh — peak/start/end summary inline
    per cell.
  - `gem_diag` atexit line in gem_scheduler.c counts spawn-overflow
    returns (`gem_spawn_fn` -> -1 on full proc table) and prints
    proc_hwm. Gated on GEM_DIAG=1 to keep default test runs quiet.

Verified: examples/97_register_arena_reset.gem registers names built
from string concat, forces an arena reset by buf_pushing ~2MB between
tail iterations. With the fix removed, exits with signal 11; with the
fix in, prints `nil nil ok` and `make test` passes (125 examples).

Sweep, post-fix — every cell delivers 100%:

  s500x100x256   delivered=50000/50000   fanout=62524 msg/s
  s10x1000x256   delivered=10000/10000   fanout=18463 msg/s

The same pattern in tables (`t->str_index`) is fine — keys there are
the same GemVal entries stored in `t->keys[]`, so they share an arena
with the table and die together. Only the global name registry takes
strings from foreign arenas.

Full lived-experience writeup appended to examples/stomp_broker/NOTES.md
under "Third pass".
@samuelSavanovic samuelSavanovic enabled auto-merge (squash) May 10, 2026 15:07
@samuelSavanovic samuelSavanovic merged commit 015624b into main May 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant