fix: name registry must own keys — strdup mode in stb_ds#23
Merged
Conversation
…stry
The second-pass STOMP sweep cliff (s500x100x256, s10x1000x256 still
DIED after mutual-TCO landed) was a dangling-pointer SEGV in the
global name registry, not mailbox bloat or process-table exhaustion.
`gem_register_name` was storing the caller's `args[0].sval` pointer
verbatim in stb_ds (default mode, no strdup). Destination names like
"/topic/foo" arrive at the registry gen_server as deep-copied
`lookup_or_create` messages, so the keys live in the registry's arena.
Once the registry's per-iteration arena reset fires (>1MB allocated at
the back-edge — easily reached at ~25k+ messages routed through
`registry.lookup_or_create` from `connection.gem`'s SEND path), every
key in stb_ds becomes a dangling pointer. The next `whereis` walks
them and segfaults in strcmp. The bug was always there; mutual TCO
lifted the 175-frame writer cliff that masked it.
Diagnostics added to surface this kind of failure in the future:
- benchmarks/stomp/sweep.sh now compiles the broker once and runs
the binary directly so `wait $bpid` captures the broker's exit
status (signal vs clean exit) instead of the gem launcher's.
- RSS sampler ported from run.sh — peak/start/end summary inline
per cell.
- `gem_diag` atexit line in gem_scheduler.c counts spawn-overflow
returns (`gem_spawn_fn` -> -1 on full proc table) and prints
proc_hwm. Gated on GEM_DIAG=1 to keep default test runs quiet.
Verified: examples/97_register_arena_reset.gem registers names built
from string concat, forces an arena reset by buf_pushing ~2MB between
tail iterations. With the fix removed, exits with signal 11; with the
fix in, prints `nil nil ok` and `make test` passes (125 examples).
Sweep, post-fix — every cell delivers 100%:
s500x100x256 delivered=50000/50000 fanout=62524 msg/s
s10x1000x256 delivered=10000/10000 fanout=18463 msg/s
The same pattern in tables (`t->str_index`) is fine — keys there are
the same GemVal entries stored in `t->keys[]`, so they share an arena
with the table and die together. Only the global name registry takes
strings from foreign arenas.
Full lived-experience writeup appended to examples/stomp_broker/NOTES.md
under "Third pass".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
s500x100x256ands10x1000x256still DIED after mutual-TCO) was a dangling-pointer SEGV in the global name registry.gem_register_namestoredargs[0].svalverbatim in stb_ds; destination names live in the registry gen_server's arena, so the per-iteration arena reset (~1 MB, easily reached past 25k messages routed throughregistry.lookup_or_create) turned every key into a dangling pointer. Nextwhereisstrcmps freed memory → SIGSEGV. Fix issh_new_strdup(gem_name_registry)at scheduler init.sweep.sh,gem_diagatexit line countinggem_spawn_fnoverflows (gated onGEM_DIAG=1), and the harness now runs the broker binary directly sowait $bpidcaptures its exit status / signal cleanly.examples/stomp_broker/NOTES.mdunder "Third pass".Test plan
make test(125 examples, including new97_register_arena_reset.gemwhich exits SIGSEGV without the fix).make bootstrap(clean roundtrip).bash benchmarks/stomp/sweep.sh— every cell delivers 100% post-fix:s500x100x256: 50,000/50,000 @ 62,524 msg/s (was 26,999/50,000 DIED)s10x1000x256: 10,000/10,000 @ 18,463 msg/s (was 5,439/10,000 DIED)