Summary
The root-cause fix for #427 (fb17573, bounding type_args access in
cbm_type_substitute) closes the producer your repro exercised, but a second
producer of the same garbage-CBMType* crash survives on current main
(tested at 2712756). Indexing ns-3's src/core/model/ptr.h — a single stock
header, raw file on GitLab master —
SEGVs deterministically on Linux x86_64. Repro: drop that one file into an
empty directory and run index_repository on it:
curl -LO https://gitlab.com/nsnam/ns-3-dev/-/raw/master/src/core/model/ptr.h
Invalid read of size 4
at c_eval_expr_type_inner (c_lsp.c:1659) ret->kind with ret == 0x9
by c_eval_expr_type (c_lsp.c:1240)
by c_resolve_calls_in_node (c_lsp.c:3437)
...
by c_lsp_process_file (c_lsp.c:4325) pass=definitions
This is the crash I described in the PR body as "corruption in the
template-function registry path" that the defensive guard deliberately did not
fix — it turns out not to be registry corruption but an uninitialized stack
array at two cbm_type_substitute call sites.
Root cause
fb17573 bounds the walk with:
int args_len = 0;
while (args_len < nparams && type_args[args_len]) args_len++;
and documents the contract: type_args is either parallel-length or
shorter-and-NULL-terminated. Two call sites in c_lsp.c violate that contract
— the template_function branch of c_eval_expr_type_inner (~line 1582) and
the qualified-identifier-with-template-args branch (~line 1438):
const CBMType* targs[16]; // uninitialized stack storage
int targ_count = 0;
/* fills targs[0..targ_count-1]; never writes a NULL terminator */
...
const char* fallback_names[] = {"T", "U", "V", "W", NULL}; // nparams = 4
base_ret = cbm_type_substitute(ctx->arena, base_ret, fallback_names, targs);
When the call site supplies fewer explicit template args than the function has
type params (e.g. one explicit arg against the 4-name fallback list, or a
registered type_param_names list longer than the explicit args), the
args_len walk reads targs[targ_count..] — uninitialized stack slots.
Whatever garbage is there is non-NULL, so it extends args_len, and the
matched-param return (i < args_len && type_args[i]) ? type_args[i] : t
hands the stack junk back as a const CBMType*. It then gets wrapped via
cbm_type_func() (c_lsp.c:1616) and dereferenced downstream in the
call_expression branch (ret->kind, c_lsp.c:1659) → SEGV.
ns-3's ptr.h hits this through its templated free functions / operators
(Ptr<T>, Create<T>(...), templated comparison operators), which resolve
through the fallback-positional path with partial explicit args.
Why the earlier reports looked different: the read is of uninitialized stack
memory, so the faulting value varies with build and stack layout — UBSan
instrumentation alone shifts the frame enough to mask it entirely (compiling
lsp_all.c with -fsanitize=undefined at -O1 or -O2 makes the crash
vanish; plain -O2 crashes every time). MSan or -ftrivial-auto-var-init=pattern
would have caught it directly; ASan/valgrind don't flag stack-uninit reads.
The other two targs[16] declarations in c_lsp.c (~881, ~971) are fine —
they feed cbm_type_template(..., targ_count), which copies by count.
Fix
Zero-initialize both arrays so the fill is always NULL-terminated, matching
the documented cbm_type_substitute contract:
const CBMType* targs[16] = {NULL};
Two lines total. Verified on ns-3 (ptr.h alone, and a full ~40k-node index
of an ns-3-based codebase): no SEGV, and a temporary trap asserting
pointer-validity of every return_types[] entry at cbm_type_func() creation
never fires after the change.
A regression fixture for tests/ would be a templated C++ header whose
templated function is called with fewer explicit template args than declared
params, e.g.:
template <typename T, typename U, typename V>
T Convert(U u) { return T(u); }
void Use() { auto x = Convert<int>(1.5); } // 1 explicit arg, 3 params
(needs the function registered first, then the call-site type-eval — the
existing typerep_substitute_short_args_no_oob_issue427 covers the
substitute-side bound but cannot see this caller-side contract break, since
the corruption enters through the array the caller hands in.)
Happy to open a PR with the two-line fix + fixture if useful.
Summary
The root-cause fix for #427 (fb17573, bounding
type_argsaccess incbm_type_substitute) closes the producer your repro exercised, but a secondproducer of the same garbage-
CBMType*crash survives on currentmain(tested at 2712756). Indexing ns-3's
src/core/model/ptr.h— a single stockheader, raw file on GitLab master —
SEGVs deterministically on Linux x86_64. Repro: drop that one file into an
empty directory and run
index_repositoryon it:This is the crash I described in the PR body as "corruption in the
template-function registry path" that the defensive guard deliberately did not
fix — it turns out not to be registry corruption but an uninitialized stack
array at two
cbm_type_substitutecall sites.Root cause
fb17573bounds the walk with:and documents the contract:
type_argsis either parallel-length orshorter-and-NULL-terminated. Two call sites in
c_lsp.cviolate that contract— the
template_functionbranch ofc_eval_expr_type_inner(~line 1582) andthe qualified-identifier-with-template-args branch (~line 1438):
When the call site supplies fewer explicit template args than the function has
type params (e.g. one explicit arg against the 4-name fallback list, or a
registered
type_param_nameslist longer than the explicit args), theargs_lenwalk readstargs[targ_count..]— uninitialized stack slots.Whatever garbage is there is non-NULL, so it extends
args_len, and thematched-param return
(i < args_len && type_args[i]) ? type_args[i] : thands the stack junk back as a
const CBMType*. It then gets wrapped viacbm_type_func()(c_lsp.c:1616) and dereferenced downstream in thecall_expressionbranch (ret->kind, c_lsp.c:1659) → SEGV.ns-3's
ptr.hhits this through its templated free functions / operators(
Ptr<T>,Create<T>(...), templated comparison operators), which resolvethrough the fallback-positional path with partial explicit args.
Why the earlier reports looked different: the read is of uninitialized stack
memory, so the faulting value varies with build and stack layout — UBSan
instrumentation alone shifts the frame enough to mask it entirely (compiling
lsp_all.cwith-fsanitize=undefinedat-O1or-O2makes the crashvanish; plain
-O2crashes every time). MSan or-ftrivial-auto-var-init=patternwould have caught it directly; ASan/valgrind don't flag stack-uninit reads.
The other two
targs[16]declarations inc_lsp.c(~881, ~971) are fine —they feed
cbm_type_template(..., targ_count), which copies by count.Fix
Zero-initialize both arrays so the fill is always NULL-terminated, matching
the documented
cbm_type_substitutecontract:Two lines total. Verified on ns-3 (
ptr.halone, and a full ~40k-node indexof an ns-3-based codebase): no SEGV, and a temporary trap asserting
pointer-validity of every
return_types[]entry atcbm_type_func()creationnever fires after the change.
A regression fixture for
tests/would be a templated C++ header whosetemplated function is called with fewer explicit template args than declared
params, e.g.:
(needs the function registered first, then the call-site type-eval — the
existing
typerep_substitute_short_args_no_oob_issue427covers thesubstitute-side bound but cannot see this caller-side contract break, since
the corruption enters through the array the caller hands in.)
Happy to open a PR with the two-line fix + fixture if useful.