Skip to content

c_lsp: SEGV on templated C++ persists after #427 fix — uninitialized targs[16] passed to cbm_type_substitute #432

@mattall

Description

@mattall

Summary

The root-cause fix for #427 (fb17573, bounding type_args access in
cbm_type_substitute) closes the producer your repro exercised, but a second
producer of the same garbage-CBMType* crash survives on current main
(tested at 2712756). Indexing ns-3's src/core/model/ptr.h — a single stock
header, raw file on GitLab master
SEGVs deterministically on Linux x86_64. Repro: drop that one file into an
empty directory and run index_repository on it:

curl -LO https://gitlab.com/nsnam/ns-3-dev/-/raw/master/src/core/model/ptr.h
Invalid read of size 4
   at c_eval_expr_type_inner (c_lsp.c:1659)    ret->kind with ret == 0x9
   by c_eval_expr_type (c_lsp.c:1240)
   by c_resolve_calls_in_node (c_lsp.c:3437)
   ...
   by c_lsp_process_file (c_lsp.c:4325)        pass=definitions

This is the crash I described in the PR body as "corruption in the
template-function registry path" that the defensive guard deliberately did not
fix — it turns out not to be registry corruption but an uninitialized stack
array at two cbm_type_substitute call sites.

Root cause

fb17573 bounds the walk with:

int args_len = 0;
while (args_len < nparams && type_args[args_len]) args_len++;

and documents the contract: type_args is either parallel-length or
shorter-and-NULL-terminated. Two call sites in c_lsp.c violate that contract
— the template_function branch of c_eval_expr_type_inner (~line 1582) and
the qualified-identifier-with-template-args branch (~line 1438):

const CBMType* targs[16];          // uninitialized stack storage
int targ_count = 0;
/* fills targs[0..targ_count-1]; never writes a NULL terminator */
...
const char* fallback_names[] = {"T", "U", "V", "W", NULL};   // nparams = 4
base_ret = cbm_type_substitute(ctx->arena, base_ret, fallback_names, targs);

When the call site supplies fewer explicit template args than the function has
type params (e.g. one explicit arg against the 4-name fallback list, or a
registered type_param_names list longer than the explicit args), the
args_len walk reads targs[targ_count..]uninitialized stack slots.
Whatever garbage is there is non-NULL, so it extends args_len, and the
matched-param return (i < args_len && type_args[i]) ? type_args[i] : t
hands the stack junk back as a const CBMType*. It then gets wrapped via
cbm_type_func() (c_lsp.c:1616) and dereferenced downstream in the
call_expression branch (ret->kind, c_lsp.c:1659) → SEGV.

ns-3's ptr.h hits this through its templated free functions / operators
(Ptr<T>, Create<T>(...), templated comparison operators), which resolve
through the fallback-positional path with partial explicit args.

Why the earlier reports looked different: the read is of uninitialized stack
memory, so the faulting value varies with build and stack layout — UBSan
instrumentation alone shifts the frame enough to mask it entirely (compiling
lsp_all.c with -fsanitize=undefined at -O1 or -O2 makes the crash
vanish; plain -O2 crashes every time). MSan or -ftrivial-auto-var-init=pattern
would have caught it directly; ASan/valgrind don't flag stack-uninit reads.

The other two targs[16] declarations in c_lsp.c (~881, ~971) are fine —
they feed cbm_type_template(..., targ_count), which copies by count.

Fix

Zero-initialize both arrays so the fill is always NULL-terminated, matching
the documented cbm_type_substitute contract:

const CBMType* targs[16] = {NULL};

Two lines total. Verified on ns-3 (ptr.h alone, and a full ~40k-node index
of an ns-3-based codebase): no SEGV, and a temporary trap asserting
pointer-validity of every return_types[] entry at cbm_type_func() creation
never fires after the change.

A regression fixture for tests/ would be a templated C++ header whose
templated function is called with fewer explicit template args than declared
params, e.g.:

template <typename T, typename U, typename V>
T Convert(U u) { return T(u); }
void Use() { auto x = Convert<int>(1.5); }   // 1 explicit arg, 3 params

(needs the function registered first, then the call-site type-eval — the
existing typerep_substitute_short_args_no_oob_issue427 covers the
substitute-side bound but cannot see this caller-side contract break, since
the corruption enters through the array the caller hands in.)

Happy to open a PR with the two-line fix + fixture if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions