Skip to content

Add UB and address sanitizers to CI#252

Merged
caitlinross merged 5 commits into
masterfrom
ci-sanitizers
Jun 30, 2026
Merged

Add UB and address sanitizers to CI#252
caitlinross merged 5 commits into
masterfrom
ci-sanitizers

Conversation

@caitlinross

@caitlinross caitlinross commented Jun 30, 2026

Copy link
Copy Markdown
Member

Adding UB and address sanitizers to CI. UB sanitizer already uncovered some issues that are also fixed here. I didn't turn on the memory leak check because we'll likely need a supression file.

packet_arrive_rc applied qhash_entry() (a container_of) directly to the
result of qhash_search(), which returns NULL on a miss. container_of on
NULL forms an out-of-bounds pointer (NULL - offsetof) — undefined behavior
even before any dereference. UBSan flags it during optimistic rollback:

  dragonfly-dally.cxx:6123: runtime error: applying non-zero offset
  18446744073709551576 to null pointer  (in packet_arrive_rc)

Guard the qhash_entry on a non-NULL hash_link, matching the forward handler
(packet_arrive) and the sibling reverse handler, which already do. tmp stays
NULL on a miss, which the downstream branches already handle.
Add a `sanitizers` job (ubuntu-24.04 / gcc / MPICH) that runs CODES under
ASan and UBSan as separate matrix legs, plus `asan`/`ubsan` CMake presets and
a CODES_SANITIZER knob that instruments every CODES target (the library,
executables, and tests). ROSS is a prebuilt imported target and is not
reinstrumented, so this is CODES-only for now.

Separate legs so each reports and gates independently. UBSan gates now:
-fno-sanitize-recover (baked into the preset) makes any UB abort, so a real
finding fails the job. ASan is informational to start (continue-on-error)
with LeakSanitizer disabled (detect_leaks=0) so the lane delivers the
high-value detections (heap-buffer-overflow, use-after-free, stack overflow)
without MPICH and teardown leak noise; enabling leaks with a suppression file
and flipping ASan to gating is a follow-up.

log_path captures rank-level sanitizer reports past the test harnesses'
per-rank stderr redirection (e.g. the ping-pong determinism test's
`2> model-output-error.txt`) and uploads them as artifacts on failure.
dragonfly_sample_fin called fwrite(s->sample_stat, ..., s->op_arr_size)
unconditionally at finalize. A terminal that recorded no compute-node samples
has op_arr_size == 0 and a NULL sample_stat (never allocated). fwrite declares
its buffer nonnull, so passing NULL is undefined behavior even for a
zero-element write; gcc's UBSan flags it:

  dragonfly.c:2379: runtime error: null pointer passed as argument 1,
  which is declared to never be null  (in dragonfly_sample_fin)

Guard the write on op_arr_size > 0. The router fin (dragonfly_rsample_fin) and
the mid-run flush already avoid this — the former writes inside a
for (i < op_arr_size) loop, the latter runs after sample_stat is malloc'd.
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/networks/model-net/dragonfly-dally.cxx 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

ASan at Debug's -O0 made the suite ~59 min on CI — the six ping-pong tests ran
10-15 min each. ASan docs recommend -O1, which brings the full suite to ~5 min,
fast enough to run on every PR. Scoped to the asan preset; UBSan stays -O0.

With the lane green and fast, drop continue-on-error so an ASan finding blocks
merge like UBSan. LeakSanitizer stays off (detect_leaks=0) pending a
suppression-file follow-up.
doc/dev/ci.md predated the asan/ubsan lanes. Document how they work (separate
gating legs, CODES-only instrumentation, ASan at -O1, leaks off, log_path
capture, local usage) and record the deferred
follow-ups (enable LeakSanitizer + suppressions) so they're
tracked in-repo rather than forgotten.
@caitlinross caitlinross merged commit 9d67ec8 into master Jun 30, 2026
13 of 14 checks passed
@caitlinross caitlinross deleted the ci-sanitizers branch June 30, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant