Skip to content

add concurrent-code seams and hermetic tests for statsd, WaitLoop, Monitor#53

Merged
dolph merged 1 commit into
mainfrom
claude/fix-issue-49-concurrency-seams
May 22, 2026
Merged

add concurrent-code seams and hermetic tests for statsd, WaitLoop, Monitor#53
dolph merged 1 commit into
mainfrom
claude/fix-issue-49-concurrency-seams

Conversation

@dolph
Copy link
Copy Markdown
Owner

@dolph dolph commented May 16, 2026

Summary

go test -race was running in CI, but no test exercised the package's concurrent surfaces, so the race detector had nothing to observe — a false signal. This PR adds the minimum testability seams to drive each concurrent path from a hermetic test, plus four tests that actually run that code under -race.

Seams added (existing public signatures unchanged; new helpers are unexported lowercase variants):

  • statsd.gocount / timer / gauge / increment / statsdSender now take a chan<- string (or <-chan string) parameter. Public Count / Timer / Gauge / Increment / StatsdSender route through the helpers with the package-level queue as the default. Tests inject their own buffered channel.
  • destinations.goMonitor's loop body is factored into monitorWithCheck(confidence, check, sleep) int (one iteration, returns the next confidence). WaitFor's loop is factored into waitForWithCheck(check, sleep). The infinite for {} in Monitor stays in the public method.
  • connectivity.gowaitLoop(destinations, checks, sleep) is the testable form of WaitLoop, letting tests drive the goroutine fan-out with stub checks and a zero sleep.

Tests added (concurrency_test.go):

  • TestStatsdCountBlocksWhenQueueFull — pins the Statsd emitter can stall checks when statsd is down; opens a new connection per metric #11 back-pressure hazard by saturating an injected queue and asserting the next count send blocks. Marked Refs #11 -- flip when fixed so a future non-blocking-send PR knows to invert the assertion.
  • TestStatsdSenderDeliversToUDP — stands up a real UDP listener on 127.0.0.1:0, runs the sender goroutine, fans 4 producers x 5 metrics through the injected queue, and asserts all 20 datagrams arrive with the right wire format. Real producer/consumer execution under -race.
  • TestWaitLoopCompletesWhenChecksSucceed — drives the goroutine fan-out with stub checks that succeed on the second attempt; asserts every destination's check was invoked at least twice and that runtime.NumGoroutine() returns to the baseline (catches a leaked waitForWithCheck goroutine).
  • TestMonitorWithCheckResetsConfidenceOnFailure — pins the post-monitor cadence delays outage detection by tens of minutes after a healthy period #16 confidence reset: after 12 successes saturate confidence at 10 (sleep = 10m), one failure snaps the next sleep back to 1m.

Scope

This is testability-only — no user-visible behavior changes:

Per AGENTS.md TDD

Each new test was verified by mutation testing: temporarily breaking the production code (making count non-blocking; dropping bytes in statsdSender; skipping the retry in waitForWithCheck; dropping the confidence reset in monitorWithCheck) caused the matching assertion to fail, confirming the test actually exercises the failure mode rather than passing trivially.

Test Plan

  • go vet ./...
  • gofmt -l . (no output)
  • go build ./...
  • go test -race ./... — 4 new tests pass; pre-existing TestRouteToLoopback{1,2,3} failures from sandbox routing (Tests aren't hermetic and major modules are untested; production code is hard to test #24) remain on origin/main and are not introduced by this PR.
  • go test -race -count=10 -run "TestStatsd|TestWaitLoop|TestMonitor" — no flakiness.
  • Mutation: each test fails when the production code path it claims to cover is broken.

Fixes #49

https://claude.ai/code/session_01WjHPSobuzrRkjwUgjAJWMk


Generated by Claude Code

…nitor

CI runs `go test -race` but until now no test exercised the package's
concurrent code paths, so the race detector had nothing to observe. This
adds the minimum testability seams to drive each concurrent surface from
a hermetic test:

- statsd.go: extract a `chan<- string` parameter on `count`, `timer`,
  `gauge`, `increment`, and `statsdSender`. Public `Count`/`Timer`/
  `Gauge`/`Increment`/`StatsdSender` route through the new helpers with
  the package-level `queue` as the default. Tests inject their own
  buffered channel.
- destinations.go: factor `Monitor`'s loop body into `monitorWithCheck`
  (one iteration, returns next confidence) and `WaitFor`'s loop into
  `waitForWithCheck(check, sleep)`. The infinite `for {}` in `Monitor`
  stays in the public method.
- connectivity.go: add `waitLoop(destinations, checks, sleep)` so the
  goroutine fan-out is driveable with stub checks and a zero sleep.

New tests in concurrency_test.go:

- TestStatsdCountBlocksWhenQueueFull pins the #11 back-pressure hazard
  by saturating an injected queue and asserting the next send blocks.
  Marked `Refs #11 -- flip when fixed` for when non-blocking sends land.
- TestStatsdSenderDeliversToUDP runs the sender goroutine against a
  local UDP listener with M concurrent producers — real producer/consumer
  exercise under -race.
- TestWaitLoopCompletesWhenChecksSucceed drives the goroutine fan-out
  with a stub check that succeeds on the second attempt, asserts
  wg.Wait() returns and brackets NumGoroutine to catch leaks.
- TestMonitorWithCheckResetsConfidenceOnFailure pins the post-#16
  confidence reset: after saturating at 10, one failure snaps the next
  sleep back to 1 minute.

Scope is testability only: this does NOT fix the #11 back-pressure bug
(producers still block), the #17 lifecycle gaps (no signal handling,
no statsd drain, no panic recovery, no context.Context), or the #18
wait-timeout flag. `see #17` / `see #18` doc comments mark the spots
where the substrate work needs to land.

Each new test was verified by mutation: temporarily breaking the
production code (e.g., making `count` non-blocking, dropping bytes in
`statsdSender`, skipping the retry in `waitForWithCheck`, dropping the
confidence reset) causes the matching assertion to fail. The tests are
not checklist theater.

Fixes #49

https://claude.ai/code/session_01WjHPSobuzrRkjwUgjAJWMk
@dolph dolph added the release:skip label May 16, 2026 — with Claude
@dolph dolph merged commit e80ccb5 into main May 22, 2026
2 checks passed
@dolph dolph deleted the claude/fix-issue-49-concurrency-seams branch May 22, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment