add concurrent-code seams and hermetic tests for statsd, WaitLoop, Monitor#53
Merged
Merged
Conversation
…nitor
CI runs `go test -race` but until now no test exercised the package's
concurrent code paths, so the race detector had nothing to observe. This
adds the minimum testability seams to drive each concurrent surface from
a hermetic test:
- statsd.go: extract a `chan<- string` parameter on `count`, `timer`,
`gauge`, `increment`, and `statsdSender`. Public `Count`/`Timer`/
`Gauge`/`Increment`/`StatsdSender` route through the new helpers with
the package-level `queue` as the default. Tests inject their own
buffered channel.
- destinations.go: factor `Monitor`'s loop body into `monitorWithCheck`
(one iteration, returns next confidence) and `WaitFor`'s loop into
`waitForWithCheck(check, sleep)`. The infinite `for {}` in `Monitor`
stays in the public method.
- connectivity.go: add `waitLoop(destinations, checks, sleep)` so the
goroutine fan-out is driveable with stub checks and a zero sleep.
New tests in concurrency_test.go:
- TestStatsdCountBlocksWhenQueueFull pins the #11 back-pressure hazard
by saturating an injected queue and asserting the next send blocks.
Marked `Refs #11 -- flip when fixed` for when non-blocking sends land.
- TestStatsdSenderDeliversToUDP runs the sender goroutine against a
local UDP listener with M concurrent producers — real producer/consumer
exercise under -race.
- TestWaitLoopCompletesWhenChecksSucceed drives the goroutine fan-out
with a stub check that succeeds on the second attempt, asserts
wg.Wait() returns and brackets NumGoroutine to catch leaks.
- TestMonitorWithCheckResetsConfidenceOnFailure pins the post-#16
confidence reset: after saturating at 10, one failure snaps the next
sleep back to 1 minute.
Scope is testability only: this does NOT fix the #11 back-pressure bug
(producers still block), the #17 lifecycle gaps (no signal handling,
no statsd drain, no panic recovery, no context.Context), or the #18
wait-timeout flag. `see #17` / `see #18` doc comments mark the spots
where the substrate work needs to land.
Each new test was verified by mutation: temporarily breaking the
production code (e.g., making `count` non-blocking, dropping bytes in
`statsdSender`, skipping the retry in `waitForWithCheck`, dropping the
confidence reset) causes the matching assertion to fail. The tests are
not checklist theater.
Fixes #49
https://claude.ai/code/session_01WjHPSobuzrRkjwUgjAJWMk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
go test -racewas running in CI, but no test exercised the package's concurrent surfaces, so the race detector had nothing to observe — a false signal. This PR adds the minimum testability seams to drive each concurrent path from a hermetic test, plus four tests that actually run that code under-race.Seams added (existing public signatures unchanged; new helpers are unexported lowercase variants):
statsd.go—count/timer/gauge/increment/statsdSendernow take achan<- string(or<-chan string) parameter. PublicCount/Timer/Gauge/Increment/StatsdSenderroute through the helpers with the package-levelqueueas the default. Tests inject their own buffered channel.destinations.go—Monitor's loop body is factored intomonitorWithCheck(confidence, check, sleep) int(one iteration, returns the next confidence).WaitFor's loop is factored intowaitForWithCheck(check, sleep). The infinitefor {}inMonitorstays in the public method.connectivity.go—waitLoop(destinations, checks, sleep)is the testable form ofWaitLoop, letting tests drive the goroutine fan-out with stub checks and a zero sleep.Tests added (
concurrency_test.go):TestStatsdCountBlocksWhenQueueFull— pins the Statsd emitter can stall checks when statsd is down; opens a new connection per metric #11 back-pressure hazard by saturating an injected queue and asserting the nextcountsend blocks. MarkedRefs #11 -- flip when fixedso a future non-blocking-send PR knows to invert the assertion.TestStatsdSenderDeliversToUDP— stands up a real UDP listener on127.0.0.1:0, runs the sender goroutine, fans 4 producers x 5 metrics through the injected queue, and asserts all 20 datagrams arrive with the right wire format. Real producer/consumer execution under-race.TestWaitLoopCompletesWhenChecksSucceed— drives the goroutine fan-out with stub checks that succeed on the second attempt; asserts every destination's check was invoked at least twice and thatruntime.NumGoroutine()returns to the baseline (catches a leakedwaitForWithCheckgoroutine).TestMonitorWithCheckResetsConfidenceOnFailure— pins the post-monitorcadence delays outage detection by tens of minutes after a healthy period #16 confidence reset: after 12 successes saturate confidence at 10 (sleep = 10m), one failure snaps the next sleep back to 1m.Scope
This is testability-only — no user-visible behavior changes:
context.Context).// see #17doc comments mark the spots where that work needs to land.waithas no overall timeout — orchestrators hang indefinitely on permanently-broken dependencies #18 (no--timeoutflag, no exponential backoff inWaitFor).// see #18doc comment marks the15 * time.Secondsleep.Per AGENTS.md TDD
Each new test was verified by mutation testing: temporarily breaking the production code (making
countnon-blocking; dropping bytes instatsdSender; skipping the retry inwaitForWithCheck; dropping the confidence reset inmonitorWithCheck) caused the matching assertion to fail, confirming the test actually exercises the failure mode rather than passing trivially.Test Plan
go vet ./...gofmt -l .(no output)go build ./...go test -race ./...— 4 new tests pass; pre-existingTestRouteToLoopback{1,2,3}failures from sandbox routing (Tests aren't hermetic and major modules are untested; production code is hard to test #24) remain onorigin/mainand are not introduced by this PR.go test -race -count=10 -run "TestStatsd|TestWaitLoop|TestMonitor"— no flakiness.Fixes #49
https://claude.ai/code/session_01WjHPSobuzrRkjwUgjAJWMk
Generated by Claude Code