fix: QPS schedule drift, handle reaping, stats window drift, sampling bias by brayniac · Pull Request #146 · iopsystems/llm-perf

brayniac · 2026-06-12T15:23:36Z

Summary

A batch of smaller correctness cleanups from the original review that weren't in the earlier groups.

QPS schedule drift (release: v0.1.5 #10). The QPS main loop slept a fresh inter-arrival delay each iteration, letting per-iteration work accumulate as drift — the achieved rate fell below target over a long run. Now it advances an absolute next_send schedule and sleep_untils it (self-correcting: if the loop runs late, the next sleep is shorter). It also uses the scheduled time as the request's arrival, so the schedule-slip metric from the coordinated-omission work now captures scheduling drift too.
JoinHandle accumulation. The QPS and concurrent fixed-count loops pushed one JoinHandle per request into a Vec for the whole run (memory growth on long runs). Now finished handles are reaped periodically (retain(!is_finished) past a concurrency + 256 threshold) so the Vec stays bounded in steady state. The end-of-run grace drain is unchanged. Note: under sustained open-loop overload, queued-but-unfinished tasks still accumulate — that's inherent to QPS mode and would need a separate outstanding-request cap (flagged as a follow-up, not regressed by this change).
Windowed-stats interval drift. stats.rs divided window deltas by the nominal interval, so a delayed tick distorted the reported rate. Now it divides by the actual elapsed time since the previous window, sets MissedTickBehavior::Skip (no catch-up bursts), and uses saturating_sub for the deltas (the counters are read non-atomically across many statics).
Dataset sampling bias. Sampling did take(sample_size) before shuffling, biasing the sample to the first N entries of a possibly-sorted file. Now a tested sample_workloads helper shuffles the full set (seeded) then truncates, and the loader warns when sample_size exceeds the dataset.

Test plan

cargo test — 96 lib + 16 integration tests pass, 0 failures. New tests: sample_workloads (shuffle-before-truncate proven by asserting the sample isn't the first N; caps at len; deterministic for a seed).
cargo clippy --all-targets — clean
cargo fmt --check — clean
Independent review of the async-path changes (QPS schedule self-correction, handle reaping, stats elapsed tracking). It confirmed the schedule and stats tracking correct and the burst-after-stall behavior unchanged vs the old code; its one substantive point — that the handle reap doesn't bound the vec under sustained overload — is inherent to open-loop QPS and is now documented in-code + above.

… bias A batch of smaller correctness cleanups from the original review that weren't in the earlier groups. - benchmark (QPS): the main loop slept a fresh inter-arrival delay each iteration, letting per-iteration work accumulate as drift so the achieved rate fell below target over a long run. Now it advances an absolute `next_send` schedule and sleeps until it (self-correcting), and uses the scheduled time as the request's arrival — so the schedule-slip metric also captures scheduling drift. - benchmark: the QPS and concurrent fixed-count loops pushed one JoinHandle per request into a Vec for the whole run. Now finished handles are reaped periodically so the Vec stays bounded in steady state. (Sustained open-loop overload still accumulates offered-but-unserved requests — inherent to QPS mode.) - stats: windowed rates divided counter deltas by the nominal interval, so a delayed tick distorted the reported rate. Now they divide by the actual elapsed time since the previous window, the interval timer uses MissedTickBehavior::Skip (no catch-up bursts), and the deltas use saturating_sub (the counters are read non-atomically across many statics). - benchmark (dataset): sampling did `take(sample_size)` BEFORE shuffling, biasing the sample to the first N entries of a possibly-sorted file. Now a tested `sample_workloads` helper shuffles the full set (seeded) then truncates, and the loader warns when sample_size exceeds the dataset. New unit tests cover sample_workloads (shuffle-before-truncate, caps at len). The async loop changes (QPS schedule, handle reaping, stats elapsed) are build + reasoning-verified. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

brayniac merged commit a91bf05 into iopsystems:main Jun 12, 2026
7 checks passed

brayniac mentioned this pull request Jun 12, 2026

refactor(benchmark): concurrent fixed-count mode as a worker pool #147

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: QPS schedule drift, handle reaping, stats window drift, sampling bias#146

fix: QPS schedule drift, handle reaping, stats window drift, sampling bias#146
brayniac merged 1 commit into
iopsystems:mainfrom
brayniac:fix/review-cleanup-batch

brayniac commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brayniac commented Jun 12, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant