feat(saturation): bisect + drain search, plateau detection, transition flags by brayniac · Pull Request #145 · iopsystems/llm-perf

brayniac · 2026-06-12T15:09:34Z

Summary

Rewrites the concurrency saturation search (#17 — the last of the review groups). The old search climbed multiplicatively only, judged each rung on a single window, stopped after N consecutive failures, and gated throughput against a linear extrapolation from a moving baseline — so it undershot the true ceiling by up to (step_multiplier − 1), was noise-sensitive, and mismodeled plateaus.

New design — a pure, unit-tested SearchPlanner state machine separated from a thin async driver:

Climb → drain → re-probe → bisect → confirm. Climb multiplicatively until an SLO breaks; drop to the last-good concurrency and drain to a clean slate (no measurement) before re-probing the failed rung, distinguishing a transient/metastable failure from a genuine one. On a genuine failure, binary-search the exact ceiling, measuring each rung from a drained state, then confirm the boundary over several windows (M-of-N) so a single noisy window can't decide it.
Marginal-gain throughput gate (replaces linear extrapolation): a rung trips it when tokens/s falls below min_throughput_ratio of the throughput projected from the previous rung (a fixed pre-plateau baseline during bisection) — detecting the plateau directly.
Transition flags: the knee (saturation onset) and any transient recoveries, in the console summary and JSON. Each step is labeled with its phase (climb/bisect/confirm) and whether it was drained.
Driver: grows via add_permits, shrinks via forget_permits, and on a drain waits for the in-flight gauge to fall to target (bounded by the sample window) plus a short settle before measuring.

Config

min_throughput_ratio keeps its name with the new marginal semantics (documented).
stop_after_failures is now unused (accepted for backward compatibility).

Test plan

cargo test — 94 lib + 16 integration tests pass, 0 failures. New planner_tests drive the planner against modeled servers: exact-knee bisection (converges to 50 where a multiplicative climb would report 40), no-compliant-at-start, transient recovery after drain, throughput-plateau detection (111), and confirm-window step-down (49).
cargo clippy --all-targets — clean
cargo fmt --check — clean
Independent review of the planner termination/convergence + drain interaction. It caught a real issue — the back-off drop was being measured as a full window (spurious results row + wasted ~sample_window per transient); fixed by making the drop a drain-only Action::Drain (no measurement). Declined its suggestion to drop the throughput gate during re-probe, since that would break plateau detection (a pure-latency transient has scaling throughput and passes; only a real plateau re-fails).
The async driver drain/forget_permits shrink is build + reasoning-verified; exercising the full climb→bisect→confirm against a live server is the recommended smoke test (low max_concurrency, watch the printed table).

Generated with Claude Code

…ition flags Rewrite the concurrency saturation search (iopsystems#17). The old search climbed multiplicatively only, judged each rung on a single window, terminated after N consecutive failures, and gated throughput against a linear extrapolation from a moving baseline — so it undershot the true ceiling by up to (step_multiplier-1), was noise-sensitive, and mismodeled how servers plateau. The new search separates a pure, unit-tested `SearchPlanner` state machine from a thin async driver: - Climb multiplicatively until an SLO breaks, then drop to the last-good concurrency and DRAIN to a clean slate (no measurement) before re-probing the failed rung — distinguishing a transient/metastable failure from a genuine one. - On a genuine failure, binary-search the exact ceiling, measuring each rung from a drained state, then confirm the boundary over several windows (M-of-N) so a single noisy window can't decide it. - Throughput is a marginal-gain gate: a rung trips it when tokens/s falls below min_throughput_ratio of the throughput projected from the previous rung (a fixed pre-plateau baseline during bisection) — detecting the plateau directly instead of extrapolating linearly. - Emits transition flags: the knee (saturation onset) and any transient recoveries, surfaced in the console summary and JSON results. Each step is labeled with its phase (climb/bisect/confirm) and whether it was drained. The driver grows via add_permits and shrinks via forget_permits, and on a drain waits for the in-flight gauge to fall to the target (bounded) plus a short settle before measuring. `min_throughput_ratio` keeps its name with the new marginal semantics; `stop_after_failures` is now unused (accepted for backward compatibility). New unit tests drive the planner against modeled servers: exact-knee bisection, no-compliant-at-start, transient recovery after drain, throughput-plateau detection, and confirm-window step-down. An independent review flagged that the back-off drop was being measured as a full window (spurious step + wasted time); fixed by making it a drain-only action. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

brayniac merged commit 4afab3e into iopsystems:main Jun 12, 2026
7 checks passed

brayniac mentioned this pull request Jun 13, 2026

feat(saturation): configurable tuning knobs; deprecate stop_after_failures #151

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(saturation): bisect + drain search, plateau detection, transition flags#145

feat(saturation): bisect + drain search, plateau detection, transition flags#145
brayniac merged 1 commit into
iopsystems:mainfrom
brayniac:fix/saturation-probe-retreat

brayniac commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brayniac commented Jun 12, 2026

Summary

Config

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant