feat: add timer-overhead correction, saturation warning, and resolution diagnostic#571
Conversation
…on diagnostic Three cooperating diagnostics for sub-microsecond benchmarking, integrated in a single coherent code path so that each one can rely on the others: - BenchOptions.subtractTimerOverhead (default false): when enabled, the cost of one timestamp provider call is calibrated once at construction time via the new exported calibrateTimerOverhead helper, then subtracted from each raw latency sample (clamped to zero) before statistics are computed. Samples returned by the task function via overriddenDuration are intentional user values and are skipped by the correction. - 'warning' event on BenchEvents and TaskEvents, dispatched on both the Bench and the Task instances when the latency samples of a task are dominated by the timer resolution. Detection uses three OR'd criteria computed by detectTimerSaturation: more than half zero samples, fewer than max(3, min(10, n/1000)) distinct values, or zero MAD with n > 100. An n < 10 guard prevents false positives on unit-style benchmarks. - Task.detectedResolution getter, populated after each run with the smallest strictly-positive sample value that appears at least twice (smallest reproducibly observed increment). Falls back to the strict minimum when no positive value repeats. A new estimateResolution helper is exported. calibrateTimerOverhead (utils.ts): - Subtracts in the provider's native type before converting to milliseconds (toMs(b - a) rather than toMs(b) - toMs(a)), preserving bigint precision on long-uptime hosts. - Discards a configurable warmup phase (default 64 pairs) so the JIT reaches its steady-state tier before measurements begin. - Returns 0 when fewer than half the back-to-back pairs produce a positive delta — in that regime the timer resolution exceeds the call cost and the positive deltas measure a tick boundary, not the call cost. - Configurable estimator: 'median' (default), 'min', or 'p05'. Task#processRunResult orders the diagnostics so they always reflect the raw, uncorrected measurements: 1. sortSamples on raw latencies 2. estimateResolution on raw sorted samples 3. when overhead correction is active, compute raw statistics, evaluate detectTimerSaturation against the raw distribution, then apply the correction in-place (skipping overridden samples) and re-sort only when overridden samples were skipped 4. compute the final (possibly corrected) statistics 5. when no correction was applied, evaluate detectTimerSaturation against the final samples (raw == final in this path) This consolidates three previously separate proposals (PRs tinylibs#568/tinylibs#569/tinylibs#570) into a single coherent change: composing them naively would have caused the diagnostics to operate on the corrected-and-clamped sample set, producing artificially small detected-resolution values and false-positive saturation warnings on benchmarks that activate overhead subtraction.
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a71cb2d80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Apply audit-driven fixes to PR tinylibs#571: * fix(task): correct overhead before sort to keep latencySamples aligned with isOverridden (collection order). Previous logic indexed isOverridden after sortSamples, corrupting both overriddenDuration preservation and measured-sample skip in mixed-mode tasks. * fix(task): run timer-saturation detection on a measured-only subset so constant overriddenDuration values cannot trigger a spurious low-distinct-count warning. * fix(bench): assert subtractTimerOverhead is incompatible with concurrency: 'task' (sequential calibration would not reflect per-iteration cost under concurrency). * fix(bench): normalize subtractTimerOverhead with ?? false instead of === true to accept any falsy default consistently. * fix(index): export detectTimerSaturation alongside the other timer diagnostics helpers. * docs(types): rewrite subtractTimerOverhead JSDoc with an honest treatment of the max(0, x) clamp and the two caveats (concurrency, overriddenDuration). Remove an orphan /** block. * test: rewrite the overriddenDuration warning test to assert warningCount === 0, matching the measured-only saturation behavior.
jerome-benoit
left a comment
There was a problem hiding this comment.
Cross-validated audit (8 oracle agents, 4 paired dimensions)
Findings classified by inter-pair convergence:
Convergent (≥2 pairs flagged the same item)
task.ts:648—detectedResolutionincludesoverriddenDurationvaluestask.ts:662— Phase 6computeStatisticsdiscards 14 of 15 fieldsbench.ts:215—?? falseaccepts truthy non-boolean from JS callers, inconsistent with sibling line 214bench.ts:219— assert is one-shot at construction; runtimeconcurrencymutation bypasses itbench.ts:218— assert message lacks remediationtask.ts:699—'warning'event payload carries no reasontypes.ts:128—BenchLike.timerOverheadrequired + non-readonly: breaking + unsafetypes.ts:204— JSDoc enumeration omitsaad/mad/min/maxtests— mixed overridden+measured run not exercised (the alignment invariant)tests—concurrency:'task'+subtractTimerOverheadassert untestedtests—'p05'estimator branch untestedutils.ts:488— bigint cast misleads the type system
Single-agent / high-signal
task.ts:631— clamp injects synthetic zeros affectingmin, throughput, criterion Atypes.ts:212—rmedeterministically inflates byM / (M − Ĉ)in the clean-shift regime, not "potentially"types.ts:211—p50(and downstreammad/aad) can collapse to 0 when ≥ 50% samples clamptypes.ts:26—'warning'widening across three unions breaks exhaustiveswitchconsumersutils.ts:285—detectTimerSaturationaccepts unsorted input but correctness requires sortedutils.ts:301— distinct-count loop has no short-circuit at thresholdutils.ts:454—TimerOverheadEstimatoris a string literal union; suffix suggests a strategytest/calibrate-timer-overhead.test.ts:57—min ≤ median * 2is mathematically trivialtest/detected-resolution.test.ts:59— conditionalif (resolution !== undefined)skips all assertions on regressionREADME.md— public API additions undocumented
Verified correct (sample)
- Phase 1 alignment of
latencySamples[i]withisOverridden[i]in collection order measuredOnly === latencySamplesreference-equality optimization- Aborted-run paths (no samples, partial samples)
estimateResolutionedge cases ([0,0,0]→undefined, all-equal positives)'warning'widening consistent acrossBenchEvents/BenchEventsWithTask/TaskEvents- Bigint precision preserved at runtime
- Style and Biome conformance on new test files
Detail in the inline comments. The mixed-overridden test gap is the only blocker.
* Add `classifyTimerSaturation` returning a `TimerSaturationReason` (`'zero-dominated' | 'low-distinct' | 'zero-mad'`) and re-implement `detectTimerSaturation` as a boolean wrapper. * Tighten `detectTimerSaturation`/`classifyTimerSaturation` parameter type from `Samples` to `SortedSamples`. * Short-circuit the distinct-value loop once the threshold is reached. * Add `medianAbsoluteDeviation(SortedSamples)` helper. * Rename `TimerOverheadEstimator` to `TimerOverheadEstimatorKind`. * Replace `as unknown as number` with `as bigint` in `calibrateTimerOverhead`; document operator polymorphism. * Re-export `classifyTimerSaturation`, `medianAbsoluteDeviation`, `TimerSaturationReason`, `TimerOverheadEstimatorKind` from the package entry point.
Extend `BenchEvent` with an optional `reason` payload symmetrical to `error`. The `reason` getter is typed as `TimerSaturationReason | undefined` for `'warning'` events and `undefined` for every other event type. * Move `TimerSaturationReason` from `utils.ts` to `types.ts` to align with the `Statistics`/`Samples` convention (types in `types.ts`, helpers in `utils.ts`). * Add a `'warning'` constructor overload accepting an optional reason. * Re-export `TimerSaturationReason` from the `./types` block in the package entry point.
…only samples * Compute `detectedResolution` from the measured-only subset (excluding `overriddenDuration` samples). A constant override value is no longer reported as the timer grain. * Allocate `isOverridden` unconditionally so the measured-only filter is also active when `subtractTimerOverhead` is disabled. * Replace Phase 6 `computeStatistics` recomputation with the dedicated `medianAbsoluteDeviation` helper. * Use `classifyTimerSaturation` and propagate the `TimerSaturationReason` onto the `'warning'` event payload. * Update `Task.detectedResolution` JSDoc to reflect the measured-only semantics; update the `#processRunResult` ordering description.
…ten options coercion * Coerce `subtractTimerOverhead` with `=== true`, matching the sibling `retainSamples` form. Truthy non-boolean values from JS callers are now rejected. * Re-state the constructor assert message in remediation form (action the user can take, not the internal cause). * Add the same assert at the start of `run()`. `concurrency` is documented as a post-construction-mutable field, so the constructor check alone leaves the mutation path uncovered. * Note the constraint and the dual enforcement in the `subtractTimerOverhead` field JSDoc.
Third-party `BenchLike` implementers can omit the field (semantically equivalent to the existing `undefined` sentinel that `Task` already handles). The `readonly` modifier matches the concrete `Bench.timerOverhead` declaration and forbids mutation through the interface, which `Task` reads on every cycle.
Rewrite the `subtractTimerOverhead` JSDoc with a mathematically grounded treatment: * Statistics list refers to all fields of `Statistics`; previously enumerated only seven of eighteen fields. * The `rme` inflation factor `M / (M − Ĉ)` is stated deterministically in the clean-shift regime, not hedged with 'potentially'. * The collapse of `p50`, `mad`, and `aad` to zero in the sub-overhead regime is named explicitly with the threshold. * Three observable consequences of the `max(0, …)` clamp are listed (`latency.min` may be 0; throughput substitutes the mean for clamped samples; criterion `'zero-dominated'` cannot distinguish clamped samples from genuine zeros).
… classifier
* New `test/subtract-timer-overhead-alignment.test.ts` — exercises the
Phase 1/2 alignment invariant on a heterogeneous run (alternating
overridden + measured iterations) using a deterministic timestamp
provider. Pins exact multiset counts so an off-by-one in the
`isOverridden`/`latencySamples` index alignment fails the test.
* `test/calibrate-timer-overhead.test.ts`:
- Replace the loose `min ≤ median * 2` assertion with a deterministic
estimator-ordering test using a scripted ascending-pair provider.
- Add a deterministic `'p05'` test pinning the
`max(0, ⌈n·0.05⌉ − 1)` index math at three sample sizes.
- Add tests for the `subtractTimerOverhead` + `concurrency: 'task'`
constructor assert and the equivalent `run()` runtime check.
* `test/detected-resolution.test.ts`: replace the conditional
`if (resolution !== undefined)` block with unconditional assertions.
* `test/utils-detect-timer-saturation.test.ts`: add `classifyTimerSaturation`
parallel coverage for each criterion (returning the precise reason
string) plus the n<10 and healthy-spread negative cases.
* New `test/warning-event-reason.test.ts` — verifies
`BenchEvent.reason` carries the saturation reason for `'warning'`
events and is `undefined` for other event types.
…, and timer diagnostics * New 'Timer Overhead Correction' section covers `subtractTimerOverhead`, the calibration helper, and the `concurrency: 'task'` and sub-overhead caveats. * New 'Per-Sample Override' section documents `overriddenDuration` (previously absent from the README despite being supported in code). * New 'Timer Diagnostics' section covers `Task.detectedResolution` and the `'warning'` event with its `TimerSaturationReason` payload, plus pointers to the standalone helpers. * Extend the `BenchEvents` listener example with a `'warning'` listener that reads `evt.reason`.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f8232ac5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| configurable estimator strategy (`'median'` default, or `'min'` / `'p05'`): | ||
|
|
||
| ```ts | ||
| import { calibrateTimerOverhead, hrtimeNowTimestampProvider } from 'tinybench' |
There was a problem hiding this comment.
Export the provider used by the README example
When a user copies this new direct-calibration example, import { hrtimeNowTimestampProvider } from 'tinybench' fails because the package entry point only exports hrtimeNow and not hrtimeNowTimestampProvider from src/index.ts. Either export the provider constant or change the example to use a public TimestampProvider so the documented API is actually importable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9a0a6aff6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (hasOverhead) { | ||
| for (let i = 0; i < latencySamples.length; i++) { | ||
| if (isOverridden?.[i] !== true) { | ||
| // eslint-disable-next-line @typescript-eslint/no-non-null-assertion | ||
| latencySamples[i] = Math.max(0, latencySamples[i]! - overhead) | ||
| } |
There was a problem hiding this comment.
Compute detectedResolution before subtracting overhead
When subtractTimerOverhead is enabled, this mutates latencySamples before measuredOnly is built and passed to estimateResolution, so Task.detectedResolution is computed from corrected/clamped durations rather than the raw timer deltas. For a timer that really advances in 1 µs ticks with a 0.4 µs calibrated overhead, repeated raw 1 µs samples will be reported as 0.6 µs (or disappear entirely if the overhead clamps them to zero), making the resolution diagnostic misleading exactly when users enable this option for very small benchmarks. Preserve a raw measured-only copy for the diagnostic and only apply this correction to the statistics path.
Useful? React with 👍 / 👎.
| * @returns the tasks array | ||
| */ | ||
| async run (): Promise<Task[]> { | ||
| assert( | ||
| !(this.subtractTimerOverhead && this.concurrency === 'task'), | ||
| '`subtractTimerOverhead` cannot be used with `concurrency: "task"` — set `concurrency` to `null` or `"bench"`, or disable `subtractTimerOverhead`' |
There was a problem hiding this comment.
Guard direct Task.run against task concurrency
This invariant is only checked in Bench.run(), so a bench created with subtractTimerOverhead: true, then changed via the documented mutable bench.concurrency = 'task' pattern, can still enter the unsupported mode by calling the public bench.getTask(name)?.run() API directly. That path uses #benchmark's concurrency === 'task' branch and then applies the sequentially calibrated overhead to concurrent samples, which is the combination this guard is trying to reject; enforce the same invariant before task-level runs too.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1376205e25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const warningEv = new BenchEvent('warning', this, saturationReason) | ||
| this.dispatchEvent(warningEv) | ||
| this.#bench.dispatchEvent(warningEv) |
There was a problem hiding this comment.
Dispatch a fresh warning event to the bench
When a task-level warning listener calls stopImmediatePropagation(), reusing that same BenchEvent for the bench dispatch can prevent bench-level warning listeners from running in Node's EventTarget, even though the new API documents the warning as being emitted on both targets. Create a separate BenchEvent('warning', this, saturationReason) for the bench dispatch so task listener propagation state cannot suppress the bench notification.
Useful? React with 👍 / 👎.
Summary
Three cooperating timer diagnostics for sub-microsecond benchmarking, integrated in a single coherent code path:
subtractTimerOverhead— opt-in calibration of one timestamp provider call costĈ, applied asmax(0, raw - Ĉ)to non-overridden samples before statistics.'warning'event — dispatched on bothBenchandTaskwhen aTask's measured samples are dominated by the timer resolution. Carries aTimerSaturationReasonpayload ('zero-dominated' | 'low-distinct' | 'zero-mad').Task.detectedResolution— populated after each run with the smallest reproducibly observed positive sample among timer-measured samples.Replaces #568, #569 and #570: their naïve composition would have the diagnostics operate on the corrected-and-clamped sample set, producing artificially small resolution estimates and false-positive saturation warnings.
Public API
Task#processRunResultordering (cross-cutting fix)isOverridden[]is collected in lockstep withlatencySamples[]during measurement. The phase ordering preserves the alignment invariant and isolates user-suppliedoverriddenDurationvalues from timer diagnostics:measuredOnlyview by filtering out overridden indices. Captured before any sort.estimateResolution(measuredOnly)— sort-invariant, runs on the filtered view; an all-overridden run yieldsundefined.latencySamplesarray.computeStatisticson the sorted (possibly corrected) samples.classifyTimerSaturationon the measured-only subset; the reason (when fired) threads onto the'warning'event payload.isOverriddenis allocated unconditionally so the measured-only filter is also active whensubtractTimerOverhead: false(coversoverriddenDurationusers without overhead correction).Constraints
subtractTimerOverhead+concurrency: 'task'is rejected. Asserted at construction and at the start ofBench.run()to cover the README's documentedbench.concurrency = 'task'mutation pattern.0and the option becomes a no-op.Risk
BenchEventswidened with'warning'; exhaustiveswitchconsumers undernoFallthroughCasesInSwitchneed acase 'warning':arm.BenchLike.timerOverheadis nowreadonlyand optional. New typesTimerSaturationReason,TimerOverheadEstimatorKind,CalibrateTimerOverheadOptions. New runtime exportscalibrateTimerOverhead,classifyTimerSaturation,detectTimerSaturation,estimateResolution,medianAbsoluteDeviation.+0.7 kBgzipped vs main. The pre-existingsize-limitbudget of 12 kB is already exceeded on main (14.15 kB) — this PR does not change that situation.JSDoc honesty
The
subtractTimerOverheadJSDoc describes themax(0, x)clamp explicitly:X >> Ĉvs sub-overheadX ≈ Ĉ)rmedeterministically inflates by factorM / (M − Ĉ)wheneverĈ > 0p50,mad,aadcollapse to0once cumulative mass at≤ Ĉreaches the relevant quantilelatency.min == 0, throughput-mean substitution,'zero-dominated'criterion ambiguity)Verification
pnpm typecheckcleanpnpm lintcleanpnpm test— 232 tests pass across 62 test filespnpm build—distgenerated cleanly