Skip to content

feat: add timer-overhead correction, saturation warning, and resolution diagnostic#571

Open
jerome-benoit wants to merge 21 commits into
tinylibs:mainfrom
jerome-benoit:feat/timer-diagnostics-and-overhead-correction
Open

feat: add timer-overhead correction, saturation warning, and resolution diagnostic#571
jerome-benoit wants to merge 21 commits into
tinylibs:mainfrom
jerome-benoit:feat/timer-diagnostics-and-overhead-correction

Conversation

@jerome-benoit

@jerome-benoit jerome-benoit commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Three cooperating timer diagnostics for sub-microsecond benchmarking, integrated in a single coherent code path:

  1. subtractTimerOverhead — opt-in calibration of one timestamp provider call cost Ĉ, applied as max(0, raw - Ĉ) to non-overridden samples before statistics.
  2. 'warning' event — dispatched on both Bench and Task when a Task's measured samples are dominated by the timer resolution. Carries a TimerSaturationReason payload ('zero-dominated' | 'low-distinct' | 'zero-mad').
  3. Task.detectedResolution — populated after each run with the smallest reproducibly observed positive sample among timer-measured samples.

Replaces #568, #569 and #570: their naïve composition would have the diagnostics operate on the corrected-and-clamped sample set, producing artificially small resolution estimates and false-positive saturation warnings.

Public API

import { Bench } from 'tinybench'

// Per-Bench (opt-in, default false)
const bench = new Bench({ subtractTimerOverhead: true })
bench.subtractTimerOverhead  // readonly boolean
bench.timerOverhead          // number | undefined — calibrated Ĉ in ms

// Per-Task diagnostic
const task = bench.getTask('foo')
task.detectedResolution      // number | undefined

// Saturation warning event with typed reason
bench.addEventListener('warning', evt => {
  evt.task    // Task
  evt.reason  // TimerSaturationReason | undefined
})

// Standalone helpers
import {
  calibrateTimerOverhead,
  classifyTimerSaturation,    // → TimerSaturationReason | undefined
  detectTimerSaturation,      // boolean wrapper around classifyTimerSaturation
  estimateResolution,
  medianAbsoluteDeviation,    // mad statistic from a sorted sample
} from 'tinybench'

import type {
  CalibrateTimerOverheadOptions,
  TimerOverheadEstimatorKind,  // 'median' | 'min' | 'p05'
  TimerSaturationReason,
} from 'tinybench'

Task#processRunResult ordering (cross-cutting fix)

isOverridden[] is collected in lockstep with latencySamples[] during measurement. The phase ordering preserves the alignment invariant and isolates user-supplied overriddenDuration values from timer diagnostics:

  1. Apply overhead correction in-place on the collection-order array (skipping overridden indices). Alignment intact.
  2. Build a measuredOnly view by filtering out overridden indices. Captured before any sort.
  3. estimateResolution(measuredOnly) — sort-invariant, runs on the filtered view; an all-overridden run yields undefined.
  4. Sort the working latencySamples array.
  5. computeStatistics on the sorted (possibly corrected) samples.
  6. classifyTimerSaturation on the measured-only subset; the reason (when fired) threads onto the 'warning' event payload.

isOverridden is allocated unconditionally so the measured-only filter is also active when subtractTimerOverhead: false (covers overriddenDuration users without overhead correction).

Constraints

  • subtractTimerOverhead + concurrency: 'task' is rejected. Asserted at construction and at the start of Bench.run() to cover the README's documented bench.concurrency = 'task' mutation pattern.
  • On runtimes with a coarse timer (resolution >= 1 ms), calibration returns 0 and the option becomes a no-op.

Risk

  • Runtime: opt-in for behaviour-changing pieces. Default-options benchmarks behave identically.
  • TypeScript: BenchEvents widened with 'warning'; exhaustive switch consumers under noFallthroughCasesInSwitch need a case 'warning': arm. BenchLike.timerOverhead is now readonly and optional. New types TimerSaturationReason, TimerOverheadEstimatorKind, CalibrateTimerOverheadOptions. New runtime exports calibrateTimerOverhead, classifyTimerSaturation, detectTimerSaturation, estimateResolution, medianAbsoluteDeviation.
  • Bundle: +0.7 kB gzipped vs main. The pre-existing size-limit budget of 12 kB is already exceeded on main (14.15 kB) — this PR does not change that situation.
  • No new runtime dependency.

JSDoc honesty

The subtractTimerOverhead JSDoc describes the max(0, x) clamp explicitly:

  • Two regimes (clean-shift X >> Ĉ vs sub-overhead X ≈ Ĉ)
  • rme deterministically inflates by factor M / (M − Ĉ) whenever Ĉ > 0
  • In the sub-overhead regime, p50, mad, aad collapse to 0 once cumulative mass at ≤ Ĉ reaches the relevant quantile
  • Three observable consequences of the clamp (latency.min == 0, throughput-mean substitution, 'zero-dominated' criterion ambiguity)

Verification

  • pnpm typecheck clean
  • pnpm lint clean
  • pnpm test232 tests pass across 62 test files
  • pnpm builddist generated cleanly

…on diagnostic

Three cooperating diagnostics for sub-microsecond benchmarking, integrated in
a single coherent code path so that each one can rely on the others:

- BenchOptions.subtractTimerOverhead (default false): when enabled, the cost
  of one timestamp provider call is calibrated once at construction time via
  the new exported calibrateTimerOverhead helper, then subtracted from each
  raw latency sample (clamped to zero) before statistics are computed.
  Samples returned by the task function via overriddenDuration are
  intentional user values and are skipped by the correction.
- 'warning' event on BenchEvents and TaskEvents, dispatched on both the
  Bench and the Task instances when the latency samples of a task are
  dominated by the timer resolution. Detection uses three OR'd criteria
  computed by detectTimerSaturation: more than half zero samples, fewer
  than max(3, min(10, n/1000)) distinct values, or zero MAD with n > 100.
  An n < 10 guard prevents false positives on unit-style benchmarks.
- Task.detectedResolution getter, populated after each run with the smallest
  strictly-positive sample value that appears at least twice (smallest
  reproducibly observed increment). Falls back to the strict minimum when
  no positive value repeats. A new estimateResolution helper is exported.

calibrateTimerOverhead (utils.ts):
- Subtracts in the provider's native type before converting to milliseconds
  (toMs(b - a) rather than toMs(b) - toMs(a)), preserving bigint precision
  on long-uptime hosts.
- Discards a configurable warmup phase (default 64 pairs) so the JIT
  reaches its steady-state tier before measurements begin.
- Returns 0 when fewer than half the back-to-back pairs produce a positive
  delta — in that regime the timer resolution exceeds the call cost and
  the positive deltas measure a tick boundary, not the call cost.
- Configurable estimator: 'median' (default), 'min', or 'p05'.

Task#processRunResult orders the diagnostics so they always reflect the raw,
uncorrected measurements:
1. sortSamples on raw latencies
2. estimateResolution on raw sorted samples
3. when overhead correction is active, compute raw statistics, evaluate
   detectTimerSaturation against the raw distribution, then apply the
   correction in-place (skipping overridden samples) and re-sort only when
   overridden samples were skipped
4. compute the final (possibly corrected) statistics
5. when no correction was applied, evaluate detectTimerSaturation against
   the final samples (raw == final in this path)

This consolidates three previously separate proposals (PRs tinylibs#568/tinylibs#569/tinylibs#570)
into a single coherent change: composing them naively would have caused the
diagnostics to operate on the corrected-and-clamped sample set, producing
artificially small detected-resolution values and false-positive saturation
warnings on benchmarks that activate overhead subtraction.
@pkg-pr-new

pkg-pr-new Bot commented May 30, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/tinylibs/tinybench@571

commit: 32e0ee5

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a71cb2d80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/task.ts Outdated
Comment thread src/task.ts Outdated
Apply audit-driven fixes to PR tinylibs#571:

* fix(task): correct overhead before sort to keep latencySamples aligned
  with isOverridden (collection order). Previous logic indexed
  isOverridden after sortSamples, corrupting both overriddenDuration
  preservation and measured-sample skip in mixed-mode tasks.
* fix(task): run timer-saturation detection on a measured-only subset so
  constant overriddenDuration values cannot trigger a spurious
  low-distinct-count warning.
* fix(bench): assert subtractTimerOverhead is incompatible with
  concurrency: 'task' (sequential calibration would not reflect
  per-iteration cost under concurrency).
* fix(bench): normalize subtractTimerOverhead with ?? false instead of
  === true to accept any falsy default consistently.
* fix(index): export detectTimerSaturation alongside the other timer
  diagnostics helpers.
* docs(types): rewrite subtractTimerOverhead JSDoc with an honest
  treatment of the max(0, x) clamp and the two caveats (concurrency,
  overriddenDuration). Remove an orphan /** block.
* test: rewrite the overriddenDuration warning test to assert
  warningCount === 0, matching the measured-only saturation behavior.

@jerome-benoit jerome-benoit left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-validated audit (8 oracle agents, 4 paired dimensions)

Findings classified by inter-pair convergence:

Convergent (≥2 pairs flagged the same item)

  • task.ts:648detectedResolution includes overriddenDuration values
  • task.ts:662 — Phase 6 computeStatistics discards 14 of 15 fields
  • bench.ts:215?? false accepts truthy non-boolean from JS callers, inconsistent with sibling line 214
  • bench.ts:219 — assert is one-shot at construction; runtime concurrency mutation bypasses it
  • bench.ts:218 — assert message lacks remediation
  • task.ts:699'warning' event payload carries no reason
  • types.ts:128BenchLike.timerOverhead required + non-readonly: breaking + unsafe
  • types.ts:204 — JSDoc enumeration omits aad/mad/min/max
  • tests — mixed overridden+measured run not exercised (the alignment invariant)
  • testsconcurrency:'task' + subtractTimerOverhead assert untested
  • tests'p05' estimator branch untested
  • utils.ts:488 — bigint cast misleads the type system

Single-agent / high-signal

  • task.ts:631 — clamp injects synthetic zeros affecting min, throughput, criterion A
  • types.ts:212rme deterministically inflates by M / (M − Ĉ) in the clean-shift regime, not "potentially"
  • types.ts:211p50 (and downstream mad/aad) can collapse to 0 when ≥ 50% samples clamp
  • types.ts:26'warning' widening across three unions breaks exhaustive switch consumers
  • utils.ts:285detectTimerSaturation accepts unsorted input but correctness requires sorted
  • utils.ts:301 — distinct-count loop has no short-circuit at threshold
  • utils.ts:454TimerOverheadEstimator is a string literal union; suffix suggests a strategy
  • test/calibrate-timer-overhead.test.ts:57min ≤ median * 2 is mathematically trivial
  • test/detected-resolution.test.ts:59 — conditional if (resolution !== undefined) skips all assertions on regression
  • README.md — public API additions undocumented

Verified correct (sample)

  • Phase 1 alignment of latencySamples[i] with isOverridden[i] in collection order
  • measuredOnly === latencySamples reference-equality optimization
  • Aborted-run paths (no samples, partial samples)
  • estimateResolution edge cases ([0,0,0]undefined, all-equal positives)
  • 'warning' widening consistent across BenchEvents / BenchEventsWithTask / TaskEvents
  • Bigint precision preserved at runtime
  • Style and Biome conformance on new test files

Detail in the inline comments. The mixed-overridden test gap is the only blocker.

Comment thread src/task.ts Outdated
Comment thread src/task.ts
Comment thread src/task.ts Outdated
Comment thread src/task.ts Outdated
Comment thread src/bench.ts Outdated
Comment thread test/subtract-timer-overhead-overridden.test.ts
Comment thread test/calibrate-timer-overhead.test.ts Outdated
Comment thread test/calibrate-timer-overhead.test.ts
Comment thread test/detected-resolution.test.ts Outdated
Comment thread src/index.ts
* Add `classifyTimerSaturation` returning a `TimerSaturationReason`
  (`'zero-dominated' | 'low-distinct' | 'zero-mad'`) and re-implement
  `detectTimerSaturation` as a boolean wrapper.
* Tighten `detectTimerSaturation`/`classifyTimerSaturation` parameter
  type from `Samples` to `SortedSamples`.
* Short-circuit the distinct-value loop once the threshold is reached.
* Add `medianAbsoluteDeviation(SortedSamples)` helper.
* Rename `TimerOverheadEstimator` to `TimerOverheadEstimatorKind`.
* Replace `as unknown as number` with `as bigint` in
  `calibrateTimerOverhead`; document operator polymorphism.
* Re-export `classifyTimerSaturation`, `medianAbsoluteDeviation`,
  `TimerSaturationReason`, `TimerOverheadEstimatorKind` from the
  package entry point.
Extend `BenchEvent` with an optional `reason` payload symmetrical to
`error`. The `reason` getter is typed as
`TimerSaturationReason | undefined` for `'warning'` events and
`undefined` for every other event type.

* Move `TimerSaturationReason` from `utils.ts` to `types.ts` to align
  with the `Statistics`/`Samples` convention (types in `types.ts`,
  helpers in `utils.ts`).
* Add a `'warning'` constructor overload accepting an optional reason.
* Re-export `TimerSaturationReason` from the `./types` block in the
  package entry point.
…only samples

* Compute `detectedResolution` from the measured-only subset (excluding
  `overriddenDuration` samples). A constant override value is no longer
  reported as the timer grain.
* Allocate `isOverridden` unconditionally so the measured-only filter is
  also active when `subtractTimerOverhead` is disabled.
* Replace Phase 6 `computeStatistics` recomputation with the dedicated
  `medianAbsoluteDeviation` helper.
* Use `classifyTimerSaturation` and propagate the
  `TimerSaturationReason` onto the `'warning'` event payload.
* Update `Task.detectedResolution` JSDoc to reflect the measured-only
  semantics; update the `#processRunResult` ordering description.
…ten options coercion

* Coerce `subtractTimerOverhead` with `=== true`, matching the sibling
  `retainSamples` form. Truthy non-boolean values from JS callers are
  now rejected.
* Re-state the constructor assert message in remediation form (action
  the user can take, not the internal cause).
* Add the same assert at the start of `run()`. `concurrency` is
  documented as a post-construction-mutable field, so the constructor
  check alone leaves the mutation path uncovered.
* Note the constraint and the dual enforcement in the
  `subtractTimerOverhead` field JSDoc.
Third-party `BenchLike` implementers can omit the field (semantically
equivalent to the existing `undefined` sentinel that `Task` already
handles). The `readonly` modifier matches the concrete
`Bench.timerOverhead` declaration and forbids mutation through the
interface, which `Task` reads on every cycle.
Rewrite the `subtractTimerOverhead` JSDoc with a mathematically grounded
treatment:

* Statistics list refers to all fields of `Statistics`; previously
  enumerated only seven of eighteen fields.
* The `rme` inflation factor `M / (M − Ĉ)` is stated deterministically
  in the clean-shift regime, not hedged with 'potentially'.
* The collapse of `p50`, `mad`, and `aad` to zero in the
  sub-overhead regime is named explicitly with the threshold.
* Three observable consequences of the `max(0, …)` clamp are listed
  (`latency.min` may be 0; throughput substitutes the mean for clamped
  samples; criterion `'zero-dominated'` cannot distinguish clamped
  samples from genuine zeros).
… classifier

* New `test/subtract-timer-overhead-alignment.test.ts` — exercises the
  Phase 1/2 alignment invariant on a heterogeneous run (alternating
  overridden + measured iterations) using a deterministic timestamp
  provider. Pins exact multiset counts so an off-by-one in the
  `isOverridden`/`latencySamples` index alignment fails the test.
* `test/calibrate-timer-overhead.test.ts`:
  - Replace the loose `min ≤ median * 2` assertion with a deterministic
    estimator-ordering test using a scripted ascending-pair provider.
  - Add a deterministic `'p05'` test pinning the
    `max(0, ⌈n·0.05⌉ − 1)` index math at three sample sizes.
  - Add tests for the `subtractTimerOverhead` + `concurrency: 'task'`
    constructor assert and the equivalent `run()` runtime check.
* `test/detected-resolution.test.ts`: replace the conditional
  `if (resolution !== undefined)` block with unconditional assertions.
* `test/utils-detect-timer-saturation.test.ts`: add `classifyTimerSaturation`
  parallel coverage for each criterion (returning the precise reason
  string) plus the n<10 and healthy-spread negative cases.
* New `test/warning-event-reason.test.ts` — verifies
  `BenchEvent.reason` carries the saturation reason for `'warning'`
  events and is `undefined` for other event types.
…, and timer diagnostics

* New 'Timer Overhead Correction' section covers
  `subtractTimerOverhead`, the calibration helper, and the
  `concurrency: 'task'` and sub-overhead caveats.
* New 'Per-Sample Override' section documents `overriddenDuration`
  (previously absent from the README despite being supported in code).
* New 'Timer Diagnostics' section covers `Task.detectedResolution`
  and the `'warning'` event with its `TimerSaturationReason` payload,
  plus pointers to the standalone helpers.
* Extend the `BenchEvents` listener example with a `'warning'`
  listener that reads `evt.reason`.
`computeStatistics` and `absoluteDeviationMedian` are not re-exported
from the package entry point, so `{@link …}` references to them
trigger `typedoc --treatWarningsAsErrors`. Switch them to plain
backticked code references; `{@link}` is preserved only for symbols
listed in the public exports.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f8232ac5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread README.md
configurable estimator strategy (`'median'` default, or `'min'` / `'p05'`):

```ts
import { calibrateTimerOverhead, hrtimeNowTimestampProvider } from 'tinybench'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Export the provider used by the README example

When a user copies this new direct-calibration example, import { hrtimeNowTimestampProvider } from 'tinybench' fails because the package entry point only exports hrtimeNow and not hrtimeNowTimestampProvider from src/index.ts. Either export the provider constant or change the example to use a public TimestampProvider so the documented API is actually importable.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9a0a6aff6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/task.ts
Comment on lines +630 to +635
if (hasOverhead) {
for (let i = 0; i < latencySamples.length; i++) {
if (isOverridden?.[i] !== true) {
// eslint-disable-next-line @typescript-eslint/no-non-null-assertion
latencySamples[i] = Math.max(0, latencySamples[i]! - overhead)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute detectedResolution before subtracting overhead

When subtractTimerOverhead is enabled, this mutates latencySamples before measuredOnly is built and passed to estimateResolution, so Task.detectedResolution is computed from corrected/clamped durations rather than the raw timer deltas. For a timer that really advances in 1 µs ticks with a 0.4 µs calibrated overhead, repeated raw 1 µs samples will be reported as 0.6 µs (or disappear entirely if the overhead clamps them to zero), making the resolution diagnostic misleading exactly when users enable this option for very small benchmarks. Preserve a raw measured-only copy for the diagnostic and only apply this correction to the statistics path.

Useful? React with 👍 / 👎.

Comment thread src/bench.ts
Comment on lines 292 to +297
* @returns the tasks array
*/
async run (): Promise<Task[]> {
assert(
!(this.subtractTimerOverhead && this.concurrency === 'task'),
'`subtractTimerOverhead` cannot be used with `concurrency: "task"` — set `concurrency` to `null` or `"bench"`, or disable `subtractTimerOverhead`'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard direct Task.run against task concurrency

This invariant is only checked in Bench.run(), so a bench created with subtractTimerOverhead: true, then changed via the documented mutable bench.concurrency = 'task' pattern, can still enter the unsupported mode by calling the public bench.getTask(name)?.run() API directly. That path uses #benchmark's concurrency === 'task' branch and then applies the sequentially calibrated overhead to concurrent samples, which is the combination this guard is trying to reject; enforce the same invariant before task-level runs too.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1376205e25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/task.ts
Comment on lines +712 to +714
const warningEv = new BenchEvent('warning', this, saturationReason)
this.dispatchEvent(warningEv)
this.#bench.dispatchEvent(warningEv)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Dispatch a fresh warning event to the bench

When a task-level warning listener calls stopImmediatePropagation(), reusing that same BenchEvent for the bench dispatch can prevent bench-level warning listeners from running in Node's EventTarget, even though the new API documents the warning as being emitted on both targets. Create a separate BenchEvent('warning', this, saturationReason) for the bench dispatch so task listener propagation state cannot suppress the bench notification.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant