feat: add timer-overhead correction, saturation warning, and resolution diagnostic by jerome-benoit · Pull Request #571 · tinylibs/tinybench

jerome-benoit · 2026-05-30T01:22:03Z

Summary

Three cooperating timer diagnostics for sub-microsecond benchmarking, integrated in a single coherent code path:

subtractTimerOverhead — opt-in calibration of one timestamp provider call cost Ĉ, applied as max(0, raw - Ĉ) to non-overridden samples before statistics.
'warning' event — dispatched on both Bench and Task when a Task's measured samples are dominated by the timer resolution. Carries a TimerSaturationReason payload ('zero-dominated' | 'low-distinct' | 'zero-mad').
Task.detectedResolution — populated after each run with the smallest reproducibly observed positive sample among timer-measured samples.

Replaces #568, #569 and #570: their naïve composition would have the diagnostics operate on the corrected-and-clamped sample set, producing artificially small resolution estimates and false-positive saturation warnings.

Public API

import { Bench } from 'tinybench'

// Per-Bench (opt-in, default false)
const bench = new Bench({ subtractTimerOverhead: true })
bench.subtractTimerOverhead  // readonly boolean
bench.timerOverhead          // number | undefined — calibrated Ĉ in ms

// Per-Task diagnostic
const task = bench.getTask('foo')
task.detectedResolution      // number | undefined

// Saturation warning event with typed reason
bench.addEventListener('warning', evt => {
  evt.task    // Task
  evt.reason  // TimerSaturationReason | undefined
})

// Standalone helpers
import {
  calibrateTimerOverhead,
  classifyTimerSaturation,    // → TimerSaturationReason | undefined
  detectTimerSaturation,      // boolean wrapper around classifyTimerSaturation
  estimateResolution,
  medianAbsoluteDeviation,    // mad statistic from a sorted sample
} from 'tinybench'

import type {
  CalibrateTimerOverheadOptions,
  TimerOverheadEstimatorKind,  // 'median' | 'min' | 'p05'
  TimerSaturationReason,
} from 'tinybench'

`Task#processRunResult` ordering (cross-cutting fix)

isOverridden[] is collected in lockstep with latencySamples[] during measurement. The phase ordering preserves the alignment invariant and isolates user-supplied overriddenDuration values from timer diagnostics:

Apply overhead correction in-place on the collection-order array (skipping overridden indices). Alignment intact.
Build a measuredOnly view by filtering out overridden indices. Captured before any sort.
estimateResolution(measuredOnly) — sort-invariant, runs on the filtered view; an all-overridden run yields undefined.
Sort the working latencySamples array.
computeStatistics on the sorted (possibly corrected) samples.
classifyTimerSaturation on the measured-only subset; the reason (when fired) threads onto the 'warning' event payload.

isOverridden is allocated unconditionally so the measured-only filter is also active when subtractTimerOverhead: false (covers overriddenDuration users without overhead correction).

Constraints

subtractTimerOverhead + concurrency: 'task' is rejected. Asserted at construction and at the start of Bench.run() to cover the README's documented bench.concurrency = 'task' mutation pattern.
On runtimes with a coarse timer (resolution >= 1 ms), calibration returns 0 and the option becomes a no-op.

Risk

Runtime: opt-in for behaviour-changing pieces. Default-options benchmarks behave identically.
TypeScript: BenchEvents widened with 'warning'; exhaustive switch consumers under noFallthroughCasesInSwitch need a case 'warning': arm. BenchLike.timerOverhead is now readonly and optional. New types TimerSaturationReason, TimerOverheadEstimatorKind, CalibrateTimerOverheadOptions. New runtime exports calibrateTimerOverhead, classifyTimerSaturation, detectTimerSaturation, estimateResolution, medianAbsoluteDeviation.
Bundle: +0.7 kB gzipped vs main. The pre-existing size-limit budget of 12 kB is already exceeded on main (14.15 kB) — this PR does not change that situation.
No new runtime dependency.

JSDoc honesty

The subtractTimerOverhead JSDoc describes the max(0, x) clamp explicitly:

Two regimes (clean-shift X >> Ĉ vs sub-overhead X ≈ Ĉ)
rme deterministically inflates by factor M / (M − Ĉ) whenever Ĉ > 0
In the sub-overhead regime, p50, mad, aad collapse to 0 once cumulative mass at ≤ Ĉ reaches the relevant quantile
Three observable consequences of the clamp (latency.min == 0, throughput-mean substitution, 'zero-dominated' criterion ambiguity)

Verification

pnpm typecheck clean
pnpm lint clean
pnpm test — 232 tests pass across 62 test files
pnpm build — dist generated cleanly

…on diagnostic Three cooperating diagnostics for sub-microsecond benchmarking, integrated in a single coherent code path so that each one can rely on the others: - BenchOptions.subtractTimerOverhead (default false): when enabled, the cost of one timestamp provider call is calibrated once at construction time via the new exported calibrateTimerOverhead helper, then subtracted from each raw latency sample (clamped to zero) before statistics are computed. Samples returned by the task function via overriddenDuration are intentional user values and are skipped by the correction. - 'warning' event on BenchEvents and TaskEvents, dispatched on both the Bench and the Task instances when the latency samples of a task are dominated by the timer resolution. Detection uses three OR'd criteria computed by detectTimerSaturation: more than half zero samples, fewer than max(3, min(10, n/1000)) distinct values, or zero MAD with n > 100. An n < 10 guard prevents false positives on unit-style benchmarks. - Task.detectedResolution getter, populated after each run with the smallest strictly-positive sample value that appears at least twice (smallest reproducibly observed increment). Falls back to the strict minimum when no positive value repeats. A new estimateResolution helper is exported. calibrateTimerOverhead (utils.ts): - Subtracts in the provider's native type before converting to milliseconds (toMs(b - a) rather than toMs(b) - toMs(a)), preserving bigint precision on long-uptime hosts. - Discards a configurable warmup phase (default 64 pairs) so the JIT reaches its steady-state tier before measurements begin. - Returns 0 when fewer than half the back-to-back pairs produce a positive delta — in that regime the timer resolution exceeds the call cost and the positive deltas measure a tick boundary, not the call cost. - Configurable estimator: 'median' (default), 'min', or 'p05'. Task#processRunResult orders the diagnostics so they always reflect the raw, uncorrected measurements: 1. sortSamples on raw latencies 2. estimateResolution on raw sorted samples 3. when overhead correction is active, compute raw statistics, evaluate detectTimerSaturation against the raw distribution, then apply the correction in-place (skipping overridden samples) and re-sort only when overridden samples were skipped 4. compute the final (possibly corrected) statistics 5. when no correction was applied, evaluate detectTimerSaturation against the final samples (raw == final in this path) This consolidates three previously separate proposals (PRs tinylibs#568/tinylibs#569/tinylibs#570) into a single coherent change: composing them naively would have caused the diagnostics to operate on the corrected-and-clamped sample set, producing artificially small detected-resolution values and false-positive saturation warnings on benchmarks that activate overhead subtraction.

pkg-pr-new · 2026-05-30T01:22:33Z

Open in StackBlitz

npm i https://pkg.pr.new/tinylibs/tinybench@571

commit: 32e0ee5

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a71cb2d80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Apply audit-driven fixes to PR tinylibs#571: * fix(task): correct overhead before sort to keep latencySamples aligned with isOverridden (collection order). Previous logic indexed isOverridden after sortSamples, corrupting both overriddenDuration preservation and measured-sample skip in mixed-mode tasks. * fix(task): run timer-saturation detection on a measured-only subset so constant overriddenDuration values cannot trigger a spurious low-distinct-count warning. * fix(bench): assert subtractTimerOverhead is incompatible with concurrency: 'task' (sequential calibration would not reflect per-iteration cost under concurrency). * fix(bench): normalize subtractTimerOverhead with ?? false instead of === true to accept any falsy default consistently. * fix(index): export detectTimerSaturation alongside the other timer diagnostics helpers. * docs(types): rewrite subtractTimerOverhead JSDoc with an honest treatment of the max(0, x) clamp and the two caveats (concurrency, overriddenDuration). Remove an orphan /** block. * test: rewrite the overriddenDuration warning test to assert warningCount === 0, matching the measured-only saturation behavior.

jerome-benoit

Cross-validated audit (8 oracle agents, 4 paired dimensions)

Findings classified by inter-pair convergence:

Convergent (≥2 pairs flagged the same item)

task.ts:648 — detectedResolution includes overriddenDuration values
task.ts:662 — Phase 6 computeStatistics discards 14 of 15 fields
bench.ts:215 — ?? false accepts truthy non-boolean from JS callers, inconsistent with sibling line 214
bench.ts:219 — assert is one-shot at construction; runtime concurrency mutation bypasses it
bench.ts:218 — assert message lacks remediation
task.ts:699 — 'warning' event payload carries no reason
types.ts:128 — BenchLike.timerOverhead required + non-readonly: breaking + unsafe
types.ts:204 — JSDoc enumeration omits aad/mad/min/max
tests — mixed overridden+measured run not exercised (the alignment invariant)
tests — concurrency:'task' + subtractTimerOverhead assert untested
tests — 'p05' estimator branch untested
utils.ts:488 — bigint cast misleads the type system

Single-agent / high-signal

task.ts:631 — clamp injects synthetic zeros affecting min, throughput, criterion A
types.ts:212 — rme deterministically inflates by M / (M − Ĉ) in the clean-shift regime, not "potentially"
types.ts:211 — p50 (and downstream mad/aad) can collapse to 0 when ≥ 50% samples clamp
types.ts:26 — 'warning' widening across three unions breaks exhaustive switch consumers
utils.ts:285 — detectTimerSaturation accepts unsorted input but correctness requires sorted
utils.ts:301 — distinct-count loop has no short-circuit at threshold
utils.ts:454 — TimerOverheadEstimator is a string literal union; suffix suggests a strategy
test/calibrate-timer-overhead.test.ts:57 — min ≤ median * 2 is mathematically trivial
test/detected-resolution.test.ts:59 — conditional if (resolution !== undefined) skips all assertions on regression
README.md — public API additions undocumented

Verified correct (sample)

Phase 1 alignment of latencySamples[i] with isOverridden[i] in collection order
measuredOnly === latencySamples reference-equality optimization
Aborted-run paths (no samples, partial samples)
estimateResolution edge cases ([0,0,0] → undefined, all-equal positives)
'warning' widening consistent across BenchEvents / BenchEventsWithTask / TaskEvents
Bigint precision preserved at runtime
Style and Biome conformance on new test files

Detail in the inline comments. The mixed-overridden test gap is the only blocker.

* Add `classifyTimerSaturation` returning a `TimerSaturationReason` (`'zero-dominated' | 'low-distinct' | 'zero-mad'`) and re-implement `detectTimerSaturation` as a boolean wrapper. * Tighten `detectTimerSaturation`/`classifyTimerSaturation` parameter type from `Samples` to `SortedSamples`. * Short-circuit the distinct-value loop once the threshold is reached. * Add `medianAbsoluteDeviation(SortedSamples)` helper. * Rename `TimerOverheadEstimator` to `TimerOverheadEstimatorKind`. * Replace `as unknown as number` with `as bigint` in `calibrateTimerOverhead`; document operator polymorphism. * Re-export `classifyTimerSaturation`, `medianAbsoluteDeviation`, `TimerSaturationReason`, `TimerOverheadEstimatorKind` from the package entry point.

Extend `BenchEvent` with an optional `reason` payload symmetrical to `error`. The `reason` getter is typed as `TimerSaturationReason | undefined` for `'warning'` events and `undefined` for every other event type. * Move `TimerSaturationReason` from `utils.ts` to `types.ts` to align with the `Statistics`/`Samples` convention (types in `types.ts`, helpers in `utils.ts`). * Add a `'warning'` constructor overload accepting an optional reason. * Re-export `TimerSaturationReason` from the `./types` block in the package entry point.

…only samples * Compute `detectedResolution` from the measured-only subset (excluding `overriddenDuration` samples). A constant override value is no longer reported as the timer grain. * Allocate `isOverridden` unconditionally so the measured-only filter is also active when `subtractTimerOverhead` is disabled. * Replace Phase 6 `computeStatistics` recomputation with the dedicated `medianAbsoluteDeviation` helper. * Use `classifyTimerSaturation` and propagate the `TimerSaturationReason` onto the `'warning'` event payload. * Update `Task.detectedResolution` JSDoc to reflect the measured-only semantics; update the `#processRunResult` ordering description.

…ten options coercion * Coerce `subtractTimerOverhead` with `=== true`, matching the sibling `retainSamples` form. Truthy non-boolean values from JS callers are now rejected. * Re-state the constructor assert message in remediation form (action the user can take, not the internal cause). * Add the same assert at the start of `run()`. `concurrency` is documented as a post-construction-mutable field, so the constructor check alone leaves the mutation path uncovered. * Note the constraint and the dual enforcement in the `subtractTimerOverhead` field JSDoc.

Third-party `BenchLike` implementers can omit the field (semantically equivalent to the existing `undefined` sentinel that `Task` already handles). The `readonly` modifier matches the concrete `Bench.timerOverhead` declaration and forbids mutation through the interface, which `Task` reads on every cycle.

Rewrite the `subtractTimerOverhead` JSDoc with a mathematically grounded treatment: * Statistics list refers to all fields of `Statistics`; previously enumerated only seven of eighteen fields. * The `rme` inflation factor `M / (M − Ĉ)` is stated deterministically in the clean-shift regime, not hedged with 'potentially'. * The collapse of `p50`, `mad`, and `aad` to zero in the sub-overhead regime is named explicitly with the threshold. * Three observable consequences of the `max(0, …)` clamp are listed (`latency.min` may be 0; throughput substitutes the mean for clamped samples; criterion `'zero-dominated'` cannot distinguish clamped samples from genuine zeros).

… classifier * New `test/subtract-timer-overhead-alignment.test.ts` — exercises the Phase 1/2 alignment invariant on a heterogeneous run (alternating overridden + measured iterations) using a deterministic timestamp provider. Pins exact multiset counts so an off-by-one in the `isOverridden`/`latencySamples` index alignment fails the test. * `test/calibrate-timer-overhead.test.ts`: - Replace the loose `min ≤ median * 2` assertion with a deterministic estimator-ordering test using a scripted ascending-pair provider. - Add a deterministic `'p05'` test pinning the `max(0, ⌈n·0.05⌉ − 1)` index math at three sample sizes. - Add tests for the `subtractTimerOverhead` + `concurrency: 'task'` constructor assert and the equivalent `run()` runtime check. * `test/detected-resolution.test.ts`: replace the conditional `if (resolution !== undefined)` block with unconditional assertions. * `test/utils-detect-timer-saturation.test.ts`: add `classifyTimerSaturation` parallel coverage for each criterion (returning the precise reason string) plus the n<10 and healthy-spread negative cases. * New `test/warning-event-reason.test.ts` — verifies `BenchEvent.reason` carries the saturation reason for `'warning'` events and is `undefined` for other event types.

…, and timer diagnostics * New 'Timer Overhead Correction' section covers `subtractTimerOverhead`, the calibration helper, and the `concurrency: 'task'` and sub-overhead caveats. * New 'Per-Sample Override' section documents `overriddenDuration` (previously absent from the README despite being supported in code). * New 'Timer Diagnostics' section covers `Task.detectedResolution` and the `'warning'` event with its `TimerSaturationReason` payload, plus pointers to the standalone helpers. * Extend the `BenchEvents` listener example with a `'warning'` listener that reads `evt.reason`.

`computeStatistics` and `absoluteDeviationMedian` are not re-exported from the package entry point, so `{@link …}` references to them trigger `typedoc --treatWarningsAsErrors`. Switch them to plain backticked code references; `{@link}` is preserved only for symbols listed in the public exports.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f8232ac5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T13:14:50Z

+configurable estimator strategy (`'median'` default, or `'min'` / `'p05'`):
+
+```ts
+import { calibrateTimerOverhead, hrtimeNowTimestampProvider } from 'tinybench'


Export the provider used by the README example

When a user copies this new direct-calibration example, import { hrtimeNowTimestampProvider } from 'tinybench' fails because the package entry point only exports hrtimeNow and not hrtimeNowTimestampProvider from src/index.ts. Either export the provider constant or change the example to use a public TimestampProvider so the documented API is actually importable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9a0a6aff6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T12:51:33Z

+      if (hasOverhead) {
+        for (let i = 0; i < latencySamples.length; i++) {
+          if (isOverridden?.[i] !== true) {
+            // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
+            latencySamples[i] = Math.max(0, latencySamples[i]! - overhead)
+          }


Compute detectedResolution before subtracting overhead

When subtractTimerOverhead is enabled, this mutates latencySamples before measuredOnly is built and passed to estimateResolution, so Task.detectedResolution is computed from corrected/clamped durations rather than the raw timer deltas. For a timer that really advances in 1 µs ticks with a 0.4 µs calibrated overhead, repeated raw 1 µs samples will be reported as 0.6 µs (or disappear entirely if the overhead clamps them to zero), making the resolution diagnostic misleading exactly when users enable this option for very small benchmarks. Preserve a raw measured-only copy for the diagnostic and only apply this correction to the statistics path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-09T12:51:33Z

   * @returns the tasks array
   */
  async run (): Promise<Task[]> {
+    assert(
+      !(this.subtractTimerOverhead && this.concurrency === 'task'),
+      '`subtractTimerOverhead` cannot be used with `concurrency: "task"` — set `concurrency` to `null` or `"bench"`, or disable `subtractTimerOverhead`'


Guard direct Task.run against task concurrency

This invariant is only checked in Bench.run(), so a bench created with subtractTimerOverhead: true, then changed via the documented mutable bench.concurrency = 'task' pattern, can still enter the unsupported mode by calling the public bench.getTask(name)?.run() API directly. That path uses #benchmark's concurrency === 'task' branch and then applies the sequentially calibrated overhead to concurrent samples, which is the combination this guard is trying to reject; enforce the same invariant before task-level runs too.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1376205e25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T23:06:20Z

+        const warningEv = new BenchEvent('warning', this, saturationReason)
+        this.dispatchEvent(warningEv)
+        this.#bench.dispatchEvent(warningEv)


Dispatch a fresh warning event to the bench

When a task-level warning listener calls stopImmediatePropagation(), reusing that same BenchEvent for the bench dispatch can prevent bench-level warning listeners from running in Node's EventTarget, even though the new API documents the warning as being emitted on both targets. Create a separate BenchEvent('warning', this, saturationReason) for the bench dispatch so task listener propagation state cannot suppress the bench notification.

Useful? React with 👍 / 👎.

jerome-benoit requested a review from Uzlopak as a code owner May 30, 2026 01:22

This was referenced May 30, 2026

feat: add opt-in subtractTimerOverhead option #568

Closed

feat: dispatch warning event on timer saturation #569

Closed

feat: expose detected timer resolution per task #570

Closed

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

Comment thread src/task.ts Outdated

Comment thread src/task.ts Outdated

jerome-benoit commented May 30, 2026

View reviewed changes

jerome-benoit added 14 commits May 30, 2026 15:32

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

411c1d7

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

2a5a701

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

b7e7e42

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

ce3c2c4

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

2f8232a

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

jerome-benoit added 2 commits June 7, 2026 19:31

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

a3b9a37

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

f9a0a6a

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

1376205

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

c53307d

Merge branch 'main' into feat/timer-diagnostics-and-overhead-correction

32e0ee5

Uh oh!

Conversation

jerome-benoit commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Public API

Task#processRunResult ordering (cross-cutting fix)

Constraints

Risk

JSDoc honesty

Verification

Uh oh!

pkg-pr-new Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

jerome-benoit left a comment

Choose a reason for hiding this comment

Cross-validated audit (8 oracle agents, 4 paired dimensions)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jerome-benoit commented May 30, 2026 •

edited

Loading

`Task#processRunResult` ordering (cross-cutting fix)

pkg-pr-new Bot commented May 30, 2026 •

edited

Loading