Skip to content

[DRAFT] Agent-based rewrite of a new REST API and UI#206

Draft
ldionne wants to merge 115 commits intomainfrom
v5
Draft

[DRAFT] Agent-based rewrite of a new REST API and UI#206
ldionne wants to merge 115 commits intomainfrom
v5

Conversation

@ldionne
Copy link
Copy Markdown
Member

@ldionne ldionne commented Mar 31, 2026

THIS IS A DRAFT. IT WILL BE THE TARGET OF A RFC BEFORE ANYTHING HAPPENS.

@ldionne ldionne force-pushed the v5 branch 6 times, most recently from dcece23 to 36d36eb Compare April 4, 2026 18:50
ldionne added 2 commits April 6, 2026 12:28
Assisted-by: Claude Code
Assisted-by: Claude Code
ldionne and others added 18 commits April 6, 2026 17:53
Add mount/unmount tests for all v5 UI pages that were missing coverage:
dashboard, machine-list, machine-detail, run-detail, order-detail, and
graph (mount tests added alongside existing pure function tests).

Tests follow the established pattern: mock API functions, call
page.mount(container, params), assert on rendered DOM and API call
arguments. Each test file includes afterEach cleanup to prevent
module-level controller state leakage between tests.

Coverage: 93 new page-level tests across 6 files (511 total frontend
tests, all passing).

Assisted-by: Claude Code
Switch the Compare page bar chart from linear percentage to log₂(ratio)
y-axis. This makes equal multiplicative changes visually symmetric (e.g.
2× faster and 2× slower produce equal-height bars).

Tick labels show percentage change at "nice" values (±1%, ±5%, ±50%,
etc.), auto-adapting to the visible range. On zoom, ticks recompute
dynamically via Plotly.relayout() with a guard flag to prevent infinite
loops. Noise bands are converted to log₂ space.

Assisted-by: Claude Code
The link was buried inside the test suite dropdown menu. Move it to a
standalone top-level link in the nav bar's right section for visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
SPA navigation links unconditionally called e.preventDefault(), blocking
Cmd+Click / Ctrl+Click from opening pages in new tabs. Add an
isModifiedClick() helper that lets modified clicks fall through to the
browser. Set real href values on all nav links (not href="#") so the
browser knows where to navigate.

Also fix the LNT brand link on the admin page: it called navigate('/')
but only the /admin route was registered, causing a 404. In admin
context, the brand now uses full-page navigation to the selected suite's
dashboard.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Graph and Compare are now "analysis tools" at /v5/graph and /v5/compare
(suite-agnostic), while browsing pages stay suite-scoped at /v5/{ts}/...
This enables cross-suite comparisons (e.g. libc++ vs libstdc++) where
each has its own test suite with incompatible order spaces.

Key changes:
- Flask: consolidated v5_global() serves /v5/admin, /v5/graph, /v5/compare
- Router: initRouter() takes context {testsuite, testsuites}, exports
  getTestsuites(); fixes testsuite derivation for agnostic context
- Nav bar: three link categories (suite-scoped SPA, analysis full-page
  with ?suite= prefill, admin)
- Graph page: suite <select> dropdown, currentSuite replaces closure ts,
  suiteGeneration counter guards async callbacks, full state reset on
  suite change
- Compare page: per-side suite selector, initSelection() replaces
  setCachedData(), per-side order/field fetching via fetchSideData(),
  per-side sample fetching (side A runs use side A's suite)
- Combobox: ComboboxContext uses getSuiteName(side)/getOrderData(side)
- Types: SideSelection.suite field, state encodes suite_a/suite_b
- Design and implementation plan docs updated

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Pinned orders were same-suite reference lines. Baselines are cross-suite
(suite, machine, order) tuples that allow comparing against any test suite.

Key changes:
- PinnedOrder → PinnedBaseline in time-series-chart.ts (label instead of
  orderValue, updated hover template)
- Graph page: baseline selector panel with cascading Suite → Machine →
  Order dropdowns, "+ Add baseline" expand/collapse UX
- Baseline data fetched independently from each baseline's suite via
  queryDataPoints(), cached per suite::machine::order::metric
- buildRefsFromCache → buildBaselinesFromData (reads from baseline cache
  instead of main trace data)
- URL encoding: baseline={suite}::{machine}::{order} (repeated param)
- Removed scaffold-based pinned order suggestions (rebuildSuggestions,
  cachedOrders, cachedSuggestions) — baseline order search uses per-suite
  API-based search

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… comments

- Promote blMachineCleanup/blOrderCleanup to module scope and clean up
  in unmount() to prevent listener leaks when navigating away with the
  baseline form open
- Call fetchAllBaselineData() in doPlot() so baselines are re-fetched
  when the metric changes (previously silently disappeared)
- Add comment documenting :: separator limitation in baseline URL encoding
- Update stale "pinned order" references in comments, test descriptions,
  and CSS
- Remove dead getOrders mock from graph tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Place the suite selector in the same flex row as the metric, filter, and
aggregation controls instead of its own full-width block.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Baseline form fields (Suite, Machine, Order) now display horizontally
  instead of stacking vertically, preventing misalignment with Machines
- Auto-add baseline when order is selected (no separate Add button needed)
- Removed blSelectedTag (unused) and blAddBtn (no longer needed)
- Normalize agg-select padding/font-size to match other controls, fixing
  vertical label alignment in the controls row
- Add CSS for baseline form horizontal layout

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use flex-start alignment on the second controls row so Machines and
Baselines labels stay top-aligned regardless of input height differences.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…age regressions

Extract createOrderPicker from createOrderCombobox so Graph baseline and
Compare page share the same order combobox UX with machine-order filtering
and tag display. Use lazy getOrderData getter so async-fetched order data
is available when the dropdown opens (fixes empty suggestions regression).

Also fix Compare page "Select a test suite first" hint: render Machine,
Order, Runs, and Run aggregation sections unconditionally and move the
hint into the Runs panel where it belongs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The query endpoint only supported after_order/before_order range filters
with strict inequality (> and <), making it impossible to query data at
a single order. Add an order parameter for exact matching (=), mutually
exclusive with the range filters (returns 400 if combined).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Baselines were never rendering because fetchAllBaselineData used
afterOrder=X & beforeOrder=X (strict exclusive range = empty result).
Switch to the new order param for exact-match queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add red halo (.combobox-invalid) to machine and order comboboxes when
  no suggestions match; block acceptance via Enter/blur while invalid.
- Enforce suite → machine → order dependency chain on Compare page by
  disabling each input until its predecessor is selected.
- Require exact-match for order picker Enter/blur acceptance — partial
  substring matches are rejected.
- Machine comboboxes fetch the full machine list once on creation and
  filter locally by case-insensitive substring (no per-keystroke API
  calls). Input is disabled with "Select a suite first" when no suite
  is selected.
- Fix arrow-key navigation: blur handlers check FocusEvent.relatedTarget
  to keep dropdown open during keyboard navigation. Add :focus style to
  dropdown items.
- Remove 20-item cap on machine suggestions (machines are always < 100).
- On Graph baseline form, onClear callback destroys the order picker
  when the machine is cleared. Clearing an order clears downstream runs.
- Run UUIDs in Compare runs panel are now links to Run Detail page.
- Metric area shows "Select a suite to load metrics..." before any suite
  is loaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…to /tests

Change the /query endpoint from GET to POST with a JSON body to
eliminate URL length limits when querying many tests with long names.
The test field accepts a list for disjunction queries. Unknown test
names are silently skipped. The schema uses marshmallow's unknown=RAISE
to reject unknown fields (returning 422 instead of the previous 400).

Add optional machine= and metric= query parameters to GET /tests so
clients can discover which tests have actual data for a given machine
and/or metric combination, joining through Sample -> Run with DISTINCT.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…uery

Restructure the Graph page data loading to fetch only the tests being
displayed instead of all data for a machine+metric:

- Discover matching test names server-side via GET /tests with machine,
  metric, and name_contains filters
- Fetch data only for discovered tests via POST /query with multi-value
  test in the JSON body (eliminates URL length limits)
- Cache data per (machine, metric, test) with LRU eviction at 500
  entries -- filter changes only fetch uncached tests (the delta)
- Enforce a hard cap of 50 displayed tests (replaces soft cap of 20)
- Case-sensitive test filtering (matches server-side SQL LIKE behavior)
- Baseline data fetching scoped to discovered tests only

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… Suites

Replace the three-category navbar (suite-scoped, analysis, admin) and suite
selector dropdown with a simpler flat layout where all links are suite-agnostic:

  [LNT] [Test Suites] [Graph] [Compare] [API]  <-->  [v4 UI] [Admin] [Settings]

- Remove suite selector dropdown from navbar entirely
- Remove suite-scoped links (Dashboard, Regressions, Machines) from navbar
- Add "Test Suites" link (suite-agnostic placeholder page at /v5/test-suites)
- Add "API" link (opens Swagger UI in new tab)
- Add suite-agnostic Dashboard placeholder at /v5/
- Move Admin and v4 UI to right side
- v4 UI link now points to v4 root page instead of suite-specific URL
- Extract buildNavLink() helper to unify link construction
- Pass abort signal to getApiKeys() refresh calls in admin page

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Serve a plain-text orientation document at GET /llms.txt following the
llms.txt convention (analogous to robots.txt). Helps AI agents quickly
understand LNT's domain concepts, API structure, and common workflows.

Content includes: what LNT is, key concepts (test suite, machine, order,
run, test, sample, regression, field change), endpoint listing, pagination
format, links to Swagger UI and OpenAPI spec, and common workflows.

- Registered as a plain Flask blueprint (not flask-smorest) to stay out
  of the OpenAPI spec
- Static content with Cache-Control (24h) and ETag headers
- Points to OpenAPI spec for full write endpoint documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Extract all graph page caching into a GraphDataCache class that manages
test data (LRU), baseline data, test names, and scaffolds behind a clean
async/sync API. This fixes three bugs:

- Baselines now appear immediately when added (not only after filter change)
- Changing aggregation now re-plots traces immediately
- Text filter changes use cached test names instead of re-querying the API

Also: remove batched chart rendering (caused flicker on legend toggle),
remove dead setsEqual, deduplicate test-name discovery logic, parallelize
baseline fetches, fix doPlot/filter race condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
ldionne and others added 30 commits April 17, 2026 07:50
…>=/<=)

The query_trends() DB method used >= and <= for after_time/before_time
filters, while all other endpoints (/query, /runs, /machines/{name}/runs)
used > and <. Standardize on exclusive bounds across the API.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ndpoints

Every non-read-scoped API endpoint now has three auth tests: no-auth→401,
boundary-scope→403, and exact-required-scope→2xx. Existing tests that
checked a non-boundary scope (e.g. read→403 on a manage endpoint) are
updated to test the correct boundary (one level below required), since the
full hierarchy is already unit-tested in test_auth.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…lick, multi-machine

UI changes (regression-detail.ts):
- Show regression title in page header (falls back to UUID)
- Notes field: display mode with Edit button, Ctrl/Cmd+Enter saves
- Enter key saves title and bug edits
- Select-all checkbox in indicator table with indeterminate state
- Shift+click range selection on indicator checkboxes (new checkbox-range.ts)
- Multi-machine selection in Add Indicators (checkbox list replaces combobox)
- Shift+click range on machine and test checkbox lists
- Preserve test selections when adding machines (prune unavailable only)
- Reset Add Indicators panel to clean state after adding
- Normalize bug URLs without protocol (ensureProtocol in utils.ts)

Other UI changes:
- Hide ordinal fallback in commit search dropdown
- Fix commit disappearing in create form (show selected value in input)

API change:
- Allow NULL regression title on create (remove auto-default)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace naive UTC datetimes with timezone-aware throughout the v5 stack:
- DB columns use DateTime(timezone=True) (TIMESTAMP WITH TIME ZONE)
- utcnow() returns aware datetime (no more .replace(tzinfo=None))
- parse_datetime() returns aware UTC (bare strings assumed UTC)
- API responses include Z suffix via new format_utc() helper
- Admin schema DateTime fields changed to String for consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Avoid dirtying the DB session on every authenticated GET by only
updating api_key.last_used_at when the existing value is NULL or
older than 1 hour. This eliminates unnecessary UPDATE + COMMIT on
pure read requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace 6 implementation plans and the submission guide (~8,800 lines)
with a concise implementation guide (~200 lines) covering only
cross-cutting conventions and patterns not already in the design docs.
Rename regressions-work-items.md to v5-todo.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add dump_response() that pipes hand-built response dicts through
schema.dump() + validate(), catching serializer-schema drift at dev
time. Extra keys raise ValueError, missing required fields raise
ValidationError.

Also fixes two pre-existing mismatches:
- MachineRuns now uses a dedicated 3-field serializer instead of
  reusing serialize_run() (which produces 5 fields)
- IndicatorResponseSchema.machine/test now allow_none=True

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…sible tests in indicators

Noise-hidden tests ("Hide noise" checkbox) are now fully removed from the
table DOM instead of being grayed out. Manually-hidden tests (click toggle)
remain grayed out. The two filters are independent.

Regression indicators from the "Add to Regression" panel now only include
tests currently visible in the table (respecting noise, manual, and text
filters).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The v5 SPA shell routes (/v5/...) rendered the data-testsuites attribute
server-side from the in-memory suite registry without calling ensure_fresh.
In a multi-worker gunicorn deployment, workers that did not handle the
suite creation request served stale pages with missing suites.

Fix: _setup_testsuite() now calls ensure_fresh() for V5DB instances.
v5_global() is refactored to use _setup_testsuite('') instead of
duplicating its logic.

Also fixes a latent race in _load_schemas_from_db: under READ COMMITTED,
reading schemas before the version counter could leave the cached version
ahead of the actual schemas after a concurrent modification, permanently
preventing reloads. The method now reads version first, schemas second,
builds into a new dict (atomic swap), and sets _schema_version last so
a failed rebuild leaves the version stale for retry.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The v5 UI displays commits in ~20 locations, but most endpoints (runs,
query, regressions) return bare commit strings without field values.
This makes it impossible to show display-field values (e.g. git SHA
instead of numeric revision) without N individual commit lookups.

Add a batch resolve endpoint that accepts a list of commit strings and
returns their summaries (ordinal + fields) as a dict keyed by commit
string.  The frontend can call this once per page to resolve display
values for all visible commits in a single round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
When a test suite schema defines a commit_field with display: true (e.g.
short_sha), the UI now shows that value instead of the raw commit string
wherever commits are displayed.  Falls back to the raw string when no
display field is configured or the field is not populated.

Foundation:
- api.ts: resolveCommits() calls POST /commits/resolve for batch
  lookups; getTestSuiteInfoCached() memoizes schema per suite
- utils.ts: resolveDisplayMap() wraps schema + resolve into a single
  async helper with graceful fallback (re-throws AbortError)

Pages with CommitSummary data (no extra API call needed):
- Test Suites Commits tab, commit-search callers in regression
  detail/list use commitDisplayValue() directly with cached schema

Pages with raw commit strings (batch resolve before render):
- Test Suites Recent Activity + Runs tabs, Machine Detail run table,
  Run Detail metadata, Regression List/Detail commit fields

Graph page:
- Scaffold retains CommitSummary.fields; scaffoldUnion computes display
  map lazily from current commitFields (avoids race with schema fetch)
- All Plotly coordinates (categoryarray, trace x, baseline x, regression
  overlays, scatter-on-hover) use display values so auto-tick-density
  works naturally; raw strings preserved in customdata for click handler
- Baseline chips and hover tooltips show display values

Compare page:
- selection.ts awaits schema in Promise.all (no race condition)
- combobox shows display values in dropdown items; filters match both
  raw and display strings; input.value keeps raw string for validation

Design docs updated: browsing.md, graph.md, compare.md specify display
field behavior and fallback to raw commit strings.

Tests: 22 new tests covering resolveCommits, getTestSuiteInfoCached,
resolveDisplayMap, chart displayMap, combobox displayMap, and
scaffoldUnion with commitFields.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The Dashboard sparklines were using Run.submitted_at for both filtering
(30d/90d/1y presets) and x-axis positioning, which is incorrect since
submission date has no meaningful relationship to the commit being tested.

Replace after_time/before_time on POST /trends with a last_n parameter
that returns the most recent N commits by ordinal. Commits without
ordinals are now excluded from trends results. The frontend uses
"Last 100 / Last 500 / Last 1000" presets (default 500) and plots
sparklines with evenly-spaced sequential x-indices mapped from ordinals,
ensuring traces from different machines align at the same commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
v5 was originally developed as a UI overlay that coexisted with v4 in the
same running instance. Now that v5 is a fully disjoint greenfield project,
remove the v5 frontend blueprint and v5 REST API registration from the v4
code path. Also remove the "v5 UI" link from the v4 layout template and
update design/implementation docs to reflect the new architecture.

Consolidate test_spa_shell.py and test_spa_shell_v5only.py into a single
test file running against a v5-only instance.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The v4 UI button linked to the v4 interface, but since v4 and v5 are now
fully disjoint it always resolved to an empty URL. Remove the entire v4 URL
plumbing chain: _v4_url() helper, data-v4-url template attribute, v4Url in
the TypeScript nav component, and related tests. Also simplify _v5_url_base()
to just return request.script_root (the url_for fallback was dead code).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace the per-test get_or_create_test loop in _parse_tests_data with a
batch approach using SELECT ... WHERE name IN (...) + INSERT ... ON CONFLICT
DO NOTHING. This reduces DB round-trips from 7,500+ per submission to 1-3,
regardless of test count.

Remove the now-unused single-test get_or_create_test method and rename the
batch method to get_or_create_tests (plural).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace ORM-based create_samples (one INSERT per row via add_all + flush)
with pg_insert().values() bulk INSERT. This reduces 7,500 individual
INSERT statements to 1-3 multi-row INSERTs per submission.

Also validates the union of all sample metric keys (not just the first
sample) and correctly handles heterogeneous metric subsets across tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The existing (run_id, test_id) index cannot serve queries that filter on
test_id first — the dominant access pattern for time-series queries. Add
a complementary (test_id, run_id) index to give these queries an index
range scan instead of a full table scan on the 100M+ row Sample table.

For existing deployments, create the index manually with:
  CREATE INDEX CONCURRENTLY ix_<suite>_Sample_test_id_run_id
      ON "<suite>_Sample" (test_id, run_id);

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace the two-step fetch pattern (all commits + machine runs for
client-side Set filtering) with a single GET /commits?machine={name}
call. The server-side filter already existed but the frontend wasn't
using it. This eliminates fetching thousands of commits only to discard
most of them, and fixes a silent data-loss bug where machines with >500
runs had an incomplete commit set (fetchMachineCommitSet only fetched
one page of 500 runs).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Enable gzip/brotli/zstd compression for v5 API responses. A 3 MB query
response compresses to ~450 KB, reducing bandwidth by 80-90%.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… DB model

Complete profiles implementation for LNT v5:

- DB: Profile model with deferred LargeBinary, binary format parser (ProfileV2),
  submission integration with base64 decoding and 50 MB size limit
- API: UUID-based endpoints for profile listing, metadata, functions, and
  per-function disassembly with instruction-level counters
- UI: Profiles page with A/B picker (cascading suite/machine/commit/run/test
  selectors), stats bar, function selector sorted by hotness, and straight-line
  disassembly viewer with heat-map coloring
- UI: Profile links on Compare page (both-sides and single-side) and Run Detail
  page (per-test)
- Commit and run dropdowns filter to only show entries with profile data
- Design docs, implementation guide, and TODO list updated

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a built-in `tag` column (String(256), nullable, partial index) to
every suite's Commit table, restoring a v4 feature that was dropped in
the initial v5 design. Tags are human-readable labels (e.g. "release-18")
set exclusively via PATCH /commits/{value}, searchable via ?search=, and
displayed as `<display_value> (tag)` throughout the UI.

DB: tag column with partial index (WHERE tag IS NOT NULL), update_commit()
gains tag/clear_tag kwargs, list_commits() and API search include tag in
OR clause, query_time_series/query_trends include tag in results.

API: tag field on CommitSummary/Detail/Neighbor/Update schemas, query and
trends schemas. PATCH handler supports set/clear. Length(max=256) validation.

UI: commitDisplayValue() refactored to accept a CommitSummary-like object
(cleaner than 4 positional params). Tag appended as "(tag)" when present.
Inline tag editing on commit detail page. Tag column in commits table.
CSS class renamed from ordinal-display to editable-field with data-field
attributes for robust test selectors.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace the single %-based noise threshold with three independent noise
filtering knobs: Delta % below, P-value above (Welch's t-test), and
Absolute below (measurement floor). A test is classified as noise if it
fails any enabled knob. All knobs default to disabled.

Design:
- Collapsible "Noise filtering" panel (collapsed by default) expands
  downward as a floating overlay without shifting other controls
- Each knob has an enable checkbox, help tooltip, and validated input
- "Hide noise" checkbox remains outside the collapsible, always visible
- Noise rows distinguished by grey Status cell text (no row-level opacity)
- Tooltip on noise Status cells lists all triggered knobs with values

Implementation:
- types.ts: NoiseConfig, NoiseKnob, NoiseReason types; AppState and
  ComparisonRow updated
- comparison.ts: Welch's t-test (regularized incomplete beta via Lentz's
  continued fraction), refactored aggregation via groupSamplesByTest to
  share raw sample data, multi-knob noise classification with per-knob
  reason messages, delta=0 always-noise guard
- state.ts: noiseConfig with setNoiseConfig helper, URL encode/decode for
  6 params (all _on params absent when disabled, present as '1' when
  enabled), legacy ?noise migration
- compare.ts: cached raw samples and aggregated maps so noise-only changes
  skip re-aggregation via reclassifyFromCache fast path
- selection.ts: collapsible details/summary with three knob rows
- table.ts: noise tooltip on Status cells
- chart.ts: noise band conditional on Delta % knob being enabled

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a horizontal summary bar between the chart and table on the Compare
page showing count and percentage of tests in each status category:
Improved, Regressed, Noise, Unchanged, Only in A, Only in B, N/A.

The summary bar respects text filter and chart zoom but always shows
noise counts even when "Hide noise" is active. Zero-count categories
are shown with reduced opacity.

Also revises the classification of delta=0 tests: when no noise knobs
are enabled, these are now classified as "unchanged" instead of "noise",
making the distinction meaningful. When a noise knob catches delta=0
(e.g. the pct knob with any threshold >= 0), it remains "noise" with
proper noise reasons.

Extracts shared STATUS_COLORS constant into utils.ts for chart and
summary bar color consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Remove 16 completed items (13 previously checked off + 3 newly confirmed
done: sample aggregation fix, p-value graying via noise knobs, and API
key copy button).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The commit picker input field was showing the raw commit string (full SHA)
after selecting from the dropdown, even though the dropdown items correctly
showed the display value (e.g. short SHA + tag). Fix by using the already-
computed displayText instead of the raw value in the click handler.

Also resolves display values when restoring state from URL or after machine
selection loads commits. Refactors module-level commit references to store
the full CommitPickerHandle instead of raw HTMLInputElement, so the machine
combobox uses picker.setValue() instead of manually resolving display values.

Fixes a race condition on page reload where fetchCommitsForMachine could
resolve before fetchSideData, leaving commitFieldsCache empty. Adds
refreshCommitDisplay() to re-resolve the display after the cache is
populated.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Compare page: chart zoom filter leak, "View on Graph" link per test row,
"Add to Existing" dropdown not collapsing on selection.

Regression detail page (new section): indicator search/filter, indicator
summary count, duplicate schema fetch.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant