Add Auto Review proof metrics and dogfood diagnostics

## Summary

Add proof instrumentation so the durable Auto Review concept can be evaluated with data, not vibes.

## Scope

- Emit structured counters/events for run lifecycle, duplicate reuse/skips, supersede/cancel reasons, findings surfaced/inspected/applied/dismissed, ledger tokens, detail tokens, and token-spend estimates.
- Add a diagnostic surface such as `/review-stats` if useful.
- Build deterministic scanners or fixtures for stale/superseded/duplicate/fix-train signatures from logs/rollouts where appropriate.

## Acceptance Criteria

- [ ] Metrics include duplicate review rate, skipped/adopted/superseded/cancelled counts, unsurfaced terminal findings, ledger overhead, avoided token estimate, time to surface findings, and finding usefulness/disposition.
- [ ] Each Auto Review run records enough proof data to explain latency: model, reasoning effort, resolve model/effort, phase timing, follow-up count, token count when available, prompt token estimate, and terminal reason.
- [ ] Restart recovery and duplicate avoidance are testable without a live TUI where possible.
- [ ] Dogfood diagnostics can compare before/after behavior across real sessions and identify whether slowness came from first review pass, follow-up loops, worktree/lock contention, retries, or prompt bloat.
- [ ] Metrics do not inject bulky telemetry into normal assistant context; ordinary turns receive only bounded actionable review state.

## Relationships

Parent: #324
Depends on: #325, #327, #329
Related: #43, #50

## Finish Line

Every Code emits enough Auto Review metrics and diagnostics to prove duplicate review reduction, avoided token spend, surfaced findings, ledger overhead, restart recovery, and finding usefulness during dogfooding.

## Current Status

State: Scoped proof-metrics implementation merged.
Merged PR: #381 feat(auto-review): add proof metrics to compact ledger
Merge commit: `5bb9fbf704aa2968bac1e44f328e8f4b3d0c458c`
Branch: `fix/auto-review-proof-metrics` (remote branch deleted after merge).
Next action: #331 Auto Review lifecycle docs can proceed against the merged durable Auto Review behavior. Broader prompt/context/token-budget/request-shape accounting remains gated by #92 and should not be started while the token-count refactor is active.
Blocked by: None for #331 docs. Broader prompt/context/token metrics remain blocked by #92.
Last verified: 2026-06-05 after focused tests, required build, PR CI, Claude review, and merge.

Completed in #381:
- Compact Auto Review diagnostics now count duplicate-skipped runs, skipped runs without saved tokens, superseded clean duplicates, failed/cancelled/lost terminal proof outcomes, saved token estimates, and existing prompt/token/timing signals.
- Duplicate/superseded/cancelled/lost proof stays in compact diagnostics without surfacing bulky run details.
- Focused review-store tests cover duplicate proof, superseded proof, terminal outcome counts, old proof-run omission, and active-run plus dedupe proof combinations.

Review and validation:
- Claude Sonnet final review found no correctness bugs and marked the PR green for merge.
- `cargo test -p code-core compact_ledger --lib`
- `./build-fast.sh`
- PR #381 GitHub check passed before merge.

Residual scope:
- This PR does not implement broader prompt/context budget enforcement or request-shape accounting; those remain behind #92.
- Future polish noted by review: add symmetric tests for clean Failed/Lost omission and Cancelled/Lost with error detail if those paths are touched again.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Auto Review proof metrics and dogfood diagnostics #330

Summary

Scope

Acceptance Criteria

Relationships

Finish Line

Current Status

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Auto Review proof metrics and dogfood diagnostics #330

Description

Summary

Scope

Acceptance Criteria

Relationships

Finish Line

Current Status

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions