From dad184e455e7b7b43f9cf8d025e22dd76283eae9 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 03:10:55 +0000 Subject: [PATCH] docs(scaling-demo): seed delta log with recon-pass deltas Records the verification pass against live main (60b727f): NEXT-STEPS.md is present (dependency satisfied), the gold schema is expectAnswerMode not must-* tags, the keyless headline needs committed query vectors, the int8 path reuses retrieve/no-leak/store without forking, and README fix 2.1 (the "eight lines" claim) has not landed. Reconnaissance only; no demo code, corpus, harness, or NEXT-STEPS edits. https://claude.ai/code/session_01CQQe5VjjDgpCj7hYcoVv8Y --- docs/scaling-demo/scaling-demo-delta-log.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/scaling-demo/scaling-demo-delta-log.md b/docs/scaling-demo/scaling-demo-delta-log.md index 53dbffb..0e40d56 100644 --- a/docs/scaling-demo/scaling-demo-delta-log.md +++ b/docs/scaling-demo/scaling-demo-delta-log.md @@ -27,6 +27,22 @@ For each assumption the spec makes, record what the build actually did and what | 7 | FP vectors commit cleanly and the default run reproduces with no key | _fill: yes / issue_ | `spec` | Confirms the no-key headline claim | | 8 | Demo is a thin module: reuses `src/retrieve.ts` + `src/no-leak.ts` untouched, no second pipeline | _fill: stayed thin / needed more_ | `spec`; **halt if it needs its own pipeline** | If it can't stay thin, propose a sibling repo per the budget rule — do not bloat | +## Recon-pass deltas (verification against `60b727f`, 2026-06-16) + +Seeded before any build code, from the reconnaissance pass. The spec's "Confirmed against the live repo" block was reconciled 2026-06-15; this records where the live repo at `60b727f` (now `origin/main`, identical to the build branch) already diverged from the spec's assumptions, or confirmed a doubted point. + +| # | Spec assumption / open question | What the repo actually shows | Touches | Downstream action | +|---|---|---|---|---| +| R1 | `NEXT-STEPS.md` "not yet on `main`; this ticket waits on it" (spec header) | Present on `origin/main` at `60b727f`, alongside `CONTRIBUTING.md` and `docs/production-scaling.md`. The dependency is satisfied; nothing waits on it | `NEXT-STEPS` | None to wait on; the demo links into it at merge (see R2) | +| R2 | "Linked from `NEXT-STEPS.md` §C1 ('a runnable miniature ships at `scaling/`')" | §C1 is titled "More aggressive quantization" (int8 to int4/PQ/binary). No "runnable miniature" sentence or `scaling/` link exists yet. §C1 is the right home (its int4 paragraph is the demo's deliberate-failure lever); the link does not pre-exist | `NEXT-STEPS` | Demo ADDS the §C1 link and the C-intro core-vs-miniature carve-out at merge. Not a precondition | +| R3 | Brief's fix 2.1 "is correcting the README's 'eight lines'; don't reintroduce a number" | `README.md:91-92` still reads "`toRoutingHint` is eight lines", wrapped in em-dashes. Fix 2.1 has NOT landed (fix 2.4, em-dash thinning of `production-scaling.md`, HAS: 0 em-dashes there) | `nothing` (core-doc seam owned by the brief) | Do not restate any line count in demo prose. Flag to author that 2.1 is still pending; do not fix it here (additive-only mandate) | +| R4 | "Same three-mode shape ... `must-answer` / `must-refuse` / `must-route`" (spec §3) | Gold schema has no `must-*` tags. It is `expectAnswerMode` (one of supported/partial/related-material/not-found, required) plus `expectSources`/`forbidSources`/`forbidRecordCitations`/`forbidAnswerPatterns` (`src/evaluate.ts` `loadGold`). The three "modes" are a conceptual grouping | `spec` | `scaling/gold.yaml` uses the real schema; keep the spec's must-* language as conceptual only | +| R5 | "Default run reproducible without a key ... runs the full gold suite (all three modes) on the quantized index" (spec §5) | `src/cli/eval.ts` embeds gold queries LIVE (`OPENAI_API_KEY` required, lines 169-171). Keyless = `judgeRetrieval` (`expectSources`/`forbidSources`, pure). Answer-mode label needs `--full` -> `judgeAnswer` -> the answer model (key). The structural `not-found` short-circuit (`answer.ts:174`) is the one keyless mode verdict | `spec` (§5) | Demo commits GOLD QUERY vectors too, not only corpus FP vectors. Headline (keyless) = rank-corr + `judgeRetrieval` over committed vectors; `--full` answer-mode pass is the optional key-gated tier. State the two-tier gate precisely. Refuse cases must use `forbidSources` (a near-floor source named) to bite keyless | +| R6 | "Reuse `src/retrieve.ts` + `src/no-leak.ts` untouched, no second pipeline" (budget rule) | Reusable as parameterized pure functions: `readIndexFile(path)` / `writeIndexFile(entries, path)` accept a path (`src/store.ts:40,67`); `buildCorpus(config)` / `buildPrivateNotes(config)` take a config; `retrieve()`, `assembleEvidence`/`toRoutingHint`, `judgeRetrieval`/`judgeAnswer`/`loadGold`, `answerQuestion` are pure/parameterized. The demo points them at `scaling/` data with no fork | `nothing` (confirms pre-seeded row 8) | Build the thin module; no sibling repo needed. Budget rule satisfied at recon | +| R7 | "`production-scaling.md` did not appear in the top-level listing; confirm where it lives" (spec §7) | Lives at `docs/production-scaling.md` (already cross-linked from `README.md:273`). §2 is the int8 prose this demo runs. Em-dashes: 0 (fix 2.4 landed) | `spec` (§7) | Cross-link `docs/production-scaling.md` <-> `scaling/README.md`; do not duplicate its argument | +| R8 | Boosts/floor/wire-format constants | Confirmed verbatim in `src/retrieve.ts` AND `NEXT-STEPS.md` B1: `EXACT_MATCH_BOOST = 0.3`, `THEME_BOOST = 0.15`, `SCORE_FLOOR = 0.2`. `INDEX_SCHEMA_VERSION = 2` (`src/store.ts:18`): the FP wire format is already versioned | `nothing` | Reuse the constants. The demo's int8 wire format needs its OWN version stamp (spec §5: version the wire so a code/data mismatch fails loudly) | +| R9 | Pipeline chain (spec confirmed block) | `validateAnswer` -> `repairCitationsToEvidence` (re-derives mode via `deriveMode`, `answer.ts:123`) -> `assertCitationsGroundedInEvidence` is exact (`answer.ts:196-198`). Four modes live (`types.ts:83`). `no-leak.ts` boundary intact (`RoutingHint` has no text field). Default models `text-embedding-3-large` / `gpt-4o-mini`. Index `artifacts/index.json` gitignored. All frontmatter fields confirmed in `corpus.ts` | `nothing` | None; confirmed as assumed | + ## Open-ended rows (add as testing surfaces them) | # | Spec assumption | What the build actually did | Touches | Downstream action |