Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/scaling-demo/scaling-demo-delta-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,22 @@ For each assumption the spec makes, record what the build actually did and what
| 7 | FP vectors commit cleanly and the default run reproduces with no key | _fill: yes / issue_ | `spec` | Confirms the no-key headline claim |
| 8 | Demo is a thin module: reuses `src/retrieve.ts` + `src/no-leak.ts` untouched, no second pipeline | _fill: stayed thin / needed more_ | `spec`; **halt if it needs its own pipeline** | If it can't stay thin, propose a sibling repo per the budget rule — do not bloat |

## Recon-pass deltas (verification against `60b727f`, 2026-06-16)

Seeded before any build code, from the reconnaissance pass. The spec's "Confirmed against the live repo" block was reconciled 2026-06-15; this records where the live repo at `60b727f` (now `origin/main`, identical to the build branch) already diverged from the spec's assumptions, or confirmed a doubted point.

| # | Spec assumption / open question | What the repo actually shows | Touches | Downstream action |
|---|---|---|---|---|
| R1 | `NEXT-STEPS.md` "not yet on `main`; this ticket waits on it" (spec header) | Present on `origin/main` at `60b727f`, alongside `CONTRIBUTING.md` and `docs/production-scaling.md`. The dependency is satisfied; nothing waits on it | `NEXT-STEPS` | None to wait on; the demo links into it at merge (see R2) |
| R2 | "Linked from `NEXT-STEPS.md` §C1 ('a runnable miniature ships at `scaling/`')" | §C1 is titled "More aggressive quantization" (int8 to int4/PQ/binary). No "runnable miniature" sentence or `scaling/` link exists yet. §C1 is the right home (its int4 paragraph is the demo's deliberate-failure lever); the link does not pre-exist | `NEXT-STEPS` | Demo ADDS the §C1 link and the C-intro core-vs-miniature carve-out at merge. Not a precondition |
| R3 | Brief's fix 2.1 "is correcting the README's 'eight lines'; don't reintroduce a number" | `README.md:91-92` still reads "`toRoutingHint` is eight lines", wrapped in em-dashes. Fix 2.1 has NOT landed (fix 2.4, em-dash thinning of `production-scaling.md`, HAS: 0 em-dashes there) | `nothing` (core-doc seam owned by the brief) | Do not restate any line count in demo prose. Flag to author that 2.1 is still pending; do not fix it here (additive-only mandate) |
| R4 | "Same three-mode shape ... `must-answer` / `must-refuse` / `must-route`" (spec §3) | Gold schema has no `must-*` tags. It is `expectAnswerMode` (one of supported/partial/related-material/not-found, required) plus `expectSources`/`forbidSources`/`forbidRecordCitations`/`forbidAnswerPatterns` (`src/evaluate.ts` `loadGold`). The three "modes" are a conceptual grouping | `spec` | `scaling/gold.yaml` uses the real schema; keep the spec's must-* language as conceptual only |
| R5 | "Default run reproducible without a key ... runs the full gold suite (all three modes) on the quantized index" (spec §5) | `src/cli/eval.ts` embeds gold queries LIVE (`OPENAI_API_KEY` required, lines 169-171). Keyless = `judgeRetrieval` (`expectSources`/`forbidSources`, pure). Answer-mode label needs `--full` -> `judgeAnswer` -> the answer model (key). The structural `not-found` short-circuit (`answer.ts:174`) is the one keyless mode verdict | `spec` (§5) | Demo commits GOLD QUERY vectors too, not only corpus FP vectors. Headline (keyless) = rank-corr + `judgeRetrieval` over committed vectors; `--full` answer-mode pass is the optional key-gated tier. State the two-tier gate precisely. Refuse cases must use `forbidSources` (a near-floor source named) to bite keyless |
| R6 | "Reuse `src/retrieve.ts` + `src/no-leak.ts` untouched, no second pipeline" (budget rule) | Reusable as parameterized pure functions: `readIndexFile(path)` / `writeIndexFile(entries, path)` accept a path (`src/store.ts:40,67`); `buildCorpus(config)` / `buildPrivateNotes(config)` take a config; `retrieve()`, `assembleEvidence`/`toRoutingHint`, `judgeRetrieval`/`judgeAnswer`/`loadGold`, `answerQuestion` are pure/parameterized. The demo points them at `scaling/` data with no fork | `nothing` (confirms pre-seeded row 8) | Build the thin module; no sibling repo needed. Budget rule satisfied at recon |
| R7 | "`production-scaling.md` did not appear in the top-level listing; confirm where it lives" (spec §7) | Lives at `docs/production-scaling.md` (already cross-linked from `README.md:273`). §2 is the int8 prose this demo runs. Em-dashes: 0 (fix 2.4 landed) | `spec` (§7) | Cross-link `docs/production-scaling.md` <-> `scaling/README.md`; do not duplicate its argument |
| R8 | Boosts/floor/wire-format constants | Confirmed verbatim in `src/retrieve.ts` AND `NEXT-STEPS.md` B1: `EXACT_MATCH_BOOST = 0.3`, `THEME_BOOST = 0.15`, `SCORE_FLOOR = 0.2`. `INDEX_SCHEMA_VERSION = 2` (`src/store.ts:18`): the FP wire format is already versioned | `nothing` | Reuse the constants. The demo's int8 wire format needs its OWN version stamp (spec §5: version the wire so a code/data mismatch fails loudly) |
| R9 | Pipeline chain (spec confirmed block) | `validateAnswer` -> `repairCitationsToEvidence` (re-derives mode via `deriveMode`, `answer.ts:123`) -> `assertCitationsGroundedInEvidence` is exact (`answer.ts:196-198`). Four modes live (`types.ts:83`). `no-leak.ts` boundary intact (`RoutingHint` has no text field). Default models `text-embedding-3-large` / `gpt-4o-mini`. Index `artifacts/index.json` gitignored. All frontmatter fields confirmed in `corpus.ts` | `nothing` | None; confirmed as assumed |

## Open-ended rows (add as testing surfaces them)

| # | Spec assumption | What the build actually did | Touches | Downstream action |
Expand Down