Add recon-pass delta log for scaling demo spec verification by lukefwalton · Pull Request #13 · lukefwalton/answer-engine

lukefwalton · 2026-06-16T03:24:43Z

Summary

Documents the reconnaissance pass verification of the scaling demo specification against the live repository state at commit 60b727f. This delta log reconciles the spec's assumptions with actual implementation details discovered during pre-build analysis.

Changes

Added "Recon-pass deltas" section to docs/scaling-demo/scaling-demo-delta-log.md with 9 verification rows (R1–R9)
Records divergences between spec assumptions and live repo state, including:
- R1–R2: NEXT-STEPS.md dependency status and linking requirements
- R3: README line-count documentation state (fix 2.1 pending)
- R4: Gold schema structure (actual vs. conceptual must-* modes)
- R5: Keyless reproducibility constraints and two-tier evaluation gate
- R6: Budget rule confirmation (thin module reusability verified)
- R7: production-scaling.md location and cross-linking
- R8: Boosts/floor/wire-format constants and versioning strategy
- R9: Pipeline chain and model defaults confirmation

Notable Details

Establishes the two-tier evaluation model: keyless tier (rank correlation + judgeRetrieval on committed vectors) and optional key-gated answer-mode tier
Confirms all core functions (retrieve, assembleEvidence, judgeRetrieval/judgeAnswer, answerQuestion) are reusable without forking
Identifies that demo must commit gold query vectors in addition to corpus FP vectors
Clarifies that forbidSources (near-floor source) is the mechanism for keyless refuse-case verdicts
Notes int8 wire format requires its own version stamp separate from existing INDEX_SCHEMA_VERSION = 2

https://claude.ai/code/session_01CQQe5VjjDgpCj7hYcoVv8Y

Records the verification pass against live main (60b727f): NEXT-STEPS.md is present (dependency satisfied), the gold schema is expectAnswerMode not must-* tags, the keyless headline needs committed query vectors, the int8 path reuses retrieve/no-leak/store without forking, and README fix 2.1 (the "eight lines" claim) has not landed. Reconnaissance only; no demo code, corpus, harness, or NEXT-STEPS edits. https://claude.ai/code/session_01CQQe5VjjDgpCj7hYcoVv8Y

surmado-code-review · 2026-06-16T03:25:14Z

Automated Checks (advisory, non-blocking)

✅ All checks passed.

Standards Compliance

This looks docs-only (docs/scaling-demo/scaling-demo-delta-log.md), so I don’t see a direct code-path change touching the repo’s non-negotiables around assembleEvidence, citation grounding, mode derivation, or not-found behavior.

That said, this doc now appears to make normative claims about boundary/eval semantics, so the main standards risk is documentation drift rather than runtime behavior. In particular, the rows called out in the PR description around:

R4: conceptual must-* modes vs actual gold schema fields
R5: keyless vs key-gated evaluation tiers
R8: boosts/floor/wire-format constants and versioning
R9: pipeline chain and model defaults

are worth spot-checking against the live code/tests before merge, because future work could incorrectly implement the doc instead of the code if any of those assertions are off. That’s especially relevant to the standards on mode semantics staying aligned with gold eval and not loosening grounding.

Summary

This PR adds a recon/delta section to the scaling demo documentation, reconciling spec assumptions with the repository state at commit 60b727f. It doesn’t appear to change runtime behavior, but it does document several implementation-sensitive assumptions about eval gates, schema semantics, and pipeline reuse that could influence follow-on work.

Reviewer: most of the risk is in whether R4/R5/R8/R9 accurately describe current code and eval behavior — the rest looks like documentation bookkeeping.

What to pay attention to

Schema/eval terminology alignment (R4, R5)
The doc seems to translate conceptual spec language into actual repo fields like expectAnswerMode, forbidRecordCitations, and forbidSources. That mapping deserves a careful read because a subtle mismatch here could mislead future implementation around the repo’s grounding and refusal semantics.
Versioning claims for new wire format (R8)
If the doc says int8 needs its own version stamp separate from INDEX_SCHEMA_VERSION = 2, that’s a useful constraint — but it should match what the codebase currently treats as schema vs transport/wire compatibility.
“Thin module / no second pipeline” claim (R6, R9)
This is probably right, but it’s an architectural assertion with downstream consequences. Worth checking that the documented reusable units really are the ones future demo work will rely on.

Things I noticed

🟡 Yellow flags — consider for this PR or a follow-up:

This PR appears to add behavioral/documentation assertions without codifying any of them in tests. That’s fine for a recon log, but if any of R4/R5/R8 are intended to become implementation constraints, they could use a follow-up issue or test coverage so they don’t become stale prose.
Because the diff itself wasn’t included here, I can’t verify whether each recon row cites exact file/line evidence. If it doesn’t, that would make the log easier to drift from reality over time.

Good patterns

Good call to capture these as deltas against a specific commit instead of silently updating the spec assumptions.
I also like that the PR description keeps the emphasis on reusing existing pipeline pieces rather than proposing a special-case/demo-only path.

Suggested improvements

For the rows that touch repo invariants (R4, R5, R8, R9), make sure each one points to the exact source-of-truth file/function/test so readers can re-verify quickly.
If not already present in the new section, mark which rows are descriptive vs prescriptive. That would help separate “what the repo does today” from “what the demo should implement next.”
Consider adding a short follow-up checklist or issue reference for the documented-but-pending items (for example the README fix and any version-stamp work), so the delta log stays historical instead of becoming a second backlog.

Questions for the author

For the R4/R5 rows, are those mappings taken directly from current eval/gold.yaml semantics and judge logic, or are any of them inferred from intended behavior?
Does the new section include code/test references for each recon row, or is some of it based on manual inspection without line-level anchors?

Surmado Code Review (v1.2-mt) is an automated review, designed to work alongside human judgment.

Want to change your STANDARDS.md or YML? Edit it directly, or tune it with our AI agent Scout.

Comment /rerun-review on this PR to refresh the review — costs 1 additional PR credit.

lukefwalton closed this Jun 16, 2026

lukefwalton deleted the claude/pensive-mccarthy-z937tl branch June 16, 2026 03:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recon-pass delta log for scaling demo spec verification#13

Add recon-pass delta log for scaling demo spec verification#13
lukefwalton wants to merge 1 commit into
mainfrom
claude/pensive-mccarthy-z937tl

lukefwalton commented Jun 16, 2026

Uh oh!

surmado-code-review Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukefwalton commented Jun 16, 2026

Summary

Changes

Notable Details

Uh oh!

surmado-code-review Bot commented Jun 16, 2026

Automated Checks (advisory, non-blocking)

Standards Compliance

Summary

What to pay attention to

Things I noticed

Good patterns

Suggested improvements

Questions for the author

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants