Skip to content

Add recon-pass delta log for scaling demo spec verification#13

Closed
lukefwalton wants to merge 1 commit into
mainfrom
claude/pensive-mccarthy-z937tl
Closed

Add recon-pass delta log for scaling demo spec verification#13
lukefwalton wants to merge 1 commit into
mainfrom
claude/pensive-mccarthy-z937tl

Conversation

@lukefwalton

Copy link
Copy Markdown
Owner

Summary

Documents the reconnaissance pass verification of the scaling demo specification against the live repository state at commit 60b727f. This delta log reconciles the spec's assumptions with actual implementation details discovered during pre-build analysis.

Changes

  • Added "Recon-pass deltas" section to docs/scaling-demo/scaling-demo-delta-log.md with 9 verification rows (R1–R9)
  • Records divergences between spec assumptions and live repo state, including:
    • R1–R2: NEXT-STEPS.md dependency status and linking requirements
    • R3: README line-count documentation state (fix 2.1 pending)
    • R4: Gold schema structure (actual vs. conceptual must-* modes)
    • R5: Keyless reproducibility constraints and two-tier evaluation gate
    • R6: Budget rule confirmation (thin module reusability verified)
    • R7: production-scaling.md location and cross-linking
    • R8: Boosts/floor/wire-format constants and versioning strategy
    • R9: Pipeline chain and model defaults confirmation

Notable Details

  • Establishes the two-tier evaluation model: keyless tier (rank correlation + judgeRetrieval on committed vectors) and optional key-gated answer-mode tier
  • Confirms all core functions (retrieve, assembleEvidence, judgeRetrieval/judgeAnswer, answerQuestion) are reusable without forking
  • Identifies that demo must commit gold query vectors in addition to corpus FP vectors
  • Clarifies that forbidSources (near-floor source) is the mechanism for keyless refuse-case verdicts
  • Notes int8 wire format requires its own version stamp separate from existing INDEX_SCHEMA_VERSION = 2

https://claude.ai/code/session_01CQQe5VjjDgpCj7hYcoVv8Y

Records the verification pass against live main (60b727f): NEXT-STEPS.md
is present (dependency satisfied), the gold schema is expectAnswerMode
not must-* tags, the keyless headline needs committed query vectors, the
int8 path reuses retrieve/no-leak/store without forking, and README fix
2.1 (the "eight lines" claim) has not landed. Reconnaissance only; no
demo code, corpus, harness, or NEXT-STEPS edits.

https://claude.ai/code/session_01CQQe5VjjDgpCj7hYcoVv8Y
@surmado-code-review

Copy link
Copy Markdown
Contributor

Automated Checks (advisory, non-blocking)

✅ All checks passed.


Standards Compliance

This looks docs-only (docs/scaling-demo/scaling-demo-delta-log.md), so I don’t see a direct code-path change touching the repo’s non-negotiables around assembleEvidence, citation grounding, mode derivation, or not-found behavior.

That said, this doc now appears to make normative claims about boundary/eval semantics, so the main standards risk is documentation drift rather than runtime behavior. In particular, the rows called out in the PR description around:

  • R4: conceptual must-* modes vs actual gold schema fields
  • R5: keyless vs key-gated evaluation tiers
  • R8: boosts/floor/wire-format constants and versioning
  • R9: pipeline chain and model defaults

are worth spot-checking against the live code/tests before merge, because future work could incorrectly implement the doc instead of the code if any of those assertions are off. That’s especially relevant to the standards on mode semantics staying aligned with gold eval and not loosening grounding.

Summary

This PR adds a recon/delta section to the scaling demo documentation, reconciling spec assumptions with the repository state at commit 60b727f. It doesn’t appear to change runtime behavior, but it does document several implementation-sensitive assumptions about eval gates, schema semantics, and pipeline reuse that could influence follow-on work.

Reviewer: most of the risk is in whether R4/R5/R8/R9 accurately describe current code and eval behavior — the rest looks like documentation bookkeeping.

What to pay attention to

  • Schema/eval terminology alignment (R4, R5)
    The doc seems to translate conceptual spec language into actual repo fields like expectAnswerMode, forbidRecordCitations, and forbidSources. That mapping deserves a careful read because a subtle mismatch here could mislead future implementation around the repo’s grounding and refusal semantics.

  • Versioning claims for new wire format (R8)
    If the doc says int8 needs its own version stamp separate from INDEX_SCHEMA_VERSION = 2, that’s a useful constraint — but it should match what the codebase currently treats as schema vs transport/wire compatibility.

  • “Thin module / no second pipeline” claim (R6, R9)
    This is probably right, but it’s an architectural assertion with downstream consequences. Worth checking that the documented reusable units really are the ones future demo work will rely on.

Things I noticed

🟡 Yellow flags — consider for this PR or a follow-up:

  • This PR appears to add behavioral/documentation assertions without codifying any of them in tests. That’s fine for a recon log, but if any of R4/R5/R8 are intended to become implementation constraints, they could use a follow-up issue or test coverage so they don’t become stale prose.
  • Because the diff itself wasn’t included here, I can’t verify whether each recon row cites exact file/line evidence. If it doesn’t, that would make the log easier to drift from reality over time.

Good patterns

  • Good call to capture these as deltas against a specific commit instead of silently updating the spec assumptions.
  • I also like that the PR description keeps the emphasis on reusing existing pipeline pieces rather than proposing a special-case/demo-only path.

Suggested improvements

  1. For the rows that touch repo invariants (R4, R5, R8, R9), make sure each one points to the exact source-of-truth file/function/test so readers can re-verify quickly.
  2. If not already present in the new section, mark which rows are descriptive vs prescriptive. That would help separate “what the repo does today” from “what the demo should implement next.”
  3. Consider adding a short follow-up checklist or issue reference for the documented-but-pending items (for example the README fix and any version-stamp work), so the delta log stays historical instead of becoming a second backlog.

Questions for the author

  • For the R4/R5 rows, are those mappings taken directly from current eval/gold.yaml semantics and judge logic, or are any of them inferred from intended behavior?
  • Does the new section include code/test references for each recon row, or is some of it based on manual inspection without line-level anchors?

Surmado Code Review (v1.2-mt) is an automated review, designed to work alongside human judgment.

Want to change your STANDARDS.md or YML? Edit it directly, or tune it with our AI agent Scout.

Comment /rerun-review on this PR to refresh the review — costs 1 additional PR credit.

@lukefwalton lukefwalton deleted the claude/pensive-mccarthy-z937tl branch June 16, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants