Skip to content

Add reproducible int8 scaling demo#16

Merged
lukefwalton merged 9 commits into
mainfrom
demo/int8-gate-calibration
Jun 16, 2026
Merged

Add reproducible int8 scaling demo#16
lukefwalton merged 9 commits into
mainfrom
demo/int8-gate-calibration

Conversation

@lukefwalton

Copy link
Copy Markdown
Owner

Summary

  • Adds the public-domain demo/ corpus, gold suites, committed vectors, and keyless int8/int4 harness for the scaling-demo artifact.
  • Calibrates the synthetic spire so int8 holds while int4 flips exactly the two engineered route near-ties, with the same suite catching both failures.
  • Documents provenance, boost findings, keyless reproduction, --full scope, and the demo/ vs historical scaling/ path bridge.

Test plan

  • npm test
  • npm run typecheck
  • env -u OPENAI_API_KEY npm run demo:run -- --natural
  • env -u OPENAI_API_KEY npm run demo:run -- --natural+synthetic
  • env -u OPENAI_API_KEY npm run demo:run -- --natural+synthetic --bits 4 (expected nonzero rejection: 7/9 held, two synthetic route flips caught)
  • Detached cold worktree at 56cd946, no .env: npm run demo:run -- --natural reproduced 7/7 keylessly

@surmado-code-review

surmado-code-review Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Automated Checks (advisory, non-blocking)

  • Hallucinated packagenode:test used in demo/artifacts.test.ts does not exist on npm. This package appears to be invented by AI.
    ✅ No other issues detected.

⚠️ Partial review: The diff exceeded size limits. Feedback below focuses on early files and high-level risks only.

Standards Compliance

The previously flagged node:test package issue appears fixed in this version.

I don’t see a clear must-fix standards violation in the visible diff, but this PR is making a very standards-sensitive exception that the reviewer should verify carefully:

  • demo/corpus/index.json and other committed demo/ artifacts

    • The PR commits full record text plus vectors under demo/.
    • That would normally conflict with Security & Performance §5: “Don't leak private embeddings/text into committed artifacts.”
    • From the PR description and the earlier docs/test summary, the intent is a narrow public-domain demo exception for keyless reproducibility. That can be okay, but only if it is mechanically fenced to:
      1. demo/ only,
      2. an exact allowlisted public-domain source set,
      3. explicitly flagged synthetic entries only.
  • [UNVERIFIED] .github/STANDARDS.md + demo/artifacts.test.ts

    • The summary says the standards text and new artifact tests now codify that exception.
    • Reviewer should confirm the wording does not relax the general no-leak boundary outside demo/, and that the tests fail on any corpus expansion beyond the intended demo records/synthetic fixtures.
    • This matters because the repo standards are explicit that committed artifact leakage is normally forbidden, and exceptions here should not become a precedent for non-demo data.

Summary

This PR adds a reproducible demo/ package: committed demo corpus/index/query vectors, gold suites, offline artifact tests, and documentation for keyless int8/int4 reproduction. The main review question is less the quantization demo itself and more whether the new committed-artifact exception is tightly constrained and regression-tested.
Reviewer: most of the risk is in the demo/ artifact guardrails (demo/artifacts.test.ts, committed corpus/vector files, and the standards wording) — the rest looks like supporting demo/docs work.

What to pay attention to

  • demo/artifacts.test.ts
    This is the critical safeguard. It should make accidental scope creep obvious by rejecting any new source IDs/types outside the intended demo set.

  • .github/STANDARDS.md
    Check that the exception is phrased as a special case for public-domain demo material, not a broader softening of the “don’t commit embeddings/text” rule.

  • Build/tooling paths that write demo artifacts
    The highest practical risk is not the current committed JSON, but whether rebuild scripts could accidentally serialize non-demo corpus/index data into those tracked files.

Things I noticed

🟡 Yellow flags — consider for this PR or a follow-up:

  • Because the committed artifacts are huge, reviewers will depend almost entirely on demo/artifacts.test.ts; if that test is not extremely explicit about unexpected sources/IDs, drift could be easy to miss.
  • [UNVERIFIED] Given the repo’s normal artifact-leak rule, it would be valuable to ensure rebuild tooling refuses to write anything except the intended demo/ public-domain/synthetic set into tracked demo files.

Good patterns

  • Adding offline artifact integrity tests aligns well with the repo’s “tests/eval as regression contract” standard.
  • The keyless reproduction path and the explicit “int8 holds / int4 flips” expectation make the demo behavior reviewable without requiring live API access.

Suggested improvements

  1. Make demo/artifacts.test.ts failure output enumerate any unexpected source IDs/types so reviewers don’t have to inspect giant JSON blobs manually.
  2. If not already present, add a guard in the demo build path that hard-fails if any non-demo/ or non-allowlisted records are about to be written to committed artifacts.
  3. Keep one small reviewer-readable manifest of the approved demo source set, rather than forcing policy review through large generated files.

Questions for the author

  • Can you confirm the new artifact tests reject any committed corpus/vector expansion beyond the exact demo allowlist?
  • Is there any rebuild path that could accidentally write non-demo corpus/index content into these committed demo/ artifacts?

Surmado Code Review (v1.2-mt) is an automated review, designed to work alongside human judgment.

Want to change your STANDARDS.md or YML? Edit it directly, or tune it with our AI agent Scout.

Comment /rerun-review on this PR to refresh the review — costs 1 additional PR credit.

@lukefwalton lukefwalton merged commit 4e690d5 into main Jun 16, 2026
2 of 3 checks passed
@lukefwalton lukefwalton deleted the demo/int8-gate-calibration branch June 16, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant