Skip to content

feat(oracle): unified tier-driven corpus runner + oracle.yml (C3)#177

Merged
skakri merged 2 commits into
mainfrom
feat/oracle-unified-runner
Jun 15, 2026
Merged

feat(oracle): unified tier-driven corpus runner + oracle.yml (C3)#177
skakri merged 2 commits into
mainfrom
feat/oracle-unified-runner

Conversation

@skakri

@skakri skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member

Part of the multi-language SCIP-oracle runner epic (#164). Replaces the two per-language demo workflows with one declarative, tier-driven runner over tools/oracle-corpora.toml — the C3 piece that ties the corpus profiles (#171), the report command (#175), and the scip-python backend (#176) into a CI-runnable whole.

What

  • tools/oracle-run.sh — run the oracle for one corpus end to end: read its profile → shallow-clone the repo at its pinned rev → run its prepare steps → render the checkout's rag-rat.toml from its bindingsrag-rat index --fullrag-rat oracle report. The health gate lives in oracle report; the script propagates its exit code (non-zero on a violation) and always leaves the report JSON behind so a Δ glue script can consume even a failing run.
  • tools/oracle-corpus.py — stdlib (tomllib) reader the bash runner shells out to for the pre-index fields (repo/rev/prepare/bindings) and --list-tier for the CI matrix. oracle report reads tool/bindings/health from the same file itself — the helper is just bash's TOML eyes.
  • tools/oracle-report-bmf.py — report JSON → Bencher BMF glue for the heavy tier (rag-rat emits JSON only; Bencher/markdown shaping is a glue concern).
  • .github/workflows/oracle.yml — tier-driven:
    • small (PRs + push to main): GitHub-hosted matrix from --list-tier small, installs each corpus's SCIP tool by its tool field, runs the runner, uploads the report artifact. The health gate is the PR gate.
    • heavy (release / manual dispatch only): self-hosted big-memory box, serial (max-parallel: 1), bench image, pushed to Bencher as the headline resolution series.
  • Deletes the superseded oracle-rust.yml / oracle-kernel.yml workflows and their rust-scip-oracle.sh / kernel-c-oracle.sh scripts (full migration, no shims); updates docs/benchmarks.md.

Verification

End-to-end locally on the small tier (with the #175 oracle report binary):

corpus tool edges resolved before→after result
rust-semver rust-analyzer 1056 412 → 936 healthy, exit 0
c-cjson cmake compdb + scip-clang 3941 2742 → 3408 healthy, exit 0

Both clone → prepare → index → report → gate clean; the corpus_profile_hash matched the golden. The health gate was separately confirmed to fail non-zero on a violated threshold (in #175). bash -n, py_compile, and YAML parse all pass.

Stacking

oracle.yml's small tier needs the oracle report command (#175) and, for py-requests, the scip-python backend (#176). The script + helpers are independent of those at the file level (disjoint paths), but the workflow's py-requests leg goes green only once #176 lands and rust-*/c-* legs only once #175 lands. Merge order: #175 + #176, then this.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/oracle-unified-runner
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

🚨 2 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
rag_pipeline::pipeline::index cargo_resolver:resolver_config()Instructions
instructions x 1e6
📈 plot
🚷 threshold
🚨 alert (🔔)
800.17 x 1e6
(+39.29%)Baseline: 574.45 x 1e6
585.94 x 1e6
(136.56%)

rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()Instructions
instructions x 1e6
📈 plot
🚷 threshold
🚨 alert (🔔)
154.66 x 1e6
(+15.59%)Baseline: 133.80 x 1e6
136.47 x 1e6
(113.33%)

Click to view all benchmark results
BenchmarkEstimated Cyclescycles x 1e6InstructionsBenchmark Result
instructions x 1e6
(Result Δ%)
Upper Boundary
instructions x 1e6
(Limit %)
L1 Hitshits x 1e6LL Hitshits x 1e6RAM Hitshits x 1e3Total read+writereads/writes x 1e6
rag_pipeline::pipeline::index cargo_resolver:resolver_config()📈 view plot
⚠️ NO THRESHOLD
1,213.35 x 1e6📈 view plot
🚷 view threshold
🚨 view alert (🔔)
800.17 x 1e6
(+39.29%)Baseline: 574.45 x 1e6
585.94 x 1e6
(136.56%)

📈 view plot
⚠️ NO THRESHOLD
1,115.38 x 1e6📈 view plot
⚠️ NO THRESHOLD
17.40 x 1e6📈 view plot
⚠️ NO THRESHOLD
312.91 x 1e3📈 view plot
⚠️ NO THRESHOLD
1,133.10 x 1e6
rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()📈 view plot
⚠️ NO THRESHOLD
241.18 x 1e6📈 view plot
🚷 view threshold
🚨 view alert (🔔)
154.66 x 1e6
(+15.59%)Baseline: 133.80 x 1e6
136.47 x 1e6
(113.33%)

📈 view plot
⚠️ NO THRESHOLD
230.99 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.96 x 1e6📈 view plot
⚠️ NO THRESHOLD
11.51 x 1e3📈 view plot
⚠️ NO THRESHOLD
232.96 x 1e6
rag_pipeline::pipeline::query_warm cargo_resolver:resolver_index()📈 view plot
⚠️ NO THRESHOLD
232.82 x 1e6📈 view plot
🚷 view threshold
149.15 x 1e6
(-8.98%)Baseline: 163.86 x 1e6
167.13 x 1e6
(89.24%)
📈 view plot
⚠️ NO THRESHOLD
223.20 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.85 x 1e6📈 view plot
⚠️ NO THRESHOLD
10.11 x 1e3📈 view plot
⚠️ NO THRESHOLD
225.06 x 1e6
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/oracle-unified-runner
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
seconds (s)
(Result Δ%)
Upper Boundary
seconds (s)
(Limit %)
index_time/full_rebuild_cargo📈 view plot
🚷 view threshold
4.96 s
(+33.39%)Baseline: 3.72 s
5.77 s
(85.99%)
🐰 View full continuous benchmarking report in Bencher

skakri added a commit that referenced this pull request Jun 15, 2026
- Roll back an unhealthy run (P2): a corpus that fails its health gate had
  already committed edge_oracle/monikers/oracle_runs, so it became the
  authoritative latest run and surfaced untrustworthy Compiler verdicts in
  later status/query despite the non-zero exit. `oracle report` now rolls the
  run back (delete verdicts + monikers + the oracle_runs row) atomically inside
  the same write lock when the gate fails — new oracle::rollback_run +
  store::delete_oracle_run + IndexDatabase::rollback_oracle_run. Verified e2e:
  after a failed gate, edge_oracle/oracle_runs/monikers are empty and status
  reports no verdicts.
- Validate the checkout matches the corpus before stamping its profile (P2):
  fail closed unless the active checkout's target bindings (language -> dirs)
  exactly equal the corpus profile's bindings, so `oracle report --corpus X`
  can't stamp X's corpus_profile_hash onto a different population. New
  ensure_checkout_matches_corpus + unit test.

The third comment (honor timeout_minutes in the report run) is already handled
one layer up: timeout_minutes is the corpus wall-clock budget the runner owns
(documented on CorpusHealth), and tools/oracle-run.sh (C3, #177) wraps the
`oracle report` invocation in `timeout ${timeout_minutes}m`.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d99df400c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/oracle-run.sh Outdated
Comment on lines +94 to +95
timeout "${TIMEOUT_MINUTES}m" \
"$RAG_RAT_BIN" --json oracle report --corpus "$CORPUS" --corpora "$CORPORA" ) > "$REPORT_OUT"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply the corpus timeout to the whole run

The profile timeout is documented as the wall-clock budget for the whole corpus run, but this timeout only starts after cloning, all prepare steps, and rag-rat index --full have already completed. If cargo fetch, the kernel make -j, or indexing hangs or runs far past the corpus budget, the small PR jobs can sit until the default Actions timeout and the heavy job until its 360-minute job timeout instead of failing at the configured 8/45/120 minutes.

Useful? React with 👍 / 👎.

Comment thread tools/oracle-report-bmf.py Outdated
# Mirror the engine's vacuous-1.0 convention for an empty denominator.
return 100.0 if total == 0 else 100.0 * numerator / total

name = f"{report['corpus_id']}/oracle"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include profile identity in the Bencher series

This benchmark name is only keyed by corpus_id, so when a profile changes while keeping the same id (for example bumping rust-cargo/linux-kernel rev, bindings, or prepare steps), Bencher will append the new measurements to the old series even though tools/oracle-corpora.toml explicitly treats those reports as incomparable via corpus_profile_hash. The deleted scripts at least carried the tag in the benchmark name; this should include the rev/profile hash (and ideally tool version) or otherwise force a new series on corpus changes.

Useful? React with 👍 / 👎.

Comment thread tools/oracle-run.sh
while IFS= read -r prepare_cmd; do
[ -n "$prepare_cmd" ] || continue
echo "oracle-run: prepare> $prepare_cmd" >&2
( cd "$CHECKOUT" && bash -c "$prepare_cmd" )

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound kernel make parallelism

For the linux-kernel profile this executes the declared make -j literally; I checked GNU make --help, and -j [N] means infinite jobs when no argument is supplied. The old kernel oracle script bounded the build with -j$(nproc), so the heavy self-hosted run can now oversubscribe CPU/RAM with unbounded kernel compile jobs before it ever reaches the oracle report.

Useful? React with 👍 / 👎.

Comment thread tools/oracle-run.sh
while IFS= read -r prepare_cmd; do
[ -n "$prepare_cmd" ] || continue
echo "oracle-run: prepare> $prepare_cmd" >&2
( cd "$CHECKOUT" && bash -c "$prepare_cmd" )

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Python virtualenv for scip-python

The py-requests profile installs the package into .venv, but each prepare command runs in a child shell and the later oracle subprocess inherits none of that environment. Sourcegraph's scip-python usage notes say to activate the virtualenv before running scip-python index (https://github.com/sourcegraph/scip-python#usage), so this small-tier leg runs against the global Python environment and can emit few or no dependency monikers despite the install step.

Useful? React with 👍 / 👎.

on:
pull_request:
paths:
- 'crates/**'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Trigger the oracle on workspace dependency changes

This PR gate is meant to run when the rag-rat binary can change, but the path filter only covers crates/** and the oracle tool files. Root Cargo.toml defines workspace dependencies and Cargo.lock pins the actual dependency versions, so a dependency/profile update that touches only those root files skips the small oracle matrix entirely on pull requests and can merge parser/oracle behavior changes without this health gate.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/oracle.yml Outdated
# produced by the same indexer build the heavy/Bencher tier uses.
SCIP_CLANG_VERSION: v0.4.0
SCIP_PYTHON_VERSION: 0.6.6
RUST_ANALYZER_URL: https://github.com/rust-lang/rust-analyzer/releases/latest/download/rust-analyzer-x86_64-unknown-linux-gnu.gz

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin the rust-analyzer download

The small tier is described as a pinned toolchain, but this downloads releases/latest on every run. When rust-analyzer publishes a new build, the rust-semver PR leg can start using a different SCIP emitter than the thresholds and heavy image were validated with, producing failures or report changes unrelated to the PR while the other tools remain version-pinned.

Useful? React with 👍 / 👎.

skakri added a commit that referenced this pull request Jun 15, 2026
Six P2s:
- Whole-run timeout (oracle-run.sh): the corpus wall-clock budget wrapped only
  `oracle report`, so a hung clone/prepare/index sat until the Actions/job
  timeout. The runner now re-execs itself once under `timeout -k 60s
  <budget>m`, covering clone + prepare + index + report; an EXIT trap still
  removes the checkout on a timeout/gate-fail.
- Preserve the virtualenv (oracle-run.sh): activate a prepare-created `.venv`
  (VIRTUAL_ENV + PATH) before index/report so scip-python (pyright) resolves
  against the project's installed deps, not the global interpreter.
- Bencher series identity (oracle-report-bmf.py): the benchmark name was keyed
  only by corpus_id, so a profile/tool-version change (which makes reports
  incomparable) appended to the old series. Now keyed by
  corpus_id@<profile_hash12>+<tool_version> so an incomparable change starts a
  fresh series.
- Bound kernel make (oracle-corpora.toml): `make -j` is unlimited jobs; pinned
  to `make -j$(nproc)` so the heavy run doesn't oversubscribe the box.
  Recomputed the golden linux-kernel profile hash.
- Trigger on dep changes (oracle.yml): added root Cargo.toml/Cargo.lock to the
  PR + push path filters so a workspace-dep bump can't skip the gate.
- Pin rust-analyzer (oracle.yml): install it as a rustup component (pinned to
  the stable toolchain) instead of downloading releases/latest each run.
@skakri

skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Addressed in 202cbf0 — all six.

P2 — Apply the corpus timeout to the whole run ✅ The runner now re-execs itself once under timeout -k 60s ${timeout_minutes}m, so the budget covers clone + every prepare step + index + report, not just the report. An EXIT trap still removes the checkout on a timeout/gate-fail. Verified e2e (rust-semver: single re-exec, "budget 8m", completes healthy).

P2 — Preserve Python virtualenv for scip-python ✅ After the prepare steps, the runner activates a .venv it finds in the checkout (VIRTUAL_ENV + PATH) before indexing/reporting, so scip-python (pyright) resolves against the project's installed deps rather than the global interpreter.

P2 — Include profile identity in the Bencher seriesoracle-report-bmf.py now names the benchmark {corpus_id}@{profile_hash[:12]}+{tool_version}/oracle (was {corpus_id}/oracle), so a profile change (rev/bindings/prepare/threshold/tool) or a SCIP-indexer bump — both of which make reports incomparable — starts a fresh series instead of appending to the old one.

P2 — Bound kernel make parallelismtools/oracle-corpora.toml now uses make -j$(nproc) (was make -j, i.e. unlimited jobs). Recomputed the golden linux-kernel profile hash to match.

P2 — Trigger the oracle on workspace dependency changes ✅ Added root Cargo.toml and Cargo.lock to the pull_request + push path filters, so a workspace-dep/lock bump that changes the built binary can't skip the small-tier gate.

P2 — Pin the rust-analyzer download ✅ rust-analyzer is now installed as a rustup component (dtolnay/rust-toolchain@stable with components: rust-analyzer, resolved via rustup which), pinning the SCIP emitter to the stable toolchain (~6-week cadence) instead of releases/latest (weekly).

bash -n, py_compile, YAML parse, cargo +nightly fmt --check, clippy, and the core-oracle suite (incl. the recomputed golden) are all green.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 202cbf0a73

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/oracle-run.sh
echo "oracle-run: oracle report --corpus $CORPUS" >&2
set +e
( cd "$CHECKOUT" && RAG_RAT_COMMIT="$RAG_RAT_COMMIT" \
"$RAG_RAT_BIN" --json oracle report --corpus "$CORPUS" --corpora "$CORPORA" ) > "$REPORT_OUT"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add the missing oracle report CLI before invoking it

In this tree the oracle CLI only exposes run and status (crates/rag-rat-cli/src/cli.rs), so every small and heavy corpus reaches this line and then exits with an unknown report subcommand after cloning/indexing. The new workflow depends on this runner for all corpora, so the PR gate and Bencher path cannot produce a valid report until the report subcommand is included or this calls an existing command.

Useful? React with 👍 / 👎.

Comment on lines +127 to +129
scip-python)
npm install -g "@sourcegraph/scip-python@${SCIP_PYTHON_VERSION}"
scip-python --version ;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add scip-python support before running py-requests

For the py-requests small-matrix leg, this branch installs scip-python, but the checked-in oracle registry still only accepts rust-analyzer and scip-clang (OracleTool::ALL/from_db_str), so once the report command exists that leg cannot map the profile's tool = "scip-python" to a runnable backend and will fail the required PR matrix. Either exclude the Python corpus here or land the scip-python backend with the workflow that starts scheduling it.

Useful? React with 👍 / 👎.

@skakri

skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

The two new P1s ("add the oracle report CLI before invoking it" and "add scip-python support before running py-requests") are both the same PR-stack ordering point, and they're correct: this PR is the top of a 3-PR stack and isn't meant to merge standalone.

Merge order is #175#176#177: once the first two land on main, I rebase #177 onto main and its tree has both the report subcommand and the scip-python backend, so the small matrix (incl. the py-requests leg) and the heavy/Bencher path resolve. Until then #177's own CI is red purely because its dependencies aren't in this branch — the stacking artifact, not a defect in the runner. I'd rather not duplicate #175/#176 into this PR or temporarily exclude the Python corpus (scaffolding I'd just have to revert).

The earlier P2 round (pin rust-analyzer, trigger on Cargo.toml/Cargo.lock, preserve the venv, bound kernel make -j$(nproc), profile-identity in the Bencher series) is addressed in 202cbf0 — those threads can be resolved.

skakri added a commit that referenced this pull request Jun 15, 2026
… C2 resolution report (C2-CLI) (#175)

* feat(oracle): `oracle report --corpus <id>` — run a corpus + emit its C2 resolution report (C2-CLI)

Adds the CLI surface over the C2 report contract: load a corpus profile from
tools/oracle-corpora.toml, run the oracle (produce a .scip with the corpus's
tool, or consume a pre-built --scip), assemble the typed OracleResolutionReport,
and emit it as JSON/TOON. Then apply the per-corpus health gate — a violated
threshold exits non-zero even when the oracle command itself succeeded.

- The report is printed to stdout unconditionally, before the gate, so a Δ glue
  script can consume it even for a failing run; violations go to stderr.
- Unlike `oracle run`, a missing/unrunnable tool is a hard error here (not the
  exit-0 Blocked UX): this is a measurement runner over a corpus whose tool CI
  is expected to have installed, so a silent skip must not pass green.
- run + resolution_report run under one write lock; .scip production stays
  outside it (the #82 P3 lock-free-production posture).
- rag_rat_commit provenance reads $RAG_RAT_COMMIT (CI's git SHA), falling back
  to the crate version off CI.

* fix(oracle): address Codex review on the report command (#175)

- Roll back an unhealthy run (P2): a corpus that fails its health gate had
  already committed edge_oracle/monikers/oracle_runs, so it became the
  authoritative latest run and surfaced untrustworthy Compiler verdicts in
  later status/query despite the non-zero exit. `oracle report` now rolls the
  run back (delete verdicts + monikers + the oracle_runs row) atomically inside
  the same write lock when the gate fails — new oracle::rollback_run +
  store::delete_oracle_run + IndexDatabase::rollback_oracle_run. Verified e2e:
  after a failed gate, edge_oracle/oracle_runs/monikers are empty and status
  reports no verdicts.
- Validate the checkout matches the corpus before stamping its profile (P2):
  fail closed unless the active checkout's target bindings (language -> dirs)
  exactly equal the corpus profile's bindings, so `oracle report --corpus X`
  can't stamp X's corpus_profile_hash onto a different population. New
  ensure_checkout_matches_corpus + unit test.

The third comment (honor timeout_minutes in the report run) is already handled
one layer up: timeout_minutes is the corpus wall-clock budget the runner owns
(documented on CorpusHealth), and tools/oracle-run.sh (C3, #177) wraps the
`oracle report` invocation in `timeout ${timeout_minutes}m`.

* fix(oracle): reject custom target filters in oracle report's corpus check (#175)

Codex follow-up: ensure_checkout_matches_corpus compared only language ->
directory set, so a [[target]] with the same language+dirs but custom
include/exclude filters slipped through — the indexer applies those filters, so
the report could stamp the corpus_profile_hash onto a filtered subset/superset.
Now also require each target to carry the simple [target_bindings] form's
default filters (include = ["**/*.<ext>"], empty exclude); any custom filter
fails closed. Extended the unit test with custom-exclude and narrowed-include
cases.

* fix(oracle): make oracle report's run provisional, not rollback-after-commit (#175)

Codex round 2 found my post-hoc rollback couldn't restore state: run::run's
authoritative clear destroys the prior (tool,version) verdicts + the tool's
monikers at the run's START, so deleting the failed run afterward left a prior
healthy run with no verdicts/monikers (NoData), and the version-keyed delete
could erase a prior healthy same-version run.

Fix: the report path no longer commits-then-maybe-deletes. run::run is split
into run() (commit-on-success wrapper) + run_in_tx() (the body). New
oracle::run_oracle_report runs run_in_tx + report assembly + the health gate
inside ONE transaction and commits ONLY if healthy; an unhealthy run drops the
transaction, rolling back the whole pass INCLUDING the authoritative clear — so
the previous healthy run's verdicts/monikers/run-row are fully preserved.
Removes rollback_run / delete_oracle_run / rollback_oracle_run / the
finalize_corpus_report helper.

Verified e2e: a healthy run then a gate-failing run on the same checkout leaves
the healthy run's verdicts intact (oracle status still reports them).
skakri added 2 commits June 16, 2026 00:36
Replaces the per-language oracle-rust.yml / oracle-kernel.yml demos with one
declarative, tier-driven runner over tools/oracle-corpora.toml (#164, C3).

- tools/oracle-run.sh: run the oracle for ONE corpus end to end — read its
  profile, shallow-clone the repo at its pinned rev, run its prepare steps,
  index it with rag-rat, then `rag-rat oracle report` (which runs the oracle +
  applies the per-corpus health gate). Exits non-zero on a health violation
  while still writing the report JSON, so a Δ glue script can consume it.
- tools/oracle-corpus.py: stdlib (tomllib) reader the bash runner shells out to
  for the pre-index fields (repo/rev/prepare/bindings) + tier corpus listing.
- tools/oracle-report-bmf.py: report JSON -> Bencher BMF glue for the heavy tier
  (rag-rat emits JSON only; presentation/Bencher shaping is a glue concern).
- .github/workflows/oracle.yml: small tier on PRs + main (GitHub-hosted matrix,
  per-corpus tool install, report artifact, health gate as the PR gate); heavy
  tier on release/dispatch (self-hosted bigmem, serial, pushed to Bencher).
- Deletes the superseded oracle-rust.yml/oracle-kernel.yml workflows and their
  rust-scip-oracle.sh/kernel-c-oracle.sh scripts; updates docs/benchmarks.md.

Verified end to end locally on the small tier: rust-semver (rust-analyzer,
1056 edges resolved 412->936) and c-cjson (cmake compdb + scip-clang, 3941
edges resolved 2742->3408) both run clean through the runner and pass the gate.
Six P2s:
- Whole-run timeout (oracle-run.sh): the corpus wall-clock budget wrapped only
  `oracle report`, so a hung clone/prepare/index sat until the Actions/job
  timeout. The runner now re-execs itself once under `timeout -k 60s
  <budget>m`, covering clone + prepare + index + report; an EXIT trap still
  removes the checkout on a timeout/gate-fail.
- Preserve the virtualenv (oracle-run.sh): activate a prepare-created `.venv`
  (VIRTUAL_ENV + PATH) before index/report so scip-python (pyright) resolves
  against the project's installed deps, not the global interpreter.
- Bencher series identity (oracle-report-bmf.py): the benchmark name was keyed
  only by corpus_id, so a profile/tool-version change (which makes reports
  incomparable) appended to the old series. Now keyed by
  corpus_id@<profile_hash12>+<tool_version> so an incomparable change starts a
  fresh series.
- Bound kernel make (oracle-corpora.toml): `make -j` is unlimited jobs; pinned
  to `make -j$(nproc)` so the heavy run doesn't oversubscribe the box.
  Recomputed the golden linux-kernel profile hash.
- Trigger on dep changes (oracle.yml): added root Cargo.toml/Cargo.lock to the
  PR + push path filters so a workspace-dep bump can't skip the gate.
- Pin rust-analyzer (oracle.yml): install it as a rustup component (pinned to
  the stable toolchain) instead of downloading releases/latest each run.
@skakri skakri force-pushed the feat/oracle-unified-runner branch from 202cbf0 to e9226e3 Compare June 15, 2026 21:38
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@skakri skakri merged commit a69f06f into main Jun 15, 2026
15 checks passed
@skakri skakri deleted the feat/oracle-unified-runner branch June 15, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant