Skip to content

feat(oracle): scip-python backend — Python compiler-grade resolution (B6)#176

Merged
skakri merged 3 commits into
mainfrom
feat/scip-python-oracle
Jun 15, 2026
Merged

feat(oracle): scip-python backend — Python compiler-grade resolution (B6)#176
skakri merged 3 commits into
mainfrom
feat/scip-python-oracle

Conversation

@skakri

@skakri skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member

Part of the multi-language SCIP-oracle runner epic (#164). Adds the B6 scip-python backend so the SCIP oracle resolves Python edges to the Compiler tier — unblocking oracle report --corpus py-requests (the Python corpus was already declared in tools/oracle-corpora.toml from #171, and #175 added the report command; this makes its tool runnable).

What

OracleTool::ScipPython added to the oracle tool registry:

  • Manifest entry: program scip-python, language python, install hint.
  • Capability probe: scip-python index --help exits 0 (the analog of rust-analyzer's scip --help check); --version reports the tool version.
  • Invocation: scip-python index --project-name <root-basename> --cwd <root> --output <abs>. The project name gives in-corpus symbols a non-empty moniker package (what lets count_symbols_with_moniker see them); --cwd is where scip-python resolves the project + its installed deps.
  • No compile_commands.json-style prerequisite: installing deps is the corpus prepare step's job, and a failed environment shows up as a near-zero moniker count the report health gate already catches — so prerequisite_blocked is None.
  • CLI --tool scip-python + exhaustive enum coverage; argv unit test.

The SCIP occurrence→edge join is language-agnostic (keyed on document paths + positions), so no join changes were needed.

Operational notes (documented in docs/oracle.md)

  • scip-python derives --project-version from the git revision, so it expects a git checkout — real corpora are cloned at a pinned rev, so this is fine; a non-git dir makes it crash on an undefined version.
  • Imports resolve against installed dependencies (a venv), so cross-package edges need the deps importable.

Verification

  • New manifest argv test scip_python_indexes_a_cwd_with_a_project_name; the existing exhaustive every_tool_has_a_manifest_entry now covers the third variant.
  • Live probe: oracle status --tool scip-pythonavailable 0.6.6.
  • End-to-end on a small git-initialized Python project (3 files, 8 edges): 5 edges examined → 4 confirmed, 1 resolved-external, 4 monikers written, verdicts persisted.
  • cargo +nightly fmt --check, cargo clippy --all-targets, and the oracle core + cli suites are green.

Stacking

Independent of #175 (C2-CLI) at the code level — both branch off main and touch disjoint code. Either can merge first; once both land, oracle report --corpus py-requests is fully runnable.

…(B6)

Adds OracleTool::ScipPython to the oracle tool registry so the SCIP oracle
resolves Python edges (#164 B6, unblocks `oracle report --corpus py-requests`).

- Manifest entry: program `scip-python`, language `python`, install hint.
- Capability probe: `scip-python index --help` exits 0 (analog of rust-analyzer's
  `scip --help`); `--version` reports the tool version.
- Invocation: `scip-python index --project-name <root-basename> --cwd <root>
  --output <abs>`. The project name gives in-corpus symbols a non-empty moniker
  package; `--cwd` is where it resolves the project + installed deps.
- No compile_commands.json-style prerequisite: the deps install is the corpus
  `prepare` step's job, and a failed environment shows up as a near-zero moniker
  count the report health gate already catches.
- CLI `--tool scip-python` + exhaustive enum coverage; argv unit test.

The SCIP occurrence->edge join is language-agnostic, so no join changes were
needed. Verified end-to-end on a small Python project: 5 edges examined, 4
confirmed, 1 resolved-external, 4 monikers written.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.52459% with 14 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/rag-rat-cli/src/main.rs 0.00% 6 Missing ⚠️
crates/rag-rat-core/src/index/oracle/manifest.rs 88.63% 5 Missing ⚠️
crates/rag-rat-core/src/index/query_api/graph.rs 96.36% 2 Missing ⚠️
crates/rag-rat-cli/src/cli.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffc151677a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// `scip-python index` — Python (#164 B6). Resolves imports against the project's INSTALLED
/// dependencies, so the corpus must install them (a venv) first; an unresolved environment
/// shows up as a near-zero moniker count the report's health gate catches.
ScipPython,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route scip-python through graph surfacing

When users run rag-rat oracle run --tool scip-python on a Python repo, the verdicts are persisted under the new scip-python tool, but the graph read paths still only query OracleTool::RustAnalyzer in enrich_hops_with_oracle and compare_graph_to_scip (crates/rag-rat-core/src/index/query_api/graph.rs). That means the new backend can complete and show up in status while trace_callees/find_callers/SCIP comparisons never see its rows, so Python edges do not get the advertised compiler tier. Please route surfacing/comparison through the tool registry or select the oracle tool for the edge language before exposing this variant.

Useful? React with 👍 / 👎.

Comment on lines +183 to +185
cmd.arg("index")
.arg("--project-name")
.arg(project_name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin scip-python project version

This invocation never passes --project-version, so scip-python falls back to the checkout's git revision (the new docs note this behavior). Because rag-rat stores the full SCIP symbol string in logical_symbol_monikers and later resolves memories by exact moniker for the same tool, every commit changes all Python monikers even when symbols did not move, breaking the advertised stable memory relocation for scip-python runs. Pass a stable project version (for example the conventional _) when invoking the indexer.

Useful? React with 👍 / 👎.

/// Every known oracle tool, for "report on all tools" surfaces (`oracle status` with no
/// `--tool`). Later language backends (#72 Kotlin) extend this alongside the enum.
pub const ALL: &[OracleTool] = &[Self::RustAnalyzer, Self::ScipClang];
pub const ALL: &[OracleTool] = &[Self::RustAnalyzer, Self::ScipClang, Self::ScipPython];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter auto-run tools by configured language

Adding ScipPython to OracleTool::ALL also feeds the MCP background auto-run loop, which iterates OracleTool::ALL without checking config.targets. In an auto-run repo where scip-python is installed but Python is not indexed, the server will still invoke the Python indexer against the checkout; if it fails, the error is swallowed and no oracle_runs row is recorded, so the loop retries on every poll. Keep the status registry broad, but gate background runs to tools whose languages are present in the config.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/scip-python-oracle
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

🚨 2 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
rag_pipeline::pipeline::index cargo_resolver:resolver_config()Instructions
instructions x 1e6
📈 plot
🚷 threshold
🚨 alert (🔔)
800.96 x 1e6
(+40.05%)Baseline: 571.89 x 1e6
583.33 x 1e6
(137.31%)

rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()Instructions
instructions x 1e6
📈 plot
🚷 threshold
🚨 alert (🔔)
154.52 x 1e6
(+15.92%)Baseline: 133.30 x 1e6
135.96 x 1e6
(113.65%)

Click to view all benchmark results
BenchmarkEstimated Cyclescycles x 1e6InstructionsBenchmark Result
instructions x 1e6
(Result Δ%)
Upper Boundary
instructions x 1e6
(Limit %)
L1 Hitshits x 1e6LL Hitshits x 1e6RAM Hitshits x 1e3Total read+writereads/writes x 1e6
rag_pipeline::pipeline::index cargo_resolver:resolver_config()📈 view plot
⚠️ NO THRESHOLD
1,214.51 x 1e6📈 view plot
🚷 view threshold
🚨 view alert (🔔)
800.96 x 1e6
(+40.05%)Baseline: 571.89 x 1e6
583.33 x 1e6
(137.31%)

📈 view plot
⚠️ NO THRESHOLD
1,116.31 x 1e6📈 view plot
⚠️ NO THRESHOLD
17.47 x 1e6📈 view plot
⚠️ NO THRESHOLD
309.53 x 1e3📈 view plot
⚠️ NO THRESHOLD
1,134.09 x 1e6
rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()📈 view plot
⚠️ NO THRESHOLD
240.97 x 1e6📈 view plot
🚷 view threshold
🚨 view alert (🔔)
154.52 x 1e6
(+15.92%)Baseline: 133.30 x 1e6
135.96 x 1e6
(113.65%)

📈 view plot
⚠️ NO THRESHOLD
230.81 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.95 x 1e6📈 view plot
⚠️ NO THRESHOLD
11.74 x 1e3📈 view plot
⚠️ NO THRESHOLD
232.77 x 1e6
rag_pipeline::pipeline::query_warm cargo_resolver:resolver_index()📈 view plot
⚠️ NO THRESHOLD
232.96 x 1e6📈 view plot
🚷 view threshold
149.14 x 1e6
(-9.07%)Baseline: 164.02 x 1e6
167.30 x 1e6
(89.14%)
📈 view plot
⚠️ NO THRESHOLD
223.15 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.87 x 1e6📈 view plot
⚠️ NO THRESHOLD
13.20 x 1e3📈 view plot
⚠️ NO THRESHOLD
225.03 x 1e6
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/scip-python-oracle
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
seconds (s)
(Result Δ%)
Upper Boundary
seconds (s)
(Limit %)
index_time/full_rebuild_cargo📈 view plot
🚷 view threshold
4.84 s
(+30.31%)Baseline: 3.72 s
5.77 s
(84.00%)
🐰 View full continuous benchmarking report in Bencher

- Route surfacing through ALL backends, not just rust-analyzer (P2): graph read
  paths hardcoded OracleTool::RustAnalyzer, so scip-python (and scip-clang)
  verdicts never surfaced as the Compiler tier in find_callers/trace_callees or
  compare_graph_to_scip. New oracle::latest_runs_in_scope enumerates every tool
  with a run in the checkout; enrich_hops_with_oracle merges their verdicts and
  compare_graph_to_scip aggregates across them (verdict sets are disjoint by edge
  language). Regression test: a scip-clang verdict now surfaces + the report
  names the contributing tool.
- Pin scip-python --project-version _ (P2): it otherwise defaults to the git
  revision, which is embedded in every SCIP symbol, so every commit churned all
  Python monikers and broke moniker-anchored memory relocation. The constant
  keeps monikers stable across commits (and fixes the crash on a non-git
  checkout, verified e2e).
- Gate background auto-run by configured language (P2): the MCP auto-run loop
  iterated OracleTool::ALL without checking config.targets, so scip-python
  installed + Python not indexed would invoke the Python indexer every poll,
  fail, swallow the error, record no run row, and retry forever. Now skips a
  backend whose language the checkout doesn't index (status registry stays
  broad).
@skakri

skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Addressed in 84cc401.

P2 — Route scip-python through graph surfacing ✅ Fixed (and it covered the pre-existing scip-clang gap too). enrich_hops_with_oracle and compare_graph_to_scip no longer hardcode OracleTool::RustAnalyzer — new oracle::latest_runs_in_scope enumerates every backend with a run in the checkout, enrich_hops_with_oracle merges their verdicts, and compare_graph_to_scip aggregates across them (verdict sets are disjoint by edge language, so merging can't collide; the report's tool/tool_version now name all contributing backends). Added a regression test (compare_graph_to_scip_surfaces_non_rust_analyzer_tools) asserting a scip-clang verdict surfaces and the report names the tool.

P2 — Pin scip-python project version ✅ Fixed. scip_command now passes --project-version _, so the SCIP symbol strings (and the monikers persisted from them) stay stable across commits, preserving moniker-anchored memory relocation. Bonus: this also fixes scip-python's crash on a non-git checkout (the git-rev default was undefined there) — verified end-to-end on a non-git Python project (4 edges examined, 3 confirmed, 3 monikers, exit 0). Manifest argv test + docs updated.

P2 — Filter auto-run tools by configured language ✅ Fixed. The MCP background auto-run loop now skips any backend whose languages aren't in config.targets before probing/deciding — so scip-python installed without a Python target no longer spawns a doomed indexer (and silently retries) every poll. The status registry (oracle status, which lists all tools) stays broad; only background runs are gated.

cargo +nightly fmt --check, clippy --all-targets, and the cli + core-oracle suites (incl. the new surfacing test) are green.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if summary.verdicts_examined == 0 {

P2 Badge Preserve empty-run warnings per backend

In a mixed-language checkout, this aggregate check only warns when every backend has zero current verdicts. If, for example, rust-analyzer has valid verdicts but the new scip-python run produced 0 in-scope rows because the Python indexer saw the wrong paths or all Python files drifted, summary.verdicts_examined is still nonzero and compare_graph_to_scip silently reports no issue for the empty Python backend. Track the per-tool comparison count and warn for each run that contributes 0 rows so one healthy backend does not mask a broken one.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// `scip-python index` — Python (#164 B6). Resolves imports against the project's INSTALLED
/// dependencies, so the corpus must install them (a venv) first; an unresolved environment
/// shows up as a near-zero moniker count the report's health gate catches.
ScipPython,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count Python constructor references in recall

Enabling ScipPython sends Python runs through the existing recall-gap code, but count_uncovered_calls only considers SCIP symbols ending in ). callable via scip::symbol_is_callable. scip-python represents class constructor call references such as ClientSession() as references to the class/type symbol (...ClientSession#), while our Python extractor explicitly emits these as CallsName, so constructor calls that the heuristic misses never increment oracle_only_calls; this inflates Python recall and can make the new py-requests oracle report look healthier than it is. Please make the callable filter language/tool-aware before treating scip-python reports as compiler-grade metrics.

Useful? React with 👍 / 👎.

…176)

Codex follow-up: the recall numerator (covered_calls) counted a calls_name edge
matched to ANY SCIP occurrence, while the denominator's uncovered side requires
scip::symbol_is_callable (a `).` method/function symbol). scip-python records a
constructor call (`Foo()`, which our extractor emits as CallsName) as a
reference to the CLASS symbol (`Foo#`), so a covered constructor counted while a
MISSED one — invisible to the callable-filtered uncovered side — never offset
it, inflating Python recall.

Fix: the covered side now also requires the matched symbol be callable, so both
sides measure the same population (`).` method/function calls). Constructors are
out of recall entirely because SCIP can't distinguish a constructor call from a
type annotation of the same `Foo#` symbol. Rust/C are unaffected (their
calls_name edges already match `).` symbols). Regression test added.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@skakri

skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

P2 — Count Python constructor references in recall ✅ Addressed in 40d12d2 — by making recall's two sides use the same callable population, which fixes the inflation you identified.

Root cause confirmed: covered_calls counted a calls_name edge matched to any SCIP occurrence, while count_uncovered_calls (the denominator) requires scip::symbol_is_callable ().). scip-python records a constructor call (Foo(), which our extractor emits as CallsName) as a reference to the class symbol (Foo#), so a covered constructor counted while a missed one — invisible to the callable-filtered uncovered side — never offset it → inflated recall.

The covered side now also requires symbol_is_callable, so both sides measure the same population (). method/function calls). I went with exclude-on-both-sides rather than include constructors because accurate inclusion isn't possible from SCIP alone: a Foo# occurrence is identical whether it's a constructor call (Foo()) or a type annotation (x: Foo), so counting # as callable on the uncovered side would over-count type refs as missed calls and deflate recall. Confining recall to ). callables is the well-defined, symmetric metric; constructor-call recall would need a call-role SCIP doesn't carry. Rust/C are unaffected (their calls_name edges already match ). symbols). Regression test covered_side_requires_a_callable_scip_symbol added.

@skakri skakri merged commit 321c7cb into main Jun 15, 2026
10 checks passed
@skakri skakri deleted the feat/scip-python-oracle branch June 15, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant