Skip to content

feat(resolve): Python from-import alias resolution scope (#174)#179

Open
skakri wants to merge 2 commits into
mainfrom
feat/python-import-alias-scope
Open

feat(resolve): Python from-import alias resolution scope (#174)#179
skakri wants to merge 2 commits into
mainfrom
feat/python-import-alias-scope

Conversation

@skakri

@skakri skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member

Fixes #174. from models import User as Account then Account() now resolves the alias use to the imported User, instead of staying unresolved or binding to an unrelated local Account.

Approach

Mirrors the Rust per-module use-scope machinery (#61), but where Rust's is_external_import only suppresses a local same-named symbol, Python needs to rebind the alias to its in-corpus target. Three layers:

  1. Extraction (python_import_target, gated to from-imports): an aliased_import emits its Imports edge to the target (to_name = User — keeps the in-corpus dependency) via file_edge_scoped, carrying the alias in evidence and a whole-file ImportScopeRange (Python imports are module/file-global). The edge is the alias carrier. Module aliases (import x as m) are left alone — they're a qualified-resolution problem, out of scope.
  2. Scope build (ImportScope): a new python_aliases map (alias → target, byte-scoped) with add_python_alias / python_alias_target. The import-edge load branches on files.language, so Python imports register aliases instead of feeding add_use (which parses Rust use text).
  3. Rebind (resolve_symbol callers): before resolution, a non-import reference whose name is an in-scope alias resolves under the target name. The stored to_name stays the surface alias; only to_symbol_id becomes the target's.

Applied in both resolution drivers (incremental resolve_all_edges and the full-rebuild CompactEdge loop) per the #61 both-driver-parity rule — index --full uses the full-rebuild path.

Verification

  • New unit test python_from_import_alias_rebinds_to_the_imported_target (resolve/tests.rs): with a target models.User and a decoy local other.Account, the Account() edge binds to User.
  • Manual e2e on a temp project confirmed the same (resolved to models.py::User, not the decoy).
  • Full rag-rat-core suite green; cargo +nightly fmt --check + clippy clean. The import target itself was already captured, so existing import/resolution tests are unaffected.

Scope

Symbol aliases via from x import T as A only; whole-file scope (a safe over-approximation). Module aliases (import x as m; m.f()) remain a follow-up — sibling of the already-closed #172.

`from m import T as A` then `A()` now resolves the alias use to the imported
`T`, instead of staying unresolved or binding to an unrelated local `A`. Mirrors
the Rust per-module use-scope machinery (#61) but REBINDS the alias to its
in-corpus target rather than only flagging it external.

- Extraction (python_import_target): a `from`-import `aliased_import` emits its
  Imports edge to the target (to_name=T, keeps the dependency) via
  file_edge_scoped, carrying the alias in `evidence` + a whole-file import scope.
  Module aliases (`import x as m`) are left alone (qualified-resolution problem).
- ImportScope gains a python_aliases map (alias -> target, byte-scoped) +
  add_python_alias / python_alias_target; the import-edge load branches on
  language so Python imports register aliases instead of feeding add_use.
- Resolution rebinds: before resolve_symbol, a non-import reference whose name is
  an in-scope alias resolves under the target name. Applied in BOTH the
  incremental and full-rebuild drivers (#61 both-driver parity).

Verified e2e (alias resolves to the import target, not a decoy local) + unit
test in resolve/tests.rs.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.59664% with 10 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/rag-rat-core/src/index/edges/resolve/mod.rs 85.00% 9 Missing ⚠️
crates/rag-rat-core/src/index/edges/extract.rs 96.55% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

SCIP oracle — resolution report

Heuristic→compiler edge resolution per corpus. Δ compares resolved-after to the main baseline (only when the corpus profile + tool version match).

corpus tool edges resolved (heuristic → compiler) precision recall monikers Δ vs main
c-cjson scip-clang 3941 69.6% → 86.5% 88.9% 80.8% 912 +0.0pp
py-requests scip-python 1060 32.9% → 71.7% 89.5% 86.8% 292 +0.0pp
rust-semver rust-analyzer 1056 39.0% → 88.6% 79.6% 100.0% 78 +0.0pp

resolved = Exact/Syntactic + compiler upgrades + resolved-external, over edge candidates with a callee range. precision/recall are the oracle eval metrics.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9ff258aed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if candidate.edge_kind == EdgeKind::Imports {
let evidence = arena.get_opt(candidate.evidence).unwrap_or("");
import_scope.add_use(*file_id, evidence, candidate.import_scope_range());
if index.file_language.get(file_id).copied() == Some(Language::Python.as_str()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use file language when building full-rebuild aliases

In the full-rebuild path this checks index.file_language, but that map is populated only from files that contributed at least one symbol. A common Python entrypoint with only from models import User as Account and Account() has no symbol row for its own file, so this branch is skipped under index --full; the alias carrier feeds neither add_python_alias nor add_use, and the call remains unresolved or binds an unrelated Account. The incremental driver joins files.language, so the two paths diverge here.

Useful? React with 👍 / 👎.

Comment on lines +852 to +853
let scope =
ImportScopeRange { scope_start: 0, scope_end: text.len(), mod_id: MOD_FILE_ROOT };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't give function-local Python imports file-wide scope

When a from ... import ... as ... appears inside a function or class body, Python binds the alias only in that local/class scope, but this records the alias as covering the entire file. In that scenario, any unrelated top-level or sibling-scope reference with the same name (including references before the import statement) is now rebound to the imported target and can override the correct local symbol. The scope needs to be derived from the containing Python scope/order, or non-module imports should be skipped.

Useful? React with 👍 / 👎.

Comment on lines +179 to +180
if let (Some(alias), Some(target)) = (evidence, to_name) {
import_scope.add_python_alias(file_id, alias, target, scope);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the imported module when rebinding aliases

This registers alias -> target using only the imported symbol's short name, discarding the from module that selected the target. If app.py contains from models import User as Account and also defines its own User, a later Account() is resolved as bare User; resolve_symbol can then take the same-file fallback and bind to app.py::User instead of models.py::User. The alias binding needs enough module/qualified context to disambiguate the imported target.

Useful? React with 👍 / 👎.

Comment on lines +237 to +239
import_scope
.python_alias_target(source_file_id, short_name(&to_name), ref_byte)
.unwrap_or(to_name.as_str())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop rebinding aliases after same-module shadowing

This substitution is unconditional for every later edge whose short name matches the alias, so it also overrides Python's normal rebinding in the same module. For example, after from models import User as Account; class Account: ...; Account(), the class definition assigns Account in module scope, but the call is still resolved under User instead of the same-file Account. The alias range needs to end at, or be shadowed by, subsequent same-name bindings rather than applying to all references in the file.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/python-import-alias-scope
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkEstimated Cyclescycles x 1e6Instructionsinstructions x 1e6L1 Hitshits x 1e6LL Hitshits x 1e6RAM Hitshits x 1e3Total read+writereads/writes x 1e6
rag_pipeline::pipeline::index cargo_resolver:resolver_config()📈 view plot
⚠️ NO THRESHOLD
1,213.57 x 1e6📈 view plot
⚠️ NO THRESHOLD
801.18 x 1e6📈 view plot
⚠️ NO THRESHOLD
1,116.54 x 1e6📈 view plot
⚠️ NO THRESHOLD
17.26 x 1e6📈 view plot
⚠️ NO THRESHOLD
307.07 x 1e3📈 view plot
⚠️ NO THRESHOLD
1,134.10 x 1e6
rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()📈 view plot
⚠️ NO THRESHOLD
241.18 x 1e6📈 view plot
⚠️ NO THRESHOLD
154.70 x 1e6📈 view plot
⚠️ NO THRESHOLD
231.05 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.95 x 1e6📈 view plot
⚠️ NO THRESHOLD
11.21 x 1e3📈 view plot
⚠️ NO THRESHOLD
233.01 x 1e6
rag_pipeline::pipeline::query_warm cargo_resolver:resolver_index()📈 view plot
⚠️ NO THRESHOLD
233.00 x 1e6📈 view plot
⚠️ NO THRESHOLD
149.19 x 1e6📈 view plot
⚠️ NO THRESHOLD
223.28 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.85 x 1e6📈 view plot
⚠️ NO THRESHOLD
13.32 x 1e3📈 view plot
⚠️ NO THRESHOLD
225.14 x 1e6
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🐰 Bencher Report

Branchfeat/python-import-alias-scope
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
seconds (s)
(Result Δ%)
Upper Boundary
seconds (s)
(Limit %)
index_time/full_rebuild_cargo📈 view plot
🚷 view threshold
4.86 s
(+29.74%)Baseline: 3.75 s
5.79 s
(84.03%)
🐰 View full continuous benchmarking report in Bencher

Four Codex P2s on the alias rebind, all from the blunt whole-file/short-name
approach:

- Use the DB file language, not index.file_language, when building aliases in the
  full-rebuild driver: index.file_language is populated only from files that
  contributed a symbol, so a symbol-less entrypoint (only `from m import X as A;
  A()`) was skipped under `index --full` and the alias never registered. Both
  drivers now key on the authoritative `files.language`.
- Defer to a same-file definition: rebind `alias -> target` only when the file
  defines NEITHER the alias name NOR the target's bare name locally. A local
  `class Account` now shadows the alias; a local `User` isn't grabbed by the bare
  rewrite. New SymbolIndex::file_defines + python_alias_rebind.
- Register only TOP-LEVEL from-imports: a `from … import … as …` nested in a
  function/class binds the alias locally, so giving it whole-file scope would
  rebind unrelated same-name refs. Extraction now records the alias only when the
  import's parent is the module.

Tests: same-file shadow defers to the local class; e2e confirms a symbol-less
entrypoint resolves the alias under --full and a local class shadows it. Full
core suite (483) green.
@skakri

skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

All four addressed in 447e5e2.

P2 — Use file language in the full-rebuild path ✅ The full-rebuild driver now builds a file_language map from the DB (SELECT id, language FROM files) instead of index.file_language, which is populated only from files that contributed a symbol. A symbol-less entrypoint (from m import X as A; A() with no def/class of its own) was skipped under index --full; it now registers and resolves. Verified e2e (a run.py that only imports + calls resolves Account → models.User under --full). The incremental driver already joined files.language, so the two are back in lockstep.

P2 — Stop rebinding after same-module shadowing & P2 — Preserve the imported module ✅ Both handled by deferring to a same-file definition: the rebind now fires only when the file defines neither the alias name nor the target's bare name locally (new SymbolIndex::file_defines + python_alias_rebind). So from m import User as Account; class Account: ...; Account() binds to the local class Account, and a file that also defines its own User won't have Account rewritten into the same-file User (it stays on normal resolution). Conservative — when either name is local, skip the rebind (no edge beats a wrong one). New unit test covers the shadow case + e2e confirms it.

P2 — Don't give function-local imports file-wide scope ✅ Extraction now records the alias only for top-level from … import … as … (the import statement's parent is the module node). An import nested in a function/class body emits its plain target edge (dependency still captured) but no file-wide alias, so unrelated same-name references aren't rebound.

Full rag-rat-core suite (483) green; fmt + clippy clean.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 447e5e2115

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +858 to +859
let scope =
ImportScopeRange { scope_start: 0, scope_end: text.len(), mod_id: MOD_FILE_ROOT };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start Python alias scopes at the import statement

Even for a top-level from ... import ... as ..., the alias is not bound before that statement executes, but this scope starts at byte 0. In a file with Account() before from models import User as Account, the earlier call is treated as covered by the alias and can be rebound to User, corrupting edges for pre-import references; the alias scope should begin after the import statement rather than covering the whole file from the start.

Useful? React with 👍 / 👎.

Comment on lines +36 to +37
if index.file_defines(file_id, name) || index.file_defines(file_id, short_name(target)) {
return None;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect import order when checking alias shadowing

This suppresses the alias rebind whenever the file defines the alias name anywhere, regardless of whether that definition occurs before the import. For class Account: ...; from models import User as Account; Account(), Python's import reassigns Account, but this check skips the rebind and normal resolution prefers the same-file class Account, producing the opposite edge; the shadowing check needs to consider binding order/range, not just file-wide existence.

Useful? React with 👍 / 👎.

Comment on lines +258 to +263
python_alias_rebind(
&import_scope,
&index,
source_file_id,
short_name(&to_name),
ref_byte,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not rebind qualified Python attribute references

This rebinds solely from the leaf to_name, so a qualified annotation such as other.Account is also treated as the local alias Account after from models import User as Account. Because Python type-reference extraction stores only the dotted tail in to_name, that scenario now binds other.Account to User instead of the Account symbol; the alias rewrite should be limited to unqualified references, e.g. by checking the original evidence/qualified context.

Useful? React with 👍 / 👎.

// (#174 review), so recording it with whole-file scope would rebind
// unrelated same-name references; for a nested import emit the plain target
// edge (dependency captured) without the alias.
let record_alias = node.parent().map(|p| p.kind()) == Some("module");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize module-scope imports inside top-level blocks

This only records aliases whose direct parent is module, but top-level if/try blocks still bind imports in the module namespace. Common patterns like try: from ujson import loads as json_loads; except ImportError: ...; json_loads(...) have the import node under a block, so the alias carrier is downgraded to a plain import and the later call is left unresolved even though the binding is module-scoped.

Useful? React with 👍 / 👎.

Comment on lines +495 to +497
.iter()
.find(|alias| alias.covers(ref_byte))
.map(|alias| alias.target.as_str())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pick the latest covering Python alias binding

When the same alias is imported more than once, every top-level alias is stored with a file-wide range and this returns the first covering binding. In from m1 import User as Account; from m2 import Customer as Account; Account(), the later import reassigns Account, but resolution still uses the first User binding; overlapping alias bindings need order-aware ranges or a latest-start/last-binding winner.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python import-alias resolution scope (alias → imported symbol) Python Implements edges: resolver prefers trait/interface, not class

1 participant