Skip to content

fix(init): Python root-entrypoint + env-only directory selection (#173)#181

Open
skakri wants to merge 1 commit into
mainfrom
fix/python-init-dir-selection
Open

fix(init): Python root-entrypoint + env-only directory selection (#173)#181
skakri wants to merge 1 commit into
mainfrom
fix/python-init-dir-selection

Conversation

@skakri

@skakri skakri commented Jun 15, 2026

Copy link
Copy Markdown
Member

Fixes #173. Two init-scanner UX refinements for Python directory selection. The safety-critical part (the walker flooring site-packages/.venv/venv/__pycache__) already shipped in #167 — these are about what init -y writes.

1. Root entrypoints alongside a package dir

A repo with root manage.py/main.py and a package dir wrote only the package binding, omitting the root files: . (the aggregate bucket every file increments) was never recognized as directly containing source. Now root-level files key under . in direct_dir_counts, and a new python_root_has_direct_source makes . a default when the root holds genuine entrypoints — so both . and the package dir are offered.

2. env-only repos no longer promote .

When every .py lives under a dependency tree (env/, .venv/…/site-packages, …), the no-default fallback would pick . on aggregate count and write python = ["."] (over installed deps). The fallback now skips . for Python unless the root directly contains real source — so an env-only repo gets no Python binding rather than a wrong one.

Tests

  • root_entrypoints_default_alongside_a_package_dirmanage.py + myapp/ → both . and myapp are defaults.
  • fallback_does_not_promote_dot_when_python_lives_only_under_a_dependency_tree — realistic env-only repo (. carries the aggregate count) → no default.
  • The existing fallback_does_not_promote_a_venv_only_python_repo stays green; full rag-rat suite + fmt + clippy clean. The count-keying change is Python-guarded in the selection logic, so other languages' dir selection is unaffected.

Last of the #167 Python follow-ups (with #172 / PR #180 and #174 / PR #179).

Two init-scanner UX refinements for Python dir selection (the walker floor
already handles the safety-critical part — these are about what `init -y`
writes):

- Root entrypoints alongside a package dir: a repo with root manage.py/main.py
  AND a package dir wrote only the package binding, omitting the root files,
  because `.` (the aggregate bucket) was never recognized as DIRECTLY containing
  source. Root files now key under `.` in direct_dir_counts, and a new
  python_root_has_direct_source makes `.` a default when the root holds real
  entrypoints — so both `.` and the package dir are offered.
- env-only repos no longer promote `.`: when every .py lives under a dependency
  tree (env/.venv/site-packages/…), the fallback would pick `.` on aggregate
  count and write python=["."] over installed deps. The fallback now skips `.`
  for Python unless the root directly contains real source.

Added tests for both (realistic env-only repo; root entrypoints + package dir);
existing venv-only test still green.
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@github-actions

Copy link
Copy Markdown

SCIP oracle — resolution report

Heuristic→compiler edge resolution per corpus. Δ compares resolved-after to the main baseline (only when the corpus profile + tool version match).

corpus tool edges resolved (heuristic → compiler) precision recall monikers Δ vs main
c-cjson scip-clang 3941 69.6% → 86.5% 89.1% 80.8% 911 +0.0pp
py-requests scip-python 1060 32.5% → 71.7% 89.9% 86.8% 292 +0.0pp
rust-semver rust-analyzer 1056 39.0% → 88.6% 79.6% 100.0% 78 +0.0pp

resolved = Exact/Syntactic + compiler upgrades + resolved-external, over edge candidates with a callee range. precision/recall are the oracle eval metrics.

@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

.filter(|candidate| {
language != Language::Python
|| candidate.path != Path::new(".")
|| python_root_has_direct_source(scan, &candidate.path)
})

P2 Badge Avoid falling back to . for env-only Python

In an env-only repo where the scan finds Python only under a dependency component such as env/, this filter leaves all Python candidates non-default, but default_plan for rag-rat init -y immediately converts an empty default list back to vec![PathBuf::from(".")] (crates/rag-rat-cli/src/init/run.rs:167-172). That means the generated config still contains python = ["."] and indexes from the repo root, so the env-only case this change is meant to suppress is not fixed unless the no-safe-default state is carried through to non-interactive plan generation.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions

Copy link
Copy Markdown

🐰 Bencher Report

Branchfix/python-init-dir-selection
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkEstimated Cyclescycles x 1e6Instructionsinstructions x 1e6L1 Hitshits x 1e6LL Hitshits x 1e6RAM Hitshits x 1e3Total read+writereads/writes x 1e6
rag_pipeline::pipeline::index cargo_resolver:resolver_config()📈 view plot
⚠️ NO THRESHOLD
1,215.13 x 1e6📈 view plot
⚠️ NO THRESHOLD
801.58 x 1e6📈 view plot
⚠️ NO THRESHOLD
1,117.18 x 1e6📈 view plot
⚠️ NO THRESHOLD
17.42 x 1e6📈 view plot
⚠️ NO THRESHOLD
309.70 x 1e3📈 view plot
⚠️ NO THRESHOLD
1,134.91 x 1e6
rag_pipeline::pipeline::query_cold cargo_resolver:resolver_built_config()📈 view plot
⚠️ NO THRESHOLD
241.10 x 1e6📈 view plot
⚠️ NO THRESHOLD
154.62 x 1e6📈 view plot
⚠️ NO THRESHOLD
230.96 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.95 x 1e6📈 view plot
⚠️ NO THRESHOLD
11.30 x 1e3📈 view plot
⚠️ NO THRESHOLD
232.92 x 1e6
rag_pipeline::pipeline::query_warm cargo_resolver:resolver_index()📈 view plot
⚠️ NO THRESHOLD
232.87 x 1e6📈 view plot
⚠️ NO THRESHOLD
149.16 x 1e6📈 view plot
⚠️ NO THRESHOLD
223.19 x 1e6📈 view plot
⚠️ NO THRESHOLD
1.85 x 1e6📈 view plot
⚠️ NO THRESHOLD
12.46 x 1e3📈 view plot
⚠️ NO THRESHOLD
225.05 x 1e6
🐰 View full continuous benchmarking report in Bencher

@github-actions

Copy link
Copy Markdown

🐰 Bencher Report

Branchfix/python-init-dir-selection
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
seconds (s)
(Result Δ%)
Upper Boundary
seconds (s)
(Limit %)
index_time/full_rebuild_cargo📈 view plot
🚷 view threshold
4.77 s
(+27.34%)Baseline: 3.75 s
5.79 s
(82.48%)
🐰 View full continuous benchmarking report in Bencher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

init scanner: Python directory-selection edge cases (root entrypoints, env-only repos)

1 participant