Modularize large graphify-extract files#25
Conversation
Split 13 oversized source files into focused submodules. The crate held several monoliths (the largest, `extractors/multi.rs`, was 2541 lines) that mixed independent concerns: cross-file resolvers, per-language collectors, multi-format extractors, and multi-pass walks. This is pure code movement; observable behaviour and the parity test suite are unchanged. By concern: - `extractors/multi.rs` becomes `multi/` (cache, python, java, js, and swift cross-file resolvers plus the `extract` orchestrator) - `generic/references.rs` and `generic/inherit.rs` become one file per language, re-exported through their `mod.rs` - `extractors/dotnet.rs`, `dm.rs`, `powershell.rs`, and `pascal.rs` become one file per target file-format - `extractors/mod.rs` sheds `groovy.rs` and `python_rationale.rs` - `extractors/go.rs`, `rust_lang.rs`, `julia.rs`, and `sql.rs` split by pass (walk / refs / calls) - `generic/walk.rs` lifts its shared graph and AST primitives into a new `generic/graph.rs` Public API and call sites are preserved: `walk.rs` re-exports the lifted primitives and the per-language modules glob-re-export their items, so no external paths changed. Context structs that cross a new module boundary expose their fields as `pub(super)` only. Verified with `cargo fmt`, `cargo clippy --all-targets --all-features --workspace`, and `cargo nextest` (549 tests pass). Glory to the Omnissiah
Address the five findings raised against the modularization diff: - `generic/graph.rs`: annotate `find_body` with `#[must_use]`, matching its sibling AST helpers (`named_children`, `first_child_kind`). - `extractors/sql/mod.rs`: rewrite `obj_name` as an idiomatic `children().find(...)` instead of a manual cursor loop that returned on the first match. Behaviour is unchanged. - `extractors/sql/walk.rs`: an UPDATE statement mutates its target, so emit a `writes_to` edge instead of `reads_from`. This diverges from graphify-py (`extract.py:5702`), whose `reads_from` label is semantically wrong for a write; the divergence is noted inline. - `extractors/multi/mod.rs`: relativise node and edge `source_file` fields through `relativise_under_root` rather than a bare `strip_prefix`. This adds the canonicalize fallback for symlinked roots (e.g. macOS `/var` to `/private/var`), matching Python's `path.resolve().relative_to(root)` and the existing ID remap pass. - `extractors/julia/walk.rs`: navigate the call-expression callee with a single tree cursor instead of two. Verified with `cargo clippy --all-targets --all-features --workspace` and `cargo nextest` (431 extract tests pass). By the will of the Machine God
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
🚧 Files skipped from review as they are similar to previous changes (7)
📝 WalkthroughWalkthroughThis PR splits several extractor modules into smaller submodules, adds new extractors for DM, .NET, Groovy, Pascal, PowerShell, Python rationale, Rust, Julia, and SQL, and expands multi-file extraction with caching, cross-file resolution, and final graph serialization. ChangesExtractor refactoring, new extractors, and multi-file resolution
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Address the second round of CodeRabbit findings on the modularization PR. Three are behavioural and two are refactors; all sit on code that the prior commits relocated verbatim. - `dotnet/csproj.rs`: a self-closing `<TargetFramework/>` (a quick-xml `Empty` event) armed the text-capture flag with no following text, so the next element's text was misread as a framework. Only a real open tag now arms it, restoring parity with graphify-py, which reads `tf.text` (None for self-closing tags) via ElementTree. - `inherit/java.rs`: class and interface inheritance now follows qualified (`scoped_type_identifier`) and generic (`generic_type`) bases through a `java_base_name` helper, not only plain `type_identifier`. Diverges from graphify-py `_extract_java`, which drops those bases. - `rust_lang/walk.rs`: `use foo::bar as baz` now imports `bar`, not `bar as baz`. Diverges from graphify-py, which keeps the alias. - `groovy.rs`: the Spock-fallback regexes and keyword set move to module-level `LazyLock` statics instead of recompiling per call. - `references/java.rs`, `python.rs`, `ts.rs`: the duplicated role-of-generic if/else now calls the shared `role_of` helper. The three reference divergences fix genuine extraction bugs per this repo's feature-parity (not bug-parity) mandate; each is noted inline. Added two coverage tests for the Java and Rust behavioural changes. Verified with `cargo clippy --all-targets --all-features --workspace` and `cargo nextest` (full workspace: 2098 pass). Ave Deus Mechanicus
4105ce8 to
0941a91
Compare
Summary
Splits 13 oversized source files in
graphify-extractinto focused submodules. The crate had accumulated several monoliths — the largest,extractors/multi.rs, was 2541 lines — that mixed independent concerns. This is pure code movement; observable behaviour and the parity suite are unchanged (one deliberate exception, noted below).What moved
Cross-file orchestrator
extractors/multi.rs(2541) →multi/{mod,cache,python,java,js,swift}.rsPer-language collectors
generic/references.rs(1402) →references/(one file per language)generic/inherit.rs(1076) →inherit/(one file per language)Multi-format extractors
extractors/dotnet.rs→dotnet/{sln,slnx,csproj,razor}extractors/dm.rs→dm/{source,dmi,dmm,dmf}extractors/powershell.rs→powershell/{mod,manifest}extractors/pascal.rs→pascal/{mod,forms,package}extractors/mod.rs→ carved outgroovy.rs+python_rationale.rsPass-split single-language extractors
go.rs,rust_lang.rs,julia.rs,sql.rs→{mod, walk, refs, calls}generic/walk.rs→ lifted shared graph/AST primitives intogeneric/graph.rsPublic API and all call sites are preserved via re-exports, so no external paths changed. Context structs that cross a new module boundary expose their fields as
pub(super)only.Review fixes (second commit)
Resolved five CodeRabbit findings:
#[must_use]onfind_bodyobj_namerewritten as an idiomaticchildren().find(...)writes_toinstead ofreads_from— a deliberate divergence fromgraphify-py(extract.py:5702), whose label is semantically wrong for a mutation; noted inlinesource_filerelativization usesrelativise_under_root(adds the symlink canonicalize fallback, closing a parity gap with Python'spath.resolve().relative_to(root))func_name_from_signatureBehavioural note
The only observable change is the SQL
UPDATErelation above:UPDATEstatements in procs/triggers now producewrites_toedges rather thanreads_from. Everything else is pure movement or refactors with identical output.Verification
cargo fmtcargo clippy --all-targets --all-features --workspace— cleancargo nextest— 549 tests pass (431 extract parity + 118 CLI integration)Summary by CodeRabbit
Release Notes
New Features
.psd1manifest dependency parsing..lfm/.dfmforms and.lpkpackage extraction.Improvements