Skip to content

Modularize large graphify-extract files#25

Merged
rblaine95 merged 3 commits into
masterfrom
refactor/modularize-graphify-extract
Jun 21, 2026
Merged

Modularize large graphify-extract files#25
rblaine95 merged 3 commits into
masterfrom
refactor/modularize-graphify-extract

Conversation

@rblaine95

@rblaine95 rblaine95 commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Splits 13 oversized source files in graphify-extract into focused submodules. The crate had accumulated several monoliths — the largest, extractors/multi.rs, was 2541 lines — that mixed independent concerns. This is pure code movement; observable behaviour and the parity suite are unchanged (one deliberate exception, noted below).

What moved

Cross-file orchestrator

  • extractors/multi.rs (2541) → multi/{mod,cache,python,java,js,swift}.rs

Per-language collectors

  • generic/references.rs (1402) → references/ (one file per language)
  • generic/inherit.rs (1076) → inherit/ (one file per language)

Multi-format extractors

  • extractors/dotnet.rsdotnet/{sln,slnx,csproj,razor}
  • extractors/dm.rsdm/{source,dmi,dmm,dmf}
  • extractors/powershell.rspowershell/{mod,manifest}
  • extractors/pascal.rspascal/{mod,forms,package}
  • extractors/mod.rs → carved out groovy.rs + python_rationale.rs

Pass-split single-language extractors

  • go.rs, rust_lang.rs, julia.rs, sql.rs{mod, walk, refs, calls}
  • generic/walk.rs → lifted shared graph/AST primitives into generic/graph.rs

Public API and all call sites are preserved via re-exports, so no external paths changed. Context structs that cross a new module boundary expose their fields as pub(super) only.

Review fixes (second commit)

Resolved five CodeRabbit findings:

  • #[must_use] on find_body
  • obj_name rewritten as an idiomatic children().find(...)
  • UPDATE edges now emit writes_to instead of reads_from — a deliberate divergence from graphify-py (extract.py:5702), whose label is semantically wrong for a mutation; noted inline
  • source_file relativization uses relativise_under_root (adds the symlink canonicalize fallback, closing a parity gap with Python's path.resolve().relative_to(root))
  • single tree cursor in Julia's func_name_from_signature

Behavioural note

The only observable change is the SQL UPDATE relation above: UPDATE statements in procs/triggers now produce writes_to edges rather than reads_from. Everything else is pure movement or refactors with identical output.

Verification

  • cargo fmt
  • cargo clippy --all-targets --all-features --workspace — clean
  • cargo nextest — 549 tests pass (431 extract parity + 118 CLI integration)
  • CodeRabbit review loop — converged to 0 findings

Summary by CodeRabbit

Release Notes

  • New Features

    • Added BYOND DreamMaker extraction for interface forms, icon sheets, and map files.
    • Added expanded .NET extraction for project, Razor, and solution files.
    • Added SQL extraction (including table/view relationships) and PowerShell .psd1 manifest dependency parsing.
    • Added Pascal/Lazarus .lfm/.dfm forms and .lpk package extraction.
  • Improvements

    • Introduced cross-file import and call resolution with per-file AST caching.
    • Enhanced extraction accuracy for Groovy (Spock fallback), Rust, Go, Julia, and multi-language type references.

Split 13 oversized source files into focused submodules. The crate
held several monoliths (the largest, `extractors/multi.rs`, was 2541
lines) that mixed independent concerns: cross-file resolvers,
per-language collectors, multi-format extractors, and multi-pass
walks. This is pure code movement; observable behaviour and the
parity test suite are unchanged.

By concern:

- `extractors/multi.rs` becomes `multi/` (cache, python, java, js,
  and swift cross-file resolvers plus the `extract` orchestrator)
- `generic/references.rs` and `generic/inherit.rs` become one file
  per language, re-exported through their `mod.rs`
- `extractors/dotnet.rs`, `dm.rs`, `powershell.rs`, and `pascal.rs`
  become one file per target file-format
- `extractors/mod.rs` sheds `groovy.rs` and `python_rationale.rs`
- `extractors/go.rs`, `rust_lang.rs`, `julia.rs`, and `sql.rs` split
  by pass (walk / refs / calls)
- `generic/walk.rs` lifts its shared graph and AST primitives into a
  new `generic/graph.rs`

Public API and call sites are preserved: `walk.rs` re-exports the
lifted primitives and the per-language modules glob-re-export their
items, so no external paths changed. Context structs that cross a
new module boundary expose their fields as `pub(super)` only.

Verified with `cargo fmt`, `cargo clippy --all-targets
--all-features --workspace`, and `cargo nextest` (549 tests pass).

Glory to the Omnissiah
Address the five findings raised against the modularization diff:

- `generic/graph.rs`: annotate `find_body` with `#[must_use]`,
  matching its sibling AST helpers (`named_children`,
  `first_child_kind`).
- `extractors/sql/mod.rs`: rewrite `obj_name` as an idiomatic
  `children().find(...)` instead of a manual cursor loop that
  returned on the first match. Behaviour is unchanged.
- `extractors/sql/walk.rs`: an UPDATE statement mutates its target,
  so emit a `writes_to` edge instead of `reads_from`. This diverges
  from graphify-py (`extract.py:5702`), whose `reads_from` label is
  semantically wrong for a write; the divergence is noted inline.
- `extractors/multi/mod.rs`: relativise node and edge `source_file`
  fields through `relativise_under_root` rather than a bare
  `strip_prefix`. This adds the canonicalize fallback for symlinked
  roots (e.g. macOS `/var` to `/private/var`), matching Python's
  `path.resolve().relative_to(root)` and the existing ID remap pass.
- `extractors/julia/walk.rs`: navigate the call-expression callee
  with a single tree cursor instead of two.

Verified with `cargo clippy --all-targets --all-features
--workspace` and `cargo nextest` (431 extract tests pass).

By the will of the Machine God
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 088f52e5-df71-43d3-ab0f-272bbf436888

📥 Commits

Reviewing files that changed from the base of the PR and between 96f343d and 0941a91.

📒 Files selected for processing (8)
  • crates/graphify-extract/src/extractors/dotnet/csproj.rs
  • crates/graphify-extract/src/extractors/groovy.rs
  • crates/graphify-extract/src/extractors/rust_lang/walk.rs
  • crates/graphify-extract/src/generic/inherit/java.rs
  • crates/graphify-extract/src/generic/references/java.rs
  • crates/graphify-extract/src/generic/references/python.rs
  • crates/graphify-extract/src/generic/references/ts.rs
  • crates/graphify-extract/tests/coverage_collectors.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/graphify-extract/src/generic/inherit/java.rs
  • crates/graphify-extract/src/generic/references/ts.rs
  • crates/graphify-extract/src/generic/references/python.rs
  • crates/graphify-extract/src/generic/references/java.rs
  • crates/graphify-extract/src/extractors/groovy.rs
  • crates/graphify-extract/src/extractors/rust_lang/walk.rs
  • crates/graphify-extract/src/extractors/dotnet/csproj.rs

📝 Walkthrough

Walkthrough

This PR splits several extractor modules into smaller submodules, adds new extractors for DM, .NET, Groovy, Pascal, PowerShell, Python rationale, Rust, Julia, and SQL, and expands multi-file extraction with caching, cross-file resolution, and final graph serialization.

Changes

Extractor refactoring, new extractors, and multi-file resolution

Layer / File(s) Summary
Generic graph and reference infrastructure
crates/graphify-extract/src/generic/graph.rs, crates/graphify-extract/src/generic/walk.rs, crates/graphify-extract/src/generic/references/..., crates/graphify-extract/src/generic/inherit/..., crates/graphify-extract/src/generic/mod.rs
Shared graph helpers move to generic/graph.rs, generic reference collectors split by language, and inheritance emitters split by language with shared base-node creation.
Go, Julia, Rust, and SQL extraction modules
crates/graphify-extract/src/extractors/go/..., crates/graphify-extract/src/extractors/julia/..., crates/graphify-extract/src/extractors/rust_lang/..., crates/graphify-extract/src/extractors/sql/...
Go is split into structural, reference, and call passes; Julia and Rust move call handling into dedicated modules; SQL gains dedicated entrypoints, reference walking, and an updated UPDATE edge relation.
DM, .NET, Groovy, Pascal, PowerShell, and Python rationale extractors
crates/graphify-extract/src/extractors/dm/..., crates/graphify-extract/src/extractors/dotnet/..., crates/graphify-extract/src/extractors/groovy.rs, crates/graphify-extract/src/extractors/pascal/..., crates/graphify-extract/src/extractors/powershell/..., crates/graphify-extract/src/extractors/python_rationale.rs, crates/graphify-extract/src/extractors/mod.rs
BYOND DM, .NET, Groovy, Pascal, PowerShell, and Python rationale extraction move into dedicated modules and re-exports, including the .dm source split and the new XML/text parsing entrypoints.
Multi-file extraction cache and cross-file resolvers
crates/graphify-extract/src/extractors/multi/cache.rs, crates/graphify-extract/src/extractors/multi/java.rs, crates/graphify-extract/src/extractors/multi/js.rs, crates/graphify-extract/src/extractors/multi/python.rs, crates/graphify-extract/src/extractors/multi/swift.rs
Cache-aware per-file extraction is added, along with Java, JS/TS, Python, and Swift cross-file resolution modules used by the multi-file pipeline.
Multi-file extraction orchestrator
crates/graphify-extract/src/extractors/multi/mod.rs
The top-level multi-file extractor dispatches per language, remaps IDs, runs cross-file passes, resolves calls, and serializes the final graph output.
Collector coverage updates
crates/graphify-extract/tests/coverage_collectors.rs
Adds coverage for Java inheritance resolution with qualified/generic bases and Rust use ... as ... import alias handling.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • bunkerlab-net/graphify#9: Both PRs touch the same graphify-extract reference/inheritance edge plumbing and context handling.
  • bunkerlab-net/graphify#14: Directly overlaps with the BYOND DreamMaker extractor family (extract_dmf, extract_dmi, extract_dmm) and its module wiring.
  • bunkerlab-net/graphify#15: Both PRs modify graphify-extract edge construction and type/reference resolution helpers.

Poem

🐇 Hop, hop! The monolith breaks apart,
Each extractor now has its own small cart.
Go, Rust, Julia — all neatly aligned,
With SQL and Swift in the pipeline refined.
A rabbit cheers for modules set free,
So many edges, so many trees! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main objective: it is a large-scale refactoring that modularizes 13 oversized files in the graphify-extract crate into focused submodules organized by concern.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

Address the second round of CodeRabbit findings on the modularization
PR. Three are behavioural and two are refactors; all sit on code that
the prior commits relocated verbatim.

- `dotnet/csproj.rs`: a self-closing `<TargetFramework/>` (a quick-xml
  `Empty` event) armed the text-capture flag with no following text,
  so the next element's text was misread as a framework. Only a real
  open tag now arms it, restoring parity with graphify-py, which reads
  `tf.text` (None for self-closing tags) via ElementTree.
- `inherit/java.rs`: class and interface inheritance now follows
  qualified (`scoped_type_identifier`) and generic (`generic_type`)
  bases through a `java_base_name` helper, not only plain
  `type_identifier`. Diverges from graphify-py `_extract_java`, which
  drops those bases.
- `rust_lang/walk.rs`: `use foo::bar as baz` now imports `bar`, not
  `bar as baz`. Diverges from graphify-py, which keeps the alias.
- `groovy.rs`: the Spock-fallback regexes and keyword set move to
  module-level `LazyLock` statics instead of recompiling per call.
- `references/java.rs`, `python.rs`, `ts.rs`: the duplicated
  role-of-generic if/else now calls the shared `role_of` helper.

The three reference divergences fix genuine extraction bugs per this
repo's feature-parity (not bug-parity) mandate; each is noted inline.

Added two coverage tests for the Java and Rust behavioural changes.
Verified with `cargo clippy --all-targets --all-features --workspace`
and `cargo nextest` (full workspace: 2098 pass).

Ave Deus Mechanicus
@rblaine95 rblaine95 force-pushed the refactor/modularize-graphify-extract branch from 4105ce8 to 0941a91 Compare June 21, 2026 14:07
@rblaine95 rblaine95 merged commit a1ad32c into master Jun 21, 2026
14 checks passed
@rblaine95 rblaine95 deleted the refactor/modularize-graphify-extract branch June 21, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant