Skip to content

Commit 1db841b

Browse files
authored
feat(devkit): add dependency topology scanner script (#361)
* feat: add dependency topology scanner Static analysis tool for Lua codebase architectural layering. - scan_topology.py: CLI entry (scan / diff subcommands) - scan_analysis.py: core analysis, group matching, policy violation detection - graph_utils.py: pure graph algorithms (Tarjan SCC, back-edges, degree) - html_renderer.py: interactive dagre-d3 HTML visualization with cluster expand/collapse, violation highlighting, SCC marking - topology.jsonc: 5-layer group definitions with English comments explaining each module placement and REVIEW notes for debatable calls scan --json produces agent-friendly output: health summary (cycles/violations/ungrouped with verdicts) cycles with severity, members_by_layer, example_cycle path, back_edges violations grouped by rule with full edge lists group_coverage confirming 0 ungrouped modules * fix: address copilot review + improve diff output - graph_utils: clear error message on invalid git ref - scan_analysis: remove unused imports (degree, largest_scc_size) - html_renderer: defensive isinstance guard in match_group; fix expanded cluster edges — children now show actual connections via rawEdges projection instead of appearing isolated - scan_topology: eliminate duplicate graph/SCC computation in cmd_scan; smart diff default (dirty worktree → HEAD vs worktree, clean → HEAD~1 vs HEAD); expand diff human output with SCC changes, fixed/new violations per edge - add requirements.txt declaring json5 dependency - AGENTS.md: document diff default behavior and ref syntax
1 parent 7e6cd2c commit 1db841b

9 files changed

Lines changed: 1871 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,7 @@ deps/
2222

2323
# Local Claude settings (keep out of repo)
2424
.claude/
25+
26+
# Dependency topology tool artifacts
27+
scripts/dependency-topology/__pycache__/
28+
*.html

AGENTS.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,13 @@
1111
- **Comments:** Avoid obvious comments that merely restate what the code does. Only add comments when necessary to explain _why_ something is done, not _what_ is being done. Prefer self-explanatory code.
1212
- **Config:** Centralize in `config.lua`. Use deep merge for user overrides.
1313
- **Types:** Use Lua annotations (`---@class`, `---@field`, etc.) for public APIs/config.
14+
15+
## Dependency Topology Tool
16+
17+
Use `scripts/dependency-topology/scan_topology.py` to inspect and track architectural layering.
18+
19+
- Use `python3 scripts/dependency-topology/scan_topology.py scan` to inspect current-state vs target-policy gap
20+
- Use `diff` to inspect change direction (improved/regressed/neutral) between snapshots
21+
- Pass `--snapshot <git-ref>` for historical snapshots
22+
- Pass `--json` when feeding outputs into scripts or agents
23+
- Keep architecture cleanup discussions anchored on scanner output instead of ad-hoc grep chains
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Dependency Topology Scanner
2+
3+
Static analysis tool for Lua codebase dependency architecture.
4+
5+
## File Structure
6+
7+
```
8+
scripts/dependency-topology/
9+
├── scan_topology.py # CLI entry: scan / diff subcommands
10+
├── scan_analysis.py # Core analysis: groups, edge rules, payload builders
11+
├── graph_utils.py # Pure graph algorithms (Tarjan SCC, back edges, degree)
12+
├── html_renderer.py # Interactive dagre-d3 + d3v5 HTML visualization
13+
└── topology.jsonc # Group definitions + review comments (strategy file)
14+
```
15+
16+
## Quick Start
17+
18+
```bash
19+
# Scan current HEAD → generate interactive HTML
20+
python3 scripts/dependency-topology/scan_topology.py scan
21+
22+
# Output to specific path
23+
python3 scripts/dependency-topology/scan_topology.py scan -o /tmp/deps.html
24+
25+
# JSON output (for scripts/agents)
26+
python3 scripts/dependency-topology/scan_topology.py scan --json
27+
28+
# Diff — smart default:
29+
# worktree has uncommitted Lua changes → HEAD vs worktree
30+
# worktree is clean → HEAD~1 vs HEAD (last commit)
31+
python3 scripts/dependency-topology/scan_topology.py diff
32+
33+
# Compare specific refs (branch names, commit SHAs, remote refs)
34+
python3 scripts/dependency-topology/scan_topology.py diff --from upstream/main --to clean-code-remove-core
35+
python3 scripts/dependency-topology/scan_topology.py diff --from HEAD~5 --to HEAD
36+
```
37+
38+
## Snapshot References
39+
40+
- `worktree` — current working tree (uncommitted changes)
41+
- `HEAD` — latest commit
42+
- Any git ref — branch name (e.g. `upstream/main`), tag, short or full commit SHA
43+
- Relative refs — `HEAD~1`, `HEAD^`
44+
45+
**diff defaults (no args):**
46+
- Worktree has uncommitted Lua changes → `HEAD` vs `worktree`
47+
- Worktree is clean → `HEAD~1` vs `HEAD`
48+
49+
Note: ambiguous short names (e.g. `upstream` when both a local branch and remote exist)
50+
produce a git warning. Prefer fully-qualified refs: `upstream/main`, `refs/heads/mybranch`.
51+
52+
## Output
53+
54+
**scan:** One-line summary + HTML file path
55+
```
56+
4 cycles, 20 violations, violations=20 → /path/to/dependency-graph.html
57+
```
58+
59+
**diff:** Change direction summary
60+
```
61+
HEAD → worktree: +2/-1 edges, improved=1, regressed=0
62+
```
63+
64+
## JSON Output Signals
65+
66+
When using `--json`:
67+
68+
- `health` — one-glance status for cycles / violations / ungrouped coverage
69+
- `cycles` — SCC details with severity, members_by_layer, example_cycle, back_edges_in_scc
70+
- `violations` — policy violations grouped by rule with full edge lists
71+
- `group_coverage` — module counts per layer (including ungrouped)
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
#!/usr/bin/env python3
2+
"""Repository-local static Lua dependency graph helpers.
3+
4+
Mechanism only:
5+
- Parse `require('opencode.*')` edges from `lua/opencode/**/*.lua`
6+
- Build snapshot graph from worktree or git ref
7+
- Provide SCC / back-edge utilities
8+
"""
9+
10+
from __future__ import annotations
11+
12+
from collections import Counter, defaultdict
13+
from dataclasses import dataclass
14+
from pathlib import Path
15+
import re
16+
import subprocess
17+
from typing import Dict, Iterable, List, Optional, Sequence, Set, Tuple
18+
19+
20+
REQUIRE_PATTERNS = [
21+
re.compile(r"require\s*\(\s*['\"](opencode(?:\.[^'\"]+)?)['\"]\s*\)"),
22+
re.compile(r"require\s+['\"](opencode(?:\.[^'\"]+)?)['\"]"),
23+
]
24+
25+
26+
@dataclass
27+
class SnapshotGraph:
28+
snapshot: str
29+
files: int
30+
nodes: Dict[str, str] # module -> relative file path
31+
edges: Set[Tuple[str, str]]
32+
33+
34+
def module_from_relpath(relpath: str) -> Optional[str]:
35+
if not relpath.startswith("lua/opencode/") or not relpath.endswith(".lua"):
36+
return None
37+
mod = relpath[len("lua/") : -len(".lua")]
38+
if mod.endswith("/init"):
39+
mod = mod[: -len("/init")]
40+
return mod.replace("/", ".")
41+
42+
43+
def _worktree_files(repo: Path) -> List[Tuple[str, str]]:
44+
out: List[Tuple[str, str]] = []
45+
base = repo / "lua" / "opencode"
46+
for fp in base.rglob("*.lua"):
47+
rel = fp.relative_to(repo).as_posix()
48+
text = fp.read_text(encoding="utf-8", errors="ignore")
49+
out.append((rel, text))
50+
return out
51+
52+
53+
def _git_files(repo: Path, ref: str) -> List[Tuple[str, str]]:
54+
cmd = ["git", "ls-tree", "-r", "--name-only", ref, "lua/opencode"]
55+
try:
56+
ls = subprocess.check_output(cmd, cwd=repo, text=True, stderr=subprocess.PIPE)
57+
except subprocess.CalledProcessError as e:
58+
stderr = e.stderr.strip() if e.stderr else ""
59+
raise ValueError(
60+
f"Invalid snapshot ref '{ref}'. Valid values: HEAD, worktree, branch name, commit SHA.\n"
61+
f"git error: {stderr}"
62+
) from None
63+
64+
out: List[Tuple[str, str]] = []
65+
for rel in ls.splitlines():
66+
if not rel.endswith(".lua"):
67+
continue
68+
show_cmd = ["git", "show", f"{ref}:{rel}"]
69+
try:
70+
text = subprocess.check_output(show_cmd, cwd=repo, text=True, stderr=subprocess.DEVNULL)
71+
except subprocess.CalledProcessError:
72+
continue
73+
out.append((rel, text))
74+
return out
75+
76+
77+
def load_snapshot_graph(repo: Path, snapshot: str) -> SnapshotGraph:
78+
files = _worktree_files(repo) if snapshot == "worktree" else _git_files(repo, snapshot)
79+
80+
nodes: Dict[str, str] = {}
81+
for rel, _ in files:
82+
module = module_from_relpath(rel)
83+
if module:
84+
nodes[module] = rel
85+
86+
edges: Set[Tuple[str, str]] = set()
87+
for rel, content in files:
88+
src = module_from_relpath(rel)
89+
if not src:
90+
continue
91+
92+
deps: Set[str] = set()
93+
for pat in REQUIRE_PATTERNS:
94+
deps.update(m.group(1) for m in pat.finditer(content))
95+
96+
for dep in deps:
97+
if dep in nodes:
98+
edges.add((src, dep))
99+
100+
return SnapshotGraph(snapshot=snapshot, files=len(files), nodes=nodes, edges=edges)
101+
102+
103+
def tarjan_scc(nodes: Iterable[str], edges: Iterable[Tuple[str, str]]) -> List[List[str]]:
104+
graph: Dict[str, List[str]] = defaultdict(list)
105+
for a, b in edges:
106+
graph[a].append(b)
107+
108+
index = 0
109+
stack: List[str] = []
110+
on_stack: Set[str] = set()
111+
indices: Dict[str, int] = {}
112+
lowlink: Dict[str, int] = {}
113+
result: List[List[str]] = []
114+
115+
def strongconnect(v: str) -> None:
116+
nonlocal index
117+
indices[v] = index
118+
lowlink[v] = index
119+
index += 1
120+
stack.append(v)
121+
on_stack.add(v)
122+
123+
for w in graph[v]:
124+
if w not in indices:
125+
strongconnect(w)
126+
lowlink[v] = min(lowlink[v], lowlink[w])
127+
elif w in on_stack:
128+
lowlink[v] = min(lowlink[v], indices[w])
129+
130+
if lowlink[v] == indices[v]:
131+
comp: List[str] = []
132+
while True:
133+
w = stack.pop()
134+
on_stack.remove(w)
135+
comp.append(w)
136+
if w == v:
137+
break
138+
result.append(comp)
139+
140+
for n in sorted(set(nodes)):
141+
if n not in indices:
142+
strongconnect(n)
143+
144+
return result
145+
146+
147+
def back_edges(nodes: Iterable[str], edges: Iterable[Tuple[str, str]]) -> Set[Tuple[str, str]]:
148+
graph: Dict[str, List[str]] = defaultdict(list)
149+
for a, b in edges:
150+
graph[a].append(b)
151+
for n in graph:
152+
graph[n] = sorted(set(graph[n]))
153+
154+
white, gray, black = 0, 1, 2
155+
color: Dict[str, int] = {n: white for n in set(nodes)}
156+
backs: Set[Tuple[str, str]] = set()
157+
158+
def dfs(v: str) -> None:
159+
color[v] = gray
160+
for w in graph[v]:
161+
c = color.get(w, white)
162+
if c == white:
163+
dfs(w)
164+
elif c == gray:
165+
backs.add((v, w))
166+
color[v] = black
167+
168+
for n in sorted(color.keys()):
169+
if color[n] == white:
170+
dfs(n)
171+
172+
return backs
173+
174+
175+
def degree(edges: Iterable[Tuple[str, str]]) -> Tuple[Counter, Counter]:
176+
indeg: Counter = Counter()
177+
outdeg: Counter = Counter()
178+
for src, dst in edges:
179+
outdeg[src] += 1
180+
indeg[dst] += 1
181+
return indeg, outdeg
182+
183+
184+
def find_cycle_in_scc(members: List[str], edges: Iterable[Tuple[str, str]]) -> List[str]:
185+
"""Return one concrete cycle path within an SCC, e.g. [a, b, c, a].
186+
187+
Uses DFS from the first member; backtracks until a back-edge is found.
188+
Returns [] if no cycle is found (shouldn't happen for a real SCC > 1).
189+
"""
190+
member_set = set(members)
191+
graph: Dict[str, List[str]] = defaultdict(list)
192+
for a, b in edges:
193+
if a in member_set and b in member_set:
194+
graph[a].append(b)
195+
for n in graph:
196+
graph[n] = sorted(set(graph[n]))
197+
198+
path: List[str] = []
199+
on_path: Dict[str, int] = {} # node -> index in path
200+
visited: Set[str] = set()
201+
202+
def dfs(v: str) -> List[str]:
203+
path.append(v)
204+
on_path[v] = len(path) - 1
205+
for w in graph[v]:
206+
if w in on_path:
207+
# Found cycle: extract from w's position to end, close it
208+
return path[on_path[w]:] + [w]
209+
if w not in visited:
210+
visited.add(w)
211+
result = dfs(w)
212+
if result:
213+
return result
214+
path.pop()
215+
del on_path[v]
216+
return []
217+
218+
start = sorted(members)[0]
219+
visited.add(start)
220+
return dfs(start)
221+
222+
223+
def largest_scc_size(comps: Sequence[Sequence[str]]) -> int:
224+
nontrivial = [c for c in comps if len(c) > 1]
225+
return max((len(c) for c in nontrivial), default=0)

0 commit comments

Comments
 (0)