nightshift: perf-regression — 12 allocation hotspots in the scan pipeline
Summary
jarspect is a Rust security scanner for Minecraft .jar mods. Analysis of 28 source files (299KB) reveals 12 performance-relevant patterns in the scan pipeline: excessive clone() calls (134 total), unbounded Vec allocations without capacity hints, and redundant string allocations in the hot path (lib.rs → scan.rs → detectors → verdict.rs).
Findings
🟡 P1 — Excessive .clone() in hot path (134 occurrences)
The scan pipeline creates strings for paths, severities, categories, and evidence, then clones them at every stage boundary.
Worst offenders:
| File |
.clone() count |
Hot path context |
src/lib.rs |
~15 |
entry.path.clone(), signature.id.clone(), signature.severity.clone() in signature matching loop (lines 256-288) |
src/scan.rs |
~12 |
request.upload_id.clone(), capability_profile.capabilities.clone(), root_label.clone() in run_scan (lines 96-264) |
src/verdict.rs |
~20 |
static_findings.matches iteration with per-indicator cloning for evidence aggregation |
Impact: For a mod with 1000+ class files, the signature matching loop in lib.rs clones strings for every match on every entry. With 11 detectors, this compounds.
🟡 P1 — Vec::new() without capacity hints (87 .collect() + many Vec::new())
Key locations:
| File |
Location |
Issue |
src/lib.rs:151-153 |
let mut matches = Vec::new(), matched_pattern_ids = Vec::new(), matched_signature_ids = Vec::new() |
Could pre-allocate based on known pattern/signature counts |
src/lib.rs:354-355 |
counts_by_category: HashMap::new(), counts_by_severity: HashMap::new() |
Small fixed number of categories (≤5) — could use with_capacity(8) |
src/lib.rs:415,456,475 |
Signature/rulepack loading Vectors |
Size known from parsed JSON — with_capacity(signatures.len()) |
src/detectors/mod.rs:92 |
merge_strings(target, incoming) — takes mut Vec<String> |
Could use extend with reserve instead of pushing one-by-one |
🟢 P2 — Redundant string formatting in verdict generation
src/verdict.rs (38KB, largest file) generates AI prompts by concatenating strings in loops:
| Line |
Pattern |
Count |
| 198-216 |
Per-indicator string extraction loops |
3 loops over static_findings.matches |
| 256 |
for value in strings formatting |
O(indicators) allocations |
| 551 |
Second full iteration over static_findings.matches |
Duplicates work from line 198 |
| 636 |
Profile capability iteration |
Additional string building |
The verdict builder iterates over all findings multiple times to construct the AI prompt, creating intermediate String allocations each pass.
🟢 P2 — Archive entry cloning in scan pipeline
src/scan.rs:150-151:
path: root_label.clone(),
bytes: bytes.clone(),
bytes.clone() on the full archive content is expensive. For large mods (10MB+), this duplicates the entire byte buffer. Consider using Arc<[u8]> or passing by reference.
🟢 P2 — Metadata analysis allocations
src/analysis/metadata.rs (22KB) is the largest analysis file:
- Creates
MetadataFinding structs with cloned strings for every entry
analyze_layer() iterates all entries in a layer, cloning paths for each finding
analyze_fabric_metadata(), analyze_forge_metadata(), analyze_spigot_metadata() all build finding vectors without capacity hints
Benchmarks Suggested
- Add criterion benchmarks for:
run_capability_detectors() with a 1000-entry evidence index
generate_verdict() with 50+ indicators
analyze_metadata() on a multi-layer mod
- Track allocation count with
dhat or jemalloc profiling on a real 10MB mod
- Set up CI perf regression — fail if scan time on a fixed fixture exceeds baseline by >10%
Low-Hanging Fruit (Estimated Impact)
| Fix |
Effort |
Impact |
Replace bytes.clone() with Arc<[u8]> in scan.rs |
Low |
High for large mods |
Add Vec::with_capacity() in signature matching loop |
Low |
Medium |
Use &str / Cow<str> for indicator fields passed between stages |
Medium |
Medium |
| Single-pass verdict prompt builder instead of 3 iterations |
Medium |
Medium |
HashMap::with_capacity(8) for category/severity counts |
Low |
Low |
Files Analyzed
src/lib.rs (17.8KB) — main analysis pipeline
src/scan.rs (19.6KB) — scan orchestrator
src/verdict.rs (38.1KB) — AI verdict generation
src/analysis/metadata.rs (22.1KB) — metadata analysis
src/detectors/mod.rs (6.8KB) — detector dispatch
- All 11 detector files (
src/detectors/capability_*.rs)
nightshift: perf-regression — 12 allocation hotspots in the scan pipeline
Summary
jarspect is a Rust security scanner for Minecraft
.jarmods. Analysis of 28 source files (299KB) reveals 12 performance-relevant patterns in the scan pipeline: excessiveclone()calls (134 total), unboundedVecallocations without capacity hints, and redundant string allocations in the hot path (lib.rs→scan.rs→ detectors →verdict.rs).Findings
🟡 P1 — Excessive
.clone()in hot path (134 occurrences)The scan pipeline creates strings for paths, severities, categories, and evidence, then clones them at every stage boundary.
Worst offenders:
.clone()countsrc/lib.rsentry.path.clone(),signature.id.clone(),signature.severity.clone()in signature matching loop (lines 256-288)src/scan.rsrequest.upload_id.clone(),capability_profile.capabilities.clone(),root_label.clone()inrun_scan(lines 96-264)src/verdict.rsstatic_findings.matchesiteration with per-indicator cloning for evidence aggregationImpact: For a mod with 1000+ class files, the signature matching loop in
lib.rsclones strings for every match on every entry. With 11 detectors, this compounds.🟡 P1 —
Vec::new()without capacity hints (87.collect()+ manyVec::new())Key locations:
src/lib.rs:151-153let mut matches = Vec::new(),matched_pattern_ids = Vec::new(),matched_signature_ids = Vec::new()src/lib.rs:354-355counts_by_category: HashMap::new(),counts_by_severity: HashMap::new()with_capacity(8)src/lib.rs:415,456,475with_capacity(signatures.len())src/detectors/mod.rs:92merge_strings(target, incoming)— takesmut Vec<String>extendwithreserveinstead of pushing one-by-one🟢 P2 — Redundant string formatting in verdict generation
src/verdict.rs(38KB, largest file) generates AI prompts by concatenating strings in loops:static_findings.matchesfor value in stringsformattingstatic_findings.matchesThe verdict builder iterates over all findings multiple times to construct the AI prompt, creating intermediate
Stringallocations each pass.🟢 P2 — Archive entry cloning in scan pipeline
src/scan.rs:150-151:bytes.clone()on the full archive content is expensive. For large mods (10MB+), this duplicates the entire byte buffer. Consider usingArc<[u8]>or passing by reference.🟢 P2 — Metadata analysis allocations
src/analysis/metadata.rs(22KB) is the largest analysis file:MetadataFindingstructs with cloned strings for every entryanalyze_layer()iterates all entries in a layer, cloning paths for each findinganalyze_fabric_metadata(),analyze_forge_metadata(),analyze_spigot_metadata()all build finding vectors without capacity hintsBenchmarks Suggested
run_capability_detectors()with a 1000-entry evidence indexgenerate_verdict()with 50+ indicatorsanalyze_metadata()on a multi-layer moddhatorjemallocprofiling on a real 10MB modLow-Hanging Fruit (Estimated Impact)
bytes.clone()withArc<[u8]>inscan.rsVec::with_capacity()in signature matching loop&str/Cow<str>for indicator fields passed between stagesHashMap::with_capacity(8)for category/severity countsFiles Analyzed
src/lib.rs(17.8KB) — main analysis pipelinesrc/scan.rs(19.6KB) — scan orchestratorsrc/verdict.rs(38.1KB) — AI verdict generationsrc/analysis/metadata.rs(22.1KB) — metadata analysissrc/detectors/mod.rs(6.8KB) — detector dispatchsrc/detectors/capability_*.rs)