faster cactus-phast on fragmented references by glennhickey · Pull Request #1946 · ComparativeGenomicsToolkit/cactus

glennhickey · 2026-06-30T13:09:19Z

Don't bother running on unaligned contigs.

Also: add general bed selection option for cactus-phast.

…ntigs Restrict cactus-phast to the reference regions worth scoring so the chunker doesn't waste time on contigs phyloP can't score: - --bedRanges <ranges.bed>: restrict the analysis to the reference ranges in a BED file (same option name/format as cactus-hal2maf). The BED parser is factored into a shared maf_chunk.parse_bed_ranges() now imported by both tools. Ranges are clamped to contig length and overlapping/touching ranges are merged (so the per-base wig has no duplicate positions); out-of-range intervals are warned about. - Automatic exclusion of reference contigs unaligned to anything. phast_setup (the one job that already localizes the HAL -- no extra HAL copy) runs halAlignedExtract on the reference, a single scan of its top segments, to find contigs with at least one aligned base and drops the rest at planning time. Default on; --keepUnalignedContigs disables it. Only applied for a leaf reference (halAlignedExtract reports alignment to the parent only, so an internal/ancestral reference is skipped); degrades to "process all" if the scan fails or returns nothing. Co-Authored-By: Claude Opus 4.8 <[email protected]>

The chunker runs --chunkCores taffy|mafDuplicateFilter|bgzip pipelines concurrently but requested no memory, so it ran at Toil's ~2 GiB default regardless of -j. On a many-way MAF at -j 32 that starves the pipelines: a taffy child is OOM-killed and the truncated stream surfaces downstream as mafDuplicateFilter "premature end to maf file". A 577-way run OOMs at 2 GiB but completes under 4 GiB at -j 32 (~100-130 MiB/pipeline), so request 128 MiB/core + a 2 GiB base (~6 GiB at -j 32), overridable with the new --chunkMemory; --doubleMem still covers pathologically dense regions. Co-Authored-By: Claude Opus 4.8 <[email protected]>

glennhickey and others added 2 commits June 29, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

faster cactus-phast on fragmented references#1946

faster cactus-phast on fragmented references#1946
glennhickey wants to merge 2 commits into
masterfrom
phast-release

glennhickey commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

glennhickey commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant