A Rust port of Bowtie 2, aiming
for drop-in compatibility on the supported subset: same .bt2 index format,
same SAM output, same default scoring and end-to-end alignment semantics.
Status: alpha. Not for production. The default-config paired-end end-to-end pipeline matches Bowtie 2 closely enough for diffing experiments (see Validation), but there are known feature gaps (see Limitations).
.bt2index format —rusttie-buildproduces files byte-identical tobowtie2-buildon the test corpora (lambda phage, multi-contig with N gaps, human chr22). Existingbowtie2-buildindexes load and align correctly withrusttie.- Paired-end end-to-end alignment with BT2-faithful semantics:
- BT2's default scoring (
--mp 6,2 --rdg 5,3 --rfg 5,3 --score-min L,-0.6,-0.6). - Quality-scaled mismatch penalties.
- Multi-seed search with seed length 22 and BT2's
S,1,1.15interval. - Descent driver with
-Dfailure budget and-Rre-seeding. - Mate-rescue for missing-partner alignments (BT2's
extendSeedsPairedmate-find step). - BT2-faithful paired MAPQ (
BowtieMapq2with summed pair scores + second-best concordant pair).
- BT2's default scoring (
- SAM and BAM output — BAM via
noodles-bam. - Gzipped FASTQ input.
- Multi-threading via
rayon— one parallel-map per batch, deterministic output order.
Synthetic (10k paired 100bp wgsim reads from human chr22, GRCh38):
| Metric | RustTie vs Bowtie 2 |
|---|---|
.bt2 files |
byte-identical |
| Reads mapped | 19,983 / 20,000 (BT2: 20,000) |
| Position agreement | 97.8% |
| TLEN agreement | 98.8% |
| CIGAR agreement | 100.0% |
| AS / NM agreement | 99.6% |
| MD agreement | 99.3% |
| MAPQ agreement | 93.9% |
Wall time (-p 8) |
0.8s (BT2: 0.48s) |
Real-data (NA12878 mitochondrial paired-end Illumina reads from nf-core/test-datasets, 322,856 reads, hg38 chrM reference):
| Metric | RustTie vs Bowtie 2 |
|---|---|
.bt2 files |
byte-identical |
| Reads mapped | 1,039 (BT2: 1,047) — 99.2% of BT2's recall |
| Position agreement | 97.5% |
| MAPQ agreement | 96.8% |
| CIGAR / AS / NM / MD | 96.5 – 96.9% |
Reproducible via scripts/chr22_validate.sh and
scripts/real_data_validate.sh. The chr22 script now emits a per-bin
MAPQ-disagreement breakdown via scripts/mapq_diff.py.
The default-path MAPQ disagreement on the synthetic corpus is structural,
not a tuning issue. It comes from BT2's paired-mode descent emitting pair
candidates from joint bilateral seed extension: each entry in BT2's
rs1_/rs2_ parallel lists
(aln_sink.cpp:1413) is one (r1, r2)
tuple produced together when both mates extended concordantly from
related seeds. RustTie's default path aligns each mate independently and
then mate-rescues from top-K anchors, so our pair pool is missing the
close alternates BT2 finds via joint extension — alternates that often
don't even appear in BT2's output SAM but affect the
bestUnchosenCScore input to MAPQ
(unique.h:234).
--joint-descent is an experimental opt-in path that's a partial port of
BT2's extendSeedsPaired
(aligner_sw_driver.cpp:1582).
It interleaves seed anchors from both mates in a single priority queue and
mate-rescues during extension, emitting pair candidates to a bounded pool
(default 50, matching BT2's mhits+1). Measured impact on chr22:
| Setting | MAPQ | Wall (-p 8) |
|---|---|---|
| default | 93.9% | 0.8s |
--seed-hit-cap 1000 -D 1000 |
94.2% | 1.6s |
--joint-descent |
94.0% | 2.4s |
--joint-descent --seed-hit-cap 1000 -D 1000 |
94.1% | 6.8s |
--joint-descent --bt2-descent |
94.3% | 2.9s |
After two precision fixes (score_min C-style truncation + bin
thresholds using (double)0.1f-style f32→f64 casts to match BT2's
unique.h) the headline jumped from 92.3% → 93.9%. Then a partial port
of BT2's seed-prioritization algorithm
(aligner_sw_driver.cpp:492-738)
landed behind --bt2-descent, replacing our "skip seeds with too many
hits" strategy with BT2's weighted random sampling. That added another
+0.3pp and recovered 15 reads (19,983 → 19,998 mapped at default
settings), at virtually identical wall time.
Notably, --joint-descent --bt2-descent at default settings now beats
brute-force hi-cap (--seed-hit-cap 1000 -D 1000) on both MAPQ (94.3% vs
94.2%) and recall (19,998 vs 20,000 — virtually tied) at less than half
the wall time. The BT2-faithful algorithm is structurally more efficient.
The remaining ~5.7% gap to 100% needs Phase 2: RedundantAlns per-cell
dedup plus BT2's exact extendSeedsPaired anchor iteration. Tracked in
GitHub #1. See
rusttie.md for the full per-phase development log.
# Requires Rust 1.85 (edition 2024). The sais-rs (https://crates.io/crates/sais-rs) dependency wants nightly;
# pin via rust-toolchain.toml in this repo.
cargo build --releasetarget/release/rusttie-build my_reference.fa my_index
# Produces my_index.{1,2,3,4,rev.1,rev.2}.bt2You can also use an existing bowtie2-build index — they're byte-compatible.
# Paired-end
target/release/rusttie -p 8 -x my_index \
-1 reads_R1.fq -2 reads_R2.fq \
-S out.sam
# Single-end
target/release/rusttie -p 8 -x my_index \
-U reads.fq \
-S out.sam
# Output to BAM (auto-detected by extension)
target/release/rusttie -p 8 -x my_index \
-1 reads_R1.fq -2 reads_R2.fq \
-S out.bamThe flag set is a strict subset of Bowtie 2's. Where flags exist they take the same meaning and defaults as upstream.
| Flag | Meaning |
|---|---|
-x <BASE> |
Index basename (<base>.1.bt2 etc.) |
-1 / -2 |
Paired-end mate 1 / mate 2 FASTQ |
-U |
Single-end FASTQ |
-S |
Output SAM/BAM (extension auto-detected) |
-p / --threads |
Threads (0 = all cores) |
-D / --descent-budget |
Consecutive seed-extension failures (default 15) |
-R / --descent-reseed |
Max re-seedings on repetitive seeds (default 2) |
--mate-rescue <K> |
Mate-rescue from top-K anchors per side (default 3, 0 disables) |
--seed-hit-cap |
Per-seed hit cap (default 50; tuning knob) |
--joint-descent |
Experimental: joint paired-mode descent — see Known MAPQ gap |
--bt2-descent |
Experimental: BT2-faithful candidate sampling (requires --joint-descent). See Known MAPQ gap |
--mp MX,MN |
Mismatch penalty bounds |
--rdg O,E |
Read-gap open/extend |
--rfg O,E |
Reference-gap open/extend |
--score-min L,A,B |
Score-min function (only L supported) |
--no-unal |
Suppress unmapped records |
--no-head |
Suppress SAM header |
--very-fast / --fast / --sensitive / --very-sensitive |
Accepted as no-ops (defaults already match --sensitive) |
These are not implemented and are left as future work:
- 64-bit "large"
.bt2lindexes — references larger than ~4 Gbp (e.g., wheat, some metagenomes). Human GRCh38 (3.1 Gbp) fits in the small index and works. - Local alignment (
--local). End-to-end only. - Mate-pair orientations other than FR.
--rf/--ffare ignored. - Custom fragment range (
-I/-X). Hardcoded to[0, 500]. - Mismatches in seeds (
-N 1). We always use-N 0. - Read group tagging (
--rg/--rg-id). No@RGheader injection. - Multi-alignment reporting (
-k <int>/-a). Always reports best. - Read trimming flags (
-3/-5/--trim-to). - Quality encodings other than Phred+33.
- Compressed input other than gzip. No bzip2 / zstd.
- Stdin input (
-U -). - Multi-file input (
-U f1.fq,f2.fq). - Color-space. BT2 deprecated this; we don't implement it either.
The workspace has four crates:
rusttie-io— FASTQ/SAM/BAM I/O.rusttie-index—.bt2reader, BWT/FM-index, SA construction (viasais-rs).rusttie-align— seed-and-extend, descent driver, MAPQ, paired-end logic, mate-rescue.rusttie-cli— therusttieandrusttie-buildbinaries.
The full development log is in rusttie.md. Bowtie 2's
source is vendored under vendor/bowtie2/ for reference (excluded from
published crates).
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Bowtie 2 (vendored under vendor/bowtie2/) is licensed separately under
GPL-3.0; see vendor/bowtie2/LICENSE. It is
included here only as reference material for porting and testing — no
GPL-3 code is incorporated into RustTie's published binary or library.