fastqrab

The multi-tool of FASTQ (pre-)processing.

It filters, samples, slices, dices, quantifies, demultiplexes and validates FASTQ reads in any way you choose.

Define your own specific read transformation pipeline out of well-tested building blocks, in a self-documenting, easily audited configuration file format.

Read processing is fast, reliable and well tested (we have 100% test coverage and more than 800 end-to-end test cases).

Supports input/output in FASTQ/FASTA/BAM.

Getting started right away

1. Define temporary run command

ABOVE="nix run github:TyberiusPrime/fastqrab"

or

ABOVE="docker run --rm ghcr.io/tyberiusprime/fastqrab:latest"

2. Run Your First Pipeline

Generate a basic quality report configuration from our first example cookbook:

$ABOVE cookbook 01 > my-first-pipeline.toml

Edit the input section to point to your FASTQ files:

nano my-first-pipeline.toml:

Run it:

$ABOVE my-first-pipeline.toml

3. View your report

xdg-open output_report.html

Documentation

We have extensive documentation following the Diátaxis framework.

Further examples can be found in the cookbook section.

Full list of FastQ manipulations supported

Please refer to the 'step' sections of our reference documentation

Briefly, you can extract information out of reads (into 'tags'), filter reads, modify their sequence and quality data, validate them, generate statistics on them, and split the output (demultiplex).

Status

It's in beta until the 1.0 release, but already quite usable.

All the major functionality and testing is in place, and I don't anticipate breaking changes.

Installation

This repo is a nix flake.

There are statically-linked binaries in the github releases section that will run on any linux with a recent enough glibc.

Currently not packaged by any distribution.

Windows and MacOS binaries are build for each release - be advised that these do not see much testing.

It's written in rust, so cargo build --release should work as long as you have zstd and cmake around. Same goes for cargo install fastqrab. The nix flake does offer a fully reproducible build and development environment.

Shell Completions

Shell completions are available for bash, fish, zsh, powershell, and elvish. After installation, generate completions for your shell:

# Bash - add to ~/.bashrc
source <(fastqrab completions bash)

# Fish - save to completions directory
fastqrab completions fish > ~/.config/fish/completions/fastqrab.fish

# Zsh - add to ~/.zshrc
source <(fastqrab completions zsh)

See the CLI documentation for more details.

Container image

A ready-to-run OCI image is published with each tag at ghcr.io/tyberiusprime/fastqrab.

# Docker
docker pull ghcr.io/tyberiusprime/fastqrab:latest
docker run --rm ghcr.io/tyberiusprime/fastqrab:latest --help

# Podman
podman pull ghcr.io/tyberiusprime/fastqrab:latest
podman run --rm ghcr.io/tyberiusprime/fastqrab:latest --help

Mount your working directory to feed a pipeline configuration:

docker run --rm -v "$(pwd)":/work ghcr.io/tyberiusprime/fastqrab:latest process input.toml

Usage

Refer to the full documentation or the binaries help page (shown when run without arguments) for details.

CLI: fastqrab process input.toml

We use a TOML file for configuration, because command lines are too limited and prone to misunderstandings.

And you should be writing down what you are doing anyway.

Here's a brief example:

[input]
    # supports multiple input files.
    # in at least three autodetected formats.
    read1 = ['fileA_1.fastq', 'fileB_1.fastq.gz', 'fileC_1.fastq.zstd']
    read2 = ['fileA_2.fastq', 'fileB_2.fastq.gz', 'fileC_2.fastq.zstd']
    index1 = ['index1_A.fastq', 'index1_B.fastq.gz', 'index1_C.fastq.zstd']
    index2 = ['index2_A.fastq', 'index2_B.fastq.gz', 'index2_C.fastq.zstd']


[[step]]
    # we can do a flexible report at any point in the pipeline
    # filename is output.(html|json)
    action = 'Report'
    name = "initial"
    duplicate_count_per_read = true
    count = true
    base_statistics = true

[[step]]
    # take the first five thousand reads
    action = "Head"
    n = 5000

[[step]]
    # extract UMI 
    action = "ExtractRegions"
    out_label = "region"
    # the umi is the first 8 bases of read1
    regions = [{ start = 0, length = 8, anchor="Start"}]
    source = 'read1' # all parts of a region must come from the same read

[[step]]
    #and place it in the read name
    action = "StoreTagInComment"
    in_label = "region"

[[step]]
    # now remove the UMI from the read sequence
    action = "CutStart"
    segment = 'read1'
    n = 8

[[step]]
    action = "Report"
    count = true # include read counts
    name = "post_filter"

[output]
    #generates output_1.fq and output_2.fq. For index reads see below.
    prefix = "output"
[[step]]
    action = 'OutputFASTQ' 
    # uncompressed. Suffix is determined from format
    compression = "Raw"
[[step]]
    action = 'OutputReport'
    json = true
    html = true

Canonical template

The repository ships an authoritative configuration scaffold at src/template.toml. When prompting an LLM or drafting a new pipeline, point it to that file so it can reference the full set of supported sections, comments, and examples.

Cookbooks

Looking for practical examples? Check out the cookbooks/ directory for complete, runnable examples demonstrating common use cases, or visit them in the documentation:

Basic Quality Report - Generate comprehensive quality metrics from FastQ files
UMI Extraction - Extract and handle Unique Molecular Identifiers
And many more...

Each cookbook includes:

Sample input data
Fully documented configuration files
Expected output for verification
Detailed README explaining the use case

Run any cookbook with:

git clone https://github.com/tyberiusprime/fastqrab
cd cookbooks/[cookbook-name]
fastqrab process input.toml

Citations

A manuscript is being drafted.

Contributions

PR's welcome.

If at any point you find the tool not doing what you expected it to, please open an issue so we can discuss how to improve it!

Name		Name	Last commit message	Last commit date
Latest commit History 1,550 Commits
.cargo		.cargo
.github/workflows		.github/workflows
cookbooks		cookbooks
dev		dev
docs		docs
fastqrab-config		fastqrab-config
fastqrab-decompressor		fastqrab-decompressor
fastqrab-dna		fastqrab-dna
fastqrab-io		fastqrab-io
fastqrab-steps		fastqrab-steps
fastqrab		fastqrab
test_cases		test_cases
.gitattributes		.gitattributes
.gitignore		.gitignore
.pinact.yaml		.pinact.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
bacon.toml		bacon.toml
barcodes.txt		barcodes.txt
clippy.toml		clippy.toml
deny.toml		deny.toml
flake.lock		flake.lock
flake.nix		flake.nix
security.md		security.md
tombi.toml		tombi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastqrab

Getting started right away

1. Define temporary run command

2. Run Your First Pipeline

3. View your report

Documentation

Full list of FastQ manipulations supported

Status

Installation

Shell Completions

Container image

Usage

Canonical template

Cookbooks

Citations

Contributions

About

Uh oh!

Releases 16

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fastqrab

Getting started right away

1. Define temporary run command

2. Run Your First Pipeline

3. View your report

Documentation

Full list of FastQ manipulations supported

Status

Installation

Shell Completions

Container image

Usage

Canonical template

Cookbooks

Citations

Contributions

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages