AFQuery

AFQuery enables fast allele frequency queries on user-defined subsets of local genomic cohorts, without rescanning VCFs.

AFQuery is a bitmap-indexed engine that efficiently recomputes AC/AN/AF for dynamically defined subcohorts (e.g., by phenotype, sex, or sequencing technology), a common requirement in ACMG/AMP variant classification. It stores per-variant genotype data as Roaring Bitmaps in Parquet files and resolves sample filters into bitmaps that can be intersected in microseconds, enabling sub-100 ms queries on large cohorts. The system accounts for ploidy in sex chromosomes, adjusts AN based on sequencing technology, supports incremental updates, and runs locally using a file-based setup (Parquet + SQLite) without requiring server or cloud infrastructure.

Full Documentation→

When to use AFQuery

You need allele frequencies for phenotype or user-defined subcohorts
You work with mixed sequencing technologies or capture kits versions (WGS, WES, targeted panels)
You require fast, repeated queries without rescanning VCFs
You want a local, reproducible workflow without cloud or cluster dependencies

Features

Dynamic subcohort queries (<100 ms) — bitmap intersections at query time; no VCF re-scan required
Technology-aware — avoids bias when mixing WGS, WES, and panels using different BED capture indexes
Ploidy-aware — correct handling of sex chromosomes (PAR/non-PAR, chrX, chrY)
ACMG-compatible allele counting — AC/AN/AF computed per standard definitions
Flexible metadata filtering — arbitrary labels (ICD-10, HPO, custom fields) with inclusion/exclusion rules
Incremental updates — add or remove samples and update metadata without rebuilding the database
VCF annotation — annotate variants using subcohort-specific frequencies
FILTER/call quality tracking — failed calls (FILTER!=PASS) tracked per variant and reported as N_FAIL
Batch and region queries — query a single locus, a genomic region, or a list of variants from a file
Bulk CSV export — export all variant frequencies with optional disaggregation by sex, technology, or phenotype
Audit changelog — all database operations logged with timestamps and operator notes
Database validation — integrity checks with scripted exit codes
Portable and serverless — file-based system, no infrastructure required

Performance

Query latency: <100 ms (tested up to 50,000 samples)
Storage: ~2 bytes/sample/variant
Scales to millions of variants per chromosome

Comparison with Alternative Tools

	AFQuery	bcftools	GATK GenomicsDB	Hail
Technology-aware AN	Yes	No	No	No
Metadata filtering	Arbitrary labels	No	No	Custom code
Ploidy-aware sex chromosomes	Yes	Manual	No	Manual
Dynamic subcohort queries	Yes	No	Limited	Requires code
FILTER/call quality tracking	Per variant	Manual	No	Manual
Incremental updates	Yes	No	Yes	No
Infrastructure required	None	None	Java/server	Spark cluster
Query latency (50K samples)	<100 ms	~5 min	<1 min	1–2 min

Algorithm Overview

AFQuery pre-indexes per-variant genotype data as Roaring Bitmaps stored in Parquet files. Each variant row holds three bitmaps: heterozygous carriers, homozygous alt carriers, and samples with FILTER!=PASS. Sample metadata (sex, phenotype, technology) is pre-serialized as bitmaps in SQLite.

At query time, the requested sample filter is resolved to a single candidate bitmap via bitmap intersections and differences — taking microseconds regardless of cohort size. For each variant, the candidate bitmap is intersected with the genotype bitmaps to compute AC/AN/AF. AN accounts for WES capture regions (via BED-indexed interval trees) and for ploidy on sex chromosomes (males are haploid on non-PAR chrX and chrY).

Input Requirements

VCF files: normalized and consistent with the selected genome build (GRCh37 or GRCh38)
Sample metadata: must include sex, sequencing technology, and any fields used for filtering (e.g., phenotype)
BED files (optional): define capture regions for each sequencing technology

Quick Start

Example workflow from raw VCFs to query, export, and annotation:

pip install afquery
# Docker: see Installation docs for docker pull / run usage

# Build the database
afquery create-db --manifest samples.tsv --output-dir ./db/ --genome-build GRCh38

# Inspect the database
afquery info --db ./db/

# Query a single position, filtered to a phenotype
afquery query --db ./db/ --locus chr1:925952 --phenotype E11.9 --sex female

# Query a genomic region
afquery query --db ./db/ --region chr1:900000-1000000

# Export BRCA1 variant frequencies to CSV
afquery dump --db ./db/ --output all_variants.csv --chrom chr17 --start 43044292 --end 43170327

# Annotate a VCF with cohort frequencies
afquery annotate --db ./db/ --input patient.vcf --output annotated.vcf --threads 12

# Add new samples to an existing database
afquery update-db --db ./db/ --add-samples new_samples.tsv

Documentation

Citation

If you use AFQuery, please cite:

AFQuery: fast, metadata-aware allele frequency queries on local genomic cohorts.
(manuscript in preparation)

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples/demo		examples/demo
recipes/afquery		recipes/afquery
resources		resources
src/afquery		src/afquery
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AFQuery

When to use AFQuery

Features

Performance

Comparison with Alternative Tools

Algorithm Overview

Input Requirements

Quick Start

Documentation

Citation

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AFQuery

When to use AFQuery

Features

Performance

Comparison with Alternative Tools

Algorithm Overview

Input Requirements

Quick Start

Documentation

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages