qndx

Fast regex search indexer for large repositories.

qndx builds a local n-gram index over source files and uses it to narrow the search space before running the actual regex. For selective queries on large codebases, this is significantly faster than scanning every file -- while guaranteeing no false negatives.

How it works

Index: Extract overlapping trigrams (and sparse n-grams) from every file. Store them in a sorted lookup table (ngrams.tbl) with postings lists (postings.dat) that map each n-gram to the files containing it.
Search: Decompose the regex into required literal fragments, look up their n-gram hashes in the index, intersect the posting lists to get a small candidate set, then run the full regex only against those candidates.
Freshness: Track Git working tree state. Modified, added, and untracked files are re-indexed into a lightweight overlay that merges with the baseline index at query time, giving read-your-writes semantics without a full rebuild.

Every match returned by the index path is verified against the actual file content. The index only eliminates files that provably cannot match -- it never introduces false negatives.

Quick start

# Build
cargo build --release

# Index a repository
qndx index -r /path/to/repo

# Search
qndx search -r /path/to/repo "fn main"
qndx search -r /path/to/repo "TODO|FIXME|HACK" --stats
qndx search -r /path/to/repo "impl.*Iterator" --strategy trigram --stats

# Inspect the query plan without searching
qndx plan "DatabaseConnection"

The index is stored in <root>/.qndx/index/v1/ and reused automatically on subsequent searches. If no index exists, search falls back to a full scan.

Performance

Measured on a 722-file / 8.6 MB Rust codebase (querymt):

Query	Strategy	Candidates	Scan	Indexed	Speedup
`enum AgentMode`	trigram	8 / 722	27 ms	0.008 ms	3375x
`TODO`	trigram	45 / 722	27 ms	1.4 ms	19x
`self\.\w+`	trigram	214 / 722	28 ms	6.9 ms	4x
`pub fn`	trigram	240 / 722	27 ms	6.7 ms	4x
`impl.*for`	trigram	427 / 722	28 ms	9.9 ms	3x

The index reader uses memory-mapped I/O (memmap2), so query-time resident memory is proportional to pages touched during the search, not the index size. A 2 GB index (Linux kernel) requires only ~100 KB of resident memory for a typical query.

CLI reference

`qndx index`

Build or update the search index for a directory.

Behavior:

If no index exists, qndx performs a full build.
If an index exists and contains a Git base_commit, qndx checks for changes since that commit.
If no changes are detected, qndx exits without rebuilding.
If changes are detected, qndx rebuilds the index and records the new base commit.
If more than 50% of files changed, qndx reports a forced full rebuild.

qndx index [OPTIONS]

Options:
  -r, --root <ROOT>                    Root directory to index [default: .]
  -i, --index-dir <INDEX_DIR>          Index output directory
      --max-file-size <MAX_FILE_SIZE>  Maximum file size in bytes [default: 1048576]
      --hidden                         Include hidden files
      --binary                         Include binary files
      --full                           Force a full rebuild (disable incremental update)

`qndx search`

Search using regex, with optional index acceleration.

qndx search [OPTIONS] <PATTERN>

Options:
  -r, --root <ROOT>            Root directory to search [default: .]
  -i, --index-dir <INDEX_DIR>  Index directory
  -l, --files-only             Show only file names
      --stats                  Show timing and candidate statistics
      --scan                   Force scan-only mode (ignore index)
      --strategy <STRATEGY>    N-gram strategy: auto, trigram, sparse [default: auto]

Output format: path:line:column: matched_text

When --stats is enabled for indexed search, qndx collects and prints a summary plus stage timings:

3 matches in 8 files (174185 bytes, 8 candidates / 722 total, 12 lookups, strategy: trigram) in 0.008s [indexed]
  timing: open=3.412ms, plan=0.071ms, candidates=0.204ms, verify=4.033ms

`qndx plan`

Show the query plan for a pattern without running a search.

qndx plan [OPTIONS] <PATTERN>

Options:
      --strategy <STRATEGY>  Force a specific strategy [default: auto]

Example output:

Pattern: enum AgentMode

Literals: ["enum AgentMode"]

Trigram plan:
  lookups: 12
  cost:    12.00

Sparse plan: unavailable (13 sparse grams >= 12 trigrams, no reduction)

Selected:  trigram
Lookups:   12
Cost:      12.00

`qndx bench` (feature-gated)

Benchmark reporting and budget checking (see Benchmarking). This command is available only when qndx-cli is built with the bench-tools feature.

cargo run -p qndx-cli --features bench-tools -- bench report
cargo run -p qndx-cli --features bench-tools -- bench check-budgets

Architecture

crates/
  qndx-core/     Shared types, file format, hashing, file walk, scan-only search
  qndx-index/    Index builder, memory-mapped reader, postings (Vec/Roaring/hybrid)
  qndx-query/    Regex decomposition, query planner, candidate resolution, verification
  qndx-git/      Git integration via gix (dirty detection, HEAD commit)
  qndx-cli/      CLI entrypoints
  qndx-bench/    Benchmark fixtures, report generation, budget checking

Data flow

                     build                              search
                     -----                              ------

  source files ──> walk + extract trigrams ──> ngrams.tbl      pattern
                   extract sparse n-grams ──> postings.dat        |
                   collect metadata       ──> manifest.bin        v
                                                           decompose regex
                                                                  |
                                                           plan (trigram vs sparse)
                                                                  |
                                                           lookup n-gram hashes
                                                           intersect posting lists
                                                                  |
                                                           candidate files
                                                                  |
                                                           read + verify (full regex)
                                                                  |
                                                           verified matches

Index files

The index is stored in three files under .qndx/index/v1/:

File	Magic	Contents
`ngrams.tbl`	`QXNG`	Sorted n-gram hash table (20 bytes per entry: hash, offset, length, flags)
`postings.dat`	`QXPO`	Concatenated posting blocks (tagged: varint-delta for small lists, Roaring for large)
`manifest.bin`	`QXMF`	Metadata and file paths (postcard-serialized)

Each file has a 24-byte header: 4-byte magic, u32 version, u64 payload length, u64 rapidhash-v3 checksum.

See docs/file-format.md for the full specification.

Benchmarking

Synthetic benchmarks

# Run all benchmarks
cargo bench

# Run a specific benchmark group
cargo bench -- end_to_end_search
cargo bench -- postings_choice

Benchmark targets: serializer_choice, postings_choice, ngram_extract, query_planner, end_to_end_search, git_overlay.

Real corpus benchmarks

Benchmark against an actual codebase:

# Basic
QNDX_BENCH_CORPUS=~/src/linux cargo bench --bench real_corpus

# With corpus-specific patterns
QNDX_BENCH_CORPUS=~/src/linux \
QNDX_BENCH_PATTERNS=benchmarks/patterns/linux.txt \
cargo bench --bench real_corpus

# Quick validation (no Criterion iterations)
QNDX_BENCH_CORPUS=~/myproject cargo bench --bench real_corpus -- --test

# Limit files for large repos
QNDX_BENCH_MAX_FILES=5000 \
QNDX_BENCH_CORPUS=~/src/linux cargo bench --bench real_corpus

Environment variables:

Variable	Required	Description
`QNDX_BENCH_CORPUS`	Yes	Path to the codebase
`QNDX_BENCH_PATTERNS`	No	Path to patterns file (tab-separated `name\tpattern` or one pattern per line)
`QNDX_BENCH_NAME`	No	Override corpus name in reports
`QNDX_BENCH_MAX_FILES`	No	Limit number of files
`QNDX_BENCH_MAX_FILE_SIZE`	No	Override max file size (default: 1 MB)

HTML reports are generated by Criterion at target/criterion/real_{name}/report/index.html.

Regression tracking

Performance budgets are defined in benchmarks/budgets.toml. Critical budgets (end-to-end search, postings intersection) fail CI on violation. See docs/performance-budgets.md for details.

# Save a baseline
cargo bench -- --save-baseline main

# Compare against baseline
cargo bench -- --baseline main

# Check budgets
cargo run -p qndx-cli -- bench check-budgets

Development

Build

cargo build
cargo build --release

Test

# All tests (202 tests across all crates)
cargo test --all-features

# Specific crate
cargo test -p qndx-index
cargo test -p qndx-query

# Differential tests (index results == scan results)
cargo test differential

# Regex edge cases
cargo test regex_edge_cases

Lint

cargo clippy --all-features --all-targets -- -D warnings
cargo fmt --all -- --check

Documentation

Document	Description
docs/architecture.md	Crate structure, data flow, design decisions
docs/file-format.md	On-disk index format specification
docs/decision-gates.md	Benchmark-backed architecture decisions (serializer, postings, n-gram strategy)
docs/performance-budgets.md	Per-benchmark-group regression thresholds
docs/regression-triage.md	Six-step process for investigating performance regressions
docs/release-gate.md	Release criteria and MVP definition of done

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
crates		crates
docs		docs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
renovate.json		renovate.json
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

qndx

How it works

Quick start

Performance

CLI reference

`qndx index`

`qndx search`

`qndx plan`

`qndx bench` (feature-gated)

Architecture

Data flow

Index files

Benchmarking

Synthetic benchmarks

Real corpus benchmarks

Regression tracking

Development

Build

Test

Lint

Documentation

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

qndx

How it works

Quick start

Performance

CLI reference

qndx index

qndx search

qndx plan

qndx bench (feature-gated)

Architecture

Data flow

Index files

Benchmarking

Synthetic benchmarks

Real corpus benchmarks

Regression tracking

Development

Build

Test

Lint

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`qndx index`

`qndx search`

`qndx plan`

`qndx bench` (feature-gated)

Packages