Skip to content

olanokhin/bitplane-ann

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bitplane-ann — progressive prefix pruning

Project: bitplane-ann License: Apache-2.0 Paper license: CC BY 4.0 Paper PDF DOI: 10.5281/zenodo.20517605 1M dataset experiments


Resolution Capacity: Beyond Reconstruction MSE for Progressive Approximate Nearest-Neighbor Search

Production RAG systems degrade as corpora grow because the quantization objective is wrong for progressive search. Reconstruction MSE preserves vector fidelity; progressive ANN needs early rank separation, balanced bit-plane partitions, and low conditional occupancy.

bitplane-ann introduces Resolution Capacity as that objective. The artifact shows that Hadamard rotation preserves prefix capacity on isotropic and anisotropic embeddings, while PCA and reconstruction-optimized rotations can collapse early-bit partitions even when reconstruction error is low.

This is the English repo+PDF research artifact for the June 2026 paper. It contains the canonical paper, compact public result summaries, Python reference experiments, official SAQ comparison notes, and R6 end-to-end beam+rerank evidence.

DOI: 10.5281/zenodo.20517605

PCA rotation collapses prefix capacity while Hadamard rotation preserves balanced bins and high recall

The core failure mode is visible in the first few bits: PCA concentrates variance, overloads early prefix bins, and wastes Resolution Capacity; Hadamard spreads signal across dimensions, keeping prefix bins balanced enough for high-recall beam filtering.

What Is In This Artifact

  • paper/ - canonical LaTeX source, compiled PDF, generated figures, and figure source CSVs.
  • experiments/python_reference/ - original Python reference experiments for Hadamard/PCA, early exit, Voronoi-local negative results, CAQ-style baseline, and Resolution Capacity diagnostics.
  • experiments/EXPERIMENT_REPORT_PYTHON.md - English journal for the Python research phase.
  • experiments/RUST_ACCELERATION_REPORT.md - English journal for Rust-backed 25k/1M validation, official SAQ, and R6 beam+rerank.
  • experiments/results/ - compact public CSV/JSON summaries supporting the paper and README claims.
  • docs/provenance.md - release provenance and attribution notes.

Paper Claims

Claim Measured setting Result Scope
Resolution Capacity BGE-small and E5-small, N=25,000, b=8 Hadamard keeps eta_b > 0.78; PCA falls to eta_b=0.256 on E5-small Rotation/distribution diagnostic
Hadamard vs PCA BGE-small and E5-small Hadamard improves Recall@8 by 19-35 percentage points Progressive retrieval objective
OPQ/PQ comparison E5-small, equal 8-bit-per-dimension budget Hadamard 99.6% vs OPQ/PQ 51.6% Recall@10 Equal-budget diagnostic
Isotropy Paradox Voronoi-local Hadamard experiments Local k-means cells do not improve bit-1 rank signal after Hadamard Negative finding
Oracle Gap Early-exit experiments Optimal exit is 1.00-1.07 bits; practical exit is 3.70-5.08 bits Stopping-confidence finding
Rust SIMD 25k parity checks and 1M-scale scans 41-45x faster than dense prefix-score materialization Kernel implementation result
R6 beam+rerank SIFT1M, BGE@1M, GIST1M 99.80-99.94% Recall@10 at 2.9-15.1 ms/query Full end-to-end pipeline
SAQ artifact BGE/E5 25k, AVX-512, SAQ commit 2163ebc SAQ reaches higher direct recall; bitplane direct prefix and beam+rerank occupy different regimes Closest-prior benchmark
Storage 8-bit scalar codes vs float32 vectors One uint8 per dimension, 4x smaller than float32, no graph overhead Memory tradeoff

HNSW remains much faster end-to-end in the current measurements. The paper positions bitplane-ann as a compact, graph-free progressive candidate-generation and capacity-control method whose practical differentiator is memory, not current latency.

Method

  1. Normalize vectors.
  2. Pad to the next power-of-2 dimension d' >= d.
  3. Apply randomized Walsh-Hadamard rotation H_d' * diag(s) * x, where s in {-1,+1}^d'.
  4. Lloyd-Max quantize each coordinate to an 8-bit scalar code using data-oblivious per-dimension codebooks.
  5. Store one contiguous uint8 array of shape N x d.
  6. At query time, compute prefix-L1 distance from the top b bits:
prefix_dist_b(q, c) = sum_j |(q_j >> (8-b)) - (c_j >> (8-b))|

For beam mode, keep floor(N / 2^b) candidates after each depth and rerank final survivors with float32 cosine/IP. The same uint8 array serves every bit depth: no graph, no centroid routing, no duplicated index layout.

Resolution Capacity

The paper uses the current variable names and definitions below.

Let B_{j,b}(X) be the set of non-empty b-bit scalar quantization bins for dimension j of corpus X, with |B_{j,b}(X)| <= 2^b.

Marginal Bin Load

R_b(X) = (1/d) * sum_j N / |B_{j,b}(X)|

R_b is the average number of corpus vectors per occupied bin per dimension. Under a perfectly uniform distribution, R_uniform ~= N / 2^b, and R_b >= R_uniform.

Marginal Resolution Capacity

eta_b(X) = R_uniform / R_b(X) = (N / 2^b) / R_b(X), in (0, 1]

eta_b=1 means perfect uniform partition utilization. eta_b << 1 means capacity collapse: a few bins overflow with imposters while many bins sit empty. Higher eta_b is better.

For direct prefix retrieval on BGE-small and E5-small text embeddings at N=25,000, b=8, the observed thresholds are:

Regime Condition Direct-mode behavior
Safe R_b < 130 and eta_b > 0.75 Recall@8 >= 99%
Caution 130 <= R_b <= 200 and 0.49 <= eta_b <= 0.75 Recall@8 90-99%
Danger R_b > 200 and eta_b < 0.49 Recall@8 < 80%

These thresholds are empirical direct-mode diagnostics, not corpus-size limits. At 1M scale, R_b=3,906 is far beyond the direct-mode danger region, so direct prefix recall drops to 71-85%. Beam+rerank recovers to 99.80-99.94% Recall@10 because the beam schedule intentionally keeps N / 2^b candidates and reranks them with float32 vectors.

Core Results

Resolution Capacity at 25k, b=8

Dataset Rotation R_b eta_b Recall@8
BGE-small PCA 190 0.514 80.85%
BGE-small Hadamard 100 0.976 99.8%
E5-small PCA 382 0.256 64.35%
E5-small Hadamard 124 0.787 99.6%

1M beam retention at depth 8

Dataset Dimensions Beam Beam retention
SIFT1M 128 3,906 / 1M 100.00%
BGE@1M 384 3,906 / 1M 100.00%
GIST1M 960 3,906 / 1M 100.00%

This retention table is a prior R0-R3 diagnostic: it asks whether the exact top-10 neighbors appear inside the depth-8 prefix beam. R6 is the authoritative end-to-end pipeline measurement.

R6 end-to-end beam+rerank

Dataset Recall@10 Total ms/query Bytes/vector Notes
SIFT1M 99.94% 2.87 128 depth=8, beam=3,906
BGE@1M 99.80% 5.60 384 depth=8, beam=3,906
GIST1M 99.84% 15.10 960 depth=8, beam=3,906

Diagnostic latency comparison at 1M

Dataset Method Recall@10 ms/query Bytes/vector
SIFT1M HNSW 99.76% 0.15 512 + graph
SIFT1M bitplane beam+rerank 99.94% 2.87 128
BGE@1M HNSW 99.64% 0.28 1,536 + graph
BGE@1M bitplane beam+rerank 99.80% 5.60 384
GIST1M HNSW 97.00% 0.98 3,840 + graph
GIST1M bitplane beam+rerank 99.84% 15.10 960

The paper's caveat is explicit: current Rust byte-SAD prefix-L1 kernels are 4-8x slower than FAISS exact at d >= 384, and HNSW is 15-20x faster end-to-end. The practical differentiator is compact storage and no graph overhead.

Official SAQ artifact vs bitplane direct-prefix diagnostic

SAQ was run from the official C++ artifact at commit 2163ebc on AVX-512 hardware with K=4,096, B in {1,8}, nprobe in {8,32,128,512}, and 4 threads. These are direct-retrieval operating points, not R6 beam+rerank.

Dataset Method Config Recall@10 ms/query
BGE-small SAQ B=1, nprobe=512 75.4% 92.2
BGE-small SAQ B=8, nprobe=512 96.3% 95.8
BGE-small bitplane direct prefix depth=8 76.1% 13.4
E5-small SAQ B=1, nprobe=512 68.2% 91.8
E5-small SAQ B=8, nprobe=512 93.3% 96.3
E5-small bitplane direct prefix depth=8 70.9% 13.4

Honest Caveats

  • No dedicated SIMD bitplane kernels yet; production speed requires AVX-512 bitplane kernels.
  • eta_b/R_b analysis is primarily validated at 25k; systematic sweeps across modalities and scales are future work.
  • R6 validates end-to-end beam+rerank latency, but HNSW remains much faster.
  • Bimodal adaptive exit is not confirmed; compute savings are consistent across queries, not query-dependent.
  • Official SAQ was benchmarked as-is with K=4,096; results may differ at K ~= sqrt(N).

Reproduce The Public Figures

Install the lightweight Python dependencies and rebuild the figures from tracked CSV/JSON summaries:

uv sync
uv run python paper/make_figures.py

The script writes:

  • paper/figures/fig_recall_curves.pdf
  • paper/figures/fig_scale_rust.pdf
  • matching *_source.csv files

Rebuild The Paper

cd paper
tectonic main.tex

If tectonic is unavailable, use a standard LaTeX installation:

cd paper
pdflatex main.tex
pdflatex main.tex

The canonical paper files are:

  • paper/main.tex
  • paper/main.pdf

Reproduce The Research Scripts

The Python reference scripts live under experiments/python_reference/.

Smoke example:

uv run --extra experiments python experiments/python_reference/experiment0_bitplanes.py \
  --corpus-size 1000 \
  --queries 50 \
  --sample-queries 5 \
  --prune-queries 10 \
  --out-dir experiments/results/python_reference_smoke

Full 25k runs may download Hugging Face datasets/models and may create local .npy caches. Those large caches are intentionally not tracked.

The R6 large-scale beam+rerank result is archived as compact output files under experiments/results/rust_acceleration/. Re-running R6 requires the external implementation workspace bitplane-infer, its Rust/Python backend, and local 1M-scale datasets; those large implementation and data directories are intentionally not vendored here.

Results To Read First

Repository Layout

paper/                    Canonical LaTeX paper and generated figures
experiments/python_reference/
                          Python reference experiments
experiments/results/       Compact public CSV/JSON result summaries
docs/assets/               README and explanatory visual assets
docs/provenance.md         Artifact provenance and attribution

How to Cite

@misc{bitplane-ann2026,
  author       = {Anokhin, Oleksandr},
  title        = {Resolution Capacity: Beyond Reconstruction MSE for Progressive Approximate Nearest-Neighbor Search},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20517605},
  url          = {https://doi.org/10.5281/zenodo.20517605}
}

License

  • Software, scripts, configuration, and compact reproducibility artifacts: Apache-2.0. See LICENSE.
  • Paper source, compiled paper PDF, and paper figures under paper/: CC BY 4.0. See LICENSE-PAPER.md.

About

Research artifact for Resolution Capacity in progressive approximate nearest-neighbor search

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-PAPER.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors