nest is maintained by hoff research. author: brenner cruvinel.
only the latest minor on main is supported.
| version | status |
|---|---|
| 0.3.x | supported (current) |
| 0.2.x | not supported, please upgrade |
| 0.1.x | not supported, please upgrade |
do not open a public github issue for security vulnerabilities.
use one of:
- private vulnerability report: https://github.com/hoffresearch/nest/security/advisories/new
- email: [email protected]
we aim to acknowledge within 72 hours and to publish a fix or mitigation within 14 days for confirmed reports. coordinated disclosure preferred; we credit reporters who request it.
things we treat as security bugs:
- malformed
.nestfiles that trigger UB / OOB / panic in the rust runtime - a citation collision (two distinct chunks producing the same
chunk_id) - a
content_hashcollision under the v1 hash domain separation - a path that bypasses
model_hashvalidation innest search-textwithout the user passing--skip-model-hash-check - secrets or credentials accidentally committed to the repository
things we do not treat as security bugs:
- low recall on a particular corpus
- HNSW recall under user expectation (configuration tuning, see
--ef) - BM25 tokenizer degrading on CJK / thai / lao (documented limitation, see
AGENTS.mdknown gaps) - compressed vs raw size differences
- vulnerabilities in upstream sentence-transformers / huggingface stack; report those upstream first
- weaknesses in the embedding model itself (false positives, biased recall)
- configuration choices made by the operator (e.g. building a corpus with the placeholder
model_hashand using--skip-model-hash-check)
- the
.nestfile_hashandcontent_hash(nest stats <file>prints both) - the runtime
simd_backendand platform (nest stats) - the exact CLI or python invocation
- a minimal reproducer if possible (a synthetic
.nestis fine, seecrates/nest-format/tests/fixtures/) - whether you have a proposed mitigation
- the runtime (rust) never opens a network socket. queries are answered from
mmap. the default query embedders are offline too:ask/retrieveuse the vendored potion table (no network by construction), and thesearch-textsentence-transformers path forcesHF_HUB_OFFLINE/TRANSFORMERS_OFFLINEunless you opt in withNEST_ALLOW_DOWNLOAD=1(or pass--model-path). model_hashis a granular fingerprint over the local model snapshot (config + tokenizer + weights + pooling + dim + normalize). a mismatch fails with a typed error, never silently. the CLI (search-text) enforces this; the PythonNestFile.retrievebinding acceptsexpected_model_hashand the flagshipforge/retrieve.pypasses it by default, so the honesty gate holds on the Python surface too.unsafelives in the SIMD dispatcher (crates/nest-runtime/src/simd/), the mmap reader (crates/nest-runtime/src/mmap_file.rs), and a handful of zero-copy view casts incrates/nest-format/src/layout/andencoding/int8.rs. the SIMD and mmap sites carry// SAFETY:comments; documenting the remainingnest-formatsites is a tracked hardening item (do not assume every block is annotated).- untrusted
.nestfiles: the header/section/footer checksums are unkeyed SHA-256 (corruption detection, NOT authenticity) — an attacker can recompute them, sovalidate()does not prove a file is trustworthy. Safety against a hostile file rests on the parser's memory-safety (bounds-checked indices, capped decompression/allocation); opening an untrusted corpus still executes that parser, so treat unknown.nestfiles with the same care as any untrusted input. - release provenance: commits are signed (ssh signing), but release tags and
nest-clibinaries are NOT yet cryptographically signed, and no SBOM is published per release. Treat a downloaded artifact as unverified against source until signed releases land.Cargo.lockis committed so the rust dependency set is pinned and auditable. This is a tracked hardening item.