huginn

Blind document intelligence — extract structure, metadata, and retrieval insights from document collections you can't read.

Setup

git clone <repo-url>
cd huginn

# Place your documents in a directory, e.g.:
cp /path/to/your/documents/* _test-docs/

# Run (CPU-only — works on any host)
DOCUMENTS_PATH=./_test-docs docker compose up

# Or run with NVIDIA GPU (recommended for any model larger than 3B)
DOCUMENTS_PATH=./_test-docs docker compose -f docker-compose.yml -f docker-compose.gpu.yml up

Open http://localhost:3000 in a browser. On first boot, the setup wizard detects your hardware (GPU/VRAM via nvidia-smi when the GPU override is applied), recommends a chat model from a curated catalog of 13 options spanning CPU-viable through 140 GB VRAM, and downloads it on demand. You can swap models later from the Model Settings page.

Reports are written to ./reports/ as JSON, Markdown, and a narrative summary.

How it works

huginn runs an 8-phase pipeline over your document folder:

Harvest — discovers files, computes checksums, infers project/customer from folder structure
Parse — extracts text from .docx, .xlsx, .pptx (officeparser) and .pdf (Apache Tika); detects language and headings
Fingerprint — builds a structural fingerprint and semantic embedding (Ollama) per document using headings only
Cluster — scores every document pair across 6 signals (filename similarity, structure, embeddings, directory, date) to find duplicate/version chains
References — extracts norm references (ISO/DIN/VDA/IATF), internal IDs, and chapter cross-references; resolves them across the corpus
Requirements — classifies sentences as MUSS/SOLL/KANN/DEKLARATIV requirements by section; spot-checks with LLM
Validate — runs consistency checks on parse success rate, requirement density, reference resolution, and version coverage
Report — writes three output files: structured JSON, human-readable Markdown, and an LLM-generated narrative summary

Everything runs locally. No document content leaves the machine.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
_test-docs		_test-docs
client-dist		client-dist
docs/superpowers		docs/superpowers
documents		documents
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile.release		Dockerfile.release
Dockerfile.scanner		Dockerfile.scanner
README.md		README.md
bun.lock		bun.lock
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

huginn

Setup

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

huginn

Setup

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages