A local, privacy-first question-answering system for your internal DevOps documentation. Index your runbooks, guides, and wikis — then ask questions in plain English and get answers grounded in your actual docs, with source citations. Everything runs on your machine using Ollama for LLM inference and ChromaDB as the vector store.
Retrieval-Augmented Generation (RAG) combines a vector search engine with a large language model. When you ask a question, the system first retrieves the most relevant documentation chunks from ChromaDB (using semantic similarity), then feeds those chunks as context to the LLM, which generates a grounded answer. This prevents hallucination and ensures every answer can be traced back to a source document.
┌─────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ │
│ ./docs/**/*.md ──► RecursiveCharacterTextSplitter │
│ ./docs/**/*.txt │ (chunk_size=500, overlap=50) │
│ ./docs/**/*.pdf │ │
│ └──► Ollama (nomic-embed-text) │
│ Embeddings │
│ │ │
│ ▼ │
│ ChromaDB (./chroma_db) │
│ Collection: devops_docs │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ QUERY PIPELINE │
│ │
│ User Question ──► Ollama (nomic-embed-text) │
│ Embed question │
│ │ │
│ ▼ │
│ ChromaDB: top-K search │
│ (cosine similarity) │
│ │ │
│ ▼ │
│ Retrieved chunks + metadata │
│ │ │
│ ▼ │
│ Ollama LLM (llama3.2) ──► Streaming Answer │
│ with source citations │
└─────────────────────────────────────────────────────────────┘
- Python 3.10+
- Ollama installed and running locally (or via Docker)
- At least 8 GB RAM (16 GB recommended for llama3.2)
Python dependencies (see requirements.txt):
ollama>=0.3.0
chromadb>=0.5.0
langchain-text-splitters>=0.3.0
rich>=13.7.0
gradio>=4.44.0
Optional (for PDF ingestion):
pypdf>=4.0.0
cd /home/mercuryo/Proyectos/rag-devops-docs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt# If not using Docker, make sure Ollama is running:
ollama serve &
# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2Place your .md, .txt, or .pdf files in ./docs/. Example files are already in
./docs/example/ to get you started.
python ingest.py --verbose# Interactive mode
python query.py
# Single question
python query.py --question "How do I rollback a Kubernetes deployment?"
# Web UI
python app.pyusage: ingest.py [-h] [--docs-dir DIR] [--reset] [--verbose]
options:
--docs-dir DIR Directory to scan for docs (default: ./docs)
--reset Delete and recreate the ChromaDB collection first
--verbose Show per-file chunk count during ingestion
Examples:
# Ingest from default ./docs directory
python ingest.py
# Ingest from a custom directory with verbose output
python ingest.py --docs-dir /mnt/wiki/runbooks --verbose
# Force re-ingest all documents (clear existing vectors)
python ingest.py --reset --verbose
# Ingest a single-file directory
python ingest.py --docs-dir ./docs/exampleusage: query.py [-h] [--question QUESTION] [--top-k N] [--model MODEL]
[--interactive] [--json]
options:
--question, -q Ask a single question and exit
--top-k N Number of chunks to retrieve (default: 5)
--model MODEL Ollama model to use (default: llama3.2)
--interactive, -i Start interactive multi-turn mode
--json Output answer as JSON (useful for scripting)
Examples:
# Interactive mode (default when run with no arguments)
python query.py
# Single question
python query.py -q "What steps should I follow during a P0 incident?"
# Use a different model
python query.py --model mistral -q "Explain the blue-green deployment strategy"
# JSON output for scripting
python query.py --json -q "How do I scale a Kubernetes deployment?" | jq .answer
# Retrieve more context chunks
python query.py --top-k 8 -q "What are the database migration rules?"Sample JSON output:
{
"question": "How do I rollback a Kubernetes deployment?",
"answer": "According to the Kubernetes Runbook, you can rollback...",
"sources": [
{
"source": "kubernetes-runbook.md",
"chunk_index": 12,
"relevance": 0.923
}
]
}The Gradio web interface provides a chat UI with a sources panel.
python app.py
# Open http://localhost:7860 in your browserFeatures:
- Streaming responses with real-time token display
- Source attribution panel showing which documents were used
- Configurable top-K and model selection in Settings accordion
- Clear chat button
- Runs entirely locally — no data leaves your machine
- Place
.md,.txt, or.pdffiles anywhere under./docs/ - Subdirectory structure is preserved in metadata but does not affect retrieval
- Re-run ingestion:
# Add new files without removing existing vectors (safe for incremental updates)
python ingest.py
# Full re-index (use when you've significantly changed or removed files)
python ingest.py --resetSupported formats:
.md— Markdown (recommended for structured runbooks).txt— Plain text.pdf— PDF (requirespip install pypdf)
Tips for best retrieval quality:
- Use clear headings and section titles in your Markdown
- Keep individual documents focused on a single topic
- Avoid very long paragraphs — the splitter handles chunking, but semantic clarity helps
- Include commands, error messages, and exact terminology your team uses
Start Ollama via Docker (ChromaDB runs embedded — no extra container needed):
docker compose up -d
# Pull models into the running container
docker exec -it rag-devops-ollama ollama pull nomic-embed-text
docker exec -it rag-devops-ollama ollama pull llama3.2
# Set Ollama host if running in Docker
export OLLAMA_HOST=http://localhost:11434
# Run ingestion and queries on your host as normal
python ingest.py --verbose
python query.pyrag-devops-docs/
├── ingest.py # Document ingestion script
├── query.py # CLI query interface
├── app.py # Gradio web UI
├── requirements.txt # Python dependencies
├── docker-compose.yml # Ollama service
├── docs/
│ └── example/
│ ├── kubernetes-runbook.md
│ ├── incident-response.md
│ └── deployment-guide.md
└── chroma_db/ # Created automatically after first ingest
ollama: connection refused
→ Start Ollama: ollama serve or docker compose up -d
Collection 'devops_docs' not found
→ Run python ingest.py first
No relevant documents found
→ Check that ./docs/ contains files and python ingest.py completed successfully
Poor answer quality
→ Try --top-k 8 to retrieve more context, or rephrase your question with more specific terminology
PDF files skipped
→ Install pypdf: pip install pypdf