RAG DevOps Documentation Assistant

A local, privacy-first question-answering system for your internal DevOps documentation. Index your runbooks, guides, and wikis — then ask questions in plain English and get answers grounded in your actual docs, with source citations. Everything runs on your machine using Ollama for LLM inference and ChromaDB as the vector store.

What is RAG?

Retrieval-Augmented Generation (RAG) combines a vector search engine with a large language model. When you ask a question, the system first retrieves the most relevant documentation chunks from ChromaDB (using semantic similarity), then feeds those chunks as context to the LLM, which generates a grounded answer. This prevents hallucination and ensures every answer can be traced back to a source document.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    INGESTION PIPELINE                        │
│                                                             │
│  ./docs/**/*.md  ──►  RecursiveCharacterTextSplitter        │
│  ./docs/**/*.txt │    (chunk_size=500, overlap=50)          │
│  ./docs/**/*.pdf │                                          │
│                  └──►  Ollama (nomic-embed-text)            │
│                         Embeddings                          │
│                              │                              │
│                              ▼                              │
│                     ChromaDB (./chroma_db)                  │
│                     Collection: devops_docs                 │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                     QUERY PIPELINE                          │
│                                                             │
│  User Question ──► Ollama (nomic-embed-text)                │
│                    Embed question                           │
│                         │                                   │
│                         ▼                                   │
│                  ChromaDB: top-K search                     │
│                  (cosine similarity)                        │
│                         │                                   │
│                         ▼                                   │
│           Retrieved chunks + metadata                       │
│                         │                                   │
│                         ▼                                   │
│         Ollama LLM (llama3.2) ──► Streaming Answer          │
│         with source citations                               │
└─────────────────────────────────────────────────────────────┘

Requirements

Python 3.10+
Ollama installed and running locally (or via Docker)
At least 8 GB RAM (16 GB recommended for llama3.2)

Python dependencies (see requirements.txt):

ollama>=0.3.0
chromadb>=0.5.0
langchain-text-splitters>=0.3.0
rich>=13.7.0
gradio>=4.44.0

Optional (for PDF ingestion):

pypdf>=4.0.0

Quick Start

1. Install dependencies

cd /home/mercuryo/Proyectos/rag-devops-docs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Start Ollama and pull models

# If not using Docker, make sure Ollama is running:
ollama serve &

# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2

3. Add your documentation

Place your .md, .txt, or .pdf files in ./docs/. Example files are already in ./docs/example/ to get you started.

4. Ingest documentation

python ingest.py --verbose

5. Ask questions

# Interactive mode
python query.py

# Single question
python query.py --question "How do I rollback a Kubernetes deployment?"

# Web UI
python app.py

Usage — `ingest.py`

usage: ingest.py [-h] [--docs-dir DIR] [--reset] [--verbose]

options:
  --docs-dir DIR   Directory to scan for docs (default: ./docs)
  --reset          Delete and recreate the ChromaDB collection first
  --verbose        Show per-file chunk count during ingestion

Examples:

# Ingest from default ./docs directory
python ingest.py

# Ingest from a custom directory with verbose output
python ingest.py --docs-dir /mnt/wiki/runbooks --verbose

# Force re-ingest all documents (clear existing vectors)
python ingest.py --reset --verbose

# Ingest a single-file directory
python ingest.py --docs-dir ./docs/example

Usage — `query.py`

usage: query.py [-h] [--question QUESTION] [--top-k N] [--model MODEL]
                [--interactive] [--json]

options:
  --question, -q   Ask a single question and exit
  --top-k N        Number of chunks to retrieve (default: 5)
  --model MODEL    Ollama model to use (default: llama3.2)
  --interactive, -i  Start interactive multi-turn mode
  --json           Output answer as JSON (useful for scripting)

Examples:

# Interactive mode (default when run with no arguments)
python query.py

# Single question
python query.py -q "What steps should I follow during a P0 incident?"

# Use a different model
python query.py --model mistral -q "Explain the blue-green deployment strategy"

# JSON output for scripting
python query.py --json -q "How do I scale a Kubernetes deployment?" | jq .answer

# Retrieve more context chunks
python query.py --top-k 8 -q "What are the database migration rules?"

Sample JSON output:

{
  "question": "How do I rollback a Kubernetes deployment?",
  "answer": "According to the Kubernetes Runbook, you can rollback...",
  "sources": [
    {
      "source": "kubernetes-runbook.md",
      "chunk_index": 12,
      "relevance": 0.923
    }
  ]
}

Web UI — `app.py`

The Gradio web interface provides a chat UI with a sources panel.

python app.py
# Open http://localhost:7860 in your browser

Features:

Streaming responses with real-time token display
Source attribution panel showing which documents were used
Configurable top-K and model selection in Settings accordion
Clear chat button
Runs entirely locally — no data leaves your machine

Adding Your Own Documentation

Place .md, .txt, or .pdf files anywhere under ./docs/
Subdirectory structure is preserved in metadata but does not affect retrieval
Re-run ingestion:

# Add new files without removing existing vectors (safe for incremental updates)
python ingest.py

# Full re-index (use when you've significantly changed or removed files)
python ingest.py --reset

Supported formats:

.md — Markdown (recommended for structured runbooks)
.txt — Plain text
.pdf — PDF (requires pip install pypdf)

Tips for best retrieval quality:

Use clear headings and section titles in your Markdown
Keep individual documents focused on a single topic
Avoid very long paragraphs — the splitter handles chunking, but semantic clarity helps
Include commands, error messages, and exact terminology your team uses

Docker

Start Ollama via Docker (ChromaDB runs embedded — no extra container needed):

docker compose up -d

# Pull models into the running container
docker exec -it rag-devops-ollama ollama pull nomic-embed-text
docker exec -it rag-devops-ollama ollama pull llama3.2

# Set Ollama host if running in Docker
export OLLAMA_HOST=http://localhost:11434

# Run ingestion and queries on your host as normal
python ingest.py --verbose
python query.py

Project Structure

rag-devops-docs/
├── ingest.py            # Document ingestion script
├── query.py             # CLI query interface
├── app.py               # Gradio web UI
├── requirements.txt     # Python dependencies
├── docker-compose.yml   # Ollama service
├── docs/
│   └── example/
│       ├── kubernetes-runbook.md
│       ├── incident-response.md
│       └── deployment-guide.md
└── chroma_db/           # Created automatically after first ingest

Troubleshooting

ollama: connection refused → Start Ollama: ollama serve or docker compose up -d

Collection 'devops_docs' not found → Run python ingest.py first

No relevant documents found → Check that ./docs/ contains files and python ingest.py completed successfully

Poor answer quality → Try --top-k 8 to retrieve more context, or rephrase your question with more specific terminology

PDF files skipped → Install pypdf: pip install pypdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG DevOps Documentation Assistant

What is RAG?

Architecture

Requirements

Quick Start

1. Install dependencies

2. Start Ollama and pull models

3. Add your documentation

4. Ingest documentation

5. Ask questions

Usage — `ingest.py`

Usage — `query.py`

Web UI — `app.py`

Adding Your Own Documentation

Docker

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/example		docs/example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
ingest.py		ingest.py
query.py		query.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG DevOps Documentation Assistant

What is RAG?

Architecture

Requirements

Quick Start

1. Install dependencies

2. Start Ollama and pull models

3. Add your documentation

4. Ingest documentation

5. Ask questions

Usage — ingest.py

Usage — query.py

Web UI — app.py

Adding Your Own Documentation

Docker

Project Structure

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Usage — `ingest.py`

Usage — `query.py`

Web UI — `app.py`

Packages