Skip to content

Merkuryo/rag-devops-docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG DevOps Documentation Assistant

A local, privacy-first question-answering system for your internal DevOps documentation. Index your runbooks, guides, and wikis — then ask questions in plain English and get answers grounded in your actual docs, with source citations. Everything runs on your machine using Ollama for LLM inference and ChromaDB as the vector store.


What is RAG?

Retrieval-Augmented Generation (RAG) combines a vector search engine with a large language model. When you ask a question, the system first retrieves the most relevant documentation chunks from ChromaDB (using semantic similarity), then feeds those chunks as context to the LLM, which generates a grounded answer. This prevents hallucination and ensures every answer can be traced back to a source document.


Architecture

┌─────────────────────────────────────────────────────────────┐
│                    INGESTION PIPELINE                        │
│                                                             │
│  ./docs/**/*.md  ──►  RecursiveCharacterTextSplitter        │
│  ./docs/**/*.txt │    (chunk_size=500, overlap=50)          │
│  ./docs/**/*.pdf │                                          │
│                  └──►  Ollama (nomic-embed-text)            │
│                         Embeddings                          │
│                              │                              │
│                              ▼                              │
│                     ChromaDB (./chroma_db)                  │
│                     Collection: devops_docs                 │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                     QUERY PIPELINE                          │
│                                                             │
│  User Question ──► Ollama (nomic-embed-text)                │
│                    Embed question                           │
│                         │                                   │
│                         ▼                                   │
│                  ChromaDB: top-K search                     │
│                  (cosine similarity)                        │
│                         │                                   │
│                         ▼                                   │
│           Retrieved chunks + metadata                       │
│                         │                                   │
│                         ▼                                   │
│         Ollama LLM (llama3.2) ──► Streaming Answer          │
│         with source citations                               │
└─────────────────────────────────────────────────────────────┘

Requirements

  • Python 3.10+
  • Ollama installed and running locally (or via Docker)
  • At least 8 GB RAM (16 GB recommended for llama3.2)

Python dependencies (see requirements.txt):

ollama>=0.3.0
chromadb>=0.5.0
langchain-text-splitters>=0.3.0
rich>=13.7.0
gradio>=4.44.0

Optional (for PDF ingestion):

pypdf>=4.0.0

Quick Start

1. Install dependencies

cd /home/mercuryo/Proyectos/rag-devops-docs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Start Ollama and pull models

# If not using Docker, make sure Ollama is running:
ollama serve &

# Pull required models
ollama pull nomic-embed-text
ollama pull llama3.2

3. Add your documentation

Place your .md, .txt, or .pdf files in ./docs/. Example files are already in ./docs/example/ to get you started.

4. Ingest documentation

python ingest.py --verbose

5. Ask questions

# Interactive mode
python query.py

# Single question
python query.py --question "How do I rollback a Kubernetes deployment?"

# Web UI
python app.py

Usage — ingest.py

usage: ingest.py [-h] [--docs-dir DIR] [--reset] [--verbose]

options:
  --docs-dir DIR   Directory to scan for docs (default: ./docs)
  --reset          Delete and recreate the ChromaDB collection first
  --verbose        Show per-file chunk count during ingestion

Examples:

# Ingest from default ./docs directory
python ingest.py

# Ingest from a custom directory with verbose output
python ingest.py --docs-dir /mnt/wiki/runbooks --verbose

# Force re-ingest all documents (clear existing vectors)
python ingest.py --reset --verbose

# Ingest a single-file directory
python ingest.py --docs-dir ./docs/example

Usage — query.py

usage: query.py [-h] [--question QUESTION] [--top-k N] [--model MODEL]
                [--interactive] [--json]

options:
  --question, -q   Ask a single question and exit
  --top-k N        Number of chunks to retrieve (default: 5)
  --model MODEL    Ollama model to use (default: llama3.2)
  --interactive, -i  Start interactive multi-turn mode
  --json           Output answer as JSON (useful for scripting)

Examples:

# Interactive mode (default when run with no arguments)
python query.py

# Single question
python query.py -q "What steps should I follow during a P0 incident?"

# Use a different model
python query.py --model mistral -q "Explain the blue-green deployment strategy"

# JSON output for scripting
python query.py --json -q "How do I scale a Kubernetes deployment?" | jq .answer

# Retrieve more context chunks
python query.py --top-k 8 -q "What are the database migration rules?"

Sample JSON output:

{
  "question": "How do I rollback a Kubernetes deployment?",
  "answer": "According to the Kubernetes Runbook, you can rollback...",
  "sources": [
    {
      "source": "kubernetes-runbook.md",
      "chunk_index": 12,
      "relevance": 0.923
    }
  ]
}

Web UI — app.py

The Gradio web interface provides a chat UI with a sources panel.

python app.py
# Open http://localhost:7860 in your browser

Features:

  • Streaming responses with real-time token display
  • Source attribution panel showing which documents were used
  • Configurable top-K and model selection in Settings accordion
  • Clear chat button
  • Runs entirely locally — no data leaves your machine

Adding Your Own Documentation

  1. Place .md, .txt, or .pdf files anywhere under ./docs/
  2. Subdirectory structure is preserved in metadata but does not affect retrieval
  3. Re-run ingestion:
# Add new files without removing existing vectors (safe for incremental updates)
python ingest.py

# Full re-index (use when you've significantly changed or removed files)
python ingest.py --reset

Supported formats:

  • .md — Markdown (recommended for structured runbooks)
  • .txt — Plain text
  • .pdf — PDF (requires pip install pypdf)

Tips for best retrieval quality:

  • Use clear headings and section titles in your Markdown
  • Keep individual documents focused on a single topic
  • Avoid very long paragraphs — the splitter handles chunking, but semantic clarity helps
  • Include commands, error messages, and exact terminology your team uses

Docker

Start Ollama via Docker (ChromaDB runs embedded — no extra container needed):

docker compose up -d

# Pull models into the running container
docker exec -it rag-devops-ollama ollama pull nomic-embed-text
docker exec -it rag-devops-ollama ollama pull llama3.2

# Set Ollama host if running in Docker
export OLLAMA_HOST=http://localhost:11434

# Run ingestion and queries on your host as normal
python ingest.py --verbose
python query.py

Project Structure

rag-devops-docs/
├── ingest.py            # Document ingestion script
├── query.py             # CLI query interface
├── app.py               # Gradio web UI
├── requirements.txt     # Python dependencies
├── docker-compose.yml   # Ollama service
├── docs/
│   └── example/
│       ├── kubernetes-runbook.md
│       ├── incident-response.md
│       └── deployment-guide.md
└── chroma_db/           # Created automatically after first ingest

Troubleshooting

ollama: connection refused → Start Ollama: ollama serve or docker compose up -d

Collection 'devops_docs' not found → Run python ingest.py first

No relevant documents found → Check that ./docs/ contains files and python ingest.py completed successfully

Poor answer quality → Try --top-k 8 to retrieve more context, or rephrase your question with more specific terminology

PDF files skipped → Install pypdf: pip install pypdf

About

RAG pipeline that ingests your internal runbooks and Confluence docs into a vector store and answers ops questions using a local LLM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages