A comprehensive demo comparing Graph-based RAG (using AtomicRAG) against Traditional vector-based RAG through a LangGraph-powered agentic system. The project demonstrates how graph-structured knowledge retrieval significantly outperforms flat vector search on multi-hop reasoning tasks.
| Metric | Traditional RAG | Graph RAG | Delta |
|---|---|---|---|
| Answer Correctness | 70.0% | 80.9% | +10.9% |
| Wins | 27 | 103 | — |
| Ties | 370 | 370 | — |
Graph RAG wins 3.8x more often than Traditional RAG on multi-hop questions that require connecting facts across multiple documents. The +10.9% improvement is consistent across 500 bridge-type questions.
| Metric | Traditional RAG | Graph RAG | Notes |
|---|---|---|---|
| Answer Correctness | 54.2% | 52.8% | Tied — single-hop queries |
| Ctx Summarize ACC | 54.6% | 59.3% | +4.7% Graph RAG |
| Creative Gen ACC | 36.3% | 40.4% | +4.0% Graph RAG |
The medical dataset (mostly single-hop factual lookups) shows comparable performance, while Graph RAG edges ahead on complex summarization and creative tasks. Multi-hop datasets like HotpotQA are where graph traversal truly shines.
┌─────────────────────────────────────────────────────────┐
│ LangGraph Agent │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Query │→ │ Traversal │→ │ Retrieval │ │
│ │ Analyzer │ │ Planner │ │ (mode-based) │ │
│ └──────────┘ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌─────────────────┼──────────┐ │
│ ▼ ▼ │ │
│ ┌──────────────────┐ ┌──────────────┐ │ │
│ │ Traditional RAG │ │ Graph RAG │ │ │
│ │ (Vector Store) │ │ (AtomicRAG) │ │ │
│ └──────────────────┘ └──────────────┘ │ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌──────────────────────────────────┐ │ │
│ │ Answer Generator │ │ │
│ └──────────────────────────────────┘ │ │
└─────────────────────────────────────────────────────────┘
- Entity Anchoring — Extract entities from query, find matching graph nodes
- Iterative Expansion — Hop through Entity → Knowledge Unit → Entity connections
- Query Updating — Subtract retrieved embeddings to find diverse information
- Beam Search Pruning — Keep only top-M most relevant paths at each depth
The agent dynamically controls graph traversal parameters:
traversal_depth(1–3): How many hops to takebeam_size(5–15): Width of beam searchquery_update_weight(0–1): Diversity vs relevance tradeoff
├── src/
│ ├── config.py # Central configuration
│ ├── agent/ # LangGraph agentic system
│ │ ├── state.py # Agent state (TypedDict)
│ │ ├── nodes.py # Agent nodes (analyze, plan, retrieve, generate)
│ │ ├── graph.py # LangGraph graph definition
│ │ └── prompts.py # LLM prompts
│ ├── traditional_rag/ # Vector similarity retrieval
│ │ ├── chunker.py # Document chunking
│ │ ├── vector_store.py # Local vector store (NumPy cosine sim)
│ │ └── retriever.py # Retrieval interface
│ ├── graph_rag/ # AtomicRAG-based retrieval
│ │ ├── loader.py # Load pre-built knowledge graph
│ │ └── retriever.py # Graph traversal retrieval
│ └── evaluation/ # Scoring framework
│ ├── runner.py # Batch evaluation runner
│ └── scorer.py # ACC metric implementation
├── scripts/
│ ├── setup_vector_store.py # Build traditional RAG vector store
│ ├── run_evaluation.py # Run agent on dataset questions
│ ├── run_scoring.py # Score predictions (6 metrics)
│ ├── hotpotqa_eval_100.py # HotpotQA 100-question benchmark
│ ├── hotpotqa_demo.py # Quick 10-question demo
│ ├── manual_comparison.py # Side-by-side comparison tool
│ └── test_quick.py # Smoke test both pipelines
├── results/
│ ├── medical/ # Medical dataset evaluation report
│ └── hotpotqa/ # HotpotQA multi-hop results
├── .env.example # Environment template
├── pyproject.toml # Python dependencies
└── docker-compose.yml # Optional PGVector setup
- Python 3.11+
- Google Gemini API key (get one here)
atomicraglibrary (pip install atomicrag)
git clone https://github.com/Rahuljangs/agentic-graph-rag.git
cd agentic-graph-rag
# Install dependencies
pip install -e .
# Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY# 1. Build traditional vector store (Medical dataset)
python -m scripts.setup_vector_store
# 2. The Graph RAG uses pre-built AtomicRAG knowledge graphs
# (or build one for HotpotQA using scripts/hotpotqa_eval_100.py)python -m scripts.hotpotqa_demoRuns 10 multi-hop questions through both RAG systems and prints a comparison table.
python -m scripts.hotpotqa_eval_100Builds the knowledge graph, runs 100 HotpotQA bridge questions, and produces scored results.
# Generate predictions
python -m scripts.run_evaluation
# Score with 6 metrics
python -m scripts.run_scoringpython -m scripts.test_quick| Metric | Description | LLM-based? |
|---|---|---|
| Answer Correctness (ACC) | 0.75×Factuality_F1 + 0.25×SemanticSim | Yes |
| Faithfulness | Are answer claims supported by context? | Yes |
| Context Relevance | Is retrieved context relevant to question? | Yes |
| Context Recall | Does context cover gold answer facts? | Yes |
| Answer Relevance | Does answer address the question? | Yes |
| ROUGE-L | Lexical overlap (F1 of LCS) | No |
- 7,405 multi-hop questions requiring reasoning across 2+ Wikipedia paragraphs
- Each question has 2 supporting + 8 distractor paragraphs
- Source: HotpotQA via HuggingFace
- NCCN clinical guidelines corpus
- 4 question types: Fact Retrieval, Complex Reasoning, Contextual Summarize, Creative Generation
- Source: GraphRAG-Bench
Traditional RAG retrieves chunks that are individually similar to the query. For multi-hop questions like:
"What government position was held by the woman who portrayed Corliss Archer in Kiss and Tell?"
Traditional RAG finds paragraphs about "Kiss and Tell" OR "government positions" but cannot connect them. Graph RAG:
- Anchors on entities: "Corliss Archer", "Kiss and Tell"
- Traverses: Kiss and Tell → Shirley Temple (actress) → Chief of Protocol (position)
- Returns the complete evidence chain
Result: Traditional RAG scored 12% on this question. Graph RAG scored 90%.
- LLM: Google Gemini 2.5 Flash
- Embeddings: Gemini Embedding 001
- Agent Framework: LangGraph
- Graph RAG: AtomicRAG (Q-Iter algorithm)
- Vector Store: Local NumPy-based (cosine similarity)
- Evaluation: LLM-as-judge (Gemini) + embedding similarity
- AtomicRAG: Atom–Entity Graphs for Retrieval-Augmented Generation
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop QA
- GraphRAG-Bench
MIT