Skip to content

Rahuljangs/agentic-graph-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic Graph RAG vs Traditional RAG

A comprehensive demo comparing Graph-based RAG (using AtomicRAG) against Traditional vector-based RAG through a LangGraph-powered agentic system. The project demonstrates how graph-structured knowledge retrieval significantly outperforms flat vector search on multi-hop reasoning tasks.

Key Results

HotpotQA Multi-Hop Reasoning (500 questions)

Metric Traditional RAG Graph RAG Delta
Answer Correctness 70.0% 80.9% +10.9%
Wins 27 103
Ties 370 370

Graph RAG wins 3.8x more often than Traditional RAG on multi-hop questions that require connecting facts across multiple documents. The +10.9% improvement is consistent across 500 bridge-type questions.

Medical Dataset (200 questions, 6 metrics)

Metric Traditional RAG Graph RAG Notes
Answer Correctness 54.2% 52.8% Tied — single-hop queries
Ctx Summarize ACC 54.6% 59.3% +4.7% Graph RAG
Creative Gen ACC 36.3% 40.4% +4.0% Graph RAG

The medical dataset (mostly single-hop factual lookups) shows comparable performance, while Graph RAG edges ahead on complex summarization and creative tasks. Multi-hop datasets like HotpotQA are where graph traversal truly shines.

Architecture

┌─────────────────────────────────────────────────────────┐
│                   LangGraph Agent                         │
│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │  Query   │→ │  Traversal   │→ │    Retrieval     │  │
│  │ Analyzer │  │   Planner    │  │  (mode-based)    │  │
│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
│                                            │             │
│                          ┌─────────────────┼──────────┐ │
│                          ▼                 ▼          │ │
│               ┌──────────────────┐ ┌──────────────┐  │ │
│               │  Traditional RAG │ │  Graph RAG   │  │ │
│               │  (Vector Store)  │ │  (AtomicRAG) │  │ │
│               └──────────────────┘ └──────────────┘  │ │
│                          │                 │          │ │
│                          ▼                 ▼          │ │
│               ┌──────────────────────────────────┐   │ │
│               │       Answer Generator           │   │ │
│               └──────────────────────────────────┘   │ │
└─────────────────────────────────────────────────────────┘

How Graph RAG Works (AtomicRAG Q-Iter Algorithm)

  1. Entity Anchoring — Extract entities from query, find matching graph nodes
  2. Iterative Expansion — Hop through Entity → Knowledge Unit → Entity connections
  3. Query Updating — Subtract retrieved embeddings to find diverse information
  4. Beam Search Pruning — Keep only top-M most relevant paths at each depth

The agent dynamically controls graph traversal parameters:

  • traversal_depth (1–3): How many hops to take
  • beam_size (5–15): Width of beam search
  • query_update_weight (0–1): Diversity vs relevance tradeoff

Project Structure

├── src/
│   ├── config.py                  # Central configuration
│   ├── agent/                     # LangGraph agentic system
│   │   ├── state.py              # Agent state (TypedDict)
│   │   ├── nodes.py             # Agent nodes (analyze, plan, retrieve, generate)
│   │   ├── graph.py             # LangGraph graph definition
│   │   └── prompts.py           # LLM prompts
│   ├── traditional_rag/          # Vector similarity retrieval
│   │   ├── chunker.py           # Document chunking
│   │   ├── vector_store.py      # Local vector store (NumPy cosine sim)
│   │   └── retriever.py         # Retrieval interface
│   ├── graph_rag/                # AtomicRAG-based retrieval
│   │   ├── loader.py            # Load pre-built knowledge graph
│   │   └── retriever.py         # Graph traversal retrieval
│   └── evaluation/               # Scoring framework
│       ├── runner.py             # Batch evaluation runner
│       └── scorer.py            # ACC metric implementation
├── scripts/
│   ├── setup_vector_store.py     # Build traditional RAG vector store
│   ├── run_evaluation.py         # Run agent on dataset questions
│   ├── run_scoring.py            # Score predictions (6 metrics)
│   ├── hotpotqa_eval_100.py      # HotpotQA 100-question benchmark
│   ├── hotpotqa_demo.py          # Quick 10-question demo
│   ├── manual_comparison.py      # Side-by-side comparison tool
│   └── test_quick.py            # Smoke test both pipelines
├── results/
│   ├── medical/                  # Medical dataset evaluation report
│   └── hotpotqa/                 # HotpotQA multi-hop results
├── .env.example                  # Environment template
├── pyproject.toml                # Python dependencies
└── docker-compose.yml            # Optional PGVector setup

Setup

Prerequisites

  • Python 3.11+
  • Google Gemini API key (get one here)
  • atomicrag library (pip install atomicrag)

Installation

git clone https://github.com/Rahuljangs/agentic-graph-rag.git
cd agentic-graph-rag

# Install dependencies
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY

Build the RAG Systems

# 1. Build traditional vector store (Medical dataset)
python -m scripts.setup_vector_store

# 2. The Graph RAG uses pre-built AtomicRAG knowledge graphs
#    (or build one for HotpotQA using scripts/hotpotqa_eval_100.py)

Usage

Quick Demo (10 questions)

python -m scripts.hotpotqa_demo

Runs 10 multi-hop questions through both RAG systems and prints a comparison table.

Full Evaluation (100 questions)

python -m scripts.hotpotqa_eval_100

Builds the knowledge graph, runs 100 HotpotQA bridge questions, and produces scored results.

Medical Dataset Evaluation

# Generate predictions
python -m scripts.run_evaluation

# Score with 6 metrics
python -m scripts.run_scoring

Test Both Pipelines

python -m scripts.test_quick

Evaluation Metrics

Metric Description LLM-based?
Answer Correctness (ACC) 0.75×Factuality_F1 + 0.25×SemanticSim Yes
Faithfulness Are answer claims supported by context? Yes
Context Relevance Is retrieved context relevant to question? Yes
Context Recall Does context cover gold answer facts? Yes
Answer Relevance Does answer address the question? Yes
ROUGE-L Lexical overlap (F1 of LCS) No

Datasets

HotpotQA (Primary — Multi-Hop)

  • 7,405 multi-hop questions requiring reasoning across 2+ Wikipedia paragraphs
  • Each question has 2 supporting + 8 distractor paragraphs
  • Source: HotpotQA via HuggingFace

Medical (GraphRAG-Bench)

  • NCCN clinical guidelines corpus
  • 4 question types: Fact Retrieval, Complex Reasoning, Contextual Summarize, Creative Generation
  • Source: GraphRAG-Bench

Why Graph RAG Wins on Multi-Hop Questions

Traditional RAG retrieves chunks that are individually similar to the query. For multi-hop questions like:

"What government position was held by the woman who portrayed Corliss Archer in Kiss and Tell?"

Traditional RAG finds paragraphs about "Kiss and Tell" OR "government positions" but cannot connect them. Graph RAG:

  1. Anchors on entities: "Corliss Archer", "Kiss and Tell"
  2. Traverses: Kiss and Tell → Shirley Temple (actress) → Chief of Protocol (position)
  3. Returns the complete evidence chain

Result: Traditional RAG scored 12% on this question. Graph RAG scored 90%.

Tech Stack

  • LLM: Google Gemini 2.5 Flash
  • Embeddings: Gemini Embedding 001
  • Agent Framework: LangGraph
  • Graph RAG: AtomicRAG (Q-Iter algorithm)
  • Vector Store: Local NumPy-based (cosine similarity)
  • Evaluation: LLM-as-judge (Gemini) + embedding similarity

References

License

MIT

About

A LangGraph-based agentic framework demonstrating the accuracy and reasoning advantages of GraphRAG over traditional vector RAG on medical datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages