Graphiti — temporal knowledge graph for AI agents. Local stack: Ollama (GPU) + FalkorDB + MCP.
- Docker + NVIDIA Container Toolkit
- GPU with enough VRAM (default
qwen2.5:7b~5 GB on RTX 3080)
Check GPU in Docker:
docker run --rm --gpus all ollama/ollama nvidia-smicp .env.example .env
docker compose up -dFirst start pulls models via ollama-pull (several GB, one-time). Watch progress:
docker compose logs -f ollama-pull
docker compose logs -f graphitiRe-pull models manually:
chmod +x scripts/pull-models.sh
./scripts/pull-models.sh| Service | URL |
|---|---|
| MCP HTTP | http://localhost:8010/mcp/ |
| Health | http://localhost:8010/health |
| FalkorDB UI | http://localhost:3001 |
| Ollama API | http://localhost:11434 |
CCT ports untouched: app 8000, redis 6379.
.cursor/mcp.json:
"graphiti": {
"url": "http://localhost:8010/mcp/"
}Restart Cursor after docker compose up.
Index docs/**/*.md into Graphiti graph (entities + facts in FalkorDB, not in repo files):
chmod +x scripts/ingest-docs.sh
# preview
./scripts/ingest-docs.sh --dry-run
# first 3 files (smoke test)
./scripts/ingest-docs.sh --limit 3
# full sync (~94 files, slow on local LLM)
./scripts/ingest-docs.shIncremental: .ingest-state.json stores file hashes — re-run skips unchanged docs. Force re-ingest: --force.
MCP search: needs graphiti-core ≥0.29.2 in container (RediSearch fix for cct-backend group_id). Rebuild: docker compose build graphiti && docker compose up -d graphiti.
MCP empty results: FalkorDB graph name must match GRAPHITI_GROUP_ID (default cct-backend). Ingest writes to graph group_id; MCP reads FALKORDB_DATABASE. Both must be cct-backend, not default_db.
Search ingested knowledge via Cursor Graphiti MCP: search_memory_facts, search_nodes.
Default for 10 GB VRAM:
| Role | Model |
|---|---|
| LLM | qwen2.5:7b |
| Embeddings | nomic-embed-text |
Change in .env:
OLLAMA_LLM_MODEL=llama3.1:8b
MODEL_NAME=llama3.1:8bThen docker compose up ollama-pull --force-recreate and restart graphiti.
In .env set real OPENAI_API_KEY, OPENAI_API_URL=https://api.openai.com/v1, cloud model names. Remove or scale down ollama service if unused.
GPU not used — docker compose exec ollama nvidia-smi inside container.
Ingestion JSON errors — local models weak on structured output; try bigger model or lower SEMAPHORE_LIMIT=1.
Slow first query — Ollama loads model into VRAM on first request.
docker compose down # keep graph + models
docker compose down -v # wipe FalkorDB + Ollama cache