A Neo4j knowledge graph connecting People → Products → Customers → Workflows → Decisions, built from plain-text documents via Claude, and exposed via a REST API, a React UI, and an MCP server.
Themed around Meridian Property Group — a PropTech SaaS company for residential and commercial property management — to demonstrate AI architecture patterns relevant to the property technology domain: lease management, maintenance workflows, fair housing compliance, and owner analytics.
Every question is classified before an LLM is called: list/count queries are answered straight from Neo4j (no LLM), simple lookups go to Haiku, and multi-hop reasoning goes to Sonnet — with a daily-budget cap that forces Haiku.
flowchart TD
Q["User question<br/>POST /query"] --> SAFE{"Prompt-injection<br/>guard"}
SAFE -- blocked --> R400["HTTP 400 — rejected"]
SAFE -- ok --> DIRECT{"List / count query?<br/>(_try_direct_answer)"}
DIRECT -- yes --> D["DIRECT — no LLM<br/>answered from Neo4j<br/>~10 ms · $0"]
DIRECT -- no --> ROUTE{"route_query"}
ROUTE -- "daily budget exceeded" --> H
ROUTE -- "complexity keyword<br/>or > 12 words" --> S["SONNET<br/>multi-hop reasoning"]
ROUTE -- "otherwise" --> H["HAIKU<br/>simple lookup · fast & cheap"]
S --> CTX["Full graph in<br/>cached system prompt"]
H --> CTX
CTX --> ANS["Stream answer (SSE)"]
D --> ANS["Stream answer (SSE)"]
classDef direct fill:#d1fae5,stroke:#059669,color:#064e3b
classDef haiku fill:#e0f2fe,stroke:#0284c7,color:#075985
classDef sonnet fill:#ede9fe,stroke:#7c3aed,color:#4c1d95
class D direct
class H haiku
class S sonnet
backend/ Python source (api.py, graph.py, seed.py, eval.py …) + prompts/
docs/ Project documentation (USER_GUIDE, DEVELOPER_GUIDE, DESIGN_DECISIONS …)
tests/ pytest test suite (unit + integration)
scripts/ CI utility scripts (validate_prompts.py, check_eval_regression.py)
UI/ React + Vite frontend
.github/ GitHub Actions CI/CD workflows
Detailed docs: docs/USER_GUIDE.md · docs/DEVELOPER_GUIDE.md · docs/DESIGN_DECISIONS.md
The graph data originates from a plain-text company brief — the kind of document that already exists in any company's wiki or shared drive. A single script converts it into a queryable knowledge graph using Claude as the extraction engine.
nexus_corp_brief.md (source: internal company document)
│
│ python doc_to_graph.py
▼
Claude Opus (tool use) ← structured extraction, no hallucination guard needed
│ because tool_choice forces one exact tool call
│ save_knowledge_graph({ entities: [...], relationships: [...] })
▼
doc_to_graph.py ← validates schema, builds MERGE Cypher statements
│
▼
Neo4j (graph.py layer) ← same driver used by REST API + MCP server
│
▼
30 nodes · 71 relationships ← queryable via UI, API, or Claude Desktop
1. Source document — nexus_corp_brief.md is written in natural prose: team bios, product descriptions, customer profiles, workflows, and strategic decisions. No schema required from the author.
2. Claude extracts structure — doc_to_graph.py sends the full document to Claude Opus with a single tool definition (save_knowledge_graph) whose input schema mirrors the Neo4j data model. tool_choice: {type: "tool"} forces exactly one structured call — no parsing of free-form text.
3. Entities and relationships loaded — The tool's output (a list of typed entities + directed relationships) is written to Neo4j using MERGE statements via graph.py, the same layer used by the REST API and MCP server.
# Preview what Claude extracts (no Neo4j writes)
python doc_to_graph.py --dry-run
# Load extracted graph (clears existing data first)
python doc_to_graph.py --clear
# Use your own document
python doc_to_graph.py --file my_company.md --clearMost knowledge graph demos hand-craft seed data. This pipeline shows the realistic path: unstructured document → LLM extraction → graph database → natural language query. The same approach works for org charts, engineering RFCs, sales notes, or any prose-heavy internal document.
flowchart TB
UI["React + Vite UI<br/>Graph · Query · Observe"]:::client
MCPC["Claude Desktop / Code<br/>MCP client"]:::client
subgraph backend["Backend (Python)"]
API["FastAPI · api.py<br/>REST + SSE endpoints<br/>routing · safety · metrics · agent loops"]:::svc
MCP["MCP server · mcp_server.py<br/>FastMCP — 11 tools"]:::svc
UTIL["utils.py<br/>injection guard · PII scan · prompt load"]:::svc
end
subgraph datalayer["Data layer — one source of truth"]
GRAPH["graph.py<br/>Cypher · hybrid search (BM25 + RRF)"]:::data
EMB["embeddings.py<br/>sentence-transformers"]:::data
NEO[("Neo4j<br/>Docker")]:::db
end
ANTH["Anthropic API<br/>Claude Haiku / Sonnet"]:::ext
LS["LangSmith<br/>tracing"]:::ext
subgraph ingest["Ingestion"]
DOC["doc_to_graph.py<br/>LLM extraction"]:::ingest
end
subgraph opsg["Ops"]
EVAL["eval.py<br/>offline eval harness"]:::ops
PROMPTS["prompts/*.yaml<br/>versioned prompts"]:::ops
end
UI -->|/api proxy| API
MCPC -->|stdio| MCP
API --> UTIL
API --> GRAPH
MCP --> GRAPH
GRAPH --> NEO
GRAPH --> EMB
API -->|LLM calls| ANTH
API -.->|traces| LS
DOC -->|extract| ANTH
DOC --> GRAPH
EVAL -->|HTTP| API
PROMPTS -.->|config| API
classDef client fill:#f1f5f9,stroke:#475569,color:#0f172a
classDef svc fill:#eef2ff,stroke:#6366f1,color:#312e81
classDef data fill:#ccfbf1,stroke:#0f766e,color:#134e4a
classDef db fill:#cffafe,stroke:#0e7490,color:#164e63
classDef ext fill:#fef3c7,stroke:#d97706,color:#78350f
classDef ingest fill:#dcfce7,stroke:#16a34a,color:#14532d
classDef ops fill:#f3e8ff,stroke:#9333ea,color:#581c87
Static image / ASCII versions
┌─────────────────────────────────────────────────────────────────┐
│ Browser / Claude │
└────────────┬───────────────────────────┬────────────────────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌───────────────────────────────────┐
│ React + Vite UI │ │ Claude Desktop / Code │
│ (TypeScript, :5173) │ │ MCP Client │
│ Graph · Query · Observe│ └───────────────┬───────────────────┘
│ ┌──────────────────┐ │ │ MCP protocol (stdio)
│ │ Force-graph canvas│ │ ┌───────────────▼───────────────────┐
│ │ Query chat (SSE) │ │ │ mcp_server.py (FastMCP) │
│ │ Std│Agent│Multi │ │ │ 11 tools ──► graph.py │
│ │ Observe metrics │ │ └───────────────┬───────────────────┘
└──────────┬───────────┘ │
│ /api proxy │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ FastAPI (api.py, :8000) │
│ │
│ Graph/CRUD : GET /graph · /nodes · /impact · /path · /metrics │
│ Search : GET /search ──► hybrid: BM25 + semantic (RRF) │
│ LLM modes (SSE): │
│ POST /query routed + full-graph cached prompt │
│ POST /query/agent single-agent tool-calling loop │
│ POST /query/orchestrate planner → ‖workers‖ → synthesizer │
│ │ │
│ route_query (direct/Haiku/Sonnet) · injection + PII guards │
│ ▼ │
│ Anthropic SDK (Haiku/Sonnet) · prompt caching · LangSmith │
│ local embeddings (sentence-transformers, hybrid search arm) │
└──────────────────────────┬──────────────────────────────────────┘
│ Python neo4j driver (bolt://7687)
▼
┌─────────────────────────────────────────────────────────────────┐
│ Neo4j 5.x (Docker) │
│ │
│ (:Person) ──[:WORKS_ON]──► (:Product) │
│ (:Customer)──[:USES]──────► (:Product) │
│ (:Decision)──[:AFFECTS]───► (:Workflow | :Product | :Customer)│
│ (:Person) ──[:OWNS]──────► (:Workflow) │
│ │
│ 30 nodes · 71 relationships │
└─────────────────────────────────────────────────────────────────┘
| Concern | Decision |
|---|---|
| Graph storage | Neo4j — relationships are first-class, Cypher queries are expressive |
| API layer | FastAPI — async, auto-docs, thin wrapper around graph.py |
| LLM integration | Full graph serialized into a cached system prompt — no RAG chunking needed at this scale |
| Streaming | SSE (StreamingResponse) so the UI renders tokens as they arrive |
| Search | Hybrid — BM25 (lexical) + local sentence-transformers embeddings (semantic), fused with RRF |
| Query modes | Routed single-turn (/query), single-agent tool loop (/query/agent), multi-agent orchestration (/query/orchestrate) |
| Multi-agent | Planner → parallel workers → synthesizer over the native SDK; per-role model routing (Sonnet planner/synth, Haiku workers) |
| Frontend proxy | Vite /api → localhost:8000 — no CORS config required in dev |
| MCP | Same graph.py functions reused — one source of truth for all clients |
How a natural language question travels through the system when a user types in the Query tab. The Query tab offers three modes — the standard routed flow is detailed below; the multi-agent flow follows in its own diagram.
| Mode | Endpoint | Shape |
|---|---|---|
| Standard | POST /query |
route → full-graph cached prompt → stream (or direct no-LLM for list/count) |
| Single-agent | POST /query/agent |
one agent iteratively calls graph tools until done |
| Multi-agent | POST /query/orchestrate |
planner → parallel workers → synthesizer |
1. User input → Frontend (UI/src/components/QueryPage.tsx)
User hits Enter. The component opens a streaming fetch:
fetch('/api/query', { method: 'POST', body: JSON.stringify({ question }) })
// response body held open as a ReadableStream2. Vite proxy (vite.config.ts)
The /api prefix is rewritten transparently — no CORS headers needed:
/api/query → http://localhost:8000/query
3. FastAPI receives POST /query (api.py)
Three things happen before the LLM is called:
| Step | Code | What it does |
|---|---|---|
| Full graph load | g.get_full_graph() |
Two Cypher queries — all nodes + all relationships |
| Serialization | _format_graph(graph) |
Converts every node and edge to a readable text string |
| Keyword search | g.search_nodes(question) |
CONTAINS match on name, description, role fields |
4. Anthropic SDK call with prompt caching
async with _anthropic.messages.stream(
model="claude-sonnet-4-6",
system=[
{ "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"} }, # cached
{ "text": graph_context, "cache_control": {"type": "ephemeral"} }, # cached
],
messages=[{ "role": "user", "content": f"{search_hits}\n\nQuestion: {question}" }]
)The two system blocks are marked ephemeral — Anthropic caches them for 5 minutes. The first query in a session pays full token cost; every subsequent query hits the cache and costs ~10× less for input tokens. The user message is never cached (unique per query).
5. SSE stream back to browser (api.py)
Tokens are forwarded to the client the moment they arrive:
async for text in stream.text_stream:
yield f"data: {json.dumps({'type': 'text', 'content': text})}\n\n"
yield f"data: {json.dumps({'type': 'done'})}\n\n"6. Frontend renders tokens as they arrive (QueryPage.tsx)
const event = JSON.parse(line.slice(6)) // strip "data: "
if (event.type === 'text') {
// append token → React re-renders → text appears word by word
}
if (event.type === 'done') {
// set streaming: false → blinking cursor disappears
}QueryPage.tsx
│ POST /api/query { question }
▼
Vite proxy → rewrites to http://localhost:8000/query
▼
FastAPI POST /query
├─► Neo4j: MATCH (n) RETURN n
│ MATCH (a)-[r]->(b) RETURN type, from_id, to_id (~5–20 ms)
│ MATCH (n) WHERE name CONTAINS kw (keyword search)
│
├─► _format_graph() → plain-text representation of all nodes + edges
│
└─► AsyncAnthropic.messages.stream()
system[0]: analyst instructions ← CACHED (5 min TTL)
system[1]: full graph text ← CACHED (5 min TTL)
user: search hits + question (not cached)
│
│ token stream
▼
StreamingResponse media_type="text/event-stream"
│
│ data: {"type":"text","content":"Who…"}\n\n
│ data: {"type":"text","content":" works…"}\n\n
│ data: {"type":"done"}\n\n
▼
res.body.getReader() in browser
│
▼
setMessages(prev → append token) → re-render per token
When the Query tab is in Multi-agent mode, the question fans out across sub-agents instead of a single LLM call. The standard flow above is unchanged; this path is purely additive.
QueryPage.tsx (Multi-agent toggle)
│ POST /api/query/orchestrate { question }
▼
FastAPI POST /query/orchestrate (injection guard runs first)
│
├─► [1] Planner · Sonnet
│ forced submit_plan tool ──► N independent sub-questions
│ SSE: { "type":"plan", "subtasks":[ s1, s2, s3 ] }
│
├─► [2] Workers · Haiku × N ‖ run IN PARALLEL (asyncio.as_completed) ‖
│ each = bounded tool-calling loop over AGENT_TOOLS → graph.py → Neo4j
│ SSE per worker: subagent_start → … → subagent_result (finish out of order)
│
└─► [3] Synthesizer · Sonnet (streamed)
merges findings (grounded only in worker answers) → PII scan
SSE: synthesis → text … → done
│
▼
done: { subtasks, tool_calls, latency_ms,
models:{ planner, worker, synthesizer } }
│
▼
UI renders: live fan-out panel (plan + per-worker status/tool count)
+ per-role model badges (planner Sonnet · N× worker Haiku · synth Sonnet)
All three roles reuse the same graph.py tools, prompt-injection filter, PII output scan, and LangSmith tracing as the single-agent path — per-role model routing (capable Sonnet for planning/synthesis, cheap Haiku for parallel workers) is the cost lever.
| Phase | Typical time |
|---|---|
| Neo4j queries | 5 – 20 ms |
| Graph serialization | < 1 ms |
| Keyword search | 5 – 15 ms |
| Time to first token (Anthropic) | 300 – 800 ms |
| Streaming throughput | ~50 – 80 tokens / sec |
Beyond the standard single-turn query, /query/agent runs a full tool-calling agent loop. Claude iteratively decides which graph tools to invoke, executes them, and reasons across results before producing a final answer — demonstrating the responder/thinker pattern described in modern agentic AI design.
User question
│
▼
Claude (tool_use turn)
├─► search_graph("compliance workflow") → entity list
├─► get_entity("w4") → Fair Housing Audit details + connections
├─► trace_decision_impact("d4") → GDPR/CCPA blast radius
└─► [end_turn] synthesized answer
Five tools are available to the agent: search_graph, get_entity, find_path, trace_decision_impact, run_cypher. The SSE stream emits typed events so the UI can render each step as it happens:
| Event type | What it carries |
|---|---|
thinking |
Claude's intermediate reasoning text |
tool_call |
Tool name + input chosen by Claude |
tool_result |
Truncated result returned to Claude |
text |
Final answer tokens |
done |
Tool call count + total latency |
For broad, comparative questions, /query/orchestrate runs a multi-agent pattern — an orchestrator that decomposes the question, parallel workers that research each part, and a synthesizer that merges the findings. It's purely additive; /query and /query/agent are unchanged.
No agent framework. No LangGraph, CrewAI, AutoGen, or LangChain agents — the orchestration is plain Python on the Anthropic SDK (
messages.create/messages.stream), with stdlibasyncio.as_completedfor the parallel worker fan-out. Same rationale as the single-agent loop: full control and traceability over every turn, custom SSE events, and per-role model routing — none of which a framework abstraction makes easier here.
User question
│
▼
[Planner · Sonnet] ──► decomposes into N independent sub-questions (forced submit_plan tool)
│
├─► [Worker s1 · Haiku] ─┐ each worker is a bounded tool-calling loop
├─► [Worker s2 · Haiku] ─┤ over the same AGENT_TOOLS, run in PARALLEL
└─► [Worker s3 · Haiku] ─┘ (asyncio.as_completed)
│
▼
[Synthesizer · Sonnet] ──► one grounded answer, streamed
Per-role model routing is the cost lever: the planner and synthesizer use the capable model (Sonnet) where reasoning matters; the parallel workers use the cheap model (Haiku). All roles reuse the same graph.py tools, prompt-injection filter, and PII output scan. SSE events extend the agent set:
| Event type | What it carries |
|---|---|
plan |
The decomposed sub-questions + planner model |
subagent_start |
A worker began (id, question, model) |
subagent_result |
A worker finished (id, answer preview, tool calls) |
synthesis |
The synthesizer started (model) |
text / done |
Final answer tokens / sub-agent + tool counts, latency, per-role models |
Models are env-overridable (ORCH_PLANNER_MODEL, ORCH_WORKER_MODEL, ORCH_SYNTH_MODEL, ORCH_MAX_SUBTASKS). In the UI, the Query tab → Multi-agent toggle shows the fan-out live: the plan, each sub-agent's status and tool count, then the synthesis.
Every query is classified before an LLM is called:
| Route | Trigger | Cost |
|---|---|---|
| Direct (no LLM) | List / count queries (how many workflows, list all products) |
~0 ms, $0 |
| Claude Haiku | Simple single-entity lookups (≤12 words, no multi-hop indicators) | Fast, cheap |
| Claude Sonnet | Complex reasoning: impact, trace, depend, compliance, path, etc. |
Full quality |
route_query("who owns the lease renewal workflow")
# → "claude-haiku-4-5-20251001"
route_query("trace the impact of the GDPR compliance decision on workflows")
# → "claude-sonnet-4-6"All search operations — /search, the standard /query endpoint, and the agent's search_graph tool — use a true hybrid retriever: a BM25 lexical arm and a semantic embedding arm, fused with Reciprocal Rank Fusion (RRF).
- Lexical arm (BM25): weighs terms by inverse document frequency and normalizes for document length, so "compliance audit" ranks
Fair Housing Auditabove nodes containing only one of the two words. Pure Python. - Semantic arm: embeds the query and every node with a local
sentence-transformersmodel (all-MiniLM-L6-v2, 384-dim) and ranks by cosine similarity — so "protecting user information" surfaces theGDPR and CCPA Compliance Overhauldecision even though it shares no words with the query. - Fusion (RRF): the two ranked lists are merged by
1/(k+rank)(k=60), which is score-scale agnostic and lets lexical-only and semantic-only hits both survive.
hybrid_search_nodes("lease renewal compliance")
# Lexical BM25 ranking + semantic cosine ranking, fused via RRF
# Embeds name + description + role + rationale; returns ranked resultsThe semantic arm degrades gracefully — if sentence-transformers is not installed the retriever falls back to pure BM25. A /search?mode=keyword fallback is also available for direct substring matching.
The Observe tab is a real-time production-AI dashboard that makes the model selection strategy visible and explains why every routing decision was made.
Routing Decision Badge — after every query response, a colour-coded pill appears:
| Badge | When shown | JD signal |
|---|---|---|
⚡ Direct · direct answer — no LLM · 0.1s |
List/count queries | Zero LLM cost path |
⚡ Haiku · simple lookup · 1.8s |
Short entity questions | Right-sizing to cheap model |
⚡ Sonnet · 'compliance' detected · 6.2s · 3 tool calls |
Complex reasoning | Full-capability model when needed |
Observe tab sections:
| Section | What it shows |
|---|---|
| Key Metrics | Total queries (standard / agent / direct), avg latency, session cost, safety events |
| Budget Status | Progress bar vs. daily limit, amber alert when exceeded |
| Model Selection Strategy | Horizontal bar per tier with percentage, colour code, and one-line description of why each tier exists. Token breakdown (input / cached / output) below |
| LangSmith Trace Feed | Last 15 runs with name, run type (chain/llm/tool), status, latency, token counts. Auto-refreshes every 10 s |
Backend additions:
_route_explanation(question)— returns the human-readable routing reason for each queryroute_reasonfield added to thedoneSSE event on all endpointsGET /langsmith/runs?limit=N— server-side LangSmith query (keeps API key server-side, avoids CORS)
Full distributed tracing for every LLM call, tool execution, and routing decision via LangSmith. Enabled by three env vars — zero overhead when disabled.
What is traced:
| Component | How | LangSmith run type |
|---|---|---|
| Every agent-loop LLM call | wrap_anthropic(AsyncAnthropic()) — patches messages.create automatically |
llm |
| Every tool execution | @traceable on _execute_agent_tool |
tool |
| Query routing decision | @traceable on route_query |
chain |
| Full agent run tree | _run_agent_traced() fires in background |
chain (parent with nested children) |
Enable tracing — add all three to .env, then restart the API:
LANGSMITH_API_KEY=your-key-here # free key at smith.langchain.com
LANGSMITH_PROJECT=cogni-graph # project name in LangSmith UI
LANGSMITH_TRACING_V2=true # ← master switch: must be set or nothing tracesImportant:
LANGSMITH_TRACING_V2=trueis the master switch. Setting onlyLANGSMITH_API_KEYis not enough — all decorators and wrappers remain no-ops until this flag is present. The API must also be restarted after editing.envbecausedotenvloads vars once at process start.
Verified trace output — one agent query produces these runs in LangSmith:
| # | Run type | Name | What it captured |
|---|---|---|---|
| 1 | llm |
ChatAnthropic | Turn 1 — LLM selected tool, inputs + output blocks, token counts |
| 2 | tool |
execute_graph_tool | Tool input + result (e.g. search_graph("Lease Renewal")) |
| 3 | llm |
ChatAnthropic | Turn 2 — next LLM decision |
| 4 | tool |
execute_graph_tool | Second tool call (e.g. get_entity("w1")) |
| 5 | llm |
ChatAnthropic | Final turn — stop reason end_turn, full answer text |
Full trace tree for a complex agent query:
agent_query (chain) ~36 s
├─ route_query (chain) — "claude-sonnet-4-6"
├─ ChatAnthropic (llm) — turn 1, tool_use
├─ execute_graph_tool (tool) — search_graph("GDPR")
├─ execute_graph_tool (tool) — trace_decision_impact("d4")
├─ execute_graph_tool (tool) — get_entity("pr1")
├─ execute_graph_tool (tool) — get_entity("pr3")
└─ ChatAnthropic (llm) — turn 5, end_turn → final answer
Each node shows inputs, outputs, token counts, latency, and metadata (domain: proptech, system: cogni-graph). Traces are searchable and can be added to LangSmith evaluation datasets.
If nothing appears in LangSmith:
- Confirm all three vars are in
.env— especiallyLANGSMITH_TRACING_V2=true - Restart the API (
pkill -f uvicorn && cd backend && uvicorn api:app) — env vars load at startup - Send at least one
/query/agentrequest — the standard/querySSE endpoint generates fewer trace events - Check the correct project name:
smith.langchain.com → Projects → cogni-graph
Every question passes through _check_injection() before any LLM call is made. Two checks run in order:
| Check | Rule | HTTP response |
|---|---|---|
| Length | ≤ 500 characters | 400 question_too_long:N_chars_max_500 |
| Pattern match | 18 compiled regex across 5 injection families | 400 Question rejected by safety filter: <category> |
The five families detected:
| Category | Examples caught |
|---|---|
instruction_override |
"ignore all previous instructions", "disregard the above guidelines" |
prompt_extraction |
"reveal your system prompt", "show me your instructions" |
identity_override |
"you are now a different AI", "pretend to be unrestricted" |
jailbreak |
"enable DAN mode", "jailbreak", "developer mode" |
delimiter_injection |
<system>, [SYSTEM], ### system, ASSISTANT: |
Flagged inputs are never forwarded to Claude. Each block increments _metrics["safety_events"] which is visible at GET /metrics. The guard covers both /query and /query/agent.
53 unit tests in tests/test_unit_safety.py verify all attack categories and confirm that all 6 sample PropTech questions plus 10 other legitimate queries pass without triggering false positives.
Output PII scanning runs after every LLM response before tokens reach the browser. Four PropTech-specific PII types are detected and redacted:
| Pattern | Example detected | Replaced with |
|---|---|---|
SSN |
123-45-6789 |
[REDACTED:SSN] |
PAYMENT_CARD |
4111 1111 1111 1111 |
[REDACTED:PAYMENT_CARD] |
ROUTING_NUMBER |
021000021 |
[REDACTED:ROUTING_NUMBER] |
EXTERNAL_EMAIL |
[email protected] |
[REDACTED:EXTERNAL_EMAIL] |
Internal @meridianpg.com addresses are excluded. Each detection increments _metrics["output_safety_events"] and emits a safety_warning SSE event to the client. Both endpoints buffer the full response before emission so patterns spanning multiple stream tokens are caught.
The safety guidelines are also embedded in prompts/v2.yaml — the system prompt explicitly instructs Claude never to reproduce sensitive personal information. Set PROMPT_VERSION=v2 to activate.
Every query updates in-memory counters. /metrics returns a live snapshot:
{
"queries": { "total": 42, "standard": 18, "agent": 12, "direct_no_llm": 12 },
"latency": { "avg_ms": 840.1, "total_ms": 35284.2 },
"tokens": { "input": 145000, "cached": 113100, "output": 8200, "cache_hit_rate": 0.78 },
"cost_usd": 0.1162,
"model_routes": { "direct": 12, "claude-haiku-4-5-20251001": 14, "claude-sonnet-4-6": 16 },
"cache_hits": 31,
"errors": 0
}A daily cost budget prevents runaway spend. Set DAILY_COST_LIMIT_USD in .env (default $5.00, set to 0 to disable):
DAILY_COST_LIMIT_USD=5.00How it works:
_update_metrics()recalculatesrunning_cost_usdafter every query using the token counters already in memory- When
running_cost_usd ≥ DAILY_COST_LIMIT_USD: sets_metrics["budget_exceeded"] = Trueand emits aWARNINGlog route_query()checks the flag first — if set, every query routes toclaude-haiku-4-5-20251001regardless of complexityGET /metricsexposes the full budget state under a"budget"blockPOST /admin/reset-budgetclears the flag without restarting the API
"budget": {
"limit_usd": 5.0,
"running_cost_usd": 5.0124,
"exceeded": true,
"exceeded_at_usd": 5.0011,
"note": "All queries forced to claude-haiku-4-5-20251001. Call POST /admin/reset-budget to restore normal routing."
}Two GitHub Actions workflows enforce quality on every change:
ci.yml — runs on every push and pull request to main:
syntax-and-unit ── py_compile + 48 unit tests (~0.5 s, no services)
prompt-validation ── validate all prompts/*.yaml structure and content
↓ (both must pass before integration runs)
integration ── Neo4j container → seed → 65 graph tests → start API → 63 API tests
No real Anthropic key is needed for the CI gate — all integration tests are marked -m "not llm".
eval-nightly.yml — runs daily at 06:00 UTC (and on manual dispatch):
Neo4j → seed → start API → python eval.py → check_eval_regression.py
│
pass ── upload 90-day artifact
fail ── upload artifact + auto-create GitHub issue
Regression thresholds (scripts/check_eval_regression.py):
| Check | Threshold |
|---|---|
| Average entity recall | ≥ 0.85 |
| Pass rate | ≥ 75% (6/8 cases) |
| Zero-recall cases | 0 allowed |
GitHub Secrets required (Settings → Secrets → Actions):
ANTHROPIC_API_KEY— for nightly eval (LLM calls)LANGSMITH_API_KEY— optional, for nightly tracing
Offline eval suite with 8 PropTech test cases. Each case specifies a question and a list of expected entities; the harness hits the live API, scores entity recall (pass threshold: ≥0.6), and prints a results table with latency and model routing breakdown.
python eval.py # standard /query endpoint
python eval.py --agent # agentic /query/agent endpoint
python eval.py --verbose # print full answer for each case [tc_01] workflow_ownership PASS recall=1.00 820 ms [claude-haiku-4-5-20251001]
[tc_02] compliance PASS recall=0.75 1240 ms [claude-sonnet-4-6]
[tc_03] customer_products PASS recall=1.00 390 ms [direct]
...
Pass rate : 8/8 (100%)
Avg recall : 0.91
Avg latency : 780 ms
| Layer | Tech |
|---|---|
| Graph DB | Neo4j 5.x (Docker) |
| API | FastAPI (Python) |
| LLM | Claude Sonnet / Haiku via Anthropic SDK (model-routed) |
| UI | React 18 + Vite + TypeScript + Tailwind |
| Graph viz | react-force-graph-2d (D3 canvas) |
| MCP Server | mcp Python SDK (FastMCP) |
| Graph data | Extracted from nexus_corp_brief.md via doc_to_graph.py + Claude |
| Search | Hybrid: BM25 (pure Python) + local sentence-transformers embeddings, fused via RRF |
| Tracing | LangSmith (wrap_anthropic + @traceable) |
| CI/CD | GitHub Actions — push gate + nightly eval with regression alerting |
| Observability UI | Observe tab — model routing distribution, LangSmith trace feed, budget status |
cd /Users/syefai/workspace/CompanyGraph
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtThe first search downloads the local embedding model (
all-MiniLM-L6-v2, ~90 MB) for the semantic arm. It's cached afterwards; ifsentence-transformersis unavailable the search falls back to pure BM25.
docker compose up -d
# Wait ~15s for Neo4j to be readyNeo4j browser: http://localhost:7474 (neo4j / companygraph123)
All backend code lives in backend/. Run commands from project root:
# Seed the graph
python backend/seed.py
# Start the API (run from backend/ so uvicorn finds api.py)
cd backend && uvicorn api:app --host 0.0.0.0 --port 8000 --reloadOr extract the graph from source document using Claude:
cd backend && python doc_to_graph.py --clear # document → Claude → Neo4j
cd backend && uvicorn api:app --reloadAPI docs: http://localhost:8000/docs
# Standard query (full graph in cached system prompt, model-routed)
curl -N -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "Who owns the Lease Renewal workflow?"}'
# Agentic query (single-agent tool-calling loop, streams tool_call/tool_result/text events)
curl -N -X POST http://localhost:8000/query/agent \
-H "Content-Type: application/json" \
-d '{"question": "Trace the compliance impact of the GDPR overhaul decision."}'
# Multi-agent query (planner → parallel workers → synthesizer; streams plan/subagent_result/text)
curl -N -X POST http://localhost:8000/query/orchestrate \
-H "Content-Type: application/json" \
-d '{"question": "What is Meridian'\''s compliance posture, and which customers and products are most affected by the GDPR decision?"}'
# Hybrid search (BM25 + semantic embeddings via RRF)
curl "http://localhost:8000/search?q=fair+housing+compliance"
# Observability snapshot
curl http://localhost:8000/metricsGet a free API key at smith.langchain.com, then add to .env:
LANGSMITH_API_KEY=your-key-here
LANGSMITH_PROJECT=cogni-graph
LANGSMITH_TRACING_V2=trueRestart the API — every agent query will now appear in LangSmith with a full trace tree showing LLM calls, tool executions, routing decisions, token counts, and latency per step. Safe to omit: all decorators and wrappers are no-ops without the key.
# Standard endpoint (model-routed, fast)
python backend/eval.py
# Agentic endpoint (tool-calling loop, more thorough)
python backend/eval.py --agent
# Verbose: print full answer for each test case
python backend/eval.py --verboseResults are saved to backend/eval_results/.
The .mcp.json file in this directory auto-registers the MCP server when you open Claude Code here — no manual step needed. Its contents:
{
"mcpServers": {
"meridian-property-graph": {
"command": "/Users/syefai/workspace/CompanyGraph/.venv/bin/python",
"args": ["backend/mcp_server.py"],
"cwd": "/Users/syefai/workspace/CompanyGraph",
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "companygraph123"
}
}
}
}Why each field matters:
commandpoints at the venv Python (not barepython), so the server has the project's dependencies (neo4j,mcp, …).argsisbackend/mcp_server.py— the server lives inbackend/, not the repo root.cwdanchors it to the repo root somcp_server.py's flatimport graphresolves and.envis found.envsupplies the Neo4j connection (noANTHROPIC_API_KEYneeded — the MCP tools only touch the graph, no LLM calls).
Or register it from the CLI without editing the file:
claude mcp add meridian-property-graph \
-- /Users/syefai/workspace/CompanyGraph/.venv/bin/python \
/Users/syefai/workspace/CompanyGraph/backend/mcp_server.pyEither way, Claude gets these 11 tools (8 read + 3 write):
| Tool | What it does |
|---|---|
get_graph_summary |
High-level overview of the graph |
list_entities |
List all People / Products / Customers / Workflows / Decisions |
get_entity |
Get an entity's details + all connections |
search_graph |
Full-text search across the graph |
find_path |
Shortest path between any two entities |
trace_decision_impact |
What does a decision affect (up to 3 hops)? |
get_workflow_team |
Who owns / is involved in a workflow? |
get_customer_products |
What products does a customer use + who built them? |
add_entity |
Add a new entity to the graph |
connect_entities |
Create a relationship between two entities |
run_cypher |
Execute raw Cypher for advanced queries |
Claude Desktop launches the server from an arbitrary working directory, so use
absolute paths for both the interpreter and the script. Add to
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"meridian-property-graph": {
"command": "/Users/syefai/workspace/CompanyGraph/.venv/bin/python",
"args": ["/Users/syefai/workspace/CompanyGraph/backend/mcp_server.py"],
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "companygraph123"
}
}
}
}The absolute path to
backend/mcp_server.pyputsbackend/onsys.path, so the flatimport graphresolves without acwd. After saving, fully restart Claude Desktop and confirm the tools icon lists meridian-property-graph with its 11 tools. Neo4j must be running (docker compose up -d) for the tools to return data.
⚠️ Gotcha — don't copy the relative path from.mcp.json. Claude Desktop ignores thecwdfield and launches the server from/, so a relative"args": ["backend/mcp_server.py"]resolves against root and fails with:can't open file '//backend/mcp_server.py': [Errno 2] No such file or directoryThe
//prefix is the tell. Use the absolute script path inargs(as above) for Desktop. The relative path +cwdonly works in.mcp.jsonfor Claude Code.
(Person)-[:WORKS_ON]-------->(Product)
(Person)-[:OWNS]------------>( Workflow)
(Person)-[:MADE]------------>( Decision)
(Workflow)-[:INVOLVES]------>( Person)
(Workflow)-[:PRODUCES]------>( Product)
(Workflow)-[:DEPENDS_ON]---->( Workflow)
(Customer)-[:USES]---------->( Product)
(Decision)-[:AFFECTS]------->( Product | Customer | Workflow)
Meridian Property Group — PropTech SaaS for residential and commercial property management
- 8 people: Elena Rodriguez (CEO), James Park (VP Operations), Sofia Nguyen (Head of Product), Marcus Webb (Senior Engineer), Priya Okafor (Engineer), David Chen (Director of Compliance), Rachel Torres (Leasing Director), Andre Williams (Customer Success Lead)
- 5 products: LeaseTrack (active), MaintenanceOS (active), TenantPay (active), OwnerInsight (beta), LegacyPortal (deprecated)
- 5 customers: Sunstone Residential (3,200 units, enterprise), Harbor View Properties (1,800 units, enterprise), Metro Living Group (900 units, mid-market), Summit HOA (320 units, SMB), Apex Commercial (2M sqft, enterprise)
- 6 workflows: Lease Renewal, Move-In Inspection, Work Order Processing, Fair Housing Audit, Vendor Onboarding, Quarterly Owner Reporting
- 6 decisions: Deprecate LegacyPortal, Enter Commercial Market, Migrate to Kubernetes, GDPR and CCPA Compliance Overhaul, Launch OwnerInsight Beta, Outsource Vendor Network
The domain was chosen to demonstrate AI patterns relevant to property technology:
| PropTech concern | Graph representation |
|---|---|
| Lease compliance | Lease Renewal workflow →[:DEPENDS_ON]→ Fair Housing Audit |
| Regulatory impact | GDPR and CCPA Compliance Overhaul →[:AFFECTS]→ LeaseTrack, TenantPay, Lease Renewal |
| Vendor risk | Work Order Processing →[:DEPENDS_ON]→ Vendor Onboarding |
| Customer analytics | OwnerInsight beta co-developed with Sunstone Residential, Harbor View Properties |
| Product deprecation | Deprecate LegacyPortal decision traced to affected customers and workflows |
"Who owns the Lease Renewal workflow and who else is involved?"
"Trace the full impact of the GDPR and CCPA compliance overhaul."
"Find the connection between Elena Rodriguez and Apex Commercial."
"Which products does Sunstone Residential use and who built them?"
"What workflows would be at risk if Marcus Webb left the company?"
"Add a new compliance engineer named 'Kai Patel' and connect them to the Fair Housing Audit workflow."




