This project leverages Graphify to turn any local directory (containing documents, code, images, etc.) into an interactive, highly connected, and queryable knowledge graph. By running locally, this iteration of Graphify routes its advanced semantic extraction and structural analysis processes completely offline. Graphify scans your raw input files to detect their types, uses tree-sitting parsers for structural relationships (for code) and uses a fast local AI agent layer to extract concepts and entities (for text). The resultant connections are saved into JSON for Agentic workflows and a user-friendly HTML/markdown visual representation.
- Python 3.10+ (Runtime environment)
- Graphifyy (The main knowledge graph orchestrator CLI/API plugin)
- Tree-sitter (Used for fast, AST-based deterministic parsing of code structure)
- NetworkX (Library used for creating complex network nodes and edges)
- Leiden Community Detection (Algorithm to group closely joined nodes into labeled conceptual clusters)
- Ollama (gemma4:e2b) (Local AI execution environment and language model for semantic relationship extraction)
- detect: Analyzes the input directory, groups files by type (code, docs, etc.), and checks for sensitive variables before routing.
- extract: The core parsing layer. Uses Tree-sitter for structural AST linkages in code and triggers parallel LLM subagent tasks for deep semantic connections in docs.
- build: Instantiates a NetworkX
Gmultidigraph by combining detected structural and semantic node JSONs together into edge relationships. - cluster: Uses the Leiden community detection algorithm over
Gto detect high-density cliques of connected nodes. - analyze: Searches for surprising bridge connections between clusters and determines "God nodes" (highly connected central nodes).
- report: Generates a unified plain-text presentation map (
GRAPH_REPORT.md), attaching human language summaries and suggesting the strongest queries that traverse communities. - export: Dumps the NetworkX object state into multiple formats:
graph.jsonfor persistent MCP traversal, GraphML / Cypher exports, and visually accessible graphs. - Supporting Files:
- ingest.py: Handles fetching remote web/URL items and local filesystem chunking.
- cache.py: Maintains state persistence so previously semantically digested files don't waste LLM tokens.
To set up and run this project from scratch via Windows terminal:
- Ensure Python 3.10+ is installed.
- Install dependencies globally:
pip install graphifyy graphify install
- Optimize and Run Local AI:
Create a file named
Modelfile_GemmaFast:Build and start the model:FROM gemma4:e2b SYSTEM """You are an expert, direct assistant. You must provide the final answer immediately. DO NOT use <think> tags, DO NOT output internal reasoning, and DO NOT brainstorm before answering. Output only the final response."""
ollama create gemma4:e2b -f Modelfile_GemmaFast ollama run gemma4:e2b - Execute via AI:
Within your AI coding assistant, run:
/graphify ./raw
- Codebase Onboarding: Instantly digest a large open-source repository's architecture without needing full technical documentation. It highlights "God Nodes" immediately.
- Analyzing Undocumented Legacy Systems: Parse sprawling spaghetti code connections into cleanly decoupled visual graphs, giving refactoring insights.
- Reducing Context Window Token Costs: By extracting precise node-to-node relationships initially instead of blindly shoving vast raw text into the context window, follow-up queries become contextually narrowed and exponentially cheaper.
- Brainstorming & Research: Tossing disjointed PDFs and web URLs into a single
.rawand seeing bridging concepts natively form context mappings.
- Live Neo4j Sync Integration: Wire the exporter output to continuously mirror changes into a local Neo4j desktop instance as the code dynamically updates on save.
- Vector Embedded Retrieval: Implementing similarity searching in
queryfunctionality by indexing extracted attributes to local LanceDB. - Custom Prompting Tuning: Expanding
Modelfileto have separate deterministic and creative variant prompts depending on whether it parses logic source code or design.mdnotes.