Locally Hosted Knowledge Graph Builder (Graphify Pipeline)

Project Details

This project leverages Graphify to turn any local directory (containing documents, code, images, etc.) into an interactive, highly connected, and queryable knowledge graph. By running locally, this iteration of Graphify routes its advanced semantic extraction and structural analysis processes completely offline. Graphify scans your raw input files to detect their types, uses tree-sitting parsers for structural relationships (for code) and uses a fast local AI agent layer to extract concepts and entities (for text). The resultant connections are saved into JSON for Agentic workflows and a user-friendly HTML/markdown visual representation.

Tech Stack

Python 3.10+ (Runtime environment)
Graphifyy (The main knowledge graph orchestrator CLI/API plugin)
Tree-sitter (Used for fast, AST-based deterministic parsing of code structure)
NetworkX (Library used for creating complex network nodes and edges)
Leiden Community Detection (Algorithm to group closely joined nodes into labeled conceptual clusters)
Ollama (gemma4:e2b) (Local AI execution environment and language model for semantic relationship extraction)

Module Explanations

detect: Analyzes the input directory, groups files by type (code, docs, etc.), and checks for sensitive variables before routing.
extract: The core parsing layer. Uses Tree-sitter for structural AST linkages in code and triggers parallel LLM subagent tasks for deep semantic connections in docs.
build: Instantiates a NetworkX G multidigraph by combining detected structural and semantic node JSONs together into edge relationships.
cluster: Uses the Leiden community detection algorithm over G to detect high-density cliques of connected nodes.
analyze: Searches for surprising bridge connections between clusters and determines "God nodes" (highly connected central nodes).
report: Generates a unified plain-text presentation map (GRAPH_REPORT.md), attaching human language summaries and suggesting the strongest queries that traverse communities.
export: Dumps the NetworkX object state into multiple formats: graph.json for persistent MCP traversal, GraphML / Cypher exports, and visually accessible graphs.
Supporting Files:
- ingest.py: Handles fetching remote web/URL items and local filesystem chunking.
- cache.py: Maintains state persistence so previously semantically digested files don't waste LLM tokens.

Running Steps

To set up and run this project from scratch via Windows terminal:

Ensure Python 3.10+ is installed.
Install dependencies globally:
```
pip install graphifyy
graphify install
```

Optimize and Run Local AI: Create a file named Modelfile_GemmaFast:

FROM gemma4:e2b
SYSTEM """You are an expert, direct assistant. You must provide the final answer immediately. DO NOT use <think> tags, DO NOT output internal reasoning, and DO NOT brainstorm before answering. Output only the final response."""

Build and start the model:

ollama create gemma4:e2b -f Modelfile_GemmaFast
ollama run gemma4:e2b

Execute via AI: Within your AI coding assistant, run:
```
/graphify ./raw
```

Use Cases

Codebase Onboarding: Instantly digest a large open-source repository's architecture without needing full technical documentation. It highlights "God Nodes" immediately.
Analyzing Undocumented Legacy Systems: Parse sprawling spaghetti code connections into cleanly decoupled visual graphs, giving refactoring insights.
Reducing Context Window Token Costs: By extracting precise node-to-node relationships initially instead of blindly shoving vast raw text into the context window, follow-up queries become contextually narrowed and exponentially cheaper.
Brainstorming & Research: Tossing disjointed PDFs and web URLs into a single .raw and seeing bridging concepts natively form context mappings.

Future Features

Live Neo4j Sync Integration: Wire the exporter output to continuously mirror changes into a local Neo4j desktop instance as the code dynamically updates on save.
Vector Embedded Retrieval: Implementing similarity searching in query functionality by indexing extracted attributes to local LanceDB.
Custom Prompting Tuning: Expanding Modelfile to have separate deterministic and creative variant prompts depending on whether it parses logic source code or design .md notes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Modelfile_GemmaFast		Modelfile_GemmaFast
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Locally Hosted Knowledge Graph Builder (Graphify Pipeline)

Project Details

Tech Stack

Module Explanations

Running Steps

Use Cases

Future Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Locally Hosted Knowledge Graph Builder (Graphify Pipeline)

Project Details

Tech Stack

Module Explanations

Running Steps

Use Cases

Future Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages