Skip to content

47thtechcorner/RayCodes_Graphify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Locally Hosted Knowledge Graph Builder (Graphify Pipeline)

Project Details

This project leverages Graphify to turn any local directory (containing documents, code, images, etc.) into an interactive, highly connected, and queryable knowledge graph. By running locally, this iteration of Graphify routes its advanced semantic extraction and structural analysis processes completely offline. Graphify scans your raw input files to detect their types, uses tree-sitting parsers for structural relationships (for code) and uses a fast local AI agent layer to extract concepts and entities (for text). The resultant connections are saved into JSON for Agentic workflows and a user-friendly HTML/markdown visual representation.

Tech Stack

  • Python 3.10+ (Runtime environment)
  • Graphifyy (The main knowledge graph orchestrator CLI/API plugin)
  • Tree-sitter (Used for fast, AST-based deterministic parsing of code structure)
  • NetworkX (Library used for creating complex network nodes and edges)
  • Leiden Community Detection (Algorithm to group closely joined nodes into labeled conceptual clusters)
  • Ollama (gemma4:e2b) (Local AI execution environment and language model for semantic relationship extraction)

Module Explanations

  • detect: Analyzes the input directory, groups files by type (code, docs, etc.), and checks for sensitive variables before routing.
  • extract: The core parsing layer. Uses Tree-sitter for structural AST linkages in code and triggers parallel LLM subagent tasks for deep semantic connections in docs.
  • build: Instantiates a NetworkX G multidigraph by combining detected structural and semantic node JSONs together into edge relationships.
  • cluster: Uses the Leiden community detection algorithm over G to detect high-density cliques of connected nodes.
  • analyze: Searches for surprising bridge connections between clusters and determines "God nodes" (highly connected central nodes).
  • report: Generates a unified plain-text presentation map (GRAPH_REPORT.md), attaching human language summaries and suggesting the strongest queries that traverse communities.
  • export: Dumps the NetworkX object state into multiple formats: graph.json for persistent MCP traversal, GraphML / Cypher exports, and visually accessible graphs.
  • Supporting Files:
    • ingest.py: Handles fetching remote web/URL items and local filesystem chunking.
    • cache.py: Maintains state persistence so previously semantically digested files don't waste LLM tokens.

Running Steps

To set up and run this project from scratch via Windows terminal:

  1. Ensure Python 3.10+ is installed.
  2. Install dependencies globally:
    pip install graphifyy
    graphify install
  3. Optimize and Run Local AI: Create a file named Modelfile_GemmaFast:
    FROM gemma4:e2b
    SYSTEM """You are an expert, direct assistant. You must provide the final answer immediately. DO NOT use <think> tags, DO NOT output internal reasoning, and DO NOT brainstorm before answering. Output only the final response."""
    Build and start the model:
    ollama create gemma4:e2b -f Modelfile_GemmaFast
    ollama run gemma4:e2b
  4. Execute via AI: Within your AI coding assistant, run:
    /graphify ./raw
    

Use Cases

  • Codebase Onboarding: Instantly digest a large open-source repository's architecture without needing full technical documentation. It highlights "God Nodes" immediately.
  • Analyzing Undocumented Legacy Systems: Parse sprawling spaghetti code connections into cleanly decoupled visual graphs, giving refactoring insights.
  • Reducing Context Window Token Costs: By extracting precise node-to-node relationships initially instead of blindly shoving vast raw text into the context window, follow-up queries become contextually narrowed and exponentially cheaper.
  • Brainstorming & Research: Tossing disjointed PDFs and web URLs into a single .raw and seeing bridging concepts natively form context mappings.

Future Features

  • Live Neo4j Sync Integration: Wire the exporter output to continuously mirror changes into a local Neo4j desktop instance as the code dynamically updates on save.
  • Vector Embedded Retrieval: Implementing similarity searching in query functionality by indexing extracted attributes to local LanceDB.
  • Custom Prompting Tuning: Expanding Modelfile to have separate deterministic and creative variant prompts depending on whether it parses logic source code or design .md notes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors