A generic GitHub org knowledge graph — ingests any org's repositories into Neo4j. Documentation is the primary source of understanding; code files (go.mod, Terraform, Makefile, Helm, docker-compose) are secondary.
All classification knowledge lives in config/tech_map.yaml.
The ingestion engine has zero hardcoded org, repo, or technology names.
# 1. Copy and populate the env file
cp .env.example .env
# Set INGEST_ORG=your-github-org (and optionally GITHUB_TOKEN)
# 2. Start Neo4j and run ingestion
./run.sh up
# 3. Query the graph
./run.sh queryNeo4j Browser is available at http://localhost:7474 (credentials: neo4j / password).
# Ingest an entire org (top N by stars)
./run.sh ingest -- --org myorg --top 20
# Ingest specific repos
./run.sh ingest -- --repos owner/repo1,owner/repo2
# Filter by name glob
./run.sh ingest -- --org myorg --filter "my-service-*"Ingestion is incremental — safe to re-run. Existing nodes are merged, not duplicated.
Edit config/tech_map.yaml to teach the engine about your technology stack:
| Section | Controls |
|---|---|
patterns |
Architectural pattern detection from documentation prose |
prose_keywords |
Technology mentions detected in README / docs |
go_modules |
Go direct-dependency classification |
terraform_providers / terraform_resources |
Terraform technology detection |
compose_services |
docker-compose service classification |
makefile_tools |
Tool usage detected in Makefiles |
helm_charts |
Helm chart name classification |
component_name_kinds |
Map repo/binary names to component kinds |
component_subdirs |
Map cmd/ subdirectory names to component kinds |
parsers |
Enable/disable optional file parsers (openapi, ocm_model_dsl) |
No code changes are needed — only edit the YAML.
Organisation -[:HAS_REPO]----------> Repository
Repository -[:CONTAINS]----------> Component
Repository -[:INTRODUCES_CONCEPT]-> Concept
Repository -[:HAS_DOCUMENT]-------> Document
Repository -[:REFERENCES]---------> Repository
Repository -[:DEFINES_SERVICE]----> ApiService
Component -[:USES_TECHNOLOGY]----> Technology
Component -[:FOLLOWS_PATTERN]----> Pattern
Component -[:EXPOSES_API]--------> ApiContract
Component -[:DEPENDS_ON]---------> Component
ApiContract -[:HAS_ENDPOINT]-------> HttpEndpoint
ApiService -[:HAS_TYPE|HAS_RESOURCE|HAS_ENUM]-> ApiResource
ApiResource -[:HAS_FIELD]----------> ApiField
Run ./run.sh query then type list to see all available queries, including:
overview— repos, languages, component countstech-stack— every technology and how widely it is usedshared-tech— technologies shared across multiple componentspatterns— architectural patterns detected from documentationconcepts— domain concepts extracted from documentationcommunications— protocols and message busesdatabases— data stores per componentauth— authentication and authorisation technologiesplatform-arch— cross-repo component dependency graphcross-references— which repos reference each other in docsendpoints— all HTTP API endpoints
| Service | Purpose |
|---|---|
neo4j |
Graph database (persisted in a Docker volume) |
ingest |
Clones repos, extracts knowledge, writes graph |
query |
Interactive Cypher REPL + named queries |
- Docker + Docker Compose
- GitHub token recommended (avoids API rate limits): set
GITHUB_TOKENin.env