A lightweight, persistent knowledge graph of the OMOP Common Data Model v5.4 schema, built on Ladybug — an embeddable graph database with Cypher query support.
The graph encodes all 37 CDM tables as nodes, their 438 columns as typed property nodes, and all 186 foreign-key relationships as directed edges. Once built, the database can be queried to enumerate tables, resolve column metadata, and compute join paths between arbitrary table pairs.
Generating valid SQL over an OMOP CDM instance requires knowledge of the schema's relational structure: which tables exist, what columns they carry, and how they join. Encoding this as a property graph makes the structure machine-queryable in a way that flat documentation or ad-hoc string templates do not.
| Node type | Count | Properties |
|---|---|---|
OmopDomain |
7 | name |
OmopTable |
37 | name, domain, description |
OmopColumn |
438 | id, table_name, column_name, data_type, is_pk, is_fk, is_nullable, description |
| Edge type | Count | Properties |
|---|---|---|
BELONGS_TO_DOMAIN |
37 | — |
HAS_COLUMN |
438 | — |
FOREIGN_KEY |
186 | join_type, fk_desc |
JOINS_TO |
186 | from_column, to_column, join_sql, join_type |
JOINS_TO edges are derived from FOREIGN_KEY edges and provide a direct table-to-table traversal surface, including self-referential joins (e.g. visit_detail.parent_visit_detail_id).
- Python ≥ 3.10
- uv
git clone <repo>
cd omop-cdm_graph
uv syncuv run python scripts/build_graph.pyProduces omop_cdm.lbug in the working directory. Re-running overwrites the existing file. To skip a rebuild if the file already exists:
uv run python scripts/build_graph.py --no-rebuildfrom omop_graph.querier import OMOPGraphQuerier
q = OMOPGraphQuerier("./omop_cdm.lbug")
# All columns for a table
q.get_columns("condition_occurrence")
# Tables directly joinable from a given table
q.get_direct_joins("condition_occurrence")
# Both inbound and outbound joins
q.get_all_joins_for_table("person")
# Shortest join path between two tables (BFS, undirected)
q.get_join_path("condition_occurrence", "location")
# → ['condition_occurrence', 'person', 'location']
# SQL JOIN snippet for a direct relationship
q.get_join_sql("condition_occurrence", "visit_occurrence")
# Tables containing a given column name
q.find_tables_with_column("person_id")
# Tables in a domain
q.get_tables_in_domain("derived")
# Raw Cypher passthrough
q.run_cypher(
"MATCH (t:OmopTable)-[:JOINS_TO]->(t2:OmopTable {name: $name}) RETURN t.name",
parameters={"name": "concept"}
)src/omop_graph/
schema.py — Pydantic models; all 37 tables, 438 columns, 186 joins (source of truth)
builder.py — Ladybug DDL and graph construction
querier.py — Query interface over the built graph
cli.py — Entry point for omop-graph-build
scripts/
build_graph.py
tests/
test_schema.py — Schema completeness and integrity
test_builder.py — Graph construction and query correctness
uv run pytest tests/ -v33 tests covering schema completeness, join referential integrity, graph construction, and all query methods.
All table and column definitions follow the OMOP CDM v5.4 specification. The schema is defined statically in schema.py and does not require a live CDM instance.