Skip to content

fastomop/omop-cdm-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omop-cdm-graph

A lightweight, persistent knowledge graph of the OMOP Common Data Model v5.4 schema, built on Ladybug — an embeddable graph database with Cypher query support.

The graph encodes all 37 CDM tables as nodes, their 438 columns as typed property nodes, and all 186 foreign-key relationships as directed edges. Once built, the database can be queried to enumerate tables, resolve column metadata, and compute join paths between arbitrary table pairs.


Motivation

Generating valid SQL over an OMOP CDM instance requires knowledge of the schema's relational structure: which tables exist, what columns they carry, and how they join. Encoding this as a property graph makes the structure machine-queryable in a way that flat documentation or ad-hoc string templates do not.


Graph model

Node type Count Properties
OmopDomain 7 name
OmopTable 37 name, domain, description
OmopColumn 438 id, table_name, column_name, data_type, is_pk, is_fk, is_nullable, description
Edge type Count Properties
BELONGS_TO_DOMAIN 37
HAS_COLUMN 438
FOREIGN_KEY 186 join_type, fk_desc
JOINS_TO 186 from_column, to_column, join_sql, join_type

JOINS_TO edges are derived from FOREIGN_KEY edges and provide a direct table-to-table traversal surface, including self-referential joins (e.g. visit_detail.parent_visit_detail_id).


Requirements

  • Python ≥ 3.10
  • uv

Installation

git clone <repo>
cd omop-cdm_graph
uv sync

Build

uv run python scripts/build_graph.py

Produces omop_cdm.lbug in the working directory. Re-running overwrites the existing file. To skip a rebuild if the file already exists:

uv run python scripts/build_graph.py --no-rebuild

Usage

from omop_graph.querier import OMOPGraphQuerier

q = OMOPGraphQuerier("./omop_cdm.lbug")

# All columns for a table
q.get_columns("condition_occurrence")

# Tables directly joinable from a given table
q.get_direct_joins("condition_occurrence")

# Both inbound and outbound joins
q.get_all_joins_for_table("person")

# Shortest join path between two tables (BFS, undirected)
q.get_join_path("condition_occurrence", "location")
# → ['condition_occurrence', 'person', 'location']

# SQL JOIN snippet for a direct relationship
q.get_join_sql("condition_occurrence", "visit_occurrence")

# Tables containing a given column name
q.find_tables_with_column("person_id")

# Tables in a domain
q.get_tables_in_domain("derived")

# Raw Cypher passthrough
q.run_cypher(
    "MATCH (t:OmopTable)-[:JOINS_TO]->(t2:OmopTable {name: $name}) RETURN t.name",
    parameters={"name": "concept"}
)

Structure

src/omop_graph/
  schema.py     — Pydantic models; all 37 tables, 438 columns, 186 joins (source of truth)
  builder.py    — Ladybug DDL and graph construction
  querier.py    — Query interface over the built graph
  cli.py        — Entry point for omop-graph-build
scripts/
  build_graph.py
tests/
  test_schema.py   — Schema completeness and integrity
  test_builder.py  — Graph construction and query correctness

Tests

uv run pytest tests/ -v

33 tests covering schema completeness, join referential integrity, graph construction, and all query methods.


Schema source

All table and column definitions follow the OMOP CDM v5.4 specification. The schema is defined statically in schema.py and does not require a live CDM instance.

Contact

[email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages