Data Commons is an open source semantic graph database for modeling, querying, and analyzing interconnected data.
Data Commons powers datacommons.org, Google's open knowledge graph that connects public data across domains like demographics, economics, health, and education.
This guide covers setting up Data Commons in Google Cloud Platform (GCP), defining schemas in JSON-LD, adding data via the command-line interface, and querying relationships.
Before you begin, ensure you have the following installed:
- Python 3.11 or higher
- uv (Python project manager)
- A Google Cloud Platform (GCP) project with Cloud Spanner enabled
- A Cloud Spanner instance and database (using Google Standard SQL) for storing the knowledge graph
Use the CLI to scaffold a Terraform deployment directory:
git clone https://github.com/datacommonsorg/datacommons
cd datacommons/
uv run datacommons admin initThe command will prompt for:
- GCP project id
- namespace
- Data Commons API key
It then creates a new folder with main.tf, terraform.tfvars, and a deployment README.md.
From the generated folder:
terraform init
terraform plan
terraform applyFor the full infrastructure module and complete variable reference, see the detailed GCP Infrastructure Guide.