Lumina Command API is a central backend service in the Lumina ecosystem of the ETH Library. It provides a unified API layer for executing data transformation pipelines, enrichment workflows, and other processing tasks on structured library data.
Within the broader Lumina initiative, this service enables:
- Data readiness for AI-based discovery
- Metadata enrichment and normalization
- Transformation pipelines for downstream systems
- Embedding generation and vector database upserts
- Scalable processing of large datasets via streaming and async background tasks
Client (Browser / Postman / Application)
│
│ x-api-key (consumer key)
▼
┌──────────────────────────┐
│ Apigee X │
│ api.library.ethz.ch │
│ • API key verification │
│ • CORS handling │
│ • Backend key injection │
└──────────┬───────────────┘
│ x-api-key (internal key)
▼
┌──────────────────────────┐
│ Lumina Command API │
│ (FastAPI on Cloud Run) │
│ • Internal key auth │
│ • Pipeline execution │
│ • Embedding generation │
└──────────┬───────────────┘
│
▼
Output (JSON / CSV)
Pinecone (vectors)
External consumers access the API exclusively through Apigee X. The Cloud Run backend is protected by an internal API key that Apigee injects transparently.
| Component | Technology |
|---|---|
| Framework | FastAPI >= 0.110.0 |
| ASGI Server (local) | Uvicorn >= 0.27.0 |
| WSGI Server (production) | Gunicorn >= 22.0.0 with UvicornWorker |
| Validation | Pydantic |
| Embeddings | OpenAI API (text-embedding-3-large) |
| Vector Database | Pinecone |
| Data Processing | Pandas >= 2.0.0 |
| API Gateway | Apigee X |
| Hosting | Google Cloud Run |
| Secret Management | Google Cloud Secret Manager |
| Path | Method | Description |
|---|---|---|
/ |
GET | Service name and environment |
/health |
GET | Health check |
/version |
GET | Name, version, and environment |
| Path | Method | Description |
|---|---|---|
/commands/transform-eth-udk-json |
POST | Transform ETH UDK dataset → JSON |
/commands/transform-eth-udk-csv |
POST | Transform ETH UDK dataset → CSV |
/commands/upsert-pinecone |
POST | Embed and upsert records to Pinecone |
/commands/upsert-pinecone-polling |
POST | Start async upsert job (returns job ID) |
/commands/upsert-pinecone-polling/{job_id}/status |
GET | Poll upsert job status |
| Key | Type | Description |
|---|---|---|
source_file |
File | ETH UDK dataset (.json or .json.gz) |
rootterms_file |
File | Root terms lookup (.json or .json.gz) |
| Key | Type | Description |
|---|---|---|
file |
File | CSV file (.csv or .csv.gz) |
index_name |
string | Pinecone index name |
namespace |
string | Pinecone namespace |
embedding_fields |
string | JSON array of field names to embed, e.g. '["descriptor_eng"]' |
Due to Cloud Run request limits (~32 MB), gzip compression is supported and recommended:
.json/.csv→ supported.json.gz/.csv.gz→ recommended for large files
- Python 3.10+
- Git
- Google Cloud SDK (
gcloud) - VS Code (recommended)
git clone https://github.com/eth-library/lumina-command-api.git
cd lumina-command-apipython -m venv .venvActivate:
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activatepip install -r requirements.txtCopy the template and fill in your keys:
cp .env.example .envThen edit .env with your actual values:
PYTHONPATH=.
OPENAI_API_KEY=sk-...
PINECONE_API_KEY=pcsk_...
INTERNAL_API_KEY=your-secret-key-hereThe .env file is listed in .gitignore and must never be committed.
| Variable | Description |
|---|---|
OPENAI_API_KEY |
API key for OpenAI embedding generation |
PINECONE_API_KEY |
API key for Pinecone vector database |
INTERNAL_API_KEY |
Internal API key for /commands/* endpoint authentication |
- Open the project folder
Ctrl + Shift + P→ Python: Select Interpreter- Select the
.venvinterpreter
uvicorn app.main:app --reload --port 8080- Swagger UI: http://127.0.0.1:8080/docs
- Health check: http://127.0.0.1:8080/health
gcloud auth login
gcloud projects create your-project-id
gcloud config set project your-project-idgcloud services enable run.googleapis.com
gcloud services enable secretmanager.googleapis.comgcloud config set run/region europe-west6The Cloud Run service account needs permission to read secrets:
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:[email protected]" \
--role="roles/secretmanager.secretAccessor"This is a one-time setup.
The setup-secrets.sh script reads keys from your local .env and creates or updates them in Google Cloud Secret Manager:
./setup-secrets.shThe script handles OPENAI_API_KEY, PINECONE_API_KEY, and INTERNAL_API_KEY.
./deploy.shThis script:
- Builds a container image from source via Google Cloud Buildpacks
- Pushes the image to Google Container Registry
- Creates a new Cloud Run revision and routes traffic to it
- Injects secrets from Secret Manager as environment variables
gcloud run services logs read lumina-command-api --region europe-west6- Update the key in your
.envfile - Run
./setup-secrets.sh(creates a new secret version) - Run
./deploy.sh(deploys a new revision with the updated secret)
lumina-command-api/
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Environment variable configuration
│ ├── auth.py # API key authentication dependency
│ ├── routers/
│ │ └── commands.py # All /commands/* endpoint definitions
│ ├── services/
│ │ ├── transform_eth_udk.py # ETH UDK transformation pipeline orchestrator
│ │ └── pinecone_upsert.py # Embedding generation and Pinecone upsert
│ └── transformers/
│ └── eth_udk/
│ ├── step1_check_unique_descriptors.py
│ ├── step2_check_non_dictionary_variants.py
│ ├── step3_validate_json_structure.py
│ ├── step4_merge_variants_by_language.py
│ ├── step5_add_broader_terms_names.py
│ ├── step6_add_related_terms_names.py
│ ├── step7a_simplify_json.py
│ ├── step7b_add_level.py
│ ├── step7c_clean_transaction_date.py
│ ├── step7d_add_cat_root_term.py
│ ├── step7e_propagate_root_terms.py
│ └── step8_json_to_csv.py
├── test_data/ # Sample datasets for testing
├── deploy.sh # Cloud Run deployment script
├── setup-secrets.sh # Secret Manager setup script
├── Procfile # Production process definition (Gunicorn)
├── requirements.txt # Python dependencies
├── openapi.json # OpenAPI 3.0.3 spec (for Apigee)
└── LICENSE # Apache 2.0
This project is licensed under the Apache License 2.0.