Skip to content

jaeyeopme/clean-data-export-api

Repository files navigation

Clean Data Export API

CI

Clean Data Export API turns incomplete FieldOps Desk job exports into files a team can review: clean CSV, matching JSON, rejected rows, run history, and a short run summary.

The project is local by design. It uses fictional data, reads only jobs and job_updates, and never connects to a real customer system.

The demo export for 2026-06-01 through 2026-06-05 returns:

source_count=9
clean_count=3
duplicate_count=1
rejected_count=6

What it does

This is one export workflow, not a broad backend platform. It:

  • pulls a limited set of source records.
  • maps source fields into clean output columns.
  • keeps invalid records out of the clean export without hiding them.
  • removes later duplicate records for the same job_id.
  • writes files that can be reviewed in a spreadsheet or handed to another system.
  • records run history in SQLite.

Scenario

FieldOps Desk is a fictional operations tool for field jobs. Its export is limited, and manual spreadsheet edits have left missing fields and duplicate rows.

The local workflow produces:

  • a clean job export for review.
  • a JSON copy with the same accepted records.
  • a rejected-row file that explains what failed.
  • a short run summary with counts and file paths.

Workflow

sequenceDiagram
  actor User
  participant Entry as CLI or FastAPI
  participant Service as Export service
  participant Source as FieldOps fixture
  participant Rules as Mapping and validation
  participant Reports as Output files
  participant History as SQLite run history

  User->>Entry: Request jobs export
  Entry->>Service: Submit date range
  Service->>Source: Fetch paginated jobs
  Source-->>Service: Return source jobs
  Service->>Source: Fetch related job updates
  Source-->>Service: Return job updates
  Service->>Rules: Map fields and validate rows
  Rules-->>Service: Return clean rows and rejected rows
  Service->>Rules: Remove later duplicates
  Rules-->>Service: Return final clean rows and duplicate rejects
  Service->>Reports: Write CSV, JSON, rejected rows, and summary
  Service->>History: Store run counts and output paths
  Service-->>Entry: Return export summary
  Entry-->>User: Print or return summary
Loading

The FastAPI endpoint and CLI use the same export service, so both entry points produce the same results.

Outputs

outputs/
  clean_jobs.csv
  clean_jobs.json
  rejected_jobs.csv
  run_summary.md

clean_jobs.csv is the spreadsheet export. clean_jobs.json contains the same accepted records in JSON form. rejected_jobs.csv keeps invalid and duplicate records visible with reason codes. run_summary.md records the run counts and file paths.

Project structure

clean-data-export-api/
├── README.md
├── pyproject.toml
├── uv.lock
├── src/clean_data_export_api/
│   ├── app.py              # FastAPI entry point
│   ├── cli.py              # Typer CLI
│   ├── config.py           # local runtime paths
│   ├── models.py           # Pydantic contracts
│   ├── source_api.py       # fixture-backed source API
│   ├── export_service.py   # shared export workflow
│   ├── mapping.py          # source-to-output mapping
│   ├── validation.py       # required-field and duplicate rules
│   ├── repository.py       # SQLite run history
│   └── reports.py          # CSV, JSON, and summary writers
├── sample_data/
│   ├── source_jobs.json
│   └── source_job_updates.json
├── outputs/
│   ├── clean_jobs.csv
│   ├── clean_jobs.json
│   ├── rejected_jobs.csv
│   └── run_summary.md
├── docs/
│   ├── PRD.md
│   ├── ARCHITECTURE.md
│   ├── DELIVERY.md
│   └── adr/
│       └── 0001-project-scope.md
└── tests/

Run it

Run the CLI export:

uv sync
uv run clean-data-export-api export jobs --from-date 2026-06-01 --to-date 2026-06-05

Serve the local API:

uv run clean-data-export-api serve
POST /exports/jobs
Content-Type: application/json

{
  "from_date": "2026-06-01",
  "to_date": "2026-06-05"
}

The API uses the configured local output directory. The CLI also supports explicit --output-dir, --database-path, and --sample-data-dir options for local runs.

Documentation

Safety and limits

This project does not use real credentials, scraping, login bypass, paid APIs, or customer data. All sample data is fictional.

This project should not claim readiness for live operations, guaranteed business outcomes, advanced security guarantees, or support for a real vendor API before that API has been reviewed.

About

Turns messy job export data into clean CSV/JSON files with rejected-row notes and run history.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages