Skip to content
View gelinant's full-sized avatar
🤠
🤠

Block or report gelinant

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gelinant/README.md

Antoine Gélin

Data Engineer · Modern Data Stack · Python · dbt · Spark · AWS · GCP

Website · LinkedIn · Email

Python dbt Apache Spark AWS Google Cloud Terraform


I build data platforms that are easier to trust, operate, and extend.

My background is in Java/Spark production systems on high-volume banking platforms. I now focus on Modern Data Stack architectures: Python ingestion, raw storage, dbt transformations, cloud warehouses, BI layers, data quality, orchestration, and clean handover to data teams.

What I Like To Solve

Problem What I bring
Fragile Python or SQL ETL Modular ELT design with clear responsibilities
Slow or risky batch pipelines Incremental strategies, raw replay, validation, monitoring
Business logic spread everywhere dbt layers, tests, documentation, governed marts
BI blocked by data debt Analytics-ready models and self-service foundations
Cloud migration or platform rebuild Practical architecture, delivery focus, team handover

Featured Build

Serverless ELT platform on Google Cloud for Velib Metropole open data.

It captures the state of the bike network every 5 minutes, stores raw JSON in GCS, loads BigQuery, transforms with dbt, and exposes analytics-ready tables for a Streamlit dashboard.

Area Stack
Ingestion Python 3.12, Cloud Run Jobs, Cloud Scheduler, Cloud Workflows
Storage & warehouse Google Cloud Storage, BigQuery
Transformation dbt-core, dbt-bigquery, incremental models
Infrastructure Terraform, Docker, IAM least privilege
Quality pytest, ruff, SQLFluff, CI/CD

The goal is to show a complete data platform, not just a pipeline: reproducible infrastructure, raw replay, layered modeling, tests, documentation, and cost-conscious serverless execution.

Experience Snapshot

Context Highlights
Fintech scale-up Rebuilt a data platform in 3 months with Python, S3, Redshift Serverless, dbt, AWS Fargate, QuickSight, RBAC, and GDPR masking policies. Reduced a nightly pipeline from 3-4h to less than 1h.
Banking platform Built and maintained Java/Spark distributed pipelines on production systems with Hive, SQL Server, Jenkins, and 24/7 operational constraints.
Banking group Worked on Java, Spark, IBM ODM, business rules, distributed processing, TDD, and Agile delivery.

Toolbox

Data Cloud Engineering
Python AWS S3 Terraform
SQL Redshift Serverless Docker
dbt AWS Fargate GitHub Actions
Spark QuickSight Forgejo Actions
Hadoop / Hive BigQuery Jenkins
Data quality GCS pytest / ruff / SQLFluff

How I Work

  • I prefer simple, explicit data flows over clever pipelines that nobody wants to touch.
  • I separate ingestion, raw storage, transformations, marts, and BI responsibilities.
  • I care about tests, documentation, naming, and operational clarity because data platforms live longer than their first delivery.
  • I like working close to analysts, business teams, DevOps, infrastructure, and security.

Contact

Open to Data Engineering roles or freelance missions around platform modernization, ELT architecture, dbt, cloud warehouses, and reliable analytics foundations.

[email protected] · antoinegelin.alt144.net · github.com/gelinant

Pinned Loading

  1. velib-data-platform velib-data-platform Public

    Python