Nirdosh Gandhi ngandhi369

Nirdosh Gandhi

Data Engineer · Azure Databricks · Lakehouse Architecture · MLflow · PySpark

I build data pipelines that survive production.

Currently at Axtria (Bengaluru), designing Spark-based Lakehouse systems for pharmaceutical clients — serving 300+ sales reps across 10 markets with ML-driven HCP engagement recommendations via Veeva CRM.

🏗️ Specialise in medallion Lakehouse architecture (Bronze → Silver → Gold) with Unity Catalog governance across DEV / UAT / PROD
⚡ Reduced Spark ETL execution time by 60% through partition strategy refinement and join optimisation
🤖 Build ML-integrated data workflows — from raw ingestion to model scoring pipelines consumed by downstream CRM systems
🚀 Automate everything — Databricks Asset Bundles + GitHub Actions = zero-touch, repeatable deployments

🔥 Featured Project

Databricks End-to-End Data Product

Production-grade data product: CSV ingestion → Medallion Lakehouse → ML segmentation → AI-assisted dashboard → automated report delivery → live REST API. Fully automated. No manual steps.

🌐 Live API: https://databricks-asset-bundle-deployment.onrender.com
📖 Swagger UI: https://databricks-asset-bundle-deployment.onrender.com/docs

Architected a fully automated, end-to-end Databricks data product covering CSV ingestion, Bronze → Silver → Gold medallion transformation, ML training, dashboard analytics, and live API serving — deployed with zero manual steps via GitHub Actions CI/CD.
Implemented idempotent Delta MERGE-based ETL using PySpark alongside a parallel Delta Live Tables (DLT) pipeline with @dlt.expect constraints for declarative data quality enforcement and pipeline lineage tracking.
Trained and registered a scikit-learn KMeans customer segmentation model in Unity Catalog Model Registry via MLflow, evaluating cluster quality with silhouette score and elbow method across k=2–6.
Built a Databricks SQL Dashboard ("Customer Intelligence Dashboard") with 4 visualisations — top customers, revenue by city, recency distribution, and ML segment breakdown — integrated with Databricks Genie for natural language querying; configured automated hourly report delivery to subscribed stakeholders post pipeline completion.
Deployed a FastAPI on Render.com backed by Databricks Serverless (Spark Connect) for live query execution, with API key authentication and Swagger UI. Automated full deployment via Databricks Asset Bundles (DAB) and GitHub Actions across DEV/PROD with approval gates, ruff linting, and pytest smoke tests.

Component	Stack
Medallion ETL (Bronze → Silver → Gold)	PySpark · Delta Lake · Delta MERGE
Declarative pipeline with data quality	Delta Live Tables · `@dlt.expect`
KMeans customer segmentation	scikit-learn · MLflow · Unity Catalog Model Registry
Customer Intelligence Dashboard	Databricks SQL · Genie AI (natural language queries)
Automated report delivery to subscribers	Databricks Dashboard Subscriptions (hourly)
Live REST API with authentication	FastAPI · Spark Connect · Render.com
Zero-touch CI/CD across DEV & PROD	GitHub Actions · Databricks Asset Bundles

🛠️ Tech Stack

Data Engineering

ML & Experimentation

CI/CD & Orchestration

API & Serving

Languages

🏅 Certifications

📝 Latest Writing

🚀 From Azure DevOps to GitHub Actions: How We Modernised Our Databricks CI/CD

🤝 Let's Connect

Data Engineer · M.Tech Artificial Intelligence · Delhi Technological University · IEEE Published · Axtria Bravo Award 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nirdosh Gandhi ngandhi369

Achievements