Skip to content
View ngandhi369's full-sized avatar
πŸ’»
At Office
πŸ’»
At Office

Block or report ngandhi369

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ngandhi369/README.md

Nirdosh Gandhi

Data Engineer Β· Azure Databricks Β· Lakehouse Architecture Β· MLflow Β· PySpark


I build data pipelines that survive production.

Currently at Axtria (Bengaluru), designing Spark-based Lakehouse systems for pharmaceutical clients β€” serving 300+ sales reps across 10 markets with ML-driven HCP engagement recommendations via Veeva CRM.

  • πŸ—οΈ Specialise in medallion Lakehouse architecture (Bronze β†’ Silver β†’ Gold) with Unity Catalog governance across DEV / UAT / PROD
  • ⚑ Reduced Spark ETL execution time by 60% through partition strategy refinement and join optimisation
  • πŸ€– Build ML-integrated data workflows β€” from raw ingestion to model scoring pipelines consumed by downstream CRM systems
  • πŸš€ Automate everything β€” Databricks Asset Bundles + GitHub Actions = zero-touch, repeatable deployments

πŸ”₯ Featured Project

CI/CD Python Databricks

Production-grade data product: CSV ingestion β†’ Medallion Lakehouse β†’ ML segmentation β†’ AI-assisted dashboard β†’ automated report delivery β†’ live REST API. Fully automated. No manual steps.

🌐 Live API: https://databricks-asset-bundle-deployment.onrender.com
πŸ“– Swagger UI: https://databricks-asset-bundle-deployment.onrender.com/docs

  • Architected a fully automated, end-to-end Databricks data product covering CSV ingestion, Bronze β†’ Silver β†’ Gold medallion transformation, ML training, dashboard analytics, and live API serving β€” deployed with zero manual steps via GitHub Actions CI/CD.
  • Implemented idempotent Delta MERGE-based ETL using PySpark alongside a parallel Delta Live Tables (DLT) pipeline with @dlt.expect constraints for declarative data quality enforcement and pipeline lineage tracking.
  • Trained and registered a scikit-learn KMeans customer segmentation model in Unity Catalog Model Registry via MLflow, evaluating cluster quality with silhouette score and elbow method across k=2–6.
  • Built a Databricks SQL Dashboard ("Customer Intelligence Dashboard") with 4 visualisations β€” top customers, revenue by city, recency distribution, and ML segment breakdown β€” integrated with Databricks Genie for natural language querying; configured automated hourly report delivery to subscribed stakeholders post pipeline completion.
  • Deployed a FastAPI on Render.com backed by Databricks Serverless (Spark Connect) for live query execution, with API key authentication and Swagger UI. Automated full deployment via Databricks Asset Bundles (DAB) and GitHub Actions across DEV/PROD with approval gates, ruff linting, and pytest smoke tests.
Component Stack
Medallion ETL (Bronze β†’ Silver β†’ Gold) PySpark Β· Delta Lake Β· Delta MERGE
Declarative pipeline with data quality Delta Live Tables Β· @dlt.expect
KMeans customer segmentation scikit-learn Β· MLflow Β· Unity Catalog Model Registry
Customer Intelligence Dashboard Databricks SQL Β· Genie AI (natural language queries)
Automated report delivery to subscribers Databricks Dashboard Subscriptions (hourly)
Live REST API with authentication FastAPI Β· Spark Connect Β· Render.com
Zero-touch CI/CD across DEV & PROD GitHub Actions Β· Databricks Asset Bundles

πŸ› οΈ Tech Stack

Data Engineering

Databricks PySpark Delta Lake Unity Catalog Azure ADF

ML & Experimentation

MLflow scikit-learn Databricks Genie

CI/CD & Orchestration

GitHub Actions Azure DevOps Databricks Asset Bundles

API & Serving

FastAPI Render

Languages

Python SQL


πŸ… Certifications

Databricks Certified Data Engineer Associate Β Β  Azure AI Fundamentals Β Β  ML Specialization


πŸ“ Latest Writing


🀝 Let's Connect


Data Engineer Β· M.Tech Artificial Intelligence Β· Delhi Technological University Β· IEEE Published Β· Axtria Bravo Award 2025

Pinned Loading

  1. databricks-e2e-data-product databricks-e2e-data-product Public

    A practical implementation of CI/CD for Databricks where Asset Bundles handle deployment packaging and GitHub Actions orchestrates automated workflows. Enables consistent, version-controlled, and e…

    Python

  2. Gesture-Controller Gesture-Controller Public

    Developed python exe setup using tkinter to fully control mouse & keyboard movements by human hand and eye gestures. It works with the help of different libraries and packages like OpenCV, Mediapip…

    Python 8 1

  3. News News Public

    Forked from Devansh-ah/News

    Developed web app using NLP & Django to summarizes the articles which are fetched from News API according to users' prefilled country preference. Deployed this model on Heroku platform.

    HTML

  4. Invisible_Cloak Invisible_Cloak Public

    OpenCV-Python project. It hides particular things according to OpenCV HSV colors and make it invisible while facing the camera.

    Python 1 1

  5. G-Home_Assistant G-Home_Assistant Public

    Used NodeMCU 32bit ESP8266 & Relay module as hardware. Apart from these, I used different platforms: Google Assistant, IFTTT, Adafruit or Blynk. On giving trigger to Google assistant from smart pho…

    C++ 1

  6. AI-Email-Classifier AI-Email-Classifier Public

    Flask web app made using machine learning model. It uses mails from authorized user's Gmail and shows mails with categorical label on web app based on the mail messages using preprocessed machine l…

    Python 13