Built for Adobe Machine Learning Engineer (2026 Batch) Application
A production-grade MLOps pipeline that trains, deploys, monitors, and auto-retrains an image classification model β mirroring Adobe's real infrastructure for Firefly and Creative Cloud AI features.
A complete end-to-end MLOps system with:
- Image Classification API β EfficientNet-B0 model trained on 14,000 real images achieving 86.85% validation accuracy
- A/B Testing β Deterministic per-user routing (80% stable / 20% canary) with instant rollback
- MLflow Tracking β Full experiment tracking, model registry, and versioning
- Drift Detection β Automatic data drift monitoring with Evidently AI
- CI/CD Pipeline β GitHub Actions with lint, test, and Docker build stages β Passing
Here click my [@project-website] (https://ml-ops-image-classification-production.up.railway.app/docs#/default/predict_predict_post) as of now I deployed this using railway.app I trained of 5 epochs, if it trained around 10 epochs then it confidence will increase from 20 to 80-90 percent
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Actions CI/CD β
β push β lint β test β Docker build β deploy β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
β FastAPI β β MLflow β β PostgreSQL β
β A/B Test β β Registry β β Database β
β port 8000 β β port 5001 β β port 5432 β
βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
| Category | Tools |
|---|---|
| ML Framework | PyTorch 2.3, EfficientNet-B0, Scikit-learn |
| MLOps | MLflow 2.13 (experiment tracking, model registry) |
| Serving | FastAPI, Uvicorn |
| Containerisation | Docker, Docker Compose |
| CI/CD | GitHub Actions β |
| Drift Detection | Evidently AI |
| Database | PostgreSQL |
| Cloud Ready | Terraform (AWS ECR, S3, SageMaker) |
| Metric | Value |
|---|---|
| Model | EfficientNet-B0 (ImageNet pretrained) |
| Dataset | Intel Image Classification (14,000 real images) |
| Classes | buildings, forest, glacier, mountain, sea, street |
| Training Epochs | 10 |
| Best Val Accuracy | 86.85% |
| Model Version | v2 (registered in MLflow) |
- Docker Desktop installed
- 4GB free disk space
git clone https://github.com/sumith25-dev/ML-ops-Image-Classification.git
cd ML-ops-Image-Classification
docker-compose up -d api mlflow postgres| Service | URL |
|---|---|
| FastAPI docs | http://localhost:8000/docs |
| MLflow UI | http://localhost:5001 |
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://images.pexels.com/photos/1547813/pexels-photo-1547813.jpeg",
"user_id": "user_001"
}'Response:
{
"label": "forest",
"confidence": 0.87,
"model_version": "stable",
"latency_ms": 245.32
}User ID β MD5 Hash β Bucket (0-9999)
0-7999 β stable model (80% traffic)
8000-9999 β canary model (20% traffic)
Same user always hits the same model β consistent experience.
Train β MLflow Registry β Assign 'stable' alias β API loads automatically
β Assign 'canary' alias β 20% traffic routed
Run drift_detector.py
β If drift > 30% OR accuracy < 80%
β Flag for retraining
β Promote new model to canary
git push β GitHub Actions
β Lint (ruff) β
β Unit Tests β
β Docker Build β
ML-ops-Image-Classification/
βββ app/
β βββ main.py # FastAPI β predict, health, metrics, rollback
β βββ model_manager.py # MLflow loading, A/B routing, inference
βββ model/
β βββ train.py # EfficientNet training with MLflow tracking
βββ monitoring/
β βββ drift_detector.py # Evidently drift analysis
βββ pipeline/
β βββ mlops_dag.py # Airflow DAG β drift β retrain β promote
βββ tests/
β βββ test_api.py # Unit tests
βββ infra/
β βββ main.tf # Terraform β AWS ECR, S3, SageMaker
β βββ prometheus.yml # Prometheus config (future deployment)
βββ notebooks/
β βββ exploration.ipynb # EDA and drift visualisation
βββ .github/workflows/
β βββ ci_cd.yml # GitHub Actions CI/CD β
passing
βββ docker-compose.yml # Full local stack
βββ Dockerfile # API container
βββ requirements.txt
# Run manually or schedule via Airflow
python monitoring/drift_detector.py
# Auto-retraining triggers when:
# - drift_share >= 30% of features drifted
# - model accuracy < 80%cd infra
terraform init
terraform apply -var="environment=prod"Provisions:
- ECR repository for Docker images
- S3 bucket for MLflow artifacts
- SageMaker endpoint with blue/green deployment
- IAM roles and policies
-
86.85% accuracy on 14,000 real images using EfficientNet-B0
-
A/B testing β Deterministic per-user hash routing for consistent user experience, configurable split at runtime
-
CI/CD β Every commit automatically lints, tests, and builds Docker image via GitHub Actions β
-
Drift detection β Evidently AI detects when production data drifts from training distribution, triggering automatic retraining
-
MLflow registry β Complete model versioning with aliases, metrics tracking, and artifact storage
-
Zero-downtime deployment β Blue/green deployment with instant rollback via single API call
Sumith B R β Applying for Adobe Machine Learning Engineer (2026 Batch)
GitHub: @sumith25-dev
MIT License