class EmmanuelLelo:
def __init__(self):
self.name = "Emmanuel Lelo"
self.role = ["Data Scientist", "ML Engineer", "MLOps Engineer"]
self.location = "Dar es Salaam, Tanzania 🇹🇿"
self.education = "BSc Data Science — EASTC (in progress)"
self.focus = "End-to-end ML systems: data → model → production"
def current(self):
return {
"working_on" : "Production-grade MLOps pipelines",
"learning" : ["GCP Cloud Run", "SHAP", "Evidently AI", "PySpark"],
"open_to" : "ML/AI collaborations & freelance projects"
}End-to-end machine learning solutions — designing scalable data pipelines, training and optimising models, and deploying production-ready AI systems with full MLOps infrastructure (MLflow, DVC, Prefect, Docker, FastAPI, GitHub Actions).
Advanced cloud-native ML deployment on GCP Cloud Run and AWS ECS, model explainability with SHAP, real-time drift detection using Evidently AI, and distributed data processing with Apache Spark (PySpark).
- Real-world ML/AI projects that solve meaningful problems in finance, healthcare, or business operations
- MLOps pipelines and production AI infrastructure
- Open-source data science tools and experiment tracking workflows
- Building end-to-end ML pipelines from data collection to model deployment
- Handling class imbalance (SMOTE, threshold tuning, stratified cross-validation)
- Setting up MLOps infrastructure: MLflow · DVC · Prefect · Docker · CI/CD
- Deep learning with TensorFlow/Keras for tabular and financial data
- Feature engineering strategies for real-world structured datasets
I built and shipped a full MLOps stack — experiment tracking, data versioning, orchestration, containerised API, and CI/CD — entirely from Dar es Salaam, Tanzania, as a self-initiated project while still completing my undergraduate degree.
|
Designed and trained a deep ANN to classify fraudulent payment transactions in real time. Applied SMOTE + MinMaxScaler to handle extreme class imbalance and implemented precision-recall threshold optimisation to maximise F1 score beyond the default 0.5 cutoff. Benchmarked against Logistic Regression, XGBoost, and LightGBM using RandomizedSearchCV within sklearn Pipelines. Engineered a production-ready flow with EarlyStopping callbacks, stratified K-Fold cross-validation, and joblib model serialisation for deployment readiness. |
Binary classification pipeline trained on 50,000 customer records to predict flight booking completion, achieving a recall of 0.62 on a heavily imbalanced dataset (15% positive class). Engineered 8 domain-specific features including booking lead category, group size, long-haul flag, and extras score. Identified booking origin (34% importance) and trip duration (20%) as top predictors. Delivered a stakeholder deck projecting a 5–10% conversion lift and 15–20% marketing cost reduction. |
|
End-to-end churn prediction pipeline on 10,000+ customer records, comparing four classifiers to identify the highest performer. Engineered 4 new financial behaviour features (balance-per-product ratio, salary-balance dependency, age group, high-value flag). Tuned Random Forest across 72 combinations and XGBoost across 24 combinations via GridSearchCV — directly applicable to retention campaign targeting. |
Architected a production-grade MLOps pipeline — DVC + DagsHub for data versioning, MLflow for experiment tracking, Prefect for orchestration, and a containerised REST API serving real-time predictions. Published Docker image achieving 84% accuracy and 63.1% weighted F1. Full CI/CD via GitHub Actions reduced manual deployment steps to zero. Roadmap: Evidently drift detection, SHAP explainability, GCP Cloud Run / AWS ECS. |
|
Analysed a 50,000+ flight schedule dataset to model passenger lounge eligibility across 3 tiers, identifying peak demand of ~1,280 users/hour at 07:00–08:00. Built a revenue model projecting £262.8M annual profit and recommended Concorde Room expansion with a 2-month payback on a £7–10M investment. Engineered a multi-sheet Excel reporting workbook and a 6-panel Matplotlib/Seaborn dashboard. Model predictions landed within 3% of actual BA data (long-haul: predicted 41% vs actual 43.9%). |
|
| 🏫 Institution | 📚 Degree | 📍 Location |
|---|---|---|
| Eastern Africa Statistical Training Centre (EASTC) | BSc Data Science | Dar es Salaam, Tanzania |
Core areas: Advanced Data Science · Machine Learning · Statistical Modelling · Data Engineering · MLOps · Model Deployment & Monitoring · Big Data Processing · Cloud-Based ML Solutions · Containerisation · CI/CD · Production-Grade AI Systems