Emmanuel lelo emmanuelmassawe

👨‍💻 About Me

class EmmanuelLelo:
    def __init__(self):
        self.name       = "Emmanuel Lelo"
        self.role       = ["Data Scientist", "ML Engineer", "MLOps Engineer"]
        self.location   = "Dar es Salaam, Tanzania 🇹🇿"
        self.education  = "BSc Data Science — EASTC (in progress)"
        self.focus      = "End-to-end ML systems: data → model → production"

    def current(self):
        return {
            "working_on" : "Production-grade MLOps pipelines",
            "learning"   : ["GCP Cloud Run", "SHAP", "Evidently AI", "PySpark"],
            "open_to"    : "ML/AI collaborations & freelance projects"
        }

🔭 I'm currently working on ...

End-to-end machine learning solutions — designing scalable data pipelines, training and optimising models, and deploying production-ready AI systems with full MLOps infrastructure (MLflow, DVC, Prefect, Docker, FastAPI, GitHub Actions).

🌱 I'm currently learning ...

Advanced cloud-native ML deployment on GCP Cloud Run and AWS ECS, model explainability with SHAP, real-time drift detection using Evidently AI, and distributed data processing with Apache Spark (PySpark).

👯 I'm looking to collaborate on ...

Real-world ML/AI projects that solve meaningful problems in finance, healthcare, or business operations
MLOps pipelines and production AI infrastructure
Open-source data science tools and experiment tracking workflows

💬 Ask me about ...

Building end-to-end ML pipelines from data collection to model deployment
Handling class imbalance (SMOTE, threshold tuning, stratified cross-validation)
Setting up MLOps infrastructure: MLflow · DVC · Prefect · Docker · CI/CD
Deep learning with TensorFlow/Keras for tabular and financial data
Feature engineering strategies for real-world structured datasets

⚡ Fun fact ...

I built and shipped a full MLOps stack — experiment tracking, data versioning, orchestration, containerised API, and CI/CD — entirely from Dar es Salaam, Tanzania, as a self-initiated project while still completing my undergraduate degree.

📫 How to reach me ...

🚀 Featured Projects

🔍 Financial Fraud Detection

Designed and trained a deep ANN to classify fraudulent payment transactions in real time. Applied SMOTE + MinMaxScaler to handle extreme class imbalance and implemented precision-recall threshold optimisation to maximise F1 score beyond the default 0.5 cutoff. Benchmarked against Logistic Regression, XGBoost, and LightGBM using RandomizedSearchCV within sklearn Pipelines. Engineered a production-ready flow with EarlyStopping callbacks, stratified K-Fold cross-validation, and joblib model serialisation for deployment readiness.

✈️ Customer Booking Conversion — British Airways

Binary classification pipeline trained on 50,000 customer records to predict flight booking completion, achieving a recall of 0.62 on a heavily imbalanced dataset (15% positive class). Engineered 8 domain-specific features including booking lead category, group size, long-haul flag, and extras score. Identified booking origin (34% importance) and trip duration (20%) as top predictors. Delivered a stakeholder deck projecting a 5–10% conversion lift and 15–20% marketing cost reduction.

🏦 Bank Customer Churn Prediction — Lloyds

End-to-end churn prediction pipeline on 10,000+ customer records, comparing four classifiers to identify the highest performer. Engineered 4 new financial behaviour features (balance-per-product ratio, salary-balance dependency, age group, high-value flag). Tuned Random Forest across 72 combinations and XGBoost across 24 combinations via GridSearchCV — directly applicable to retention campaign targeting.

⚙️ Production Churn System — MLOps Pipeline

Architected a production-grade MLOps pipeline — DVC + DagsHub for data versioning, MLflow for experiment tracking, Prefect for orchestration, and a containerised REST API serving real-time predictions. Published Docker image achieving 84% accuracy and 63.1% weighted F1. Full CI/CD via GitHub Actions reduced manual deployment steps to zero. Roadmap: Evidently drift detection, SHAP explainability, GCP Cloud Run / AWS ECS.

📊 Lounge Demand & Revenue Modelling — British Airways

Analysed a 50,000+ flight schedule dataset to model passenger lounge eligibility across 3 tiers, identifying peak demand of ~1,280 users/hour at 07:00–08:00. Built a revenue model projecting £262.8M annual profit and recommended Concorde Room expansion with a 2-month payback on a £7–10M investment. Engineered a multi-sheet Excel reporting workbook and a 6-panel Matplotlib/Seaborn dashboard. Model predictions landed within 3% of actual BA data (long-haul: predicted 41% vs actual 43.9%).

🛠️ Tech Stack & Tools

💻 Languages & Data

🤖 Machine Learning & AI

⚙️ MLOps & Engineering

☁️ Cloud & Infrastructure

📊 Data & Visualisation

🗂️ Version Control

🎓 Education

🏫 Institution	📚 Degree	📍 Location
Eastern Africa Statistical Training Centre (EASTC)	BSc Data Science	Dar es Salaam, Tanzania

Core areas: Advanced Data Science · Machine Learning · Statistical Modelling · Data Engineering · MLOps · Model Deployment & Monitoring · Big Data Processing · Cloud-Based ML Solutions · Containerisation · CI/CD · Production-Grade AI Systems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly