Skip to content
View emmanuelmassawe's full-sized avatar

Block or report emmanuelmassawe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
emmanuelmassawe/README.md

Typing SVG



     




👨‍💻 About Me

class EmmanuelLelo:
    def __init__(self):
        self.name       = "Emmanuel Lelo"
        self.role       = ["Data Scientist", "ML Engineer", "MLOps Engineer"]
        self.location   = "Dar es Salaam, Tanzania 🇹🇿"
        self.education  = "BSc Data Science — EASTC (in progress)"
        self.focus      = "End-to-end ML systems: data → model → production"

    def current(self):
        return {
            "working_on" : "Production-grade MLOps pipelines",
            "learning"   : ["GCP Cloud Run", "SHAP", "Evidently AI", "PySpark"],
            "open_to"    : "ML/AI collaborations & freelance projects"
        }

🔭 I'm currently working on ...

End-to-end machine learning solutions — designing scalable data pipelines, training and optimising models, and deploying production-ready AI systems with full MLOps infrastructure (MLflow, DVC, Prefect, Docker, FastAPI, GitHub Actions).

🌱 I'm currently learning ...

Advanced cloud-native ML deployment on GCP Cloud Run and AWS ECS, model explainability with SHAP, real-time drift detection using Evidently AI, and distributed data processing with Apache Spark (PySpark).

👯 I'm looking to collaborate on ...

  • Real-world ML/AI projects that solve meaningful problems in finance, healthcare, or business operations
  • MLOps pipelines and production AI infrastructure
  • Open-source data science tools and experiment tracking workflows

💬 Ask me about ...

  • Building end-to-end ML pipelines from data collection to model deployment
  • Handling class imbalance (SMOTE, threshold tuning, stratified cross-validation)
  • Setting up MLOps infrastructure: MLflow · DVC · Prefect · Docker · CI/CD
  • Deep learning with TensorFlow/Keras for tabular and financial data
  • Feature engineering strategies for real-world structured datasets

⚡ Fun fact ...

I built and shipped a full MLOps stack — experiment tracking, data versioning, orchestration, containerised API, and CI/CD — entirely from Dar es Salaam, Tanzania, as a self-initiated project while still completing my undergraduate degree.

📫 How to reach me ...

   


🚀 Featured Projects

🔍 Financial Fraud Detection

Deep Learning SMOTE XGBoost

Designed and trained a deep ANN to classify fraudulent payment transactions in real time. Applied SMOTE + MinMaxScaler to handle extreme class imbalance and implemented precision-recall threshold optimisation to maximise F1 score beyond the default 0.5 cutoff. Benchmarked against Logistic Regression, XGBoost, and LightGBM using RandomizedSearchCV within sklearn Pipelines. Engineered a production-ready flow with EarlyStopping callbacks, stratified K-Fold cross-validation, and joblib model serialisation for deployment readiness.

✈️ Customer Booking Conversion — British Airways

Random Forest Feature Engineering

Binary classification pipeline trained on 50,000 customer records to predict flight booking completion, achieving a recall of 0.62 on a heavily imbalanced dataset (15% positive class). Engineered 8 domain-specific features including booking lead category, group size, long-haul flag, and extras score. Identified booking origin (34% importance) and trip duration (20%) as top predictors. Delivered a stakeholder deck projecting a 5–10% conversion lift and 15–20% marketing cost reduction.

🏦 Bank Customer Churn Prediction — Lloyds

XGBoost SVM GridSearchCV

End-to-end churn prediction pipeline on 10,000+ customer records, comparing four classifiers to identify the highest performer. Engineered 4 new financial behaviour features (balance-per-product ratio, salary-balance dependency, age group, high-value flag). Tuned Random Forest across 72 combinations and XGBoost across 24 combinations via GridSearchCV — directly applicable to retention campaign targeting.

⚙️ Production Churn System — MLOps Pipeline

FastAPI Docker MLflow

Architected a production-grade MLOps pipeline — DVC + DagsHub for data versioning, MLflow for experiment tracking, Prefect for orchestration, and a containerised REST API serving real-time predictions. Published Docker image achieving 84% accuracy and 63.1% weighted F1. Full CI/CD via GitHub Actions reduced manual deployment steps to zero. Roadmap: Evidently drift detection, SHAP explainability, GCP Cloud Run / AWS ECS.

📊 Lounge Demand & Revenue Modelling — British Airways

Python Seaborn Statistical Modelling

Analysed a 50,000+ flight schedule dataset to model passenger lounge eligibility across 3 tiers, identifying peak demand of ~1,280 users/hour at 07:00–08:00. Built a revenue model projecting £262.8M annual profit and recommended Concorde Room expansion with a 2-month payback on a £7–10M investment. Engineered a multi-sheet Excel reporting workbook and a 6-panel Matplotlib/Seaborn dashboard. Model predictions landed within 3% of actual BA data (long-haul: predicted 41% vs actual 43.9%).


🛠️ Tech Stack & Tools

💻 Languages & Data

Python SQL PySpark Linux

🤖 Machine Learning & AI

scikit-learn TensorFlow Keras XGBoost OpenCV

⚙️ MLOps & Engineering

MLflow DVC Prefect Docker FastAPI GitHub Actions Streamlit

☁️ Cloud & Infrastructure

AWS Terraform Prometheus Grafana

📊 Data & Visualisation

Pandas Matplotlib Seaborn Plotly

🗂️ Version Control

Git GitHub GitLab DagsHub


🎓 Education

🏫 Institution 📚 Degree 📍 Location
Eastern Africa Statistical Training Centre (EASTC) BSc Data Science Dar es Salaam, Tanzania

Core areas: Advanced Data Science · Machine Learning · Statistical Modelling · Data Engineering · MLOps · Model Deployment & Monitoring · Big Data Processing · Cloud-Based ML Solutions · Containerisation · CI/CD · Production-Grade AI Systems


📈 GitHub Stats



Pinned Loading

  1. bank_churn_mlops bank_churn_mlops Public

    Production-grade bank customer churn prediction system — Random Forest + scikit-learn Pipeline (84% acc · 63.1% F1). FastAPI REST API, Docker Hub deployment, full MLOps stack: DVC · MLflow · DagsHu…

    HTML

  2. credit-card-fraud-detection credit-card-fraud-detection Public

    🔐 Production-grade credit card fraud detection system — XGBoost + SMOTE on 284k transactions (99.9% acc · 93.2% F1 · 98.7% AUC). FastAPI serving, Kubernetes deployment with HPA autoscaling, full ML…

    Python

  3. british-airways-customer-booking-predictions- british-airways-customer-booking-predictions- Public

    Binary classification pipeline predicting flight booking completion on 50,000 customer records (0.62 recall · 15% class imbalance). 8 engineered features, SMOTE oversampling, Random Forest + XGBoos…

    Jupyter Notebook

  4. british-airways-data-analysis- british-airways-data-analysis- Public

    Data science analysis of BA Terminal 3 lounge operations — modelled passenger eligibility across 3 tiers on 50k+ flight records. Revenue model projecting £262.8M annual profit with predictions with…

    Jupyter Notebook

  5. -Lloyds-Bank-Group-Customer-Churn-Prediction -Lloyds-Bank-Group-Customer-Churn-Prediction Public

    End-to-end bank customer churn prediction pipeline comparing 4 classifiers (Logistic Regression · Random Forest · XGBoost · SVM) on 10,000+ customer records. 4 engineered behavioural features, SMOT…

    Jupyter Notebook

  6. -Synthetic-Financial-Dataset-Fraud-Detection-with-Deep-Learning-ANN- -Synthetic-Financial-Dataset-Fraud-Detection-with-Deep-Learning-ANN- Public

    Deep learning fraud detection on 284,807 synthetic transactions (0.17% fraud rate) using a TensorFlow/Keras ANN (128→64→1, Dropout 0.5). Precision-recall threshold optimisation over default 0.5 cut…

    Jupyter Notebook