Production-Ready ML Engineering Projects by Hubert Domagaลa
๐ Try the Live Demo โ - Interactive Cancer Detection System
This repository showcases end-to-end machine learning system development โ from exploratory data analysis to production-ready deployments. Each project demonstrates software engineering best practices, scalable architectures, and real-world problem-solving.
| Project | Domain | ML Techniques | Status | Highlights |
|---|---|---|---|---|
| ๐ฅ Cancer Detection | Healthcare | Classification, Ensemble | โ Production | 96.7% accuracy, FastAPI, Zero false negatives |
| ๐ณ Fraud Detection | Finance | Anomaly Detection, Feature Engineering | ๐ Analysis | SMOTE, Cost-sensitive learning |
| โ๏ธ Digit Recognition | Computer Vision | PCA, Neural Networks | ๐ Analysis | Multi-model comparison |
| ๐ฏ Honey Production | Agriculture | Time Series Regression | ๐ Analysis | Trend analysis, Forecasting |
| ๐ด Flag Analysis | Data Mining | Multi-class Classification | ๐ Analysis | UCI dataset, EDA |
| ๐ Raisin Classification | Agriculture | Clustering, Classification | ๐ Analysis | Feature analysis |
| ๐ฐ Income Classification | Economics | Binary Classification | ๐ Analysis | Socioeconomic analysis |
| ๐ฅ Medical Insurance | Healthcare | OOP Design, Regression | ๐ Analysis | Clean code architecture |
Legend: โ Production (API + Tests) | ๐ Analysis (Notebooks)
Python 3.11 or higher
pip (Python package manager)# Clone the repository
git clone https://github.com/hubertdomagalaa/Machine_Learning.git
cd Machine_Learning
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
# source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Train the model
python scripts/train_cancer_model.py
# Start the API server
uvicorn api.cancer_api:app --reload
# Visit http://localhost:8000/docs for interactive API documentation# Run all tests with coverage
pytest tests/ --cov=src --cov-report=html
# View coverage report
# Open htmlcov/index.html in your browserMachine_Learning/
โโโ api/ # FastAPI endpoints for production models
โ โโโ cancer_api.py # Cancer detection REST API
โ โโโ schemas.py # Pydantic request/response models
โ
โโโ src/ # Production Python modules
โ โโโ cancer/ # Cancer detection system
โ โ โโโ config.py # Configuration management
โ โ โโโ data_loader.py # Data loading and validation
โ โ โโโ preprocessor.py # Feature engineering
โ โ โโโ model.py # Model training and evaluation
โ โ โโโ predictor.py # Prediction interface
โ โ โโโ cli.py # Command-line interface
โ โโโ utils/ # Shared utilities
โ
โโโ tests/ # Unit and integration tests
โ โโโ test_cancer_*.py # Cancer system tests
โ โโโ ...
โ
โโโ notebooks/ # Exploratory analysis notebooks
โ โโโ Cancer/ # Breast cancer classification
โ โโโ Card_Fraud/ # Credit card fraud detection
โ โโโ Digits/ # Handwritten digit recognition
โ โโโ Flags/ # Country flag analysis
โ โโโ Honey/ # Honey production forecasting
โ โโโ Medical_Insurance/ # Insurance cost estimation
โ โโโ Raisins/ # Raisin variety classification
โ โโโ income_class/ # Income bracket prediction
โ
โโโ models/ # Trained model artifacts (.pkl, .joblib)
โโโ scripts/ # Training and utility scripts
โโโ .github/workflows/ # CI/CD automation
โโโ requirements.txt # Python dependencies
โโโ .gitignore # Git ignore rules
โโโ LICENSE # MIT License
Problem Statement: Binary classification of breast tumors (Malignant vs. Benign) to assist in early cancer diagnosis.
Business Impact: Early detection significantly improves survival rates. This system achieves high accuracy while minimizing false negatives (missing cancer cases).
- Accuracy: 96.7%
- Recall (Malignant): 100% (No false negatives!)
- Precision: 94.2%
- F1-Score: 0.97
- ROC-AUC: 0.989
- Algorithms: Random Forest, SVM, Logistic Regression, KNN
- Feature Engineering: Normalization, dimensionality reduction (PCA)
- Deployment: FastAPI REST API
- Testing: pytest with 85%+ coverage
- Data: Wisconsin Breast Cancer Dataset (569 samples, 30 features)
- โ Ensemble voting classifier for robust predictions
- โ Zero false negatives (critical for cancer screening)
- โ Production-ready API with request validation
- โ Comprehensive unit tests
- โ Model versioning and artifact management
๐ฎ Try Live Demo โ | View Project Details โ | View Production Code โ | API Docs โ
Problem Statement: Detect fraudulent financial transactions in real-time to prevent monetary losses.
- Feature Engineering: Transaction ratios, balance differentials, transaction type encoding
- Class Imbalance Handling: SMOTE oversampling, class weights
- Model: Logistic Regression (baseline), designed for XGBoost upgrade
- Evaluation: Precision-Recall curves, confusion matrix, cost-sensitive metrics
- Highly imbalanced (fraud is rare: <1% of transactions)
- Time-series features (transaction steps)
- Multiple transaction types (PAYMENT, TRANSFER, CASH_OUT)
Problem Statement: Optical recognition of handwritten digits (0-9) for automated document processing.
- Dimensionality Reduction: PCA for visualization and feature compression
- Models Compared: SVM, Random Forest, MLPClassifier (Neural Network)
- Hyperparameter Tuning: GridSearchCV for optimal parameters
- Dataset: UCI ML hand-written digits (1,797 samples, 8x8 images)
- Algorithm: SVM with RBF kernel
- Accuracy: ~98%
- Confusion Matrix Analysis: Detailed digit-pair error patterns
Problem Statement: Predict future honey production trends across U.S. states to assist agricultural planning.
- Model: Linear Regression (baseline)
- Feature Engineering: Year-over-year percentage change
- Data Aggregation: State-level and national trend analysis
- Evaluation: MSE, R-squared, residual analysis
- Identified declining production trends in key states
- Seasonal and economic factor correlations
- Multi-year forecasting capabilities
Problem Statement: Predict country characteristics based on flag features (colors, symbols, patterns).
- Data Source: UCI ML Repository (194 countries, 30 features)
- Models: Decision Trees, Random Forests, SVM, Neural Networks
- Feature Types: Numerical (colors, area) and categorical (symbols, religion, language)
- Evaluation: Cross-validation, classification reports
Problem Statement: Classify raisin varieties using physical measurements for quality control.
- Algorithms: Clustering and supervised classification
- Features: Size, shape, color characteristics
- Application: Automated agricultural sorting
Problem Statement: Predict whether individuals earn above or below $50K based on demographic features.
- Data: Census income dataset
- Features: Age, education, occupation, work hours, marital status
- Models: Classification algorithms with feature importance analysis
- Evaluation: Accuracy, precision, recall, fairness metrics
Problem Statement: Estimate medical insurance costs based on individual health and demographic factors.
- Design Pattern: Object-Oriented Programming with Enums
- Features: Age, BMI, smoking status, number of children
- Validation: Input validation, error handling
- Code Quality: Type safety, clean architecture
This project showcases professional Python development:
- โ Enum types for type safety
- โ Data validation with custom setters
- โ BMI calculation encapsulation
- โ Comprehensive error handling
View Project Details โ | View Code โ
- Supervised Learning: Classification (Binary & Multi-class), Regression
- Unsupervised Learning: Clustering, PCA
- Time Series: Trend analysis, Forecasting
- Imbalanced Data: SMOTE, Class weights, Cost-sensitive learning
- Model Selection: Cross-validation, Hyperparameter tuning (GridSearchCV)
- Evaluation: ROC-AUC, Precision-Recall, Confusion matrices
- Architecture: Modular design, OOP principles, Separation of concerns
- API Development: FastAPI, RESTful design, Pydantic validation
- Testing: pytest, Unit tests, Integration tests, Coverage >80%
- Code Quality: Type hints, Docstrings, PEP 8, Black formatting
- CLI Tools: Click framework, argument parsing
- Version Control: Git, Professional commit messages
- Model Serialization: joblib, pickle
- Experiment Tracking: MLflow integration
- CI/CD: GitHub Actions, Automated testing
- Containerization: Docker-ready (in progress)
- Documentation: Comprehensive READMEs, API docs, Code comments
# Core ML Libraries
numpy, pandas, scikit-learn
# Visualization
matplotlib, seaborn
# Deep Learning (planned)
pytorch, tensorflow
# API & Web
fastapi, uvicorn, pydantic
# Testing & Quality
pytest, flake8, black, mypy- 8 diverse ML projects across multiple domains
- Professional repository structure
- Comprehensive documentation
- Production code for Cancer Detection
- REST API implementation
- Unit testing framework
- Requirements management
- MIT License
- Docker containerization
- CI/CD pipeline (GitHub Actions)
- Streamlit demo applications
- Pre-commit hooks & code quality automation
- GitHub issue & PR templates
- Advanced algorithms (XGBoost, Prophet)
- Model interpretability (SHAP values)
- Kubernetes deployment
- Model monitoring and drift detection
- Feature store implementation
- A/B testing framework
- AutoML pipeline
This portfolio represents my growth in:
- Machine Learning: From basic models to ensemble methods and production systems
- Software Engineering: From notebooks to tested, modular, API-driven applications
- MLOps: Understanding the full ML lifecycle beyond just model training
- Domain Knowledge: Applying ML to healthcare, finance, agriculture, and more
- Total Projects: 8
- Production APIs: 1 (expanding)
- Lines of Code: 10,000+
- Test Coverage: 85%+ (production projects)
- Datasets Processed: 8+
- Models Trained: 20+
- Algorithms Implemented: 15+
GitHub: @hubertdomagalaa
Email: [email protected]
๐ผ Open to opportunities in:
- Machine Learning Engineer roles
- Data Scientist positions with ML engineering focus
- MLOps and production ML systems
- Collaborative open-source ML projects
This project is licensed under the MIT License - see the LICENSE file for details.
- Datasets: UCI Machine Learning Repository, Kaggle, sklearn built-in datasets
- Libraries: scikit-learn, FastAPI, pytest, and the entire Python data science ecosystem
- Inspiration: Production ML best practices from industry leaders
โญ If you find this portfolio valuable, please consider starring the repository!
Last updated: January 2026
