Skip to content

Immanuel2004/Dataflow-Deployment

Repository files navigation

Dataflow Deployment — End-to-End MLOps Pipeline

Overview

Dataflow Deployment is a complete MLOps project that automates the entire machine learning workflow — from data ingestion to model deployment .

It is designed using a modular architecture with components for ETL, data validation, transformation, model training, evaluation, MLflow tracking, CI/CD automation , and AWS cloud deployment .


Project Architecture

MongoDB → Data Ingestion → Data Validation → Data Transformation
→ Model Training & Hyperparameter Tuning → MLflow Tracking → DagsHub Integration
→ Model Evaluation → Model Registry → AWS Deployment (via GitHub Actions)

Folder Structure

Dataflow Deployment/
│
├── data_schema/
│   └── schema.yaml
│
├── final_models/
│   ├── model.pkl
│   └── preprocessor.pkl
│
├── Network_Data/
│   └── Phishing_Legitimate_full.csv
│
├── networksecurity/
│   ├── __init__.py
│   ├── cloud/
│   │   └── __init__.py
│   ├── components/
│   │   ├── __init__.py
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── data_validation.py
│   │   └── model_trainer.py
│   ├── constants/
│   │   └── training_pipeline/
│   │       └── __init__.py
│   ├── entity/
│   │   ├── artifact_entity.py
│   │   └── config_entity.py
│   ├── exception/
│   │   └── exception.py
│   ├── logging/
│   │   └── logger.py
│   ├── pipeline/
│   │   ├── batch_prediction.py
│   │   └── training_pipeline.py
│   └── utils/
│       ├── main_utils/
│       │   └── utils.py
│       └── ml_utils/
│           ├── metric/
│           │   └── classification_metric.py
│           └── model/
│               └── estimator.py
│
├── notebooks/
│   └── __init__.py
├── prediction_output/
│   └── output.csv
├── templates/
│   ├── dashboard.html
│   ├── index.html
│   └── table.html
├── valid_data/
│   └── test.csv
├── .gitignore
├── app.py
├── main.py
├── Dockerfile
├── README.md
├── requirements.txt
└── setup.py

Key Features

  • ETL Pipeline
    • Automated data extraction from MongoDB
    • Data validation and transformation
    • Schema integrity and null checks
  • Model Training & Evaluation
    • Modular training pipeline with hyperparameter tuning
    • MLflow & DagsHub tracking for experiments and models
    • Automated evaluation and metric logging
  • CI/CD Automation
    • GitHub Actions workflow (main.yml) for continuous integration and deployment
    • Dockerized build and AWS deployment pipeline
  • Cloud Deployment
    • Model deployment to AWS (EC2 / Elastic Beanstalk / ECS)
    • Integrated with MLflow for model registry management

Tech Stack

Category Tools / Libraries / Services
Programming Language Python
Web Framework / API FastAPI, Flask
Frontend / UI Bootstrap, Jinja2 Templates, Chart.js
Data Storage / DB MongoDB
Machine Learning / Analytics Scikit-learn, Pandas, NumPy
Experiment Tracking MLflow, DagsHub
Version Control Git, GitHub
CI/CD / Automation GitHub Actions
Containerization Docker
Cloud Deployment AWS (EC2 / Elastic Beanstalk / ECS)
Other Tools Certifi, Python-dotenv, Uvicorn

Setup Instructions

Clone the Repository

git clone https://github.com/Immanuel2004/dataflow-deployment.git
cd dataflow-deployment

Create and Activate Virtual Environment

python3 -m venv venv
source venv/bin/activate    # macOS/Linux
venv\Scripts\activate       # Windows

Install Dependencies

pip install -r requirements.txt

Run the Project

python app.py

MLflow & DagsHub Integration

Set up your DagsHub tracking environment:

export MLFLOW_TRACKING_URI=https://dagshub.com/Immanuel2004/dataflow-deployment.mlflow
export MLFLOW_TRACKING_USERNAME=Immanuel2004

Start MLflow UI locally:

mlflow ui

Access at http://127.0.0.1:5000.


Deployment

Using Docker

Build the image:

docker build -t dataflow-deployment .

Run the container:

docker run -p 8080:8080 dataflow-deployment

Deploy to AWS

The deployment process is automated via GitHub Actions (.github/workflows/main.yml).

Once you push changes to the main branch:

  • The pipeline runs tests and linting
  • Builds and pushes Docker images
  • Deploys the application to AWS

CI/CD Workflow

The GitHub Actions workflow automates:

  • Linting and static code checks
  • Unit testing
  • Model training and evaluation
  • Docker image creation
  • AWS deployment

Future Enhancements

  • Add Airflow/Prefect for pipeline orchestration
  • Implement data drift detection
  • Integrate model monitoring (Prometheus + Grafana)
  • Automate retraining with live data

Author

Immanuel

📧 [email protected]

🌐 GitHub: Immanuel2004

🌐 DagsHub: Immanuel2004

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors