Dataflow Deployment — End-to-End MLOps Pipeline

Overview

Dataflow Deployment is a complete MLOps project that automates the entire machine learning workflow — from data ingestion to model deployment .

It is designed using a modular architecture with components for ETL, data validation, transformation, model training, evaluation, MLflow tracking, CI/CD automation , and AWS cloud deployment .

Project Architecture

MongoDB → Data Ingestion → Data Validation → Data Transformation
→ Model Training & Hyperparameter Tuning → MLflow Tracking → DagsHub Integration
→ Model Evaluation → Model Registry → AWS Deployment (via GitHub Actions)

Folder Structure

Dataflow Deployment/
│
├── data_schema/
│   └── schema.yaml
│
├── final_models/
│   ├── model.pkl
│   └── preprocessor.pkl
│
├── Network_Data/
│   └── Phishing_Legitimate_full.csv
│
├── networksecurity/
│   ├── __init__.py
│   ├── cloud/
│   │   └── __init__.py
│   ├── components/
│   │   ├── __init__.py
│   │   ├── data_ingestion.py
│   │   ├── data_transformation.py
│   │   ├── data_validation.py
│   │   └── model_trainer.py
│   ├── constants/
│   │   └── training_pipeline/
│   │       └── __init__.py
│   ├── entity/
│   │   ├── artifact_entity.py
│   │   └── config_entity.py
│   ├── exception/
│   │   └── exception.py
│   ├── logging/
│   │   └── logger.py
│   ├── pipeline/
│   │   ├── batch_prediction.py
│   │   └── training_pipeline.py
│   └── utils/
│       ├── main_utils/
│       │   └── utils.py
│       └── ml_utils/
│           ├── metric/
│           │   └── classification_metric.py
│           └── model/
│               └── estimator.py
│
├── notebooks/
│   └── __init__.py
├── prediction_output/
│   └── output.csv
├── templates/
│   ├── dashboard.html
│   ├── index.html
│   └── table.html
├── valid_data/
│   └── test.csv
├── .gitignore
├── app.py
├── main.py
├── Dockerfile
├── README.md
├── requirements.txt
└── setup.py

Key Features

ETL Pipeline
- Automated data extraction from MongoDB
- Data validation and transformation
- Schema integrity and null checks
Model Training & Evaluation
- Modular training pipeline with hyperparameter tuning
- MLflow & DagsHub tracking for experiments and models
- Automated evaluation and metric logging
CI/CD Automation
- GitHub Actions workflow (main.yml) for continuous integration and deployment
- Dockerized build and AWS deployment pipeline
Cloud Deployment
- Model deployment to AWS (EC2 / Elastic Beanstalk / ECS)
- Integrated with MLflow for model registry management

Tech Stack

Category	Tools / Libraries / Services
Programming Language	Python
Web Framework / API	FastAPI, Flask
Frontend / UI	Bootstrap, Jinja2 Templates, Chart.js
Data Storage / DB	MongoDB
Machine Learning / Analytics	Scikit-learn, Pandas, NumPy
Experiment Tracking	MLflow, DagsHub
Version Control	Git, GitHub
CI/CD / Automation	GitHub Actions
Containerization	Docker
Cloud Deployment	AWS (EC2 / Elastic Beanstalk / ECS)
Other Tools	Certifi, Python-dotenv, Uvicorn

Setup Instructions

Clone the Repository

git clone https://github.com/Immanuel2004/dataflow-deployment.git
cd dataflow-deployment

Create and Activate Virtual Environment

python3 -m venv venv
source venv/bin/activate    # macOS/Linux
venv\Scripts\activate       # Windows

Install Dependencies

pip install -r requirements.txt

Run the Project

python app.py

MLflow & DagsHub Integration

Set up your DagsHub tracking environment:

export MLFLOW_TRACKING_URI=https://dagshub.com/Immanuel2004/dataflow-deployment.mlflow
export MLFLOW_TRACKING_USERNAME=Immanuel2004

Start MLflow UI locally:

mlflow ui

Access at http://127.0.0.1:5000.

Deployment

Using Docker

Build the image:

docker build -t dataflow-deployment .

Run the container:

docker run -p 8080:8080 dataflow-deployment

Deploy to AWS

The deployment process is automated via GitHub Actions (.github/workflows/main.yml).

Once you push changes to the main branch:

The pipeline runs tests and linting
Builds and pushes Docker images
Deploys the application to AWS

CI/CD Workflow

The GitHub Actions workflow automates:

Linting and static code checks
Unit testing
Model training and evaluation
Docker image creation
AWS deployment

Future Enhancements

Add Airflow/Prefect for pipeline orchestration
Implement data drift detection
Integrate model monitoring (Prometheus + Grafana)
Automate retraining with live data

Author

Immanuel

📧 [email protected]

🌐 GitHub: Immanuel2004

🌐 DagsHub: Immanuel2004

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataflow Deployment — End-to-End MLOps Pipeline

Overview

Project Architecture

Folder Structure

Key Features

Tech Stack

Setup Instructions

Clone the Repository

Create and Activate Virtual Environment

Install Dependencies

Run the Project

MLflow & DagsHub Integration

Deployment

Using Docker

Deploy to AWS

CI/CD Workflow

Future Enhancements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
.vscode		.vscode
Network_Data		Network_Data
api		api
data_schema		data_schema
final_models		final_models
networksecurity		networksecurity
notebooks		notebooks
prediction_output		prediction_output
templates		templates
valid_data		valid_data
venv		venv
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
main.py		main.py
push_data.py		push_data.py
requirements.txt		requirements.txt
setup.py		setup.py
test_mongodb.py		test_mongodb.py
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Dataflow Deployment — End-to-End MLOps Pipeline

Overview

Project Architecture

Folder Structure

Key Features

Tech Stack

Setup Instructions

Clone the Repository

Create and Activate Virtual Environment

Install Dependencies

Run the Project

MLflow & DagsHub Integration

Deployment

Using Docker

Deploy to AWS

CI/CD Workflow

Future Enhancements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages