Dataflow Deployment is a complete MLOps project that automates the entire machine learning workflow — from data ingestion to model deployment .
It is designed using a modular architecture with components for ETL, data validation, transformation, model training, evaluation, MLflow tracking, CI/CD automation , and AWS cloud deployment .
MongoDB → Data Ingestion → Data Validation → Data Transformation
→ Model Training & Hyperparameter Tuning → MLflow Tracking → DagsHub Integration
→ Model Evaluation → Model Registry → AWS Deployment (via GitHub Actions)
Dataflow Deployment/
│
├── data_schema/
│ └── schema.yaml
│
├── final_models/
│ ├── model.pkl
│ └── preprocessor.pkl
│
├── Network_Data/
│ └── Phishing_Legitimate_full.csv
│
├── networksecurity/
│ ├── __init__.py
│ ├── cloud/
│ │ └── __init__.py
│ ├── components/
│ │ ├── __init__.py
│ │ ├── data_ingestion.py
│ │ ├── data_transformation.py
│ │ ├── data_validation.py
│ │ └── model_trainer.py
│ ├── constants/
│ │ └── training_pipeline/
│ │ └── __init__.py
│ ├── entity/
│ │ ├── artifact_entity.py
│ │ └── config_entity.py
│ ├── exception/
│ │ └── exception.py
│ ├── logging/
│ │ └── logger.py
│ ├── pipeline/
│ │ ├── batch_prediction.py
│ │ └── training_pipeline.py
│ └── utils/
│ ├── main_utils/
│ │ └── utils.py
│ └── ml_utils/
│ ├── metric/
│ │ └── classification_metric.py
│ └── model/
│ └── estimator.py
│
├── notebooks/
│ └── __init__.py
├── prediction_output/
│ └── output.csv
├── templates/
│ ├── dashboard.html
│ ├── index.html
│ └── table.html
├── valid_data/
│ └── test.csv
├── .gitignore
├── app.py
├── main.py
├── Dockerfile
├── README.md
├── requirements.txt
└── setup.py
- ETL Pipeline
- Automated data extraction from MongoDB
- Data validation and transformation
- Schema integrity and null checks
- Model Training & Evaluation
- Modular training pipeline with hyperparameter tuning
- MLflow & DagsHub tracking for experiments and models
- Automated evaluation and metric logging
- CI/CD Automation
- GitHub Actions workflow (
main.yml) for continuous integration and deployment - Dockerized build and AWS deployment pipeline
- GitHub Actions workflow (
- Cloud Deployment
- Model deployment to AWS (EC2 / Elastic Beanstalk / ECS)
- Integrated with MLflow for model registry management
| Category | Tools / Libraries / Services |
|---|---|
| Programming Language | Python |
| Web Framework / API | FastAPI, Flask |
| Frontend / UI | Bootstrap, Jinja2 Templates, Chart.js |
| Data Storage / DB | MongoDB |
| Machine Learning / Analytics | Scikit-learn, Pandas, NumPy |
| Experiment Tracking | MLflow, DagsHub |
| Version Control | Git, GitHub |
| CI/CD / Automation | GitHub Actions |
| Containerization | Docker |
| Cloud Deployment | AWS (EC2 / Elastic Beanstalk / ECS) |
| Other Tools | Certifi, Python-dotenv, Uvicorn |
git clone https://github.com/Immanuel2004/dataflow-deployment.git
cd dataflow-deploymentpython3 -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtpython app.pySet up your DagsHub tracking environment:
export MLFLOW_TRACKING_URI=https://dagshub.com/Immanuel2004/dataflow-deployment.mlflow
export MLFLOW_TRACKING_USERNAME=Immanuel2004Start MLflow UI locally:
mlflow uiAccess at http://127.0.0.1:5000.
Build the image:
docker build -t dataflow-deployment .Run the container:
docker run -p 8080:8080 dataflow-deploymentThe deployment process is automated via GitHub Actions (.github/workflows/main.yml).
Once you push changes to the main branch:
- The pipeline runs tests and linting
- Builds and pushes Docker images
- Deploys the application to AWS
The GitHub Actions workflow automates:
- Linting and static code checks
- Unit testing
- Model training and evaluation
- Docker image creation
- AWS deployment
- Add Airflow/Prefect for pipeline orchestration
- Implement data drift detection
- Integrate model monitoring (Prometheus + Grafana)
- Automate retraining with live data
Immanuel