Machine learning system for detecting fraudulent healthcare claims using advanced classification algorithms.
`mermaid flowchart TD A[Raw Claims Data] --> B[Data Ingestion] B --> C[Data Cleaning] C --> D[Feature Engineering] D --> E[Feature Selection] E --> F[Model Training] F --> G[Model Evaluation] G --> H[Model Deployment] H --> I[Prediction API] I --> J[Dashboard]
C --> C1[Handle Missing Values]
C --> C2[Remove Duplicates]
C --> C3[Data Validation]
D --> D1[Claim Amount Features]
D --> D2[Provider Features]
D --> D3[Patient Features]
D --> D4[Temporal Features]
E --> E1[Correlation Analysis]
E --> E2[Feature Importance]
E --> E3[Dimensionality Reduction]
F --> F1[Random Forest]
F --> F2[XGBoost]
F --> F3[Neural Network]
F --> F4[Ensemble]
`
`mermaid graph TB subgraph "Data Sources" Claims[Claims Database] Providers[Provider Data] Patients[Patient Data] end
subgraph "Processing Layer"
ETL[ETL Pipeline]
FeatureStore[Feature Store]
end
subgraph "ML Layer"
Training[Model Training]
Inference[Model Inference]
Registry[Model Registry]
end
subgraph "Application Layer"
API[REST API]
Dashboard[Analytics Dashboard]
Alerts[Alert System]
end
Claims --> ETL
Providers --> ETL
Patients --> ETL
ETL --> FeatureStore
FeatureStore --> Training
Training --> Registry
Registry --> Inference
Inference --> API
API --> Dashboard
API --> Alerts
`
`mermaid flowchart TD A[New Claim] --> B[Preprocessing] B --> C[Feature Extraction] C --> D{Model Ensemble}
D --> E[Random Forest]
D --> F[XGBoost]
D --> G[Neural Network]
E --> H[Vote Aggregation]
F --> H
G --> H
H --> I{Fraud Probability}
I -->|High > 0.8| J[Block Claim]
I -->|Medium 0.5-0.8| K[Manual Review]
I -->|Low < 0.5| L[Approve Claim]
J --> M[Alert Investigation]
K --> N[Review Queue]
L --> O[Process Payment]
M --> P[Update Database]
N --> P
O --> P
`
ML-Project/ │ ├── data/ │ ├── raw/ │ │ ├── claims.csv # Raw claims data │ │ ├── providers.csv # Provider information │ │ └── patients.csv # Patient data │ ├── processed/ │ │ ├── features.csv # Engineered features │ │ └── cleaned.csv # Cleaned data │ └── data_dictionary.md │ ├── notebooks/ │ ├── 01_EDA.ipynb # Exploratory Data Analysis │ ├── 02_Feature_Engineering.ipynb # Feature engineering │ ├── 03_Model_Training.ipynb # Model training │ ├── 04_Evaluation.ipynb # Model evaluation │ └── 05_Deployment.ipynb # Deployment prep │ ├── src/ │ ├── __init__.py │ ├── data/ │ │ ├── __init__.py │ │ ├── data_loader.py # Data loading │ │ ├── preprocessor.py # Data preprocessing │ │ └── feature_engine.py # Feature engineering │ │ │ ├── models/ │ │ ├── __init__.py │ │ ├── random_forest.py # Random Forest model │ │ ├── xgboost_model.py # XGBoost model │ │ ├── neural_network.py # Neural network model │ │ ├── ensemble.py # Ensemble model │ │ ├── trainer.py # Training pipeline │ │ └── saved/ │ │ ├── best_model.pkl │ │ └── scaler.pkl │ │ │ ├── evaluation/ │ │ ├── __init__.py │ │ ├── metrics.py # Evaluation metrics │ │ └── visualizer.py # Visualization │ │ │ ├── api/ │ │ ├── __init__.py │ │ ├── app.py # Flask API │ │ ├── routes.py # API routes │ │ └── schemas.py # Data schemas │ │ │ └── utils/ │ ├── __init__.py │ ├── logger.py # Logging │ └── helpers.py # Utility functions │ ├── models/ │ ├── random_forest/ │ ├── xgboost/ │ ├── neural_network/ │ └── ensemble/ │ ├── api/ │ ├── app.py │ └── requirements.txt │ ├── tests/ │ ├── test_data.py │ ├── test_models.py │ └── test_api.py │ ├── configs/ │ ├── model_config.yaml │ └── training_config.yaml │ ├── docs/ │ ├── METHODOLOGY.md │ ├── FEATURES.md │ └── API.md │ ├── requirements.txt ├── setup.py ├── Dockerfile └── README.md
| Category | Feature | Description |
|---|---|---|
| Claim | claim_amount | Total claim amount |
| Claim | procedure_code | Medical procedure code |
| Claim | diagnosis_code | Diagnosis code |
| Provider | provider_specialty | Provider specialty |
| Provider | provider_location | Geographic location |
| Patient | patient_age | Patient age |
| Patient | patient_gender | Patient gender |
| Temporal | claim_date | Date of claim |
| Temporal | days_since_last | Days since last claim |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Random Forest | 94.2% | 92.5% | 95.1% | 93.8% |
| XGBoost | 95.8% | 94.2% | 96.5% | 95.3% |
| Neural Network | 93.5% | 91.8% | 94.8% | 93.3% |
| Ensemble | 96.2% | 95.1% | 97.0% | 96.0% |
`�ash git clone https://github.com/Jashwanth33/ML-Project.git cd ML-Project
pip install -r requirements.txt
python src/models/trainer.py
python src/api/app.py `
`python import requests
response = requests.post("http://localhost:5000/predict", json={ "claim_amount": 5000, "procedure_code": "99213", "diagnosis_code": "J06.9", "provider_specialty": "Internal Medicine", "patient_age": 45, "patient_gender": "M" })
print(response.json())
`
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License
Jashwanth - GitHub