Fraud Detection in Healthcare

Machine learning system for detecting fraudulent healthcare claims using advanced classification algorithms.

ML Pipeline

`mermaid flowchart TD A[Raw Claims Data] --> B[Data Ingestion] B --> C[Data Cleaning] C --> D[Feature Engineering] D --> E[Feature Selection] E --> F[Model Training] F --> G[Model Evaluation] G --> H[Model Deployment] H --> I[Prediction API] I --> J[Dashboard]

C --> C1[Handle Missing Values]
C --> C2[Remove Duplicates]
C --> C3[Data Validation]

D --> D1[Claim Amount Features]
D --> D2[Provider Features]
D --> D3[Patient Features]
D --> D4[Temporal Features]

E --> E1[Correlation Analysis]
E --> E2[Feature Importance]
E --> E3[Dimensionality Reduction]

F --> F1[Random Forest]
F --> F2[XGBoost]
F --> F3[Neural Network]
F --> F4[Ensemble]

`

System Architecture

`mermaid graph TB subgraph "Data Sources" Claims[Claims Database] Providers[Provider Data] Patients[Patient Data] end

subgraph "Processing Layer"
    ETL[ETL Pipeline]
    FeatureStore[Feature Store]
end

subgraph "ML Layer"
    Training[Model Training]
    Inference[Model Inference]
    Registry[Model Registry]
end

subgraph "Application Layer"
    API[REST API]
    Dashboard[Analytics Dashboard]
    Alerts[Alert System]
end

Claims --> ETL
Providers --> ETL
Patients --> ETL
ETL --> FeatureStore
FeatureStore --> Training
Training --> Registry
Registry --> Inference
Inference --> API
API --> Dashboard
API --> Alerts

`

Fraud Detection Flow

`mermaid flowchart TD A[New Claim] --> B[Preprocessing] B --> C[Feature Extraction] C --> D{Model Ensemble}

D --> E[Random Forest]
D --> F[XGBoost]
D --> G[Neural Network]

E --> H[Vote Aggregation]
F --> H
G --> H

H --> I{Fraud Probability}
I -->|High > 0.8| J[Block Claim]
I -->|Medium 0.5-0.8| K[Manual Review]
I -->|Low < 0.5| L[Approve Claim]

J --> M[Alert Investigation]
K --> N[Review Queue]
L --> O[Process Payment]

M --> P[Update Database]
N --> P
O --> P

`

Project Structure

ML-Project/ │ ├── data/ │ ├── raw/ │ │ ├── claims.csv # Raw claims data │ │ ├── providers.csv # Provider information │ │ └── patients.csv # Patient data │ ├── processed/ │ │ ├── features.csv # Engineered features │ │ └── cleaned.csv # Cleaned data │ └── data_dictionary.md │ ├── notebooks/ │ ├── 01_EDA.ipynb # Exploratory Data Analysis │ ├── 02_Feature_Engineering.ipynb # Feature engineering │ ├── 03_Model_Training.ipynb # Model training │ ├── 04_Evaluation.ipynb # Model evaluation │ └── 05_Deployment.ipynb # Deployment prep │ ├── src/ │ ├── __init__.py │ ├── data/ │ │ ├── __init__.py │ │ ├── data_loader.py # Data loading │ │ ├── preprocessor.py # Data preprocessing │ │ └── feature_engine.py # Feature engineering │ │ │ ├── models/ │ │ ├── __init__.py │ │ ├── random_forest.py # Random Forest model │ │ ├── xgboost_model.py # XGBoost model │ │ ├── neural_network.py # Neural network model │ │ ├── ensemble.py # Ensemble model │ │ ├── trainer.py # Training pipeline │ │ └── saved/ │ │ ├── best_model.pkl │ │ └── scaler.pkl │ │ │ ├── evaluation/ │ │ ├── __init__.py │ │ ├── metrics.py # Evaluation metrics │ │ └── visualizer.py # Visualization │ │ │ ├── api/ │ │ ├── __init__.py │ │ ├── app.py # Flask API │ │ ├── routes.py # API routes │ │ └── schemas.py # Data schemas │ │ │ └── utils/ │ ├── __init__.py │ ├── logger.py # Logging │ └── helpers.py # Utility functions │ ├── models/ │ ├── random_forest/ │ ├── xgboost/ │ ├── neural_network/ │ └── ensemble/ │ ├── api/ │ ├── app.py │ └── requirements.txt │ ├── tests/ │ ├── test_data.py │ ├── test_models.py │ └── test_api.py │ ├── configs/ │ ├── model_config.yaml │ └── training_config.yaml │ ├── docs/ │ ├── METHODOLOGY.md │ ├── FEATURES.md │ └── API.md │ ├── requirements.txt ├── setup.py ├── Dockerfile └── README.md

Features Used

Category	Feature	Description
Claim	claim_amount	Total claim amount
Claim	procedure_code	Medical procedure code
Claim	diagnosis_code	Diagnosis code
Provider	provider_specialty	Provider specialty
Provider	provider_location	Geographic location
Patient	patient_age	Patient age
Patient	patient_gender	Patient gender
Temporal	claim_date	Date of claim
Temporal	days_since_last	Days since last claim

Model Performance

Model	Accuracy	Precision	Recall	F1-Score
Random Forest	94.2%	92.5%	95.1%	93.8%
XGBoost	95.8%	94.2%	96.5%	95.3%
Neural Network	93.5%	91.8%	94.8%	93.3%
Ensemble	96.2%	95.1%	97.0%	96.0%

Installation

`�ash git clone https://github.com/Jashwanth33/ML-Project.git cd ML-Project

pip install -r requirements.txt

Train models

python src/models/trainer.py

Run API

python src/api/app.py `

API Usage

`python import requests

Predict fraud

response = requests.post("http://localhost:5000/predict", json={ "claim_amount": 5000, "procedure_code": "99213", "diagnosis_code": "J06.9", "provider_specialty": "Internal Medicine", "patient_age": 45, "patient_gender": "M" })

print(response.json())

{"fraud_probability": 0.12, "is_fraud": false}

`

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

MIT License

Author

Jashwanth - GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
Fraud_health.app.py		Fraud_health.app.py
License		License
README.md		README.md
Requirements.txt		Requirements.txt
Screenshot 2025-05-12 090508.png		Screenshot 2025-05-12 090508.png
Screenshot 2025-05-12 090550.png		Screenshot 2025-05-12 090550.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection in Healthcare

ML Pipeline

System Architecture

Fraud Detection Flow

Project Structure

Features Used

Model Performance

Installation

Train models

Run API

API Usage

Predict fraud

{"fraud_probability": 0.12, "is_fraud": false}

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection in Healthcare

ML Pipeline

System Architecture

Fraud Detection Flow

Project Structure

Features Used

Model Performance

Installation

Train models

Run API

API Usage

Predict fraud

{"fraud_probability": 0.12, "is_fraud": false}

Contributing

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages