MEDomics PoC3 — Centralized vs Federated Learning on MIMIC-IV and eICU

This repository contains code developed for a **Proof of Concept ** of the MEDomics platform, as part of my Master's thesis at Université de Sherbrooke (MEDomicsLab).

The goal of this PoC is to evaluate the generalizability of a mortality prediction model trained on MIMIC-IV using a centralized approach, and to compare its performance against a federated model trained across 9 eICU hospitals using MEDfl.

For more informations about the process, check this [PoC3] documentation.

Overview

Component	Description
Datasets	MIMIC-IV + eICU (PhysioNet, credentialed access required)
Preprocessing	MED3PA protocol + SAPS-II transformation
Preprocessing notebook	Generate learning/holdout sets for MIMIC-IV and eICU
Centralized model	MLP trained on MIMIC-IV via MEDomics
Federated model	MLP trained across 9 eICU hospitals via MEDfl
Evaluation	Internal (MIMIC-IV), external global (eICU), per-hospital (9 sites)

Repository Structure

medomics-poc3/
├── notebooks/
│   └── data_preparation.ipynb    # Generate learning/holdout sets (MIMIC-IV and eICU)
├── evaluate_models.py             # Main evaluation script (centralized vs federated)
├── data/                          # Input CSV files (not versioned — see below)
├── models/                        # Trained model pkl files
├── results/                       # Generated at runtime (metrics + figures)
├── requirements.txt
└── README.md

Data Access

The datasets used in this project require credentialed access via PhysioNet:

MIMIC-IV: https://physionet.org/content/mimiciv/
eICU Collaborative Research Database: https://physionet.org/content/eicu-crd/

The preprocessing pipeline follows the MED3PA framework, which applies SAPS-II feature transformation and hospital-level data splitting for eICU.

Models

The trained model files (model_centralise.pkl, model_fed.pkl) are not included in this repository.

To reproduce them:

Centralized model: train the MLP pipeline in MEDomics using data/learning.csv (generated by the preprocessing notebook)
Federated model: run MEDfl on the 9 eICU hospital learning sets

Place both .pkl files in the working directory before running evaluate_models.py.

Usage

1. Install dependencies

pip install -r requirements.txt

2. Generate learning and holdout sets

Open and run notebooks/data_preparation.ipynb.

This notebook handles both datasets:

MIMIC-IV:

INPUT_PATH      = "data/MIMIC_saps.csv"
OUTPUT_LEARNING = "data/learning.csv"
OUTPUT_HOLDOUT  = "data/holdout.csv"

eICU (run once per hospital file):

INPUT_PATH      = "data/eicu_saps.csv"
OUTPUT_LEARNING = "data/Learning_eicu_hospital_*.csv"
OUTPUT_HOLDOUT  = "data/Holdout_eicu_dataset_hospital_*.csv"

3. Run the evaluation

python evaluate_models.py

Place the following files in the working directory before running:

model_centralise.pkl
model_fed.pkl
Holdout_MIMIC.csv
Holdout_eicu_afterdividing.csv
Holdout_eicu_dataset_hospital_*.csv (9 hospitals)

4. Results

Outputs are saved in results/:

resultats_comparaison.csv — full metrics table
figures/cm_*.png — confusion matrices (centralized and federated)
figures/shap_*_federe.png — SHAP feature importance (federated model)

Evaluation Protocol

Three configurations are evaluated on each dataset:

Configuration	Model	Threshold
Centralised	Centralized MLP	0.2 (fixed)
Federated (fixed)	Federated MLP	0.2 (fixed)
Federated (Youden)	Federated MLP	Youden index (optimized)

Evaluation datasets:

MIMIC-IV holdout (internal validation)
eICU aggregated holdout (external validation)
9 individual eICU hospital holdouts (per-site validation)

References

Johnson, A. et al. MIMIC-IV. PhysioNet, 2023. https://doi.org/10.13026/6mm1-ek67
Pollard, T.J. et al. The eICU Collaborative Research Database. Scientific Data, 2018. https://doi.org/10.1038/sdata.2018.178
Lefebvre, O. et al. Predictive performance precision analysis in medicine: identification of low-confidence predictions at patient and profile levels (MED3pa I). Journal of the American Medical Informatics Association (JAMIA), 2026. https://doi.org/10.1093/jamia/ocag034

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEDomics PoC3 — Centralized vs Federated Learning on MIMIC-IV and eICU

For more informations about the process, check this [PoC3] documentation.

Overview

Repository Structure

Data Access

Models

Usage

1. Install dependencies

2. Generate learning and holdout sets

3. Run the evaluation

4. Results

Evaluation Protocol

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md
evaluate_models.py		evaluate_models.py

Folders and files

Latest commit

History

Repository files navigation

MEDomics PoC3 — Centralized vs Federated Learning on MIMIC-IV and eICU

For more informations about the process, check this [PoC3] documentation.

Overview

Repository Structure

Data Access

Models

Usage

1. Install dependencies

2. Generate learning and holdout sets

3. Run the evaluation

4. Results

Evaluation Protocol

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages