This repository contains code developed for a **Proof of Concept ** of the MEDomics platform, as part of my Master's thesis at Université de Sherbrooke (MEDomicsLab).
The goal of this PoC is to evaluate the generalizability of a mortality prediction model trained on MIMIC-IV using a centralized approach, and to compare its performance against a federated model trained across 9 eICU hospitals using MEDfl.
For more informations about the process, check this [PoC3] documentation.
| Component | Description |
|---|---|
| Datasets | MIMIC-IV + eICU (PhysioNet, credentialed access required) |
| Preprocessing | MED3PA protocol + SAPS-II transformation |
| Preprocessing notebook | Generate learning/holdout sets for MIMIC-IV and eICU |
| Centralized model | MLP trained on MIMIC-IV via MEDomics |
| Federated model | MLP trained across 9 eICU hospitals via MEDfl |
| Evaluation | Internal (MIMIC-IV), external global (eICU), per-hospital (9 sites) |
medomics-poc3/
├── notebooks/
│ └── data_preparation.ipynb # Generate learning/holdout sets (MIMIC-IV and eICU)
├── evaluate_models.py # Main evaluation script (centralized vs federated)
├── data/ # Input CSV files (not versioned — see below)
├── models/ # Trained model pkl files
├── results/ # Generated at runtime (metrics + figures)
├── requirements.txt
└── README.md
The datasets used in this project require credentialed access via PhysioNet:
- MIMIC-IV: https://physionet.org/content/mimiciv/
- eICU Collaborative Research Database: https://physionet.org/content/eicu-crd/
The preprocessing pipeline follows the MED3PA framework, which applies SAPS-II feature transformation and hospital-level data splitting for eICU.
The trained model files (model_centralise.pkl, model_fed.pkl) are not
included in this repository.
To reproduce them:
- Centralized model: train the MLP pipeline in MEDomics using
data/learning.csv(generated by the preprocessing notebook) - Federated model: run MEDfl on the 9 eICU hospital learning sets
Place both .pkl files in the working directory before running
evaluate_models.py.
pip install -r requirements.txtOpen and run notebooks/data_preparation.ipynb.
This notebook handles both datasets:
MIMIC-IV:
INPUT_PATH = "data/MIMIC_saps.csv"
OUTPUT_LEARNING = "data/learning.csv"
OUTPUT_HOLDOUT = "data/holdout.csv"eICU (run once per hospital file):
INPUT_PATH = "data/eicu_saps.csv"
OUTPUT_LEARNING = "data/Learning_eicu_hospital_*.csv"
OUTPUT_HOLDOUT = "data/Holdout_eicu_dataset_hospital_*.csv"python evaluate_models.pyPlace the following files in the working directory before running:
model_centralise.pklmodel_fed.pklHoldout_MIMIC.csvHoldout_eicu_afterdividing.csvHoldout_eicu_dataset_hospital_*.csv(9 hospitals)
Outputs are saved in results/:
resultats_comparaison.csv— full metrics tablefigures/cm_*.png— confusion matrices (centralized and federated)figures/shap_*_federe.png— SHAP feature importance (federated model)
Three configurations are evaluated on each dataset:
| Configuration | Model | Threshold |
|---|---|---|
| Centralised | Centralized MLP | 0.2 (fixed) |
| Federated (fixed) | Federated MLP | 0.2 (fixed) |
| Federated (Youden) | Federated MLP | Youden index (optimized) |
Evaluation datasets:
- MIMIC-IV holdout (internal validation)
- eICU aggregated holdout (external validation)
- 9 individual eICU hospital holdouts (per-site validation)
-
Johnson, A. et al. MIMIC-IV. PhysioNet, 2023. https://doi.org/10.13026/6mm1-ek67
-
Pollard, T.J. et al. The eICU Collaborative Research Database. Scientific Data, 2018. https://doi.org/10.1038/sdata.2018.178
-
Lefebvre, O. et al. Predictive performance precision analysis in medicine: identification of low-confidence predictions at patient and profile levels (MED3pa I). Journal of the American Medical Informatics Association (JAMIA), 2026. https://doi.org/10.1093/jamia/ocag034