Skip to content

MEDomicsLab/MEDomics_PoC3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MEDomics PoC3 — Centralized vs Federated Learning on MIMIC-IV and eICU

This repository contains code developed for a **Proof of Concept ** of the MEDomics platform, as part of my Master's thesis at Université de Sherbrooke (MEDomicsLab).

The goal of this PoC is to evaluate the generalizability of a mortality prediction model trained on MIMIC-IV using a centralized approach, and to compare its performance against a federated model trained across 9 eICU hospitals using MEDfl.

For more informations about the process, check this [PoC3] documentation.

Overview

Component Description
Datasets MIMIC-IV + eICU (PhysioNet, credentialed access required)
Preprocessing MED3PA protocol + SAPS-II transformation
Preprocessing notebook Generate learning/holdout sets for MIMIC-IV and eICU
Centralized model MLP trained on MIMIC-IV via MEDomics
Federated model MLP trained across 9 eICU hospitals via MEDfl
Evaluation Internal (MIMIC-IV), external global (eICU), per-hospital (9 sites)

Repository Structure

medomics-poc3/
├── notebooks/
│   └── data_preparation.ipynb    # Generate learning/holdout sets (MIMIC-IV and eICU)
├── evaluate_models.py             # Main evaluation script (centralized vs federated)
├── data/                          # Input CSV files (not versioned — see below)
├── models/                        # Trained model pkl files
├── results/                       # Generated at runtime (metrics + figures)
├── requirements.txt
└── README.md

Data Access

The datasets used in this project require credentialed access via PhysioNet:

The preprocessing pipeline follows the MED3PA framework, which applies SAPS-II feature transformation and hospital-level data splitting for eICU.


Models

The trained model files (model_centralise.pkl, model_fed.pkl) are not included in this repository.

To reproduce them:

  1. Centralized model: train the MLP pipeline in MEDomics using data/learning.csv (generated by the preprocessing notebook)
  2. Federated model: run MEDfl on the 9 eICU hospital learning sets

Place both .pkl files in the working directory before running evaluate_models.py.


Usage

1. Install dependencies

pip install -r requirements.txt

2. Generate learning and holdout sets

Open and run notebooks/data_preparation.ipynb.

This notebook handles both datasets:

MIMIC-IV:

INPUT_PATH      = "data/MIMIC_saps.csv"
OUTPUT_LEARNING = "data/learning.csv"
OUTPUT_HOLDOUT  = "data/holdout.csv"

eICU (run once per hospital file):

INPUT_PATH      = "data/eicu_saps.csv"
OUTPUT_LEARNING = "data/Learning_eicu_hospital_*.csv"
OUTPUT_HOLDOUT  = "data/Holdout_eicu_dataset_hospital_*.csv"

3. Run the evaluation

python evaluate_models.py

Place the following files in the working directory before running:

  • model_centralise.pkl
  • model_fed.pkl
  • Holdout_MIMIC.csv
  • Holdout_eicu_afterdividing.csv
  • Holdout_eicu_dataset_hospital_*.csv (9 hospitals)

4. Results

Outputs are saved in results/:

  • resultats_comparaison.csv — full metrics table
  • figures/cm_*.png — confusion matrices (centralized and federated)
  • figures/shap_*_federe.png — SHAP feature importance (federated model)

Evaluation Protocol

Three configurations are evaluated on each dataset:

Configuration Model Threshold
Centralised Centralized MLP 0.2 (fixed)
Federated (fixed) Federated MLP 0.2 (fixed)
Federated (Youden) Federated MLP Youden index (optimized)

Evaluation datasets:

  • MIMIC-IV holdout (internal validation)
  • eICU aggregated holdout (external validation)
  • 9 individual eICU hospital holdouts (per-site validation)

References

About

Extension of the MEDomics proof of concept: benchmarking centralized and federated MLP models for in-hospital mortality prediction across MIMIC-IV and 9 eICU hospitals using the MEDomics platform and MEDfl.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors