BERT-based fine-tuning for SLE information extraction from Italian clinical reports.
This repository contains the code used for the study Lupus Alberto: A Transformer-Based Approach for SLE Information Extraction from Italian Clinical Reports (Lilli et al., 2024). The fine-tuning experiments build on AlBERTo, the Italian BERT model introduced by Polignano et al. (2019).
- Install requirements
pip install -r requirements.txt
- Configure the experiments
Edit the JSON files in the configs folder before running the scripts:
- configs/experiment_config.json controls the fine-tuning and evaluation loop: domain, target categories, base models, number of epochs, and checkpoint number used for evaluation.
- configs/benchmark_config.json controls the benchmark aggregation: domain, target categories, model names, and metrics to extract from the result files.
- Fine Tuning and Evaluation
python experiments/run_experiments.py
The run_experiments.py file reads its settings from configs/experiment_config.json and runs the following sequence of scripts: finetune.py, evaluation.py, and evaluation_results.py.
- Benchmark Analysis
python benchmark_analysis/compare_results.py
The compare_results.py file reads its settings from configs/benchmark_config.json.
If you use this repository, please cite:
@inproceedings{lilli-etal-2024-lupus,
title = "Lupus Alberto: A Transformer-Based Approach for {SLE} Information Extraction from {I}talian Clinical Reports",
author = "Lilli, Livia and Antenucci, Laura and Ortolan, Augusta and Bosello, Silvia Laura and D'agostino, Maria Antonietta and Patarnello, Stefano and Masciocchi, Carlotta and Lenkowicz, Jacopo",
booktitle = "Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)",
year = "2024",
address = "Pisa, Italy",
publisher = "CEUR Workshop Proceedings",
pages = "510--516",
url = "https://aclanthology.org/2024.clicit-1.60/"
}Please also cite the original AlBERTo model:
@inproceedings{polignano-etal-2019-alberto,
title = "{A}l{BER}To: {I}talian {BERT} Language Understanding Model for {NLP} Challenging Tasks Based on Tweets",
author = "Polignano, Marco and Basile, Pierpaolo and de Gemmis, Marco and Semeraro, Giovanni and Basile, Valerio",
booktitle = "Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)",
year = "2019",
address = "Bari, Italy",
publisher = "CEUR Workshop Proceedings",
pages = "312--317",
url = "https://aclanthology.org/2019.clicit-1.47/"
}