A state-of-the-art Deep Learning framework for automated chest X-ray report generation implementing Cognitive Simulation inspired by the Hi-CliTr framework.
Hi-CliTr (Hierarchical Cross-modal Cognitive Transformer) is designed to address "reader fatigue" in radiology by acting as an intelligent "Second Reader". Unlike standard image captioning models, Hi-CliTr simulates the cognitive workflow of a radiologist:
- Perceives anatomical structures at multiple scales (Organ β Region β Pixel).
- Reasons about potential pathologies using a knowledge graph.
- Verifies its findings against the image before generating the final report.
This project implements the core components: PRO-FA (Progressive Feature Alignment), MIX-MLP (Knowledge-Enhanced Classification), and RCTA (Triangular Cognitive Attention).
Implements Hierarchical Visual Perception via a Swin-Transformer backbone. It aligns multi-scale features with the RadLex medical ontology:
- Organ-level (4Γ4): Global anatomical awareness.
- Region-level (7Γ7): Lobe/region specific features.
- Pixel-level (7Γ7): Fine-grained lesion details.
A dual-path, knowledge-enhanced architecture for disease classification:
- Residual Path: Efficient feature flow for common cases.
- Expansion Path: Captures complex disease patterns and co-occurrences.
- CheXpert: High-precision classification for 14 common pathologies.
A 3-stage closed-loop verification system that mimics clinical reasoning:
- Image β Text: Creates context from visual features and clinical indication.
- Context β Labels: Formulates a diagnostic hypothesis.
- Labels β Image: Verifies the hypothesis against visual evidence.
Utilizes a GPT-2 Medium backbone (355M params) to generate structured, clinically accurate reports (Findings & Impression), conditioned on the cognitive states from RCTA.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COGNITIVE RADIOLOGY MODEL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β CXR β β Clinical β β β β
β β Images β β Indication β β Generated Report β β
β β (PA/LAT) β β Text β β ββββββββββββββββββββ β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β β FINDINGS: β β β
β β β β β Heart size normalβ β β
β βΌ β β β Lungs are clear β β β
β βββββββββββββββββββββββ β β ββββββββββββββββββββ€ β β
β β PRO-FA β β β β IMPRESSION: β β β
β β βββββββ¬ββββββ¬ββββββ β β β β No acute β β β
β β βOrganβRegioβPixelβ β β β β abnormality β β β
β β β 4x4 β 7x7 β 7x7 β β β β ββββββββββββββββββββ β β
β β ββββ¬βββ΄βββ¬βββ΄βββ¬βββ β β β β β
β β βββββββΌββββββ β β βββββββββββββββββββββββββββ β
β β RadLex Align β β β² β
β ββββββββββββ¬βββββββββββ β β β
β β β ββββββββββββ΄βββββββββββ β
β βΌ β β Report Generator β β
β βββββββββββββββββββββββ β β (GPT-2) β β
β β MIX-MLP β β ββββββββββββ¬βββββββββββ β
β β βββββββββββββββββββ β β β² β
β β β Residual Path β β β β β
β β βββββββββββββββββββ€ β β ββββββββββββ΄βββββββββββ β
β β β Expansion Path β β β β RCTA β β
β β ββββββββββ¬βββββββββ β β β ββββββββββββββββ β β
β β βΌ β β β β ImageβText β β β
β β 14 CheXpert Labelsβ ββββββββββββΊβ β TextβLabels β β β
β β (Multi-label F1) βββββββββββββββββ β LabelsβImage β β β
β βββββββββββββββββββββββ β ββββββββββββββββ β β
β βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BrainDead-Solution/
βββ data/ # Data management
β βββ __init__.py
β βββ download_iu_xray.py # Scripts for downloading & preprocessing IU-Xray
β βββ dataset.py # PyTorch Dataset definitions (MIMIC-CXR, IU-Xray)
β βββ sanity_check.py # Data integrity verification script
βββ models/ # Core model components
β βββ __init__.py
β βββ encoder.py # PRO-FA: Multi-scale ViT + RadLex Alignment
β βββ classifier.py # MIX-MLP: Knowledge-enhanced classifier
β βββ decoder.py # RCTA + GPT-2 Decoder
β βββ model.py # Unified Hi-CliTr Model assembly
βββ training/ # training logic
β βββ __init__.py
β βββ trainer.py # Training loop, validation, and saving
βββ evaluation/ # Metrics and evaluation
β βββ __init__.py
β βββ metrics.py # CheXpert F1, BLEU, CIDEr, RadGraph F1
βββ notebooks/
β βββ inference_demo.ipynb # Interactive Jupyter notebook for demo
βββ static/ # Web app static assets (CSS/JS)
βββ templates/ # Web app HTML templates
βββ app.py # Flask Web Application entry point
βββ config.py # Centralized configuration file
βββ requirements.txt # Python dependencies
βββ problem_statement.md # Original hackathon problem statement
βββ README.md # Project documentation
Clone the repository and install dependencies:
# Clone repository
git clone https://github.com/your-username/braindead-solution.git
cd braindead-solution
# Create virtual environment (Recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThis project uses the IU-Xray dataset for benchmarking and public usage. A helper script is provided to download and prepare it.
# Verify integrity of existing data or download
python data/download_iu_xray.py --verify
# Preprocess dataset (create splits and metadata)
python data/download_iu_xray.py --preprocess
# Run a sanity check to ensure everything is loadable
python data/sanity_check.pyNote: For MIMIC-CXR, you must have credentialed access via PhysioNet. Place the dataset in data/mimic_cxr if available.
Train the model from scratch using the trainer.py script. You can configure hyperparameters in config.py or pass them as arguments.
# Standard training run
python training/trainer.py \
--max_epochs 30 \
--batch_size 8 \
--learning_rate 1e-4
# Fast dev run (sanity check training loop)
python training/trainer.py --fast_dev_runLaunch the interactive web interface to generate reports for uploaded X-rays.
python app.pyOpen http://localhost:5000 in your browser.
- Upload a Chest X-ray image.
- (Optional) Enter clinical indication (e.g., "Fever and cough").
- View the generated Findings and Impression.
You can also run inference programmatically:
from models.model import create_model
import torch
# Load Model
model = create_model(pretrained=True, device="cuda")
model.load_state_dict(torch.load("checkpoints/best.pt")["model_state_dict"])
model.eval()
# Generate Report
result = model.generate_report(
images="path/to/xray.png",
indication="Patient with shortness of breath"
)
print(result['reports'][0])See notebooks/inference_demo.ipynb for a complete walkthrough.
The config.py file controls all aspects of the model and training. Key sections:
DataConfig: Paths, image size (224x224), sequence lengths.EncoderConfig: Swin Transformer settings, RadLex concept count.ClassifierConfig: CheXpert labels, loss weights.DecoderConfig: GPT-2 settings, beam search parameters (k=4).TrainingConfig: Learning rate, batch size, mixed precision (AMP) settings.
| Metric | IU-Xray Test | Target | Description |
|---|---|---|---|
| CheXpert Micro F1 | TBD | > 0.500 | Clinical accuracy of disease detection |
| RadGraph F1 | TBD | > 0.500 | Semantic relation accuracy |
| CIDEr | TBD | > 0.400 | Text generation consensus metric |
| BLEU-4 | TBD | > 0.100 | N-gram overlap precision |
Contributions are welcome!
- Fork the repository.
- Create a feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
If you use this code for your research, please cite:
@inproceedings{braindead2026,
title={Hi-CliTr: Cognitive Radiology Report Generation},
author={Team BrainDead},
booktitle={ML Hackathon 2026},
year={2026}
}Distributed under the MIT License. See LICENSE for more information.
Made with π§ and β€οΈ by Team BrainDead for ML Hackathon 2026
"Pushing the boundaries of Cognitive Simulation in Radiology"