REFOLD is a fully automated, end-to-end computational pipeline that takes any human disease-associated missense mutation as input and outputs:
- A binary prediction of whether the mutation causes misfolding (amenable to pharmacological chaperone rescue)
- The 3D structure of the mutant protein in its partially unfolded state with transiently exposed binding pockets
- De novo designed small molecule candidates predicted to bind those transient pockets and thermodynamically stabilize the correct fold
- Predicted pharmacokinetic properties, synthesizability scores, and cellular rescue probability for each candidate
# Install
pip install -e ".[ml]"
# Run on Fabry disease mutation (GLA Y152C)
refold-single \
--uniprot P06280 \
--position 152 \
--wildtype Y \
--mutant C \
--gene GLA \
--disease "Fabry disease"# Core dependencies
pip install -e .
# ML dependencies (ESM-2, torch-geometric)
pip install -e ".[ml]"
pip install fair-esm torch-geometric
# External tools (macOS)
brew install fpocket openbabel dssp# Download raw data (ClinVar, ThermoMutDB)
make download
# Preprocess ClinVar (~30-60 min)
make preprocess# Train rescue classifier
make train-rescue
# Train molecule generator (uses synthetic data by default)
make train-generator| Stage | Description | Time/mutation |
|---|---|---|
| 1 | Classify only (ΔΔG + rescue probability) | ~10s |
| 2 | Stage 1 + pocket detection (ANM + fpocket) | ~2min |
| 3 | Stage 2 + molecule generation | ~10min |
- ESM-1v: ΔΔG prediction via masked marginal probability
- ESM-2 (150M): Protein embeddings for rescue classifier input
- ANM/ENM: Conformational sampling (50 states per mutation)
- fpocket: Alpha-sphere binding pocket detection
- DDPM Diffusion: Pocket-conditioned de novo 3D molecule generation
- RescueClassifier: GNN + ESM-2 + thermodynamic + evolutionary features
| Metric | Target |
|---|---|
| AUROC | ≥ 0.85 |
| AUPRC | ≥ 0.70 |
| MCC | ≥ 0.55 |
| F1 | ≥ 0.65 |
| ΔΔG Pearson r | ≥ 0.60 |
# Run benchmark evaluation
make benchmark- Fabry disease (GLA): Y152C, R215W, R231H
- Pompe disease (GAA): L525S, P600L
- Gaucher disease (GBA): N444S, L296V
- Cystic fibrosis (CFTR): R551H
- Transthyretin amyloidosis (TTR): V30M, V122I
- Li-Fraumeni syndrome (TP53): R175H, R248W, R249S
MIT