I build machine learning systems that sit at the boundary between computational biology and drug discovery — specifically around the idea that cancer is a cellular identity problem, not just a genetic one.
Most of my work is about making biological systems legible to computers: inferring gene regulatory networks from single-cell data, simulating epigenetic landscapes, and designing molecules that interact with those landscapes in precise ways.
|
ORACLE v2.0
End-to-end pipeline for cancer identity reversion. Maps the Waddington epigenetic landscape from scRNA-seq, predicts the minimum TF perturbation set to push a cancer cell into the normal attractor, then designs PROTAC-like bifunctional molecules (TCIPs) to execute it.
|
REFOLD
World's first proteome-scale database of Pharmacological Chaperones pcd-atlas-data.
|
|
Celery
Conditional generative model for designing oncogenically safe, cell-type-specific synthetic telomerase systems with multi-layer fail-safe architecture. Generates sequences constrained to avoid oncogenic activation patterns.
|
TRC-TopoGen
De novo topoisomerase inhibitor design conditioned on transcription–replication conflict landscapes. Integrates molecular generation, structural dynamics, and genomic bias modeling to find disease-selective candidates.
|
|
DYNAMICS-AEX
Streamlined GROMACS interface with entropy-based energy tracking for accelerated molecular dynamics. ~300% speedup with 96.2% trajectory similarity vs. standard simulation — validated across multiple protein systems.
|
pcd-atlas-data
Raw counts, processed
|
Languages Python · SQL · Bash
ML/DL PyTorch · JAX · Hugging Face · Lightning
Bio stack scanpy · anndata · CellxGene · RDKit · BioPython
Models GNNs · Transformers · Diffusion (EGNN/DDPM) · VAEs · SE(3)-equivariant nets
Data scRNA-seq · scATAC-seq · spatial transcriptomics · PDB · SMILES
Infra GROMACS · AlphaFold · GEO/TCGA/Census fetchers · HPC
- Training ORACLE's biological pretraining stage on 2M+ cancer cells from CellxGene Census across 18 cancer types
- GBM is the flagship: 8 TCIP molecules designed, all pass PROTAC-space hard constraints, all Tier-1 amide assembly
- Next: wet-lab collaboration for ORACLE GBM validation