A lightweight Python framework for perturbation signature retrieval and biologically interpretable program inference from transcriptomic gene sets.
Current release: PerturbAtlas-K562-v1.0
Current status: Version 1.0.0 (Initial Public Release)
PerturbAtlas is designed to behave like a biologically informed search engine rather than a black-box classifier. Given a transcriptomic gene set, it retrieves biologically similar perturbation signatures from a curated reference atlas, predicts the dominant biological program, and quantifies the strength of evidence supporting each prediction.
Traditional enrichment analyses identify pathways or biological processes associated with a gene set, but they do not directly answer which perturbation is most likely to have generated the observed transcriptional program.
PerturbAtlas addresses this problem by treating perturbation signatures as searchable biological programs. Instead of returning only enriched pathways or functional annotations, it retrieves the most similar perturbation signatures, predicts the dominant biological program represented by the query, and provides an interpretable evidence score describing the reliability of the prediction.
The first public release, PerturbAtlas-K562-v1.0, was constructed from 105 curated K562 cell line Perturb-seq perturbation signatures organized into three biologically meaningful transcriptional programs.
PerturbAtlas is intended for researchers working with:
- Bulk RNA-seq differential expression analyses
- CRISPR perturbation experiments
- Perturb-seq datasets
- Gene regulatory network studies
- Functional genomics
- Transcriptomic signature interpretation
PerturbAtlas was developed around four principles:
- Biological interpretability
- Transparency
- Simplicity
- Extensibility
Every prediction can be traced back to retrieved perturbation signatures. The package favors transparent biological reasoning over opaque machine-learning predictions.
Feature Description
Biological program classification Predicts the dominant transcriptional program represented by a query.
Perturbation retrieval Retrieves the most similar perturbation signatures.
Evidence scoring Combines multiple validation-derived metrics into a single evidence score.
Biological interpretation Produces concise human-readable summaries.
Master regulator prioritization Highlights regulatory perturbations with strong influence.
Fully interpretable No black-box inference.
Property Value
Atlas PerturbAtlas-K562-v1.0 Organism Homo sapiens Cell line K562 Dataset K562 Perturb-seq Reference perturbations 105 Biological modules 3 Reference genes 1540
Modules:
- Myeloid differentiation
- Erythroid differentiation
- Developmental / cell-state transition
git clone https://github.com/UCSOLMAZ/PerturbAtlas.git
cd PerturbAtlas
python -m pip install -e .pip install perturbatlasPyPI distribution is planned for a future release.
The following example loads the released K562 atlas, queries a small gene set, and prints a biological interpretation.
from perturbatlas import K562Atlas
atlas = K562Atlas.load("data/PerturbAtlas-K562-v1.0.pkl")
result = atlas.query([
"SPI1","CEBPA","CEBPE","CSF3R","MPO","ELANE"
])
result.summary()
result.interpret()==================================================
PerturbAtlas-K562-v1.0 Query Result
==================================================
Predicted module:
myeloid
Module scores:
myeloid 0.0860
developmental 0.0278
erythroid 0.0076
Evidence score:
Very High (1.00)
Top perturbations:
1. CEBPA | recall=1.00 | precision=0.13 | module=myeloid
Input gene set
|
v
Module scoring
|
v
Perturbation retrieval
|
v
Evidence scoring
|
v
Biological interpretation
Component Weight
---------------------------- --------
Module separation 40%
Top perturbation agreement 40%
Mean retrieval recall 20%
Score Interpretation
-------- ----------------
>=%0.85 Very High
>=%0.70 High
>=%0.50 Moderate
<0.50 Low
Low-evidence predictions are intentionally reported conservatively.
------------------------------------------------------------------------
Purpose: Recover known perturbation programs.
Result: 90.7% module accuracy, 98.1% Top-3 retrieval.
Conclusion: Excellent recovery of known perturbational programs.
Purpose: Evaluate specificity.
Result: Mean evidence score of approximately 0.06 across 400 random gene-set queries with no high-confidence predictions.
Recommendation: use 20 or more informative genes for optimal performance.
Twenty informative genes mixed with eighty random genes retained 96.3% module prediction accuracy.
Evidence decreases for mixed biological states and increases again as one program becomes dominant.
query()info()save()load()modules()perturbations()
summary()interpret()top_perturbations()to_csv()evidence()evidence_score()confidence_score()
PerturbAtlas/
|
|-- perturbatlas/
| |-- __init__.py
| |-- atlas.py
| |-- result.py
|
|-- data/
| |-- PerturbAtlas-K562-v1.0.pkl
|
|-- examples/
| |-- QuickStart.ipynb
| |-- ExampleQueries.ipynb
| |-- Benchmark.ipynb
|
|-- README.md
|-- LICENSE
|-- pyproject.toml
|-- requirements.txt
`-- .gitignore
PerturbAtlas-K562-v1.0 was constructed using publicly available K562 Perturb-seq data. The current atlas is based on curated perturbation signatures derived from the following study:
Norman TM, Horlbeck MA, Replogle JM, Ge AY, Xu A, Jost M, Gilbert LA, Weissman JS.
Exploring genetic interaction manifolds constructed from rich single-cell phenotypes.
Science. 2019;365(6455):786-793.
DOI: https://doi.org/10.1126/science.aax4438
The processed dataset used during atlas construction was obtained from:
https://www.kaggle.com/datasets/alexandervc/scrnaseq-crisprperturbseq-normanweissman
We gratefully acknowledge the authors of the original Perturb-seq study for making these data publicly available.
If you use PerturbAtlas in your research, please cite both the software and the reference dataset used to construct the released atlas.
PerturbAtlas
Ufuk Solmaz.
PerturbAtlas: A lightweight Python framework for perturbation signature retrieval and biologically interpretable program inference from transcriptomic gene sets.
GitHub repository (current release)
https://github.com/UCSOLMAZ/PerturbAtlas
Reference dataset
Norman TM, Horlbeck MA, Replogle JM, Ge AY, Xu A, Jost M, Gilbert LA, Weissman JS.
Exploring genetic interaction manifolds constructed from rich single-cell phenotypes.
Science. 2019;365(6455):786-793.
DOI: https://doi.org/10.1126/science.aax4438
The current release (PerturbAtlas-K562-v1.0) represents the first public version of the framework. Planned future developments include:
- Additional cell-type perturbation atlases
- Expanded benchmarking and validation
- Drug perturbation atlas support
- Improved documentation and tutorials
- PyPI distribution (
pip install perturbatlas)
- Single-cell perturbation atlases
- Cross-cell consensus perturbation atlases
- Automatic atlas download and version management
- Interactive visualization utilities
- Integration with additional transcriptomic resources
PerturbAtlas is released under the MIT License.
You are free to use, modify, and distribute the software in accordance with the terms of the license.
See the LICENSE file for complete license information.
Ufuk C. Solmaz
Developer and maintainer of PerturbAtlas
GitHub: https://github.com/UCSOLMAZ/
For questions, suggestions, bug reports, or collaborations, please open an issue or discussion on the GitHub repository.
Contributions, feature requests, bug reports, and suggestions are welcome.
If you encounter a bug or have ideas for improving PerturbAtlas, please open an issue or submit a pull request.
Community contributions are greatly appreciated.