Analysis pipeline for MERSCOPE spatial transcriptomics data, covering spatial embedding, batch integration, cell type annotation, and downstream analyses.
Core pipeline — runs on every dataset.
| Notebook | Description |
|---|---|
Manual_slide_cluster.ipynb |
Optional Stage 0. Interactive Napari-based tool to draw ROIs on a spatial section and split it into separate .h5ad files. Use when one slide contains multiple distinct tissue sections. |
01_per_section_banksy.ipynb |
Preprocesses each section (normalize → log1p) and runs BANKSY spatial embedding (λ=0.8). Saves two files per section: *_banksy_genes.h5ad (genes only) and *_banksy_full.h5ad (1500 features including neighbourhood). |
02_global_integration_scvi.ipynb |
Concatenates all *_banksy_genes.h5ad files, runs QC, trains scVI for batch correction, and clusters with Leiden at multiple resolutions. |
02_1_global_integration_scvi.ipynb |
Re-integrates a cell subset (e.g. immune cells) with refined scVI parameters. Supports Apple Silicon MPS acceleration. |
04_banksy_aware_integration.ipynb |
Integrates all *_banksy_full.h5ad files using PCA + Harmony on the full 1500-feature BANKSY matrix. Better for identifying spatially-defined tissue domains. |
Identifies tissue domains that recur across conditions.
| Notebook | Description |
|---|---|
Banksy_domain_analysis.ipynb |
Runs BANKSY + Harmony per group (one injury type × timepoint), extracts domain expression profiles, then clusters domains cross-condition into meta-domains via hierarchical correlation clustering. |
Cell type annotation and downstream analyses on the integrated object.
| Notebook | Description |
|---|---|
Tangram.ipynb |
Maps a multiome scRNA-seq reference (immune cells + fibroblasts) onto spatial cells using Tangram to transfer fine-grained cell type labels. |
Augur.ipynb |
Ranks immune cell types by transcriptional perturbation (AUC) across injury models and phases using Augur (via pertpy). |
Commot.ipynb |
Infers spatial cell-cell communication using COMMOT and the CellChat ligand-receptor database (mouse, 50 µm threshold). |
Pseudobulk DEG analysis between injury models.
| Notebook | Description |
|---|---|
DEG.ipynb |
Pseudobulk differential expression between two injury conditions, stratified by phase. Three methods provided: Wilcoxon (quick exploration), edgeR (recommended), DESeq2 (Python-native alternative). Includes automatic bias correction for transcript detection efficiency differences across slides. |
Utilities for moving data between R and Python.
| File | Description |
|---|---|
Data_extraction_for_Python_conve… |
Exports count matrices and metadata from a Seurat object to files readable by Python/AnnData. |
Seurat_object_conversion_part2… |
Converts a Seurat object to AnnData format. |
Tutorial |
Step-by-step guide for the conversion workflow. |
Two conda environments cover the full pipeline:
| Environment | Used for |
|---|---|
banksy |
01_per_section_banksy, Manual_slide_cluster |
scvi |
All other notebooks |
banksy: scanpy · banksy · scipy · scikit-learn · napari · shapely · magicgui
scvi: scvi-tools · scanpy · pytorch · lightning · pertpy · commot · tangram-sc · harmonypy · pydeseq2 (optional)
For DEG with edgeR: R installation with the edgeR Bioconductor package.
Each input .h5ad must have:
adata.X— raw count matrix (integers)adata.obs["x"],adata.obs["y"]— spatial coordinatesadata.obs["batch"]oradata.obs["sample"]— slide/section ID