Skip to content

immunogenomics/harmony

Repository files navigation

Harmony

CRAN status DOI

Integration of large, complex single-cell datasets with Harmony2

Check out our pre-print in biorxiv:

For Python users, check out the harmonypy package by Kamil Slowikowski.

System requirements

Harmony has been tested on R versions >= 4.2. Please consult the DESCRIPTION file for more details on required R packages. Harmony has been tested on Linux, OS X, and Windows platforms.

Installation

To install Harmony2, from CRAN (takes approximately 15 seconds):

install.packages("harmony")

To install Harmony2 from github (development version) from github directly (should take less than 5 minutes):

devtools::install_github("immunogenomics/harmony", build_vignettes=TRUE)

Usage

Harmony is designed to be user-friendly and supports some SingleCellExperiment and Seurat R analysis pipelines. Alternatively, it can be used in standalone mode.

Quick Start

Standalone Mode

Check out this vignette for a quick start tutorial which demonstrates the usage of the tool in standalone mode (~4 seconds).

At minimum the following parameters need to be specified to achieve an integration. For a few samples < 100K cells integration should finish within seconds.

library(harmony)
my_harmony_embeddings <- RunHarmony(my_pca_embeddings, meta_data, "dataset")

Seurat Objects

By default, the harmony API works on Seurats PCA cell embeddings and corrects them. You can run Harmony within your Seurat workflow with RunHarmony(). Prior RunHarmony() the PCA cell embeddings need to be precomputed through Seurat's API. For downstream analyses, use the harmony embeddings instead of pca.

For example, the following snippet run Harmony and then calculates UMAP of the corrected input embeddings:

seuratObj <- RunHarmony(seuratObj, "dataset")
seuratObj <- RunUMAP(seuratObj, reduction = "harmony")

For a more detailed overview of the RunHarmony() Seurat interface check, the Seurat vignette

Harmony with two or more covariates

Harmony can integrate over multiple covariates. To do this, specify a vector covariates to integrate.

my_harmony_embeddings <- RunHarmony(
  my_pca_embeddings, meta_data, c("dataset", "donor", "batch_id")
)

Do the same with your Seurat object:

seuratObject <- RunHarmony(seuratObject, c("dataset", "donor", "batch_id"))

Advanced tutorial

The examples above all return integrated PCA embeddings. We created a detailed walkthrough that explores the internal data structures and mechanics of the Harmony algorithm.

Performance Notes

  1. OpenBLAS will make a substantial performance difference. If you are not using OpenBLAS have a look at the PERFORMANCE.md.

  2. For very large datasets (>10M cells) see the OpenMP notes see PERFORMANCE.md our github channel.

About

Fast, sensitive and accurate integration of single-cell data with Harmony

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors