Skip to content

ChaissonLab/CLASH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLASH is a model designed to quantify chromatin loop strength and facilitate cross-sample loop strength comparisons from Hi-C data. It is not a loop caller, and is instead designed to be used after using other software to call chromatin loops. CLASH will score each provided locus for each provided sample with a loop score value between 0-1 relative to the other loops in the dataset, with 0 representing the weakest loop in the set and 1 representing the strongest loop in the set. CLASH scores incorporate data from the same locus across different samples, retaining local comparability.

Although the suggested usage of CLASH is as a continous scorer, we recognize that certain tasks do require binarization. As CLASH scores are dataset specific, the score that represents the decision boundary between a loop and no-loop is also dataset specific. Thus, we provide the decision boundary score for each CLASH run as an output text file.

Users will be expected to provide two paths:

  1. input: This is a csv file with format 'genomic position, sample'. genomic position should have format {chr}:{start}-{end}. Please see provided 'ExampleLoops.csv' for an example. This should be a list of all previously called loop positions, with a row for each sample. CLASH will score each locus only for the provided samples, so it is imperative to include a row for each sample at each locus to account for loop caller errors.

  2. hic-dir: This is the path to a directory containing Hi-C matricies. These matricies should be sample and chromosome specific and should be named "{sample}{binSize}{chrom}.txt". Matricies can be generated by running cooler balance and cooler dump on cool or mcool files.

The simplest example command demonstrating CLASH usage is:

python clash.py
--input ExampleLoops.tsv
--hic-dir ./hic_file_directory

The default model used is clash_xgb_model.json, which was trained and tested on Hi-C data binned at 2 kb resolution, and thus currently should only be used on analysis at 2kb resolution. We hope to test and expand usability to lower resolutions in the near future. Thus, in the future, we may add additional models to CLASH which can be controlled using the --model parameter or support for other resolutions which can be controlled using the --binsize parameter (default is 2000). Use the --output parameter to name the output CLASH score tsv file as you desire (default is "AllScores.tsv") Use the --boundaryFile parameter to name the output decision boundary text file as you desire (default is "decision_boundary.txt")

A list of required dependencies is included in requirements.txt.

About

Chromatin Loop Across-sample Score Harmonizer

Resources

License

GPL-3.0, MIT licenses found

Licenses found

GPL-3.0
LICENSE
MIT
LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors