This document was written by Rebecca Chase. Code written and executed by Allen Yen and Simona Sarafinovska
This code takes fastq files generated from a Bulk Calling Cards experiment and prepares the reads for alignment. After alignment, the reads are tagged with barcodes to get a list of reads with transposon insertions.
This code relies on the following tools and packages: python3 cutadapt version 2.9 umi-tools version 1.0.0 bowtie2 version 2.4.2 bwa version 0.7.17 samtools version 1.13 HOMER (http://homer.ucsd.edu/homer/download.html)
Code consists of an sbatch file where the following variables must be assigned:
- GENOME
- PROJECT_DIR
- BOWTIE2_INDEX_PATH_AND_PREFIX
- TWO_BIT_PATH
- A manifest file that allows information to be pulled into the sbatch script.
- A SRT Barcode text file where one can input SRT barcodes used
- .py files used to filter files, tag reads, and create/sort bam files
HOMER was run using findMotifsGenome.pl with the output BED file, the mm10 genome, and a size of 200.
This code was eventually worked into the project nf-core/callingcards: https://github.com/nf-core/callingcards See the associated paper here: https://doi.org/10.1002/cpz1.883