This repository is used to record the data processing of Syt7 alternative splicing process. Data are from BICNN, Link:https://knowledge.brain-map.org/data/Z0GBA7V12N4J4NNSUHA/collections.
Ensemble mm10 was used for reference_genome and GTF.
R: scisorseqr("https://github.com/noush-joglekar/scisorseqr") and Isosceles("https://github.com/timbitz/Isosceles/tree/devel").
Python: pysam,pandas,os,sys
cmd: Samtools Umi-tools bedtools, minimap2
mm10 mouse Syt7 alternative splicing informtion was retrieved in the Ensemble genome center, more detailed could be found here: https://nov2020.archive.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000024743;r=19:10389090-10453180
1 Get barcode from the short read 10x data. Data has been processed and cell cluster lable was saved in the RObj program.
2 Filter ONT long read data based on cell barcode and get UMI. Only keep the one with barcode and UMI in the processed fastq file.(pipeline:Isoquant) Also generate the bc.tsv that have the cell cluster information
3 Using Minimap2 to mapping the filtered long read data.
4 Add cell barcode(BC) and UMI(UM) tag in the bam file. (script Addtag.py)
5 Based on the UMI tag, using UMI-tools to deduplicate the bam reads.
6 Using Isosceles to do isoform counting based on the filtered bam file. Cell clusters info was retrieved from bc.tsv and used for Pseudo-bulk.
7 Using Biomrt search for the ensemble-transcript-id and annotate it by 'Syt7-2**' version
8 Plot the heatmap based on the result.
Scripts available for HTC is also created. Saved in "/staging/groups/rizvi_group/yuchen/Syt7/Syt_Script_for_HTC/" "/scripts/Main_script_Syt7.sh" and "/sub_files/Main_script_Syt7.sub"