π Project website: robertslab.github.io/project-lake-trout β the annotation & phenotype landing page, top candidate genes, GO enrichment, interpretation guardrails, and links to both interactive genome browsers.
Comprehensive genomic analysis of Salvelinus namaycush (lake trout) comparing two distinct ecotypes: lean and siscowet. This repository contains multiple integrated analyses including:
- RNAseq differential expression analysis using parasitized/non-parasitized liver tissue
- PacBio HiFi DNA methylation profiling and differential methylation analysis
- Presence-Absence Variation (PAV) analysis to identify structural genomic variations
- Functional annotation & phenotype interpretation linking DMRs and PAVs to candidate genes
- Interactive genome browser for visualizing genomic features
- Assembly: GCF_016432855.1 (SaNama_1.0)
- Species: Salvelinus namaycush (Lake Trout)
- Source: NCBI Genome
Analysis of liver RNAseq data from parasitized and non-parasitized samples (NCBI BioProject PRJNA316738) to identify:
- Differentially expressed genes (DEGs) between subspecies
- Differentially expressed transcripts (DETs) and alternative isoforms
- Expression differences related to parasite status
Key Results:
- 202 differentially expressed transcripts (p < 0.05)
- Analysis performed using Ballgown
- See
analyses/README.mdfor detailed results
Analysis Files:
code/01-ballgown-analysis.Rmd- Primary differential expression analysiscode/02-gene-explore.qmd- Gene-specific exploration
Whole-genome DNA methylation profiling using PacBio HiFi sequencing with 5mC modification calling:
- Sample-level methylation profiles for both ecotypes
- Differential methylation analysis between lean and siscowet
- Identification of Differentially Methylated Regions (DMRs)
Key Results:
- 540,040 CpG sites tested
- 4,440 significant differentially methylated cytosines (DMCs, p < 0.05)
- 302 Differentially Methylated Regions (DMRs)
- 20 hypermethylated in siscowet
- 282 hypomethylated in siscowet
Analysis Files:
code/04-pacbio/- PacBio workflow (alignment, QC, methylation calling)code/10-mCG-call.Rmd- Methylation callingcode/13.3-hifiasm-differential-methylation-plan.md- Plan for extending differential methylation analysis to ecotype-specific hifiasm assembliescode/14-diff-meth.Rmd- Differential methylation analysiscode/14-diff-meth.py- Python implementation for DMR identification
Genome-wide structural variation analysis identifying insertions and deletions between ecotypes:
- Coverage-based detection of absent regions (deletions)
- CIGAR-based detection of novel insertions
- Ecotype-specific and shared structural variants
Key Results:
- Lean-specific: 996,228 variants (770,891 insertions + 225,337 deletions)
- Siscowet-specific: 1,332,705 variants (1,086,799 insertions + 245,906 deletions)
- Shared: 878,372 variants common to both ecotypes
Analysis Files:
code/11-pav.Rmd- PAV identification analysiscode/12-pav.py- Python implementationcode/15-diff-pav.py- Differential PAV analysis
Web-based genome browsers for exploring PAV and methylation data across the SaNama_1.0 assembly.
Features:
- Visualize ecotype-specific insertions and deletions
- View differential methylation tracks (CpG methylation, 8 samples; DMRs)
- Gene annotations with interactive navigation
- Mobile-responsive design
Live Demos:
- IGV.js (quick exploration): robertslab.github.io/project-lake-trout/genome-browser/
- JBrowse 2 (advanced analysis): robertslab.github.io/project-lake-trout/jbrowse/
Documentation: genome-browser/README.md
The interpretive layer that turns DMR and PAV coordinates into genes and plausible ecotype phenotypes. A genome-wide RefSeq annotation backbone is built, differential features are assigned to genes with positional context, candidates are ranked, GO over-representation is tested, and the results are synthesized into hypothesized phenotype axes.
Key Results:
- 46,359 genes annotated (46,231 with a product, 34,367 with β₯1 GO term)
- 2,036 candidate genes within 5 kb of a DMR/DMC/stringent-PAV; 4 convergent (DMR and
stringent siscowet deletion), led by
znf883-like(LOC120032414) - Exonic siscowet deletions in lipid-metabolism genes (
angptl5,mogat2, epoxide hydrolase 1) - Most defensible GO enrichment (deletion set): calcium ion transport (FDR 3Γ10β»Β³), with gene-length and lean-reference caveats carried throughout
- 0 DMCs survive q < 0.1 β interpretation leads with DMR-level and stringent-PAV sets
Caveat: the reference is a lean-background doubled-haploid genome, so siscowet-specific deletions are divergence-inflated and not magnitude-comparable to lean. All links are associations on a single reference, not validated mechanisms.
Analysis Files:
code/18-diff-annotation-phenotype-plan.md- Plan of workcode/18-build-gene-function-table.py- Annotation backbonecode/18.1-assign-features-to-genes.py- DMR/PAV β gene assignmentcode/18.2-integrate-candidates.py- Ranked candidate integrationcode/18.3-go-enrichment.py- GO over-representationcode/18-diff-annotation-phenotype.Rmd- Phenotype synthesis reportanalyses/18-annotation/README.md- Outputs & provenance
project-lake-trout/
βββ index.html / index.qmd # Project website landing page (GitHub Pages)
βββ code/ # Analysis scripts and notebooks
β βββ 01-ballgown-analysis.Rmd # RNAseq differential expression
β βββ 02-gene-explore.qmd # Gene exploration
β βββ 04-pacbio/ # PacBio HiFi analysis workflow
β βββ 05-pacbio-align.Rmd # PacBio alignment
β βββ 07-pacbio-QC.Rmd # PacBio quality control
β βββ 10-mCG-call.Rmd # Methylation calling
β βββ 11-pav.Rmd # PAV analysis
β βββ 13.3-hifiasm-differential-methylation-plan.md # Plan for hifiasm-based differential methylation
β βββ 14-diff-meth.Rmd/py # Differential methylation
β βββ 15-diff-pav.py # Differential PAV
β βββ 18-* # Annotation, candidate integration, GO, phenotype
βββ data/ # Raw data and metadata
β βββ SraRunTable.csv # RNAseq sample information
β βββ ballgown-metadata.csv # Ballgown metadata
β βββ *.bed # Gene annotations
βββ analyses/ # Analysis outputs and results
β βββ DEG-*.csv # Differentially expressed genes
β βββ DET-*.csv # Differentially expressed transcripts
β βββ 04-pacbio/ # PacBio analysis outputs
β βββ 14-diff-meth/ # Methylation results
β βββ 15-diff-pav/ # PAV results
βββ genome-browser/ # Interactive IGV.js genome browser
β βββ index.html # Browser interface
β βββ prepare_data.py # Data preparation script
β βββ data/ # Browser data files
βββ jbrowse/ # JBrowse 2 genome browser
βββ figures/ # Generated figures and plots
See README files in each subdirectory for detailed information about specific analyses.
- Lean Nonparasitized: NPLL32, NPLL34, NPLL44, NPLL46, NPLL56, NPLL61
- Lean Parasitized: PLL20, PLL31, PLL43, PLL55, PLL59, PLL62
- Siscowet Nonparasitized: NPSL15, NPSL24, NPSL29, NPSL36, NPSL50, NPSL58
- Siscowet Parasitized: PSL13, PSL16, PSL35, PSL49, PSL53, PSL63
See data/SraRunTable.csv for complete RNAseq sample metadata.
- Lean: bc2041, bc2068, bc2069, bc2070
- Siscowet: bc2071, bc2072, bc2073, bc2096
The following data processing steps were performed prior to analyses in this repository:
- R/RStudio: Statistical analysis and visualization
- Ballgown: Differential expression analysis
- tidyverse: Data manipulation
- Python: Data processing and analysis pipelines
- pysam, pandas, numpy: Data manipulation
- modbampy: Modified base parsing
- PacBio Tools: HiFi sequencing analysis
- pbmm2: Read alignment
- pb-CpG-tools: Methylation calling
- IGV.js: Interactive genome visualization
- Quarto/RMarkdown: Reproducible analysis notebooks
If you use data or methods from this repository, please cite:
- Project website: robertslab.github.io/project-lake-trout
- Lake Trout RNAseq data: NCBI BioProject PRJNA316738
- Reference genome: GCF_016432855.1 (SaNama_1.0)
Roberts Lab
School of Aquatic and Fishery Sciences
University of Washington
For questions or issues, please open a GitHub issue in this repository.