Skip to content

RobertsLab/project-lake-trout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

155 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Lake Trout Genomics: Comparative Analysis of Lean and Siscowet Ecotypes

🌐 Project website: robertslab.github.io/project-lake-trout β€” the annotation & phenotype landing page, top candidate genes, GO enrichment, interpretation guardrails, and links to both interactive genome browsers.

Overview

Comprehensive genomic analysis of Salvelinus namaycush (lake trout) comparing two distinct ecotypes: lean and siscowet. This repository contains multiple integrated analyses including:

  • RNAseq differential expression analysis using parasitized/non-parasitized liver tissue
  • PacBio HiFi DNA methylation profiling and differential methylation analysis
  • Presence-Absence Variation (PAV) analysis to identify structural genomic variations
  • Functional annotation & phenotype interpretation linking DMRs and PAVs to candidate genes
  • Interactive genome browser for visualizing genomic features

Reference Genome

  • Assembly: GCF_016432855.1 (SaNama_1.0)
  • Species: Salvelinus namaycush (Lake Trout)
  • Source: NCBI Genome

Key Analyses

1. RNAseq Differential Expression

Analysis of liver RNAseq data from parasitized and non-parasitized samples (NCBI BioProject PRJNA316738) to identify:

  • Differentially expressed genes (DEGs) between subspecies
  • Differentially expressed transcripts (DETs) and alternative isoforms
  • Expression differences related to parasite status

Key Results:

  • 202 differentially expressed transcripts (p < 0.05)
  • Analysis performed using Ballgown
  • See analyses/README.md for detailed results

Analysis Files:

2. PacBio HiFi DNA Methylation Analysis

Whole-genome DNA methylation profiling using PacBio HiFi sequencing with 5mC modification calling:

  • Sample-level methylation profiles for both ecotypes
  • Differential methylation analysis between lean and siscowet
  • Identification of Differentially Methylated Regions (DMRs)

Key Results:

  • 540,040 CpG sites tested
  • 4,440 significant differentially methylated cytosines (DMCs, p < 0.05)
  • 302 Differentially Methylated Regions (DMRs)
    • 20 hypermethylated in siscowet
    • 282 hypomethylated in siscowet

Analysis Files:

3. Presence-Absence Variation (PAV)

Genome-wide structural variation analysis identifying insertions and deletions between ecotypes:

  • Coverage-based detection of absent regions (deletions)
  • CIGAR-based detection of novel insertions
  • Ecotype-specific and shared structural variants

Key Results:

  • Lean-specific: 996,228 variants (770,891 insertions + 225,337 deletions)
  • Siscowet-specific: 1,332,705 variants (1,086,799 insertions + 245,906 deletions)
  • Shared: 878,372 variants common to both ecotypes

Analysis Files:

4. Interactive Genome Browsers

Web-based genome browsers for exploring PAV and methylation data across the SaNama_1.0 assembly.

Features:

  • Visualize ecotype-specific insertions and deletions
  • View differential methylation tracks (CpG methylation, 8 samples; DMRs)
  • Gene annotations with interactive navigation
  • Mobile-responsive design

Live Demos:

Documentation: genome-browser/README.md

5. Functional Annotation & Phenotype Interpretation

The interpretive layer that turns DMR and PAV coordinates into genes and plausible ecotype phenotypes. A genome-wide RefSeq annotation backbone is built, differential features are assigned to genes with positional context, candidates are ranked, GO over-representation is tested, and the results are synthesized into hypothesized phenotype axes.

Key Results:

  • 46,359 genes annotated (46,231 with a product, 34,367 with β‰₯1 GO term)
  • 2,036 candidate genes within 5 kb of a DMR/DMC/stringent-PAV; 4 convergent (DMR and stringent siscowet deletion), led by znf883-like (LOC120032414)
  • Exonic siscowet deletions in lipid-metabolism genes (angptl5, mogat2, epoxide hydrolase 1)
  • Most defensible GO enrichment (deletion set): calcium ion transport (FDR 3Γ—10⁻³), with gene-length and lean-reference caveats carried throughout
  • 0 DMCs survive q < 0.1 β€” interpretation leads with DMR-level and stringent-PAV sets

Caveat: the reference is a lean-background doubled-haploid genome, so siscowet-specific deletions are divergence-inflated and not magnitude-comparable to lean. All links are associations on a single reference, not validated mechanisms.

Analysis Files:


Repository Structure

project-lake-trout/
β”œβ”€β”€ index.html / index.qmd  # Project website landing page (GitHub Pages)
β”œβ”€β”€ code/                    # Analysis scripts and notebooks
β”‚   β”œβ”€β”€ 01-ballgown-analysis.Rmd       # RNAseq differential expression
β”‚   β”œβ”€β”€ 02-gene-explore.qmd            # Gene exploration
β”‚   β”œβ”€β”€ 04-pacbio/                     # PacBio HiFi analysis workflow
β”‚   β”œβ”€β”€ 05-pacbio-align.Rmd            # PacBio alignment
β”‚   β”œβ”€β”€ 07-pacbio-QC.Rmd               # PacBio quality control
β”‚   β”œβ”€β”€ 10-mCG-call.Rmd                # Methylation calling
β”‚   β”œβ”€β”€ 11-pav.Rmd                     # PAV analysis
β”‚   β”œβ”€β”€ 13.3-hifiasm-differential-methylation-plan.md # Plan for hifiasm-based differential methylation
β”‚   β”œβ”€β”€ 14-diff-meth.Rmd/py            # Differential methylation
β”‚   β”œβ”€β”€ 15-diff-pav.py                 # Differential PAV
β”‚   └── 18-*                           # Annotation, candidate integration, GO, phenotype
β”œβ”€β”€ data/                    # Raw data and metadata
β”‚   β”œβ”€β”€ SraRunTable.csv                # RNAseq sample information
β”‚   β”œβ”€β”€ ballgown-metadata.csv          # Ballgown metadata
β”‚   └── *.bed                          # Gene annotations
β”œβ”€β”€ analyses/                # Analysis outputs and results
β”‚   β”œβ”€β”€ DEG-*.csv                      # Differentially expressed genes
β”‚   β”œβ”€β”€ DET-*.csv                      # Differentially expressed transcripts
β”‚   β”œβ”€β”€ 04-pacbio/                     # PacBio analysis outputs
β”‚   β”œβ”€β”€ 14-diff-meth/                  # Methylation results
β”‚   └── 15-diff-pav/                   # PAV results
β”œβ”€β”€ genome-browser/          # Interactive IGV.js genome browser
β”‚   β”œβ”€β”€ index.html                     # Browser interface
β”‚   β”œβ”€β”€ prepare_data.py                # Data preparation script
β”‚   └── data/                          # Browser data files
β”œβ”€β”€ jbrowse/                 # JBrowse 2 genome browser
└── figures/                 # Generated figures and plots

See README files in each subdirectory for detailed information about specific analyses.


Sample Information

RNAseq Samples

  • Lean Nonparasitized: NPLL32, NPLL34, NPLL44, NPLL46, NPLL56, NPLL61
  • Lean Parasitized: PLL20, PLL31, PLL43, PLL55, PLL59, PLL62
  • Siscowet Nonparasitized: NPSL15, NPSL24, NPSL29, NPSL36, NPSL50, NPSL58
  • Siscowet Parasitized: PSL13, PSL16, PSL35, PSL49, PSL53, PSL63

See data/SraRunTable.csv for complete RNAseq sample metadata.

PacBio HiFi Samples (for methylation and PAV analysis)

  • Lean: bc2041, bc2068, bc2069, bc2070
  • Siscowet: bc2071, bc2072, bc2073, bc2096

Pre-Analysis Data Processing

The following data processing steps were performed prior to analyses in this repository:


Technologies Used

  • R/RStudio: Statistical analysis and visualization
    • Ballgown: Differential expression analysis
    • tidyverse: Data manipulation
  • Python: Data processing and analysis pipelines
    • pysam, pandas, numpy: Data manipulation
    • modbampy: Modified base parsing
  • PacBio Tools: HiFi sequencing analysis
    • pbmm2: Read alignment
    • pb-CpG-tools: Methylation calling
  • IGV.js: Interactive genome visualization
  • Quarto/RMarkdown: Reproducible analysis notebooks

Citation

If you use data or methods from this repository, please cite:


Contact

Roberts Lab
School of Aquatic and Fishery Sciences
University of Washington

For questions or issues, please open a GitHub issue in this repository.