This directory contains research and documentation of public datasets with RNA-seq and DNA methylation data for Crassostrea gigas (Pacific oyster) and Magallana gigas (updated genus name).
code/- Scripts for downloading the identified datasets (seecode/HOW-TO.md)ncbi-datasets/- Documentation of datasets found in NCBI databasesliterature-review/- Information from published studiesdataset-summaries/- Consolidated summaries of identified datasetsfile-size-estimates/- Estimates of cumulative file sizes for datasets
This research focuses on identifying publicly available datasets containing:
- RNA-seq data
- DNA methylation data (including bisulfite sequencing, MeDIP-seq, etc.)
For each dataset, we aim to collect:
- Number of samples
- Tissue types analyzed
- Environmental conditions
- Estimated cumulative file size
- Access information (SRA accessions, GEO series, etc.)
- Crassostrea gigas
- Magallana gigas (updated genus classification)
- RNA-seq
- Transcriptome
- Bisulfite sequencing
- DNA methylation
- MeDIP-seq
- WGBS (Whole Genome Bisulfite Sequencing)
- RRBS (Reduced Representation Bisulfite Sequencing)
The code/ directory contains a downloader that fetches the identified DNA
methylation datasets from NCBI SRA. Sequencing runs are discovered live from
NCBI (no accession numbers are hardcoded), so you always get the real files.
cd code
./install_dependencies.sh # one-time: install SRA Toolkit
python3 download_methylation_data.py --list # see available datasets
python3 download_methylation_data.py --dataset wgbs_roberts --dry-run # preview safely
python3 download_methylation_data.py --dataset wgbs_roberts # downloadNew to this? Start with the step-by-step code/HOW-TO.md.
For all options and troubleshooting, see code/USAGE.md.
- Created: December 2024
- Updated: June 2026 — added data downloader (
code/)