fix: remove STAR protrude params#261
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
=======================================
Coverage 87.50% 87.50%
=======================================
Files 1 1
Lines 56 56
=======================================
Hits 49 49
Misses 7 7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Waiting on confirmation from Bahman Afsari that this fixed his error |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent RSEM quantification failures on custom references with short transcripts (e.g., HPV) by removing STAR alignment parameters that can allow alignments to protrude past transcript boundaries.
Changes:
- Removed
--alignEndsProtrude/--peOverlapNbasesMinfrom some STAR invocations in the single-end and paired-end Snakemake rules. - Added a changelog entry describing the user-facing fix and linking to the PR.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| workflow/rules/single-end.smk | Removes protrude-related STAR params for the star_basic path. |
| workflow/rules/paired-end.smk | Removes protrude-related STAR params in specific STAR calls (but not all relevant branches). |
| CHANGELOG.md | Documents the fix as a user-facing change. |
| --sjdbGTFfile {params.gtffile} \ | ||
| --limitSjdbInsertNsj {params.nbjuncs} \ | ||
| --quantMode TranscriptomeSAM GeneCounts \ | ||
| --outSAMtype BAM SortedByCoordinate \ | ||
| --alignEndsProtrude 10 ConcordantPair \ | ||
| --peOverlapNbasesMin 10 \ | ||
| --outTmpDir=${{tmp}}/STARtmp_{wildcards.name} \ | ||
| --sjdbOverhang ${{readlength}} |
There was a problem hiding this comment.
This change removes --alignEndsProtrude / --peOverlapNbasesMin for the star_basic branch, but the same parameters are still present in the multi-sample 2-pass STAR path in this file (e.g., in star1p at ~493–494 and star2p at ~613–614). That means the RSEM “hung over the end of transcript” failure can still occur when options.star_2_pass_basic is false. Please remove these parameters consistently from the other STAR invocations as well (or gate them behind a config option if they’re still needed in some modes).
| --readFilesIn {input.file1} {input.file2} \ | ||
| --readFilesCommand zcat \ | ||
| --runThreadN {threads} \ | ||
| --outFileNamePrefix {params.prefix}. \ | ||
| --outSAMtype BAM Unsorted \ | ||
| --alignEndsProtrude 10 ConcordantPair \ | ||
| --peOverlapNbasesMin 10 \ | ||
| --sjdbGTFfile {params.gtffile} \ | ||
| --outTmpDir=${{tmp}}/STARtmp_{wildcards.name} \ |
There was a problem hiding this comment.
The protrude-related STAR params are removed in this hunk, but --alignEndsProtrude 10 ConcordantPair and --peOverlapNbasesMin 10 are still present in the star_basic command used when options.star_2_pass_basic is enabled (see same file around ~384–385). If users run paired-end with star_2_pass_basic: true, they may still hit the RSEM boundary issue described in #260. Please remove these parameters from the star_basic STAR call as well (and consider also whether the Arriba STAR call at ~689 should keep --peOverlapNbasesMin).
|
Bahman confirmed this version works for him. |
Changes
Remove STAR parameters
--alignEndsProtrudeand--peOverlapNbasesMin, which cause reads to protrude over transcript boundaries. This was noticed by a user with a custom genome containing HPV which has very short transcripts.Issues
fixes #260
PR Checklist
(
Strikethroughany points that are not applicable.)[ ] Update docs if there are any API changes.CHANGELOG.mdwith a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/