salmon seems to remove transcripts #1500

jedi-bogate · 2025-01-31T13:56:48Z

Hi,

I'm running a pipeline for unique mapping in order to map TEs to the reference genome. I'm quantifying using transcript ID. When I run the pipeline, salmon seems to have removed some of the transcripts that are present in the annotation file when quantifying.

I read that this happens because salmon removes duplicates, however I'm unable to find the duplicate_clusters.tsv file that is supposed to be generated when that happens. I have also tried supplying the --keepDuplicates option to salmon index and salmon quant. However, none of it worked.

I also tried running the pipeline with star_rsem instead of star_salmon. The same thing seems to be happening as with salmon. Perhaps it isn't a quantification problem.

What are my options here? Should I just remove the transcripts that don't show up in salmon.merged.transcript_counts.tsv from the annotation file?

Thank you in advance for your answer! I'm not sure if this issue has been raised/resolved already (sorry if it has), I haven't been able to find a solution online.

These are the line counts for the annotation and each of the quantification output files:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salmon seems to remove transcripts #1500

salmon seems to remove transcripts #1500

jedi-bogate commented Jan 31, 2025

salmon seems to remove transcripts #1500

salmon seems to remove transcripts #1500

Comments

jedi-bogate commented Jan 31, 2025