You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, a big thank you to the developers of this useful and versatile tool. I am using sourmash to detect and remove undesired prokaryotes from WGS datasets in my de-novo assembly pipeline of prokaryotic genomes. I was thinking I could use it in a similar way to detect and remove viral and eukaryotic contamination. My first thought was to build a sketch file for NCBI's RefSeq database containing genomes from microbial viruses and microbial eukaryotes, however, I have two doubts:
First, I am not sure if sourmash is adequate for this analysis in such distant lineages.
Second, I am not sure of the computational requirements to run such an analysis. After all, I run my pipeline in a standard desktop PC (16Gb RAM), and I don't even know if building the sketch file on my PC will be possible.
If you have any thoughts or recommendations I would greatly appreciate them.
The text was updated successfully, but these errors were encountered:
hi @asaldivar93, the short answer is that it should work technically, within the memory constraints you have, but may not be able to do what you want scientifically!
We do have slightly out of date NCBI databases for GenBank fungi, protozoa, and viruses that I'd be happy to make available to you, also. They're actually pretty small. (We're working on building new & updated ones, too; I'll take your issue as a vote to work harder on that :).
Scientifically, you're going to run into two problems in using sourmash for this -
first, the sets of known genomes, especially those in NCBI, aren't really that complete for euk/viruses.
second, sourmash's ability to reach out across evolutionary distance to detect matches is not great. At the moment it's effectively a database lookup tool. So if you don't have the right species in the database, you're not going to find it as a contaminant.
Hi, a big thank you to the developers of this useful and versatile tool. I am using sourmash to detect and remove undesired prokaryotes from WGS datasets in my de-novo assembly pipeline of prokaryotic genomes. I was thinking I could use it in a similar way to detect and remove viral and eukaryotic contamination. My first thought was to build a sketch file for NCBI's RefSeq database containing genomes from microbial viruses and microbial eukaryotes, however, I have two doubts:
First, I am not sure if sourmash is adequate for this analysis in such distant lineages.
Second, I am not sure of the computational requirements to run such an analysis. After all, I run my pipeline in a standard desktop PC (16Gb RAM), and I don't even know if building the sketch file on my PC will be possible.
If you have any thoughts or recommendations I would greatly appreciate them.
The text was updated successfully, but these errors were encountered: