-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
classify plant reads in metagenomes #3172
Comments
Hi Gabri, sorry for ignoring your issue for so long 😭 Short version - we don't have anything formal for plants, BUT if you can find a listing of all the things you want - maybe an assembly_summary file? - we can put together a recipe for sketching it quickly. Sound good? |
Hi Titus, Thanks a lot again! |
Hi Titus, Any chances of further progresses on this? I would love to help if needed? Thanks! Gabri |
Hi Gabri, I went ahead and made this database for you. You can download the files here:
Lineage spreadsheet for sourmash Two genbank accessions did not have genome FASTAs associated with them. They are:
The commands and files I used are here: https://github.com/bluegenes/2024-ds-plant.
Then, I ran
There are a lot of very large genomes in here! It took a few days to run 😅 . If you have an updated version with any genomes that have been added since July, I'm happy to sketch those and update this db for you. Please let me know how it goes - I haven't actually tested these with anything yet! Tessa |
@bluegenes this is sooo cool!!! thanks so much!!! @ctb and thanks Titus too you rock!!!!! I will let you know how it goes! |
Hello @bluegenes. One last question. I successfully run gather between my metagenomes and the databases. Now I want to run taxonomy. But I get an error that the csv file is not formatted correctly. Nevertheless, i have troubles finding the correct way to format it from the instruction website. This is the command I run and the error I get:
error:
|
created with directsketch; see https://github.com/bluegenes/2024-ds-plant for details ref #3172 --------- Co-authored-by: C. Titus Brown <[email protected]>
hi @gabridinosauro sorry, we need to format the taxdb for you/make the lineages CSV - there are some instructions in the plant repo mentioned above, but I'm not sure if they'll work. It's on our list! |
@gabridinosauro I made you a tax file that should work, though I didn't test it. Try downloading here: https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/genbank-plant-2024-07/genbank-plants-2024-07.lineages.csv.gz let me know if it works! |
it worked thanks!!! I downloaded the file from the github repo tho, not from that link because I did not have permissions from that link. Overall, the method seem to work although I seem to get some false positives. I get hits of tropical plants when my wild rodents are living in Canada. I think it might be because there is bacterial contamination in plant genomes? |
hi @gabridinosauro we have some new code that makes it easy to sketch all genomes under any given NCBI taxonomic rank; it's not particularly usable by others yet, but I wanted to link it for you, since you were part of the inspiration. see: #3487 basically, if you can locate one or more NCBI taxonomic IDs under which you want all the things sketched, we can produce said database :) |
@gabridinosauro updated reference databases described here: #3504 |
Dear Sourmash team,
Hope you are all good. I have a project where I have some shotgun metagenomics data of wild rodents.
I want to see if I can classify reads to plant genomes, to have an idea of their diet.
Is it possible to do it with sourmash?
I suppose I would have to make my own database as I do not see any databases containing plants already available.
Thanks in advance.
Gabriele
The text was updated successfully, but these errors were encountered: