Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloading & sketching genomes by NCBI taxonomic hierarchy #3487

Closed
ctb opened this issue Jan 13, 2025 · 2 comments
Closed

downloading & sketching genomes by NCBI taxonomic hierarchy #3487

ctb opened this issue Jan 13, 2025 · 2 comments
Labels
fyi Information that is interesting or useful

Comments

@ctb
Copy link
Contributor

ctb commented Jan 13, 2025

here is code that uses the NCBI REST API to grab genome links and produce directsketch-compatible input CSVs for sketching:

https://github.com/ctb/2025-ncbi-rest-api

Related issues:

@ctb ctb added the fyi Information that is interesting or useful label Jan 13, 2025
@ctb
Copy link
Contributor Author

ctb commented Jan 13, 2025

note also that that repo contains scripts for comparing directsketch input CSVs/ ("links CSVs") and sketch databases, and outputting the differences; as well as subtracting links CSVs. So we can sketch only things that haven't already been sketched, and build subtree tax databases like "all euks, minus metazoa and fungi and plants".

@ctb
Copy link
Contributor Author

ctb commented Jan 22, 2025

closing in favor of #3504, which has files, instructions for using them, and links to relevant repos 🎉

@ctb ctb closed this as completed Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fyi Information that is interesting or useful
Projects
None yet
Development

No branches or pull requests

1 participant