Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -m to seqtk command when low on RAM #63

Open
danejo3 opened this issue Oct 25, 2023 · 1 comment
Open

Add -m to seqtk command when low on RAM #63

danejo3 opened this issue Oct 25, 2023 · 1 comment

Comments

@danejo3
Copy link
Collaborator

danejo3 commented Oct 25, 2023

We had a case where we were trying to downsample 11 billion reads with 1 TB of RAM and seqtk said it needed more RAM.

Below, a suggestion was made to resolve the issue; however, it will run slower.

Usage:   seqtk sample [-2] [-s seed=11] <in.fa> <frac>|<number>

Options: -s INT       RNG seed [11]
         -2           2-pass mode: twice as slow but with much reduced memory
@standage
Copy link
Member

standage commented Oct 25, 2023

It may be easiest to just do the two-pass mode as a matter of course: 2x runtime shouldn't be too bad when x is small, and it looks like it's necessary when x is big. We could try to come up with some kind of threshold (as measured e.g. by file size) that separates the cases best suited for single pass versus the cases where we need two passes to reduce memory, but I'm skeptical about the cost/benefit of that approach vs the simpler approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants