Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: add delay before retrying #183

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

MRG: add delay before retrying #183

wants to merge 2 commits into from

Conversation

bluegenes
Copy link
Collaborator

@bluegenes bluegenes commented Jan 17, 2025

When using n equal to the maximum requests/second (NCBI: 3 downloads/sec regularly, 10 with API KEY), we seem to get several failures per run that are resolved by retries -- best if retries is 3 or greater. I can only think that maybe we're retrying in the same second and exceeding the maximum requests?

Here I've added a 1 second delay before retrying. An alternative (or in addition?) would be to add a fraction of a second delay in the main loop, which would stagger the initial requests? I'm honestly not sure how much that would help, as the failures are not generally at the beginning of the run, but rather spread across it randomly (and not always the same files).

@bluegenes bluegenes changed the title WIP: add delay before retrying MRG: add delay before retrying Jan 17, 2025
@bluegenes
Copy link
Collaborator Author

@ctb ready for review. I think the delays won't be impact total time too much, and hopefully they reduce the total number of retries needed for any failures?

@bluegenes
Copy link
Collaborator Author

Still consistently getting one failure (though not 2+ as with #181) with -n 10 -r1, but none with -n10 -r 3. Time difference is negligible (time ranges 44s-51s). So I think I'd recommend this change, plus always -r 3 or higher (r=3 is default). I tried a 2 second delay, but it still didn't eliminate the one failure with -n 10 -r 1.

This branch, -n 10 -r 3 : 47 seconds

time sourmash scripts gbsketch outputs/fungi-links.head100.csv -n 10 -c 8 -p k=21,k=31,k=51,dna --failed fungi.gbsketch-fail.181.n10r1.txt -r 1 --checksum-fail fungi.gbsketch-check-fail.181.n10r1.txt -o fungi.head100.181.n10r1.zip

193.14s user 5.23s system 418% cpu 47.435 total

@bluegenes
Copy link
Collaborator Author

bluegenes commented Jan 17, 2025

This doesn't actually consistently solve the issue, though it may decrease the failures a little. Using retries =3 does consistently solve the issue, both for main and this branch. Using one fewer simultaneous download (-n 9) also seems to consistently solve the issue. So I'm actually not sure we want to merge this.

The time for this branch was a little more variable -- 49 to 56 seconds.

For these test data, upping the delay to 3 seconds eliminated the failure with -n 10 -r1 for 4/5 runs. Upping the delay to 5 seconds seemed to increase the failures!

time sourmash scripts gbsketch outputs/fungi-links.head100.csv -n 10 -c 8 -p k=21,k=31,k=51,dna --failed fungi.gbsketch-fail.181.n10r1.txt -r 1 --checksum-fail fungi.gbsketch-check-fail.181.n10r1.txt -o fungi.head100.181.n10r1.zip

user 5.09s system 379% cpu 49.534 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants