-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrapy scheduler emits timeout errors #404
Labels
t-tooling
Issues with this label are in the ownership of the tooling team.
Comments
I have some new findings!
More ExamplesSpecimen 3
Specimen 4
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When running my scrapers with the latest Apify SDK (meaning fd7650a), I get timeouts on the following lines of code. These timeouts don't crash the scraper immediately, but they corrupt the scraper run: The results are incomplete, and I've also seen strange request queue behavior after these errors, which at least once resembled endless looping (I aborted the scraper after repeatedly seeing the same runtime stats).
I use the same technique with the same async thread for caching requests (see #403), but I can't see any timeout errors related to the key-value storage I use. AFAIK all timeout errors I've seen were related to RQ, despite the KV being heavily used as well during the same scraper run.(It happens with KV as well, see my comment below.)The issue happens only occasionally, which makes it hard to track down. My scraper runs for 20 minutes just okay, and then spits out 5 of these errors. I've got these timeouts with two rather different spiders, so this isn't specific to a code of a single spider class.
Debugging Plan & Ideas
--apify
) 👉 no successExamples
Specimen 1
Specimen 2
The text was updated successfully, but these errors were encountered: