How can I re-queue a URL? #944

matecsaj · 2025-01-29T21:25:57Z

matecsaj
Jan 29, 2025

Crawlee optimizes requests by suppressing redundant URLs and giving up on a URL after reaching a configurable retry limit. This is great, but I encountered an edge case.

Situation: You're extracting data from a page and realize that it wasn’t downloaded properly.

Ideal Goal: Re-queue the URL while decrementing its retry counter.

Alternative: Add the URL back to the queue with a fresh retry counter.

How can I achieve either of these?

Answered by vdusek

Jan 30, 2025

Ideal Goal: Re-queue the URL while decrementing its retry counter.

I am afraid this is not possible.

Add the URL back to the queue with a fresh retry counter.

You can use the always_enqueue=True:

request = Request.from_url('https://crawlee.dev/', always_enqueue=True)

which will generate "unique" unique_key under the hood. And add the request to the request queue.

View full answer

B4nan · 2025-01-29T21:51:15Z

B4nan
Jan 29, 2025
Maintainer

You can enqueue it again with a different unique_key, that's what we use for deduplication (and it defaults to the URL itself).

0 replies

vdusek · 2025-01-30T08:27:45Z

vdusek
Jan 30, 2025
Maintainer

Ideal Goal: Re-queue the URL while decrementing its retry counter.

I am afraid this is not possible.

Add the URL back to the queue with a fresh retry counter.

You can use the always_enqueue=True:

request = Request.from_url('https://crawlee.dev/', always_enqueue=True)

which will generate "unique" unique_key under the hood. And add the request to the request queue.

1 reply

janbuchar Feb 11, 2025
Maintainer

Actually, I think you can do this:

request = Request.from_url('https://crawlee.dev/', always_enqueue=True)
request.max_retries = failed_request.max_retries - 1  # TODO check if this is not negative
await queue.add_request(request)

It does not enqueue the same request per se, but it's close enough if you copy relevant parameters.

matecsaj · 2025-02-11T01:31:36Z

matecsaj
Feb 11, 2025
Author

Thank you for the helpful answer.

0 replies

matecsaj · 2025-02-11T16:14:42Z

matecsaj
Feb 11, 2025
Author

I searched the documentation but couldn’t find an example of ‘await queue.add_request(request)’, so I’m unsure how to import or instantiate ‘queue’. Nothing breaks when I try the following—would this be considered acceptable?

await context.add_requests([Request.from_url(url=requested_url, always_enqueue=True)])

2 replies

janbuchar Feb 11, 2025
Maintainer

Yes, this is a reasonable way to enqueue new requests

vdusek Feb 12, 2025
Maintainer

@matecsaj Have you seen this page ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I re-queue a URL? #944

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How can I re-queue a URL? #944

matecsaj Jan 29, 2025

Replies: 4 comments · 3 replies

B4nan Jan 29, 2025 Maintainer

vdusek Jan 30, 2025 Maintainer

janbuchar Feb 11, 2025 Maintainer

matecsaj Feb 11, 2025 Author

matecsaj Feb 11, 2025 Author

janbuchar Feb 11, 2025 Maintainer

vdusek Feb 12, 2025 Maintainer

matecsaj
Jan 29, 2025

Replies: 4 comments 3 replies

B4nan
Jan 29, 2025
Maintainer

vdusek
Jan 30, 2025
Maintainer

janbuchar Feb 11, 2025
Maintainer

matecsaj
Feb 11, 2025
Author

matecsaj
Feb 11, 2025
Author

janbuchar Feb 11, 2025
Maintainer

vdusek Feb 12, 2025
Maintainer