Skip to content

Conversation

@Mantisus
Copy link
Collaborator

Description

  • This PR improves the index for the request_queue_records table, as the previous version of the index was not efficient for retrieving requests.

EXPLAIN result for over a million requests
Old index

image

New index

image

Issues

@Mantisus Mantisus requested review from janbuchar and vdusek October 31, 2025 22:41
@Mantisus
Copy link
Collaborator Author

@ericvg97, thank you for your analysis and for finding this. Is it possible for you to check the new index in your environment and confirm that the queue is working correctly with it?

@Mantisus Mantisus self-assigned this Oct 31, 2025
@ericvg97
Copy link

ericvg97 commented Nov 1, 2025

@ericvg97, thank you for your analysis and for finding this. Is it possible for you to check the new index in your environment and confirm that the queue is working correctly with it?

It doesn't work in my environment. The query I shared was incorrect, the actual query crawlee is doing is
SELECT * FROM request_queue_records
WHERE request_queue_id = 'QXxcAytWcFELg1yFR'
AND is_handled IS false
AND (time_blocked_until IS NULL OR time_blocked_until < now())
ORDER BY sequence_number ASC LIMIT 10 FOR UPDATE SKIP LOCKED

Notice the condition "is_handled IS false" vs "is_handled = false" that I shared before. When doing the first, the planner doesn't use the index you are proposing. When doing the second, the planner does use the index. I didn't know this could happen in postgres TBH. Can you check this also happens in your environmnet

@ericvg97
Copy link

ericvg97 commented Nov 1, 2025

Changing "postgresql_where=text('is_handled = false')", to "postgresql_where=text('is_handled is false') fixes it in my environment.

@Mantisus
Copy link
Collaborator Author

Mantisus commented Nov 1, 2025

Changing "postgresql_where=text('is_handled = false')", to "postgresql_where=text('is_handled is false') fixes it in my environment.

@ericvg97 Thank you, I double-checked the query and updated the index. It works the same way in my environment.

I didn't know this could happen in postgres TBH.

Yes, I think it's because it's a partition index. Therefore, it is important that the expression fully matches the index.

Thanks again for double-checking this and helping to better identify the error. I really appreciate it.

@vdusek vdusek merged commit 6509534 into apify:master Nov 3, 2025
35 of 36 checks passed
@vdusek vdusek added this to the 126th sprint - Tooling team milestone Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetching new requests in Postgres does full table scan

4 participants