Skip to content

Possibility of making SolidQueue::Job.clear_finished_in_batches more performant? #403

@salmonsteakk

Description

@salmonsteakk

Hey there, I was analyzing some queries that is being done on my solid queue DB and I came across this:

DELETE
FROM
  "solid_queue_jobs"
WHERE
  "solid_queue_jobs"."id" IN (
  SELECT
    "solid_queue_jobs"."id"
  FROM
    "solid_queue_jobs"
  WHERE
    "solid_queue_jobs"."finished_at" IS NOT NULL
    AND "solid_queue_jobs"."finished_at" < $1
  LIMIT
    $2)

I believe it relates to the SolidQueue::Job.clear_finished_in_batches code, where I have been using it like so in a recurring job:

SolidQueue::Job.clear_finished_in_batches(finished_before: <some_time>)

I also did an EXPLAIN QUERY on this query, and I seems like it's doing a sequential scan on all rows in solid_queue_jobs which are older than the time specified in finished_before:

QUERY PLAN
Delete on solid_queue_jobs (cost=84.52..1473.01 rows=0 width=0) (actual time=3.724..3.725 rows=0 loops=1)
-> Nested Loop (cost=84.52..1473.01 rows=500 width=38) (actual time=0.383..2.057 rows=500 loops=1)
-> HashAggregate (cost=83.96..88.96 rows=500 width=40) (actual time=0.374..0.449 rows=500 loops=1)
Group Key: "ANY_subquery".id
Batches: 1 Memory Usage: 105kB
-> Subquery Scan on "ANY_subquery" (cost=0.00..82.71 rows=500 width=40) (actual time=0.012..0.287 rows=500 loops=1)
-> Limit (cost=0.00..77.71 rows=500 width=8) (actual time=0.005..0.225 rows=500 loops=1)
-> Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))
-> Index Scan using solid_queue_jobs_pkey on solid_queue_jobs (cost=0.56..2.77 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=500)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.723 ms
Trigger for constraint fk_rails_318a5533ed: time=2.681 calls=500
Trigger for constraint fk_rails_39bbc7a631: time=2.883 calls=500
Trigger for constraint fk_rails_4cd34e2228: time=2.044 calls=500
Trigger for constraint fk_rails_81fcbd66af: time=2.731 calls=500
Trigger for constraint fk_rails_9cfe4d4944: time=2.608 calls=500
Trigger for constraint fk_rails_c4316f352d: time=2.897 calls=500
Execution Time: 19.997 ms 

More information from query analysis on CloudSQL:
image

Is it possible to simplify this query, or would there be a timeline for when clear_finished_jobs_after will happen automatically?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions