De-duplicating jobs #1063

vblcc · 2025-10-22T09:03:20Z

vblcc
Oct 22, 2025

Hi,

I'm looking for a way to reduce the number of duplicate jobs (identical kind & args). The jobs run correctly with at least once execution. This is more about reducing the load on systems downstream.

I looked at the uniqueness feature but my understanding is that I cannot exclude already running jobs from the constraint, is that correct?
This makes it difficult to use that feature to ensure changes in a transaction (e.g. from a http request) are worked "at least once after commit".

In this scenario workers read the latest state from the database, not from their args. So N jobs scheduled around the same time can practically redo the same exact work N times.

Is there a recommended approach for this that I've missed?

Answered by brandur

Oct 26, 2025

I looked at the uniqueness feature but my understanding is that I cannot exclude already running jobs from the constraint, is that correct?

Yeah that's correct. Running is a required state in the unique states list.

I should point out first that either the pro features sequences or batching would be a reasonable way to approach this problem.

Aside from those, one thing you could do is use unique jobs, but include an identifier in the args for the largest known ID that had to be processed, and make the jobs unique on args.

So if two operations were to insert jobs for the same work that'd produce a no-op, you'd insert:

{"last_operation_id":123}
{"last_operation_id":123}

These would dedup…

View full answer

brandur · 2025-10-26T04:40:44Z

brandur
Oct 26, 2025
Maintainer

I looked at the uniqueness feature but my understanding is that I cannot exclude already running jobs from the constraint, is that correct?

Yeah that's correct. Running is a required state in the unique states list.

I should point out first that either the pro features sequences or batching would be a reasonable way to approach this problem.

Aside from those, one thing you could do is use unique jobs, but include an identifier in the args for the largest known ID that had to be processed, and make the jobs unique on args.

So if two operations were to insert jobs for the same work that'd produce a no-op, you'd insert:

{"last_operation_id":123}
{"last_operation_id":123}

These would deduplicate because they're identical. If a job was already running while a second was inserted, it'd be okay because nothing new needs to be done.

But if one were to have an addition item to work, you'd get:

{"last_operation_id":123}
{"last_operation_id":124}

Two jobs would be inserted because the args are no longer unique.

You could also do something like hold a PG advisory lock while job work is ongoing. This would cause a lot of blocking for jobs queued after the one running, but it might be okay if those can just no-op successfully with minimal work once they're allowed to run.

1 reply

vblcc Oct 27, 2025
Author

it might be okay if those can just no-op successfully with minimal work once they're allowed to run.

This might be a valid path forward in this specific case. Sounds like I need a "has this been started before"-key and either put that "insert side" to let River skip through uniqueness, or do it myself on the "work side". Since this is mostly to protect downstream systems hashing some payloads may be sufficient and then skipping "work side".

I will also take a closer look at batching, it looks useful 👍

Thank you for the tips!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

De-duplicating jobs #1063

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

De-duplicating jobs #1063

Uh oh!

vblcc Oct 22, 2025

Replies: 1 comment · 1 reply

Uh oh!

brandur Oct 26, 2025 Maintainer

Uh oh!

Uh oh!

vblcc Oct 27, 2025 Author

vblcc
Oct 22, 2025

Replies: 1 comment 1 reply

brandur
Oct 26, 2025
Maintainer

vblcc Oct 27, 2025
Author