-
|
Hi, I'm looking for a way to reduce the number of duplicate jobs (identical kind & args). The jobs run correctly with at least once execution. This is more about reducing the load on systems downstream. I looked at the uniqueness feature but my understanding is that I cannot exclude already running jobs from the constraint, is that correct? In this scenario workers read the latest state from the database, not from their args. So N jobs scheduled around the same time can practically redo the same exact work N times. Is there a recommended approach for this that I've missed? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yeah that's correct. Running is a required state in the unique states list. I should point out first that either the pro features sequences or batching would be a reasonable way to approach this problem. Aside from those, one thing you could do is use unique jobs, but include an identifier in the args for the largest known ID that had to be processed, and make the jobs unique on args. So if two operations were to insert jobs for the same work that'd produce a no-op, you'd insert: These would deduplicate because they're identical. If a job was already running while a second was inserted, it'd be okay because nothing new needs to be done. But if one were to have an addition item to work, you'd get: Two jobs would be inserted because the args are no longer unique. You could also do something like hold a PG advisory lock while job work is ongoing. This would cause a lot of blocking for jobs queued after the one running, but it might be okay if those can just no-op successfully with minimal work once they're allowed to run. |
Beta Was this translation helpful? Give feedback.
Yeah that's correct. Running is a required state in the unique states list.
I should point out first that either the pro features sequences or batching would be a reasonable way to approach this problem.
Aside from those, one thing you could do is use unique jobs, but include an identifier in the args for the largest known ID that had to be processed, and make the jobs unique on args.
So if two operations were to insert jobs for the same work that'd produce a no-op, you'd insert:
These would dedup…