Summary
We are seeing @workflow/world-postgres produce duplicate suspension events for the same (run_id, correlation_id, type) when multiple Postgres World workers/engine instances race on the same pending step set.
The failure presents as a fatal replay error:
WorkflowRuntimeError: Unconsumed event in event log:
eventType=step_created
correlationId=step_01KRPCM1RF4KNAV0WY1YMXJ4R6
eventId=wevt_01KRPCM51K8B3ZDFHWNZ5NPPCT
This indicates a corrupted or invalid event log.
In the affected run, the same correlation_id had a normal lifecycle followed by a second step_created:
17:58:21.386 step_created
17:58:21.445 step_started
17:58:21.463 step_completed
17:58:21.501 step_created <-- duplicate on same correlation_id; replay rejects it
Versions
workflow: ^4.2.4
@workflow/world-postgres: ^4.1.1
- Runtime: Node service running multiple Kubernetes pods against the same Postgres World
- Queue/storage: Postgres World / graphile-worker
What we think is happening
workflow.workflow_events has a primary key on id, but no uniqueness constraint over the suspension-event identity: (run_id, correlation_id, type).
When two engine instances observe the same pending step set, both can compute the same stepsNeedingCreation and both insert a step_created event for the same correlation_id. Since the rows have distinct event IDs, both inserts succeed. On the next replay, the first event is consumed and the second has no subscriber, so the run fails with Unconsumed event in event log.
We have seen this when running three HPA-scaled pods against the same Postgres World and rolling the deployment. The affected workflow included a for_each step with parallel child execution, but the duplicate appears to be at the Postgres World suspension/event insertion layer rather than in user workflow code.
Reproduction shape
- Run two or more engine instances against the same Postgres World.
- Trigger a workflow that suspends/creates child steps.
- Cause both handlers/workers to observe the same pending step set close together, for example during pod rollover or HPA-scaled execution.
- Both handlers insert
step_created for the same (run_id, correlation_id).
- Replay encounters the second
step_created and fails with WorkflowRuntimeError: Unconsumed event in event log.
Proposed fix
Make suspension-event creation idempotent in @workflow/world-postgres.
One possible shape:
- Add a partial unique index for suspension events:
CREATE UNIQUE INDEX workflow_events_suspension_unique
ON workflow.workflow_events (run_id, correlation_id, type)
WHERE type IN ('step_created', 'wait_created',
'hook_created', 'run_created');
- Make the suspension-event insert tolerate that conflict, for example with
onConflictDoNothing or by raising the existing conflict path that the suspension handler already treats as benign.
await tx
.insert(workflowEvents)
.values(row)
.onConflictDoNothing({
target: [
workflowEvents.runId,
workflowEvents.correlationId,
workflowEvents.type,
],
});
The partial index is intentionally limited to suspension/creation events so other event types with different semantics are not constrained.
Current workaround
We are temporarily pinning the service that hosts the Postgres World worker to a single Kubernetes replica with a Recreate rollout strategy, and adding a duplicate-event monitor. This avoids the cross-pod race in our deployment, but it is a workaround and prevents us from safely scaling the worker horizontally until the insert is idempotent upstream.
Happy to provide more event-log detail or test a candidate patch.
Summary
We are seeing
@workflow/world-postgresproduce duplicate suspension events for the same(run_id, correlation_id, type)when multiple Postgres World workers/engine instances race on the same pending step set.The failure presents as a fatal replay error:
In the affected run, the same
correlation_idhad a normal lifecycle followed by a secondstep_created:Versions
workflow:^4.2.4@workflow/world-postgres:^4.1.1What we think is happening
workflow.workflow_eventshas a primary key onid, but no uniqueness constraint over the suspension-event identity:(run_id, correlation_id, type).When two engine instances observe the same pending step set, both can compute the same
stepsNeedingCreationand both insert astep_createdevent for the samecorrelation_id. Since the rows have distinct event IDs, both inserts succeed. On the next replay, the first event is consumed and the second has no subscriber, so the run fails withUnconsumed event in event log.We have seen this when running three HPA-scaled pods against the same Postgres World and rolling the deployment. The affected workflow included a
for_eachstep with parallel child execution, but the duplicate appears to be at the Postgres World suspension/event insertion layer rather than in user workflow code.Reproduction shape
step_createdfor the same(run_id, correlation_id).step_createdand fails withWorkflowRuntimeError: Unconsumed event in event log.Proposed fix
Make suspension-event creation idempotent in
@workflow/world-postgres.One possible shape:
onConflictDoNothingor by raising the existing conflict path that the suspension handler already treats as benign.The partial index is intentionally limited to suspension/creation events so other event types with different semantics are not constrained.
Current workaround
We are temporarily pinning the service that hosts the Postgres World worker to a single Kubernetes replica with a
Recreaterollout strategy, and adding a duplicate-event monitor. This avoids the cross-pod race in our deployment, but it is a workaround and prevents us from safely scaling the worker horizontally until the insert is idempotent upstream.Happy to provide more event-log detail or test a candidate patch.