fix: set FailureAction=rollback for swarm services default UpdateConfig by jaimehgb · Pull Request #3810 · Dokploy/dokploy

jaimehgb · 2026-02-26T23:51:51Z

Problem

Several issues have been reported about orphan containers piling up in Swarm deployments -- services stuck at Replicas: N/1 with multiple healthy containers that never go away.

See #1669, #2223, #2911, #2150

What's actually happening

Docker Swarm defaults FailureAction to "pause". If a task fails or gets killed mid-update (app crash, rapid deploys stepping on each other), Swarm pauses the update and stops reconciling. The extra containers sit there forever, healthy or not.

We confirmed this on a production cluster:

$ docker service inspect <service> --format '{{json .UpdateStatus}}'
{
    "State": "paused",
    "StartedAt": "2026-02-27T23:44:07.480239109Z",
    "Message": "update paused due to failure or early termination of task l38gsrsqg2rl..."
}

5 healthy containers sitting there for 30+ hours. Replicas: 5/1.

Fix

Sets better defaults for UpdateConfig and RollbackConfig in Swarm services.

This isn't really a Dokploy bug -- it's a Docker Swarm default that happens to be a bad fit for a deployment platform. Most users won't know FailureAction exists, let alone that it defaults to "pause". Setting it to "rollback" makes Swarm revert to the previous working spec when a deploy fails, instead of freezing mid-update.

Only affects the default config. Users who have set their own updateConfigSwarm or rollbackConfigSwarm in the UI are not touched.

+// default rollback config to match update config
+RollbackConfig: {
+    Parallelism: 1,
+    Order: "start-first",
+},
+
 // default config if no updateConfigSwarm provided
 UpdateConfig: {
     Parallelism: 1,
     Order: "start-first",
+    FailureAction: "rollback",
 },

The RollbackConfig default sets Order: "start-first" to match the update order. Without it, Docker defaults rollbacks to "stop-first", which briefly takes the service down during rollback.

Why `rollback`?

We tested all three options:

Value	On failure	Orphans?	Availability
`pause` (current)	Freezes everything	Yes, permanent	Old tasks survive by accident
`continue`	Keeps retrying	No	Service goes down (broken deploy completes, kills healthy tasks)
`rollback` (this PR)	Reverts to previous spec	No	Previous version stays up

continue looked promising but it actually pushes the broken deploy through to completion, killing healthy tasks in the process. rollback is the only option that both prevents orphans and keeps the service available.

Reproduction

Script that reproduces the bug and verifies the fix on any Swarm node (no Dokploy needed):

Reproduction script and docs (Gist)

docker swarm init  # if not already
curl -sL https://gist.githubusercontent.com/jaimehgb/6ae57f6a079bf389ed57fe18c4fd3877/raw/reproduce-orphan-bug.sh | bash

Testing

Reproduced locally (3 healthy orphans, UpdateStatus=paused)
Reproduced on production Dokploy cluster (5 healthy orphans, 30+ hours)
Verified FailureAction=rollback prevents orphans (1 task, rollback_completed)
Verified FailureAction=continue prevents orphans but kills the service (0/1)
Built custom Dokploy image, deployed to local Swarm, confirmed service gets correct UpdateConfig and RollbackConfig
Confirmed custom updateConfigSwarm/rollbackConfigSwarm are not overwritten

Docker Swarm's default FailureAction is "pause". When a task fails or is terminated early during a rolling update, Swarm pauses the update and stops ALL reconciliation — orphan containers persist indefinitely, even when healthy. This is the root cause of orphan container issues reported in production (services showing Replicas: N/1 with multiple healthy containers that never get cleaned up). Setting FailureAction to "rollback" makes Swarm automatically revert to the previous working service spec on failure, preventing orphans while preserving service availability. Also adds a default RollbackConfig with Order: "start-first" to match the update config (Docker defaults rollback to "stop-first" otherwise). Only affects the default config — users who have configured their own updateConfigSwarm/rollbackConfigSwarm are not affected. Relates to Dokploy#1669, Dokploy#2223, Dokploy#2911, Dokploy#2150

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

dosubot · 2026-02-28T23:27:59Z

Related Documentation

Checked 7 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

jaimehgb force-pushed the fix/swarm-convergence branch from 0b7ef69 to 0357eff Compare February 28, 2026 22:18

jaimehgb changed the title ~~fix: wait for swarm task convergence after service update~~ fix: set FailureAction=rollback for swarm services default UpdateConfig Feb 28, 2026

jaimehgb force-pushed the fix/swarm-convergence branch from 0357eff to fadc7fe Compare February 28, 2026 23:20

jaimehgb marked this pull request as ready for review February 28, 2026 23:26

jaimehgb requested a review from Siumauricio as a code owner February 28, 2026 23:26

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 28, 2026

greptile-apps bot reviewed Feb 28, 2026

View reviewed changes

dosubot bot added the bug Something isn't working label Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: set FailureAction=rollback for swarm services default UpdateConfig#3810

fix: set FailureAction=rollback for swarm services default UpdateConfig#3810
jaimehgb wants to merge 1 commit intoDokploy:canaryfrom
jaimehgb:fix/swarm-convergence

jaimehgb commented Feb 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

dosubot bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jaimehgb commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What's actually happening

Fix

Why rollback?

Reproduction

Testing

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

dosubot bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaimehgb commented Feb 26, 2026 •

edited

Loading

Why `rollback`?