-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rolling_deploy_on_docker_failure option #180
base: master
Are you sure you want to change the base?
Conversation
46cb294
to
dcbf8a0
Compare
lib/tasks/deploy.rake
Outdated
container = start_new_container(server, service, defined_restart_policy) | ||
rescue e | ||
on_fail = fetch(:rolling_deploy_on_docker_failure, :exit) | ||
raise e unless on_fail == :continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you want just raise
which will re-raise the last error. Otherwise you get the stacktrace from here rather than the original error.
@@ -134,12 +134,19 @@ namespace :deploy do | |||
end | |||
|
|||
task :rolling_deploy do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to validate that the contents of rolling_deploy_on_failure
are one of the expected options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you rather we have a validate_rolling_deploy_options
dependent task? I don't believe we perform any validation on the values of other rolling deploy options, could take a swing at that. Or just put the one validation at the top of this task if you think it's better to keep this changeset smaller.
README.md
Outdated
@@ -411,6 +411,7 @@ are the same everywhere. Settings are per-project. | |||
ports are not HTTP services, this allows you to only health check the ports | |||
that are. The default is an empty array. If you have non-HTTP services that you | |||
want to check, see Custom Health Checks in the previous section. | |||
* `rolling_deploy_on_docker_failure` => What to do when Centurion encounters an error stopping or starting a container during a rolling deploy. By default, when an error is encountered the deploy will stop and immediately raise that error. If this option is set to `:continue` Centurion will continue deploying to the remaining hosts and raise at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this rolling_deploy_on_failure
because it's more generally about failure and not necessarily about a Docker failure.
Seems like a good addition to me. Thoughts @intjonathan ? |
Does the existing |
@intjonathan I think I think the main benefit of the option in this PR is to reduce the blast radius in terms of undeployed-to hosts in the event that an error in communicating with a given host occurs in the middle of a deploy. That plus a more robust |
container = start_new_container(server, service, defined_restart_policy) | ||
rescue e | ||
on_fail = fetch(:rolling_deploy_on_failure, :exit) | ||
raise unless on_fail == :continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the validation discussion, I'd greenlight this if a log line here made it obvious what happened. Something like
if on_fail == :continue
info "Caught error #{e.message}, but continuing deploy because rolling_deploy_on_failure is #{on_fail}"
else
error "Raising exception, as rolling_deploy_on_failure was #{on_fail} and not :continue"
raise
end
Currently, when a rolling deploy errors during the process of stopping an existing container and spinning up the new one, the deploy stops and exits by raising that error. The longer the list of servers you're deploying to, the more of a pain it is to determine which servers did and did not successfully deploy, and you're left in an inconsistent state.
This PR adds an option,
rolling_deploy_on_docker_failure
that defaults to:exit
, which preserves the existing behavior. When set to:continue
, Centurion will try to deploy to every host on its list and keep a running collection of the errors it encounters along the way. When all the servers are done, it will raise a single error with a concatenation of all the error messages it encountered. This should: