Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rolling_deploy_on_docker_failure option #180

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

imakewebthings
Copy link

Currently, when a rolling deploy errors during the process of stopping an existing container and spinning up the new one, the deploy stops and exits by raising that error. The longer the list of servers you're deploying to, the more of a pain it is to determine which servers did and did not successfully deploy, and you're left in an inconsistent state.

This PR adds an option, rolling_deploy_on_docker_failure that defaults to :exit, which preserves the existing behavior. When set to :continue, Centurion will try to deploy to every host on its list and keep a running collection of the errors it encounters along the way. When all the servers are done, it will raise a single error with a concatenation of all the error messages it encountered. This should:

  • Ensure hosts that are healthy at the time of deploy get deployed to, regardless of the health of other hosts in the list.
  • Make it easier to see at the end of the deploy failure which hosts are unhealthy and still running the old container.

@imakewebthings imakewebthings force-pushed the ctroughton/rolling_deploy_continue branch from 46cb294 to dcbf8a0 Compare February 24, 2017 23:40
container = start_new_container(server, service, defined_restart_policy)
rescue e
on_fail = fetch(:rolling_deploy_on_docker_failure, :exit)
raise e unless on_fail == :continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you want just raise which will re-raise the last error. Otherwise you get the stacktrace from here rather than the original error.

@@ -134,12 +134,19 @@ namespace :deploy do
end

task :rolling_deploy do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to validate that the contents of rolling_deploy_on_failure are one of the expected options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you rather we have a validate_rolling_deploy_options dependent task? I don't believe we perform any validation on the values of other rolling deploy options, could take a swing at that. Or just put the one validation at the top of this task if you think it's better to keep this changeset smaller.

README.md Outdated
@@ -411,6 +411,7 @@ are the same everywhere. Settings are per-project.
ports are not HTTP services, this allows you to only health check the ports
that are. The default is an empty array. If you have non-HTTP services that you
want to check, see Custom Health Checks in the previous section.
* `rolling_deploy_on_docker_failure` => What to do when Centurion encounters an error stopping or starting a container during a rolling deploy. By default, when an error is encountered the deploy will stop and immediately raise that error. If this option is set to `:continue` Centurion will continue deploying to the remaining hosts and raise at the end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this rolling_deploy_on_failure because it's more generally about failure and not necessarily about a Docker failure.

@relistan
Copy link
Collaborator

Seems like a good addition to me. Thoughts @intjonathan ?

@intjonathan
Copy link
Contributor

Does the existing deploy:repair task just need documentation and fixing? Seems like that'd be a one-shot cleanup task you could run instead of using the output of this to build a host-specific deploy.

@imakewebthings
Copy link
Author

imakewebthings commented Mar 9, 2017

@intjonathan I think repair would have to do a lot more work than check status endpoints to determine the hosts that need the deploy. It would have to check all running containers for version mismatches, and that's assuming the deployer is using distinct versioned image tags and not latest. This could be a neat feature of repair.

I think the main benefit of the option in this PR is to reduce the blast radius in terms of undeployed-to hosts in the event that an error in communicating with a given host occurs in the middle of a deploy. That plus a more robust repair would be a great combo.

container = start_new_container(server, service, defined_restart_policy)
rescue e
on_fail = fetch(:rolling_deploy_on_failure, :exit)
raise unless on_fail == :continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the validation discussion, I'd greenlight this if a log line here made it obvious what happened. Something like

if on_fail == :continue
  info "Caught error #{e.message}, but continuing deploy because rolling_deploy_on_failure is #{on_fail}"
else
  error "Raising exception, as rolling_deploy_on_failure was #{on_fail} and not :continue"
  raise
end

@CLAassistant
Copy link

CLAassistant commented Mar 3, 2020

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants