Description
Feature Request
Is your feature request related to a problem? Please describe.
Yes. It isn't possible to use leader-for-life leader election with controller-runtime's manager when also using liveness and readiness probes.
Using controller-runtime's manager out of the box, the following sequence of events happens when manager.Start()
is called:
- Liveness and readiness probes are started
- Leader election is started.
- Controllers are started.
When using leader-for-life from this repo, it must be called prior to manager.Start()
since controller-runtime doesn't support pluggable leader election implementations. The sequence of events in this case is:
- Leader election is started.
- Liveness and readiness probes are started
- Controllers are started.
Notice that 1) and 2) are swapped. This swap causes deadlocks when upgrading operator deployments that use leader-for-life. When the deployment is attempting to rollout a new version, the new pod starts up and first attempts to become the leader, failing indefinitely until the old pod relinquishes ownership. However the old pod will not relinquish ownership until it disappears and it won't disappear until the new pod reports that it's healthy. Unfortunately the new pod will never be able to report that it's healthy because it needs to be the leader before it starts its liveness and readiness probe servers.
Describe the solution you'd like
To work upstream to make controller-runtime support a pluggable leader election implementation such that leader-for-life can be used by the manager.