-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: LeaderElectionReleaseOnCancel #556
operator: LeaderElectionReleaseOnCancel #556
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 8325705141Details
💛 - Coveralls |
9472a4c
to
361c81b
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
1 similar comment
Thanks for your PR,
To skip the vendors CIs use one of:
|
361c81b
to
cca761b
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
cca761b
to
f8fba94
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
I validated this change with a test commit. This comes from the related CI logs:
|
f8fba94
to
9894ab3
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
controller runtime defaults seem much shorter. i see we override with openshift inspired defaults. lease is 137s and renew 26s, so it should take no longer than 2.7min given its not 5+ min, do we really need this ? |
main.go
Outdated
setupLog.Info("starting leader election manager") | ||
if err := leaderElectionMgr.Start(leaderElectionContext); err != nil { | ||
setupLog.Error(err, "Leader Election Manager exited non-zero") | ||
os.Exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something seems off to me with the use of os.Exit() here and below
maybe just need to use waitgroup to wait for these managers to complete.
what if we os.Exit() before any defer statements are executed. it may not really be an issue IDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't know: if one manager returns an error while the others are running, it should signal a stop and wait. But it could become a little bit complicated and there is a risk of getting deadlock as far as I can see.
I can explore a little more if you think it's worth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on looking how to streamline this
setupLog.Error(err, "unable to set up ready check") | ||
os.Exit(1) | ||
} | ||
leaderElectionContext, cancelLeaderElection := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to create this WithCancel context using the stopCh context (created below) instead of a context.Background
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way leaderElectionContext
would stop as soon as the stop context is Done. Which is what I'm trying to achieve with this PR.
I tried using nested contexts here
https://go.dev/play/p/6dFfBQyXlW1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by using the stop context as parent of this one, the leaderElectionMgr.Start function will be cancelled if a singint or sigterm is received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I don't want to happen. Here's the ending sequence
- sigint arrive
- stopCh is Done
- mgr and mgrGlobals finish their work and return
- as mgr is not on a go routing, when it returns it complete the function, triggering all the
defer
s utils.Shutdown()
defer goes first. do the cleanup (the leader election is still running here, so we have the lock lease)cancelLeaderElection()
defer goes second, stopping the manager which in turn release the lock
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it would be the same as using the ReleaseOnCancel
option.
My concern here is that after calling cancelLeaderElection()
(with the defer), the program exits, so nothing ensures that leaderElectionMgr
really finishes. There is a race condition.
The internal implementation of the ReleaseOnCancel
option uses a channel to ensure that the routine has stopped before continuing with the shutdown. Perhaps you can do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that after calling cancelLeaderElection() (with the defer), the program exits, so nothing ensures that leaderElectionMgr really finishes. There is a race condition.
Added a wait group to ensure leaderElectionMgr is correctly stopped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me that this never exits. The leader election manager is stopped with a defer but before that, you are waiting for this to stop, so, it won't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
You're right: regular clusters may get a delay of 2.7min at most. With Single Node Openshift (SNO in this context) it can grow to 5 minutes.
openshift/library-go@2612981#diff-61dd95c7fd45fa18038e825205fbfab8a803f1970068157608b6b1e9e6c27248R127 I'm taking that math for granted, and one of our customers is experiencing that 5 minutes restart during upgrades, on SNO.
|
9894ab3
to
1328f48
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Hi @zeeke @adrianchiris not related to this PR directly but question do you know why we have two manges?? and now 3 |
AFAIU, mgr watches resource in the operator's namespace. mgrGlobal watches every namespace, as it runs Sriov[IB}Network reconcilers which react on NetworkAttachmentDefinitions. I guess we can unify those managers with one that watches all namespaces, but maybe we get some performance degradation. Also, I'm investigating about the shutdown logic to see if we can get rid of some bits and pieces, as I'm not very happy with the current election-proxy-manager proposal |
1328f48
to
28cff08
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
28cff08
to
1dac637
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
d3643ba
to
a045409
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
a045409
to
77eea87
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
77eea87
to
ca913c1
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Shutdown logic for the long run will be discussed in @SchSeba, @e0ne , @adrianchiris Please take another look at this. I added an end-to-end test to verify the restart of the operator is fast enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some small comments nice work :)
g.Expect(err).ToNot(HaveOccurred()) | ||
|
||
g.Expect(newLease.Spec.HolderIdentity).ToNot(Equal(oldLease.Spec.HolderIdentity)) | ||
}, 45*time.Second, 5*time.Second).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I a complicated test. if the operator starts on another master for example we need to add the time that takes for the image to get pulled into the node.
maybe we wait for the new pod to be running and then we do the leader elector check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! This will save us a lot of flakes.
Fixing
ca913c1
to
88f560d
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
please rebase this one an open the issue so we don't forget :) |
When manually restarting the operator, the leader election may take 5+ minutest to acquire the lease on startup: ``` I1205 16:06:02.101302 1 leaderelection.go:245] attempting to acquire leader lease openshift-sriov-network-operator/a56def2a.openshift.io... ... I1205 16:08:40.133558 1 leaderelection.go:255] successfully acquired lease openshift-sriov-network-operator/a56def2a.openshift.io ``` The manager's option `LeaderElectionReleaseOnCancel` would solve this problem, but it's not safe as the shutdown cleanup procedures (inhibiting webhooks and removing finalizers) would run without any leader guard. This commit moves the LeaderElection mechanism from the namespaced manager to a dedicated, no-op controller manager. This approach has been preferred to directly dealing with the LeaderElection API as: - It leverages library code that has been proved to be stable - It includes recording k8s Events about the Lease process - The election process must come after setting up the health probe. Doing it manually would involve handling the healthz endpoint as well. Signed-off-by: Andrea Panattoni <[email protected]>
Signed-off-by: Andrea Panattoni <[email protected]>
Add CoordinationV1 to `test/util/clients.go` to make assertions on `coordination.k8s.io/Lease` objects. Add `OPERATOR_LEADER_ELECTION_ENABLE` environment variable to `deploy/operator.yaml` to let user enable leader election on the operator. Signed-off-by: Andrea Panattoni <[email protected]>
88f560d
to
0ef001c
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
rebased! The discussion about shutdown is here: |
@e0ne @ykulazhenkov if you have time please take a look on this PR please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zeeke @adrianchiris not related to this PR directly but question do you know why we have two manges?? and now 3
AFAIU, mgr watches resource in the operator's namespace. mgrGlobal watches every namespace, as it runs Sriov[IB}Network reconcilers which react on NetworkAttachmentDefinitions.
I guess we can unify those managers with one that watches all namespaces, but maybe we get some performance degradation.
Also, I'm investigating about the shutdown logic to see if we can get rid of some bits and pieces, as I'm not very happy with the current election-proxy-manager proposal
TBH, I think we should try to use the single manager and rely on cancellation logic from the controller-runtime package.
Recent versions of the controller-runtime package provides a way to configure cache/watches in a very granular way. It is possible to watch some resources in all namespace and other only in specific namespace.
Here is example:
https://github.com/Mellanox/nvidia-k8s-ipam/blob/a34b4a2547815b7c016df7d563a2e04000f0add3/cmd/ipam-node/app/app.go#L146
WDYT?
Hi @ykulazhenkov , Thank you for your feedback. I agree with you we should simplify as much as we can. With your suggestion, we can merge the namespace and the non-namespaced manager to a single one, but it's not the problem this PR is trying to solve. The problem here is that the operator has a shutdown logic with two constraints: The cleaner solution I found (apart from getting rid of the shutdown logic, see discussions/608) is to add a new manager that controls only the leader election. I admit it's probably not the best solution, but it solves an issue found in a production environment (OCPBUGS-23795) |
thx, for the clarification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks!
When manually restarting the operator, the leader election
takes 5+ minutes to acquire the lease on startup:
This PR makes sure the lease is released when the operator shutdown