-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a previously reset node to rejoin its original cluster #13643
Allow a previously reset node to rejoin its original cluster #13643
Conversation
It's a reasonable idea but I think that |
Yeah, I agree the current position is a but odd. I'll move it, with comments! |
…onsider node a member. Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
96332e1
to
dd49cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rabbit_db_cluster
now fails gmake dialyze
:
rabbit_db_cluster.erl:239:9: The pattern
Error = {error, _} can never match since previous clauses completely covered the type
{'error', {'inconsistent_cluster', string()}} |
{'ok', 'already_member' | [atom()]}
Works on my machine! Test process:
Note that cluster is running.
Note that
Note that |
Looking into failing tests. |
Just trying to figure out if I messed up some test results that expect this to fail... |
in this case khepri would first remove itself then join. This ensures it rejoins as a new member. Anyhow it makes sense that mnesia would also perform a similar set of steps. |
The Selenium suite failure is due to an npm dependency installation failure, not anything in this PR. |
Allow a previously reset node to rejoin its original cluster (backport #13643)
If a cluster member for whatever reason gets its local state wiped, it has a hard time re-joining the cluster, as the old cluster members will think the node is already a member and reject the request (if mnesia is used).
Proposed Changes
Mnesia: On failure due to 'already a member', ask to leave the cluster first and retry.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
I would like early feedback here, as to if this naive approach is even OK, if there should be a limited set of retries, and if the logic should live in
rabbit_mnesia
or inrabbit_db_cluster
?It feels a bit wonky that a function called
can_join_cluster
would also try to leave a cluster and try again, so perhaps it would be better ifrabbit_db_cluster:join
instead initiates the leave and retry request?