Skip to content

[KIP-1102] Enable clients to rebootstrap based on timeout or error code #4981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

emasab
Copy link
Contributor

@emasab emasab commented Feb 28, 2025

No description provided.

mfleming and others added 21 commits November 2, 2023 11:31
The Kafka protocol allows for brokers to have multiple host:port pairs
for a given node Id, e.g. see UpdateMetadata request which contains a
live_brokers list where each broker Id has a list of host:port pairs. It
follows from this that the thing that uniquely identifies a broker is
its Id, and not the host:port.

The behaviour right now is that if we have multiple brokers with the
same host:port but different Ids, the first broker in the list will be
updated to have the Id of whatever broker we're looking at as we iterate
through the brokers in the Metadata response in
rd_kafka_parse_Metadata0(), e.g.

 Step 1. Broker[0] = Metadata.brokers[0]
 Step 2. Broker[0] = Metadata.brokers[1]
 Step 3. Broker[0] = Metadata.brokers[2]

A typical situation where brokers have the same host:port pair but
differ in their Id is if the brokers are behind a load balancer.

The NODE_UPDATE mechanism responsible for this was originally added in
b09ff60 ("Handle broker name and nodeid updates (issue #343)") as a way
to forcibly update a broker hostname if an Id is reused with a new host
after the original one was decommissioned. But this isn't how the Java
Kafka client works, so let's use the Metadata response as the source of
truth instead of updating brokers if we can only match by their
host:port.
Brokers that are not in the metadata should be purged from the internal
client lists. This helps to avoid annoying "No route to host" and other
connection failure messages.

Fixes #238.
Co-authored-by: Emanuele Sabellico <[email protected]>
as it's now set only on creation
and not modified anymore
ones, except for currently used bootstrap broker.
Wait decommissioned threads after they've stopped instead of on termination.
triggered when a broker is removed
without terminating the client.
remove left references when decommissioning a broker
and avoid it's selected as leader again or that partitions
are delegated to it
broker for the all brokers down error, to send
the error in all the cases
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

emasab added 3 commits March 13, 2025 12:01
triggered when a broker is removed
without terminating the client.

Move the operation after the interceptors to avoid blocking the main thread.
Exclude terminating brokers
emasab added 2 commits March 28, 2025 16:37
…sible to await for the correct list of brokers in all tests given since decommissioning brokers are excluded from that list
test log interceptor.
Used the test log interceptor for test 0151 too
@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_kip899 branch 2 times, most recently from e9889e5 to 919a9bc Compare April 1, 2025 17:52
Base automatically changed from dev_kip899 to master April 1, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants