Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/christian/disconnect slow peers #54

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ckreibich
Copy link
Owner

No description provided.

@ckreibich ckreibich force-pushed the topic/christian/disconnect-slow-peers branch from 30af91e to 9d11c51 Compare December 3, 2024 07:05
@ckreibich ckreibich force-pushed the topic/christian/disconnect-slow-peers branch 2 times, most recently from b4f29ac to b622c6b Compare December 3, 2024 09:17
ckreibich and others added 8 commits December 3, 2024 01:30
This adds re-peering at the Broker level for peers that Broker decided to
unpeer. We keep this at the Broker level since this behavior is specific to
it (as opposed to other cluster backends).

Includes baseline updates for btests that pick up on the new script's @load.
This translates Broker IDs to cluster-level node names, if available.
This module is loaded by the telemetry framework, which we're now loading via
the cluster framework, i.e. also in bare mode. The resulting additional
thread (for creating reporter.log) trips up a number of btest baselines.

version.zeek doesn't use any of the string helper functions.
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure overflows. The
new zeek_broker_backpressure_overflows counter, labeled by the neighboring peer
that the reporting node has determined to be unresponsive, counts the number of
unpeerings.

Here the node "worker" has observed node "proxy" falling behind once:

# HELP zeek_broker_backpressure_overflows_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_overflows_total counter
zeek_broker_backpressure_overflows_total{endpoint="worker",peer="proxy"} 1

Includes btest baseline updates.
Yes, really. :-) We've hit the need for this on occasion in very specific
settings and always worked around it via ugly nested loops or similars.
This has ample warning that folks normally won't want to use this.
@ckreibich ckreibich force-pushed the topic/christian/disconnect-slow-peers branch from b622c6b to 65129b9 Compare December 3, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants