-
Notifications
You must be signed in to change notification settings - Fork 20.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth: add connection manager to drop connections when needed #31476
base: master
Are you sure you want to change the base?
Conversation
p2p/peer_error.go
Outdated
@@ -69,6 +69,7 @@ const ( | |||
DiscUnexpectedIdentity | |||
DiscSelf | |||
DiscReadTimeout | |||
DiscDropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add here since these values are defined by the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use 0x04 (Too many peers), at least for now when we random disconnect.
Later after integrating metrics, we might change to 0x03 (Useless peer), when dropping peers that are actually useless.
If you want to track the sync status of the node, this code will have to live outside of package p2p. It'll likely need to be created from |
Dropping peers randomly with a slow pace to create some artificial churn. Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Better positioned in package eth to access relevant data about connection quality.
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
eth/connmanager.go
Outdated
if !syncing { | ||
// If a drop was already scheduled, Schedule does nothing. | ||
if cm.maxDialPeers-numDialed <= peerDropThreshold { | ||
cm.peerDropDialedTimer.Schedule(cm.clock.Now().Add(peerDropInterval)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer random schedule. Otherwise it will be possible to predict exactly the next slot opening time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I wanted to introduce it also to decouple the two timers. Will add soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about even having two timers. It's not really necessary. We could just have one random timer and make a decision based on the current state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could use a single one, but I opted for the two separate timers because the two peer pools are well separated.
- Their limit is defined separately, which basically means that there are dedicated slots for dialed and inbould connections.
- Their ingress processes are independent, one being discovery based outgoing dials, another being incoming connections based on our advertised ID.
- Their fill speed is largely different, one being throttled (2x free dial slots outstanding dial attempts), while the other is not throttled (just depends on how much our ENR is getting found and how many nodes are actively hunting for dial candidates).
- The role of having free slots in one or the other serves a different purpose in the system: free dial slots just meaning we can find new peers hoping to get better connected, while free inbound slots giving other nodes the possibility to join the system.
We do connect the two pools in our download/fetch/tx propagation scheduling, where AFAIK we don't differentiate at the moment between dialed and inbound peers, but I would avoid using a single timer when there is value in keeping the two pools separate.
While I'm not planning to set different intervals for the two right now, this might be the case later on. Note that we could even autotune these intervals based on connection acquisition rate stats, if we would like to ... although I think that would be overcomplicating things for no reason right now.
The simplest implementation would be a single always running randomized timer checking all conditions including sync status and exclusions on tick. But I think we will be going towards more complex drop conditions, for which the double timer with start and stop seems a better base to me.
Signed-off-by: Csaba Kiraly <[email protected]>
WIP: this is still wip for a few reasons:
As of now, Geth disconnects peers only on protocol error or timeout, meaning once connection slots are filled, the peerset is largely fixed.
As mentioned in #31321, Geth should occasionally disconnect peers to ensure some churn. What/when to disconnect could depend on:
This PR adds a very slow churn using a random drop.