eth: add connection manager to drop connections when needed #31476

cskiraly · 2025-03-24T08:26:50Z

WIP: this is still wip for a few reasons:

conditions of dropping a peer should be improved, including also inbound peer drop.
I've observed a deadlock in one test, still to be checked if the new module was the reason or not.

As of now, Geth disconnects peers only on protocol error or timeout, meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in #31321, Geth should occasionally disconnect peers to ensure some churn. What/when to disconnect could depend on:

the state of geth (e.g. sync or not)
current number of peers
peer level metrics

This PR adds a very slow churn using a random drop.

p2p/connmanager.go

fjl · 2025-03-24T09:11:54Z

p2p/peer_error.go

@@ -69,6 +69,7 @@ const (
 	DiscUnexpectedIdentity
 	DiscSelf
 	DiscReadTimeout
+	DiscDropped


We can't add here since these values are defined by the protocol.

I can use 0x04 (Too many peers), at least for now when we random disconnect.
Later after integrating metrics, we might change to 0x03 (Useless peer), when dropping peers that are actually useless.

fjl · 2025-03-24T10:05:04Z

If you want to track the sync status of the node, this code will have to live outside of package p2p. It'll likely need to be created from eth.Ethereum instead.

Dropping peers randomly with a slow pace to create some artificial churn. Signed-off-by: Csaba Kiraly <[email protected]>

Signed-off-by: Csaba Kiraly <[email protected]>

Better positioned in package eth to access relevant data about connection quality.

Signed-off-by: Csaba Kiraly <[email protected]>

fjl · 2025-03-28T15:21:09Z

eth/connmanager.go

+	if !syncing {
+		// If a drop was already scheduled, Schedule does nothing.
+		if cm.maxDialPeers-numDialed <= peerDropThreshold {
+			cm.peerDropDialedTimer.Schedule(cm.clock.Now().Add(peerDropInterval))


I would prefer random schedule. Otherwise it will be possible to predict exactly the next slot opening time.

Agree. I wanted to introduce it also to decouple the two timers. Will add soon

I'm not so sure about even having two timers. It's not really necessary. We could just have one random timer and make a decision based on the current state.

Yes, we could use a single one, but I opted for the two separate timers because the two peer pools are well separated.

Their limit is defined separately, which basically means that there are dedicated slots for dialed and inbould connections.

Their ingress processes are independent, one being discovery based outgoing dials, another being incoming connections based on our advertised ID.

Their fill speed is largely different, one being throttled (2x free dial slots outstanding dial attempts), while the other is not throttled (just depends on how much our ENR is getting found and how many nodes are actively hunting for dial candidates).

The role of having free slots in one or the other serves a different purpose in the system: free dial slots just meaning we can find new peers hoping to get better connected, while free inbound slots giving other nodes the possibility to join the system.

We do connect the two pools in our download/fetch/tx propagation scheduling, where AFAIK we don't differentiate at the moment between dialed and inbound peers, but I would avoid using a single timer when there is value in keeping the two pools separate.

While I'm not planning to set different intervals for the two right now, this might be the case later on. Note that we could even autotune these intervals based on connection acquisition rate stats, if we would like to ... although I think that would be overcomplicating things for no reason right now.

The simplest implementation would be a single always running randomized timer checking all conditions including sync status and exclusions on tick. But I think we will be going towards more complex drop conditions, for which the double timer with start and stop seems a better base to me.

Signed-off-by: Csaba Kiraly <[email protected]>

cskiraly requested review from fjl and zsfelfoldi as code owners March 24, 2025 08:26

fjl reviewed Mar 24, 2025

View reviewed changes

p2p/connmanager.go Outdated Show resolved Hide resolved

fjl reviewed Mar 24, 2025

View reviewed changes

fjl changed the title ~~p2p/connmanager: WIP - Add connection manager to drop connections when needed~~ p2p: add connection manager to drop connections when needed Mar 24, 2025

cskiraly requested review from holiman and rjl493456442 as code owners March 26, 2025 13:08

cskiraly changed the title ~~p2p: add connection manager to drop connections when needed~~ eth/connmanager: add connection manager to drop connections when needed Mar 26, 2025

fjl changed the title ~~eth/connmanager: add connection manager to drop connections when needed~~ eth: add connection manager to drop connections when needed Mar 26, 2025

cskiraly added 15 commits March 28, 2025 13:12

p2p/connmanager: add connection manager to create some churn

136d32d

Dropping peers randomly with a slow pace to create some artificial churn. Signed-off-by: Csaba Kiraly <[email protected]>

p2p/connmanager: only drop from dialed peers

7a49bf0

Signed-off-by: Csaba Kiraly <[email protected]>

p2p/connmanager: avoid dropping trusted peers

91c1c30

Signed-off-by: Csaba Kiraly <[email protected]>

p2p/connmanager: avoid dropping peers too early

75e4c26

Signed-off-by: Csaba Kiraly <[email protected]>

p2p/connmanager: set meaningful defaults

4635dac

Signed-off-by: Csaba Kiraly <[email protected]>

p2p/peer: expose conn flags through getter functions

23cda63

Signed-off-by: Csaba Kiraly <[email protected]>

p2p/server: expose MaxInboundConns and MaxDialedConns

ea8d05a

Signed-off-by: Csaba Kiraly <[email protected]>

eth/connmanager: move Connection Manager to package eth

e0b0189

Better positioned in package eth to access relevant data about connection quality.

eth/connmanager: use slices.DeleteFunc to filter in place

c41569d

Signed-off-by: Csaba Kiraly <[email protected]>

eth/connman: fixup log levels

61b26a9

eth/connmanager: get sync status

628c5e5

Signed-off-by: Csaba Kiraly <[email protected]>

eth/connmanager: no need to store srv

cb5d672

Signed-off-by: Csaba Kiraly <[email protected]>

eth/connmanager: monitor sync status

301b396

Signed-off-by: Csaba Kiraly <[email protected]>

eth/connmanager: handle inbound and dialed peers separately

77d634c

Signed-off-by: Csaba Kiraly <[email protected]>

fixing newlines

d46ef40

Signed-off-by: Csaba Kiraly <[email protected]>

cskiraly force-pushed the connmanager branch from e78999b to d46ef40 Compare March 28, 2025 12:15

fjl reviewed Mar 28, 2025

View reviewed changes

eth/connmanager: randomize peer drop timers

8bb7f1e

Signed-off-by: Csaba Kiraly <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eth: add connection manager to drop connections when needed #31476

eth: add connection manager to drop connections when needed #31476

cskiraly commented Mar 24, 2025

fjl Mar 24, 2025 •

edited

Loading

cskiraly Mar 25, 2025

fjl commented Mar 24, 2025

fjl Mar 28, 2025

cskiraly Mar 28, 2025

fjl Mar 28, 2025

cskiraly Mar 29, 2025 •

edited

Loading

eth: add connection manager to drop connections when needed #31476

Are you sure you want to change the base?

eth: add connection manager to drop connections when needed #31476

Conversation

cskiraly commented Mar 24, 2025

fjl Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

cskiraly Mar 25, 2025

Choose a reason for hiding this comment

fjl commented Mar 24, 2025

fjl Mar 28, 2025

Choose a reason for hiding this comment

cskiraly Mar 28, 2025

Choose a reason for hiding this comment

fjl Mar 28, 2025

Choose a reason for hiding this comment

cskiraly Mar 29, 2025 • edited Loading

Choose a reason for hiding this comment

fjl Mar 24, 2025 •

edited

Loading

cskiraly Mar 29, 2025 •

edited

Loading