Skip to content

rotation of outbound peers#3037

Open
pompon0 wants to merge 55 commits intomainfrom
gprusak-stablehash
Open

rotation of outbound peers#3037
pompon0 wants to merge 55 commits intomainfrom
gprusak-stablehash

Conversation

@pompon0
Copy link
Contributor

@pompon0 pompon0 commented Mar 6, 2026

To make the p2p network uniformly random, nodes need to be able to change their peers periodically (otherwise network would be centralized at bootstrap peers). This PR implements an algorithm based on stable hashing, which makes nodes assign random priority to every node ID, and connect to peers with higher priority.

This pr also separates inbound and outbound connection pools to simplify logic. In particular it is possible that there will be 2 concurrent connections between 2 peers (inbound + outbound). Avoiding duplicate connections is best effort - nodes try not to dial peers that they are connected to, but in case 2 peers dial each other at the same time they let those 2 connections be.

Basic requirements:

  • peermanager maintains a pex table: addresses of peers of our peers. This bounds the network view size (even though it is an unauthenticated network) that our node needs to maintain and provides enough exposure to select new peers. Peers are periodically reporting their connections, therefore peermanager keeps only fresh addresses
  • on startup a spike of dials is expected - node will try to connect to peers that it was connected to before restart
  • during stable operation node will dial peers at a low rate like 1/s or 0.1/s, and the node should be selected from a fresh set of addresses - i.e. we cannot snapshot a list of currently available addresses and try to dial them all (it will take hours)
  • despite low dial rate, node should attempt to round robin over the ever changing peer candidates set - i.e. it should not get stuck dialing the same bad address over and over
  • in the stable-hash-based approach, each peer ID obtains a priority for dialing - it should be taken into account
  • implementation should support replacing low priority peers with higher priority peers (to support convergence to a random graph). The churn of the connections should be low though, so that connection efficiency is not affected. My initial guesstimate would be that we should allow replacing a connection every ~1min.
  • We need to support a pex table with ~100 * 100 = 10k addresses total (100 connections per peer is a safe estimate with the current implementation). Whether we can affort just rank all the addresses on every dial attempt is a borderline IMO.
  • the addresses inserted to pex table should be made available for dialing ASAP, without any active polling, if possible.

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 16, 2026, 9:13 PM

Comment on lines +28 to +37
for id := range s.last {
if _,ok := GetAny(conns,id); !ok {
delete(s.last, id)
update = PeerUpdate{
NodeID: id,
Status: PeerStatusDown,
}
return true
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +66 to +70
for _, e := range t.bySender {
if !yield(e) {
return
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +142 to +146
for old, priority := range p.out {
if id, ok := low.Get(); !ok || priority < id.priority {
low = utils.Some(pNodeID{priority, old})
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
regular: newPoolManager(&poolConfig{
MaxIn: options.maxInbound(),
MaxOut: options.maxOutbound(),
FixedAddrs: bootstrapAddrs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible someone set all bootstrap addresses in persistent so FixedAddrs is now empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, peers are either persistent or not. I can add logic to move bootstrap addresses to persistent addresses in case of overlap to make it slightly more foolproof, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can use the PEX from persistent peers in regular pool, this is probably fine. Moving bootstrap addresses to persistent pool is probably bad, because if everyone tries to connect to bootstrap nodes and never disconnect, then bootstrap will have all its incoming connections occupied and start to reject new-comers, which is worse.
In fact, we should probably intentionally disconnect old connections on bootstrap nodes. But we can do that in the future.

Copy link
Contributor Author

@pompon0 pompon0 Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I didn't mean to make all bootstrap nodes persistent - I just meant that IF the same peer has an address in both in bootstrap list AND persistent list THEN both addresses could be used to establish the persistent connection to this peer (while currently the address from the bootstrap list is just discarded).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, that's fine then

@pompon0 pompon0 requested a review from wen-coding March 16, 2026 15:12
@pompon0 pompon0 marked this pull request as ready for review March 16, 2026 15:14
@pompon0 pompon0 requested a review from sei-will March 16, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants