-
Notifications
You must be signed in to change notification settings - Fork 18
Initial design/thoughts around tx availability #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 4. Additionally, both validators and provers need to verify the proofs that accompany the transactions, these proofs dominate the size of the transaction payload. | ||
| 5. From a network perspective, we are satisfied with 66% of validators successfully re-executing and attesting to a block. The network only requires 1 prover to submit a proof of an epoch. | ||
|
|
||
| When it comes to hardware and bandwidth requirements, it is desirable for people to be able to operate as validators with a single consumer grade machine and home broadband connectivity with potentially limited upload bandwidth. Provers can be expected to have significantly more resources, these are likely to be professional/institutional organisations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to specify the exact minimum supported up/down speeds.
|
|
||
| ## Proposed Solution | ||
|
|
||
| Gossiping is already very effective at transaction propagation and it is assumed that with further reductions in proof size/increases in bandwidth availability this will continue to be the case. However, due to the reasons outlined above, we have to assume that there will be instances of nodes not having the transactions they require. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How "effective"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this doc: https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/vyzovitis2020.pdf
The latency distribution shows that most nodes receive messages under the 150 ms threshold in gs-v1.1
with p99 at 165 ms, while in gs-v1.0 the threshold is 200 ms with p99 at 192 ms. This is because
The setup was 1k nodes, 100 nodes are active publishers, rest 900 hundred are passive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks like the messages in the test were 2KB. @PhilWindle How big are our proposals?
|
|
||
| ### Request/Response from proposal propagater | ||
|
|
||
| Upon receipt of a block proposal where the receiving peer P does not have all required transactions, P will start a request reponse process but will more aggresively target the peer propagating the block proposal. That is to say, that each round of peer sampling will always include the propagating peer. The block proposal will be propagated further immediately, regardless of whether P is missing transactions or not. This ensures that the proposal is not delayed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we just have the proposer send the proposal directly to the validators?
Then it is much easier to mitigate DoS when doing req/resp for transactions with the proposer- the proposer only replies to people it sent directly to.
Everyone still gossips thereafter, so you get eventual full propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we just have the proposer send the proposal directly to the validators?
What if the proposer is not connected via P2P to he validators? I don't think it would be easy to guarantee this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committees have an entire aztec epoch (30 minutes or so) to prepare for their active epoch. That would be plenty of time to troll through the network and form a decent mesh.
Though yes, we couldn't rely on it entirely: it would be best effort and some validators would still receive proposals on the slow path.
Regardless, if propagation time is truly ~200ms at p99 regardless, then this is moot.
|
|
||
| Our usage of Gossipsub works such that: | ||
|
|
||
| 1. The original publisher of a message sends to all peers. This number is configurable but likely to be no less than 50. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I'm not sure this is the case, but please correct me if I'm wrong; flooding is disabled by default.
I understand that publishing works similarly to forwarding. We'll send a message to direct and mash peers. However, I don't see the direct peers being set/used in the code, so TL;DR, we'll send only to mesh peers.
Source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have this switched on by default. It's configuration on the node so can easily be switched off. Do you see any downside to having it on? It seems like quite a good feature to kickstart propagation around the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have this switched on by default.
I completely missed this 🤦
Do you see any downside to having it on?
Yes. If we implement a solution where peer P' asks for the missing transactions from peer P (proposal sender), we could overburden it by ending up in a situation where a bunch (up to maxPeerCount, which is currently 100) are asking the proposer for the missing txs simultaneously.
Furthermore, some of these peers could be physically far away from the Proposer, so depending on the network, we could either waste bandwidth because they'll be receiving the proposal from other peers(which I don't think is a huge deal) or, worse, they could receive the proposal from the Proposer and ask it the transactions effectively loosing latency.
I feel relying on the mesh is a better choice, and I'd like to ask the validators to increase their mesh count (D) to 12 because it will give us better spread w/o losing much (at all?) latency.
Still, this is just my feeling/opinion/reasoning; I don't have any concrete numbers, experiments, or proofs. In practice, either option shouldn't make a huge difference.
| Our usage of Gossipsub works such that: | ||
|
|
||
| 1. The original publisher of a message sends to all peers. This number is configurable but likely to be no less than 50. | ||
| 2. All other peers propagate to a subset, around 5/6 peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, nitpick: I would say around 8 peers. This is the target mash size, with the minimum being 4 and the maximum being 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think you are right. I had the number 6 in my mind. Again, it's configurable with a default of 8.
|
|
||
| Targeting the propagating peer in this way ensures that the transaction 'should' propagate throughout the network albeit at a potentially slower rate than gossiping. Every level of propagation requires a round trip of request/response. | ||
|
|
||
| We don't wait for each level of propagtion to retrieve the transactions as doing so would delay the block proposal. It is possible (even likely) that a sufficient number of validators already have the transactions and can execute immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2c:
If we waited to propagate the block until the node had all the necessary transactions, we'd have much better guarantees regarding their availability.
Node N knows peer P has all the transactions because it got a block proposal from peer P, and once we request missing transactions, we are sure we are going to get them.
This is in contrast to the proposed solution, where once we request a transaction from the peer that sent us the proposal, we are highly likely to get the transaction.
I don't think this will have a substantial negative impact on the latency for the following reasons:
- We already expect that a sufficient number of nodes will have all transactions
- It takes approximately
log(D, node_count)hops to propagate a block proposal through the whole network. GivenD(the mesh size) of 8 andnode_countof 10k, we are looking at 5 hops at most, which is not a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we waited to propagate the block until the node had all the necessary transactions,
After internal discussion, we agreed it would be better not to wait on this, as it is hard to argue benefits of this.
So my comment above is not relevant anymore :)
|
|
||
| ### Request/Response directly from proposer | ||
|
|
||
| Performing request/response more aggresively from the propagator should further increase the likelihood of transactions being generally available to all peers. However, if this is found to still be insufficient then we propose a method of requesting directly from the proposer. This would only be an option for committee members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should ask the propagator for the transactions. If we implement the requirement that the propagator must have all the transactions before broadcasting the proposal, I don't think the following is necessary:
However, if this is found to still be insufficient then we propose a method of requesting directly from the proposer. This would only be an option for committee members.
Validators would be required to post their Ethereum public key as part ...
Simlifying the protocol :)
|
|
||
| 1. Run multiple full nodes in addition to the prover node. | ||
| 2. Configure every node with a very large transaction pool size, significantly reducing the transaction eviction rate. | ||
| 3. Configure every node to have a high peer count, increasing the likelihood of request/response success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could increase the mesh size to D = 12 to be more aggressive.
AFAIK the values above have no observable improvements
|
Linking here some general improvements to reqresp discussed with @Maddiaa0, for the sake of completeness: AztecProtocol/aztec-packages#14354 |
No description provided.