Initial design/thoughts around tx availability #63

PhilWindle · 2025-05-20T12:14:58Z

No description provided.

just-mitch · 2025-05-21T12:56:26Z

in-progress/13000-tx-availability/design.md

+4. Additionally, both validators and provers need to verify the proofs that accompany the transactions, these proofs dominate the size of the transaction payload.
+5. From a network perspective, we are satisfied with 66% of validators successfully re-executing and attesting to a block. The network only requires 1 prover to submit a proof of an epoch.
+
+When it comes to hardware and bandwidth requirements, it is desirable for people to be able to operate as validators with a single consumer grade machine and home broadband connectivity with potentially limited upload bandwidth. Provers can be expected to have significantly more resources, these are likely to be professional/institutional organisations.


It would be good to specify the exact minimum supported up/down speeds.

just-mitch · 2025-05-21T12:57:08Z

in-progress/13000-tx-availability/design.md

+
+## Proposed Solution
+
+Gossiping is already very effective at transaction propagation and it is assumed that with further reductions in proof size/increases in bandwidth availability this will continue to be the case. However, due to the reasons outlined above, we have to assume that there will be instances of nodes not having the transactions they require.


How "effective"?

From this doc: https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/vyzovitis2020.pdf

The latency distribution shows that most nodes receive messages under the 150 ms threshold in gs-v1.1
with p99 at 165 ms, while in gs-v1.0 the threshold is 200 ms with p99 at 192 ms. This is because

The setup was 1k nodes, 100 nodes are active publishers, rest 900 hundred are passive

Thanks! Looks like the messages in the test were 2KB. @PhilWindle How big are our proposals?

just-mitch · 2025-05-21T13:03:31Z

in-progress/13000-tx-availability/design.md

+
+### Request/Response from proposal propagater
+
+Upon receipt of a block proposal where the receiving peer P does not have all required transactions, P will start a request reponse process but will more aggresively target the peer propagating the block proposal. That is to say, that each round of peer sampling will always include the propagating peer. The block proposal will be propagated further immediately, regardless of whether P is missing transactions or not. This ensures that the proposal is not delayed.


Why wouldn't we just have the proposer send the proposal directly to the validators?

Then it is much easier to mitigate DoS when doing req/resp for transactions with the proposer- the proposer only replies to people it sent directly to.

Everyone still gossips thereafter, so you get eventual full propagation.

Why wouldn't we just have the proposer send the proposal directly to the validators?

What if the proposer is not connected via P2P to he validators? I don't think it would be easy to guarantee this.

Committees have an entire aztec epoch (30 minutes or so) to prepare for their active epoch. That would be plenty of time to troll through the network and form a decent mesh.

Though yes, we couldn't rely on it entirely: it would be best effort and some validators would still receive proposals on the slow path.

Regardless, if propagation time is truly ~200ms at p99 regardless, then this is moot.

mralj · 2025-05-21T09:46:24Z

in-progress/13000-tx-availability/design.md

+
+Our usage of Gossipsub works such that:
+
+1. The original publisher of a message sends to all peers. This number is configurable but likely to be no less than 50.


Nitpick: I'm not sure this is the case, but please correct me if I'm wrong; flooding is disabled by default.
I understand that publishing works similarly to forwarding. We'll send a message to direct and mash peers. However, I don't see the direct peers being set/used in the code, so TL;DR, we'll send only to mesh peers.
Source

We currently have this switched on by default. It's configuration on the node so can easily be switched off. Do you see any downside to having it on? It seems like quite a good feature to kickstart propagation around the network.

We currently have this switched on by default.

I completely missed this 🤦

Do you see any downside to having it on?

Yes. If we implement a solution where peer P' asks for the missing transactions from peer P (proposal sender), we could overburden it by ending up in a situation where a bunch (up to maxPeerCount, which is currently 100) are asking the proposer for the missing txs simultaneously.
Furthermore, some of these peers could be physically far away from the Proposer, so depending on the network, we could either waste bandwidth because they'll be receiving the proposal from other peers(which I don't think is a huge deal) or, worse, they could receive the proposal from the Proposer and ask it the transactions effectively loosing latency.

I feel relying on the mesh is a better choice, and I'd like to ask the validators to increase their mesh count (D) to 12 because it will give us better spread w/o losing much (at all?) latency.

Still, this is just my feeling/opinion/reasoning; I don't have any concrete numbers, experiments, or proofs. In practice, either option shouldn't make a huge difference.

mralj · 2025-05-21T09:47:54Z

in-progress/13000-tx-availability/design.md

+Our usage of Gossipsub works such that:
+
+1. The original publisher of a message sends to all peers. This number is configurable but likely to be no less than 50.
+2. All other peers propagate to a subset, around 5/6 peers.


Again, nitpick: I would say around 8 peers. This is the target mash size, with the minimum being 4 and the maximum being 12.

Yes I think you are right. I had the number 6 in my mind. Again, it's configurable with a default of 8.

mralj · 2025-05-21T12:55:14Z

in-progress/13000-tx-availability/design.md

+
+Targeting the propagating peer in this way ensures that the transaction 'should' propagate throughout the network albeit at a potentially slower rate than gossiping. Every level of propagation requires a round trip of request/response. 
+
+We don't wait for each level of propagtion to retrieve the transactions as doing so would delay the block proposal. It is possible (even likely) that a sufficient number of validators already have the transactions and can execute immediately.


My 2c:
If we waited to propagate the block until the node had all the necessary transactions, we'd have much better guarantees regarding their availability.
Node N knows peer P has all the transactions because it got a block proposal from peer P, and once we request missing transactions, we are sure we are going to get them.
This is in contrast to the proposed solution, where once we request a transaction from the peer that sent us the proposal, we are highly likely to get the transaction.

I don't think this will have a substantial negative impact on the latency for the following reasons:

We already expect that a sufficient number of nodes will have all transactions

It takes approximately log(D, node_count) hops to propagate a block proposal through the whole network. Given D (the mesh size) of 8 and node_count of 10k, we are looking at 5 hops at most, which is not a lot.

If we waited to propagate the block until the node had all the necessary transactions,

After internal discussion, we agreed it would be better not to wait on this, as it is hard to argue benefits of this.
So my comment above is not relevant anymore :)

mralj · 2025-05-21T12:57:33Z

in-progress/13000-tx-availability/design.md

+
+### Request/Response directly from proposer
+
+Performing request/response more aggresively from the propagator should further increase the likelihood of transactions being generally available to all peers. However, if this is found to still be insufficient then we propose a method of requesting directly from the proposer. This would only be an option for committee members.


I agree we should ask the propagator for the transactions. If we implement the requirement that the propagator must have all the transactions before broadcasting the proposal, I don't think the following is necessary:

However, if this is found to still be insufficient then we propose a method of requesting directly from the proposer. This would only be an option for committee members.

Validators would be required to post their Ethereum public key as part ...

Simlifying the protocol :)

mralj · 2025-05-21T13:06:03Z

in-progress/13000-tx-availability/design.md

+
+1. Run multiple full nodes in addition to the prover node.
+2. Configure every node with a very large transaction pool size, significantly reducing the transaction eviction rate.
+3. Configure every node to have a high peer count, increasing the likelihood of request/response success.


We could increase the mesh size to D = 12 to be more aggressive.
AFAIK the values above have no observable improvements

spalladino · 2025-06-02T15:55:00Z

Linking here some general improvements to reqresp discussed with @Maddiaa0, for the sake of completeness: AztecProtocol/aztec-packages#14354

Initial design/thoughts

3bb6761

just-mitch reviewed May 21, 2025

View reviewed changes

mralj reviewed May 21, 2025

View reviewed changes

Update

707da95


		## Proposed Solution

		Gossiping is already very effective at transaction propagation and it is assumed that with further reductions in proof size/increases in bandwidth availability this will continue to be the case. However, due to the reasons outlined above, we have to assume that there will be instances of nodes not having the transactions they require.


		### Request/Response from proposal propagater

		Upon receipt of a block proposal where the receiving peer P does not have all required transactions, P will start a request reponse process but will more aggresively target the peer propagating the block proposal. That is to say, that each round of peer sampling will always include the propagating peer. The block proposal will be propagated further immediately, regardless of whether P is missing transactions or not. This ensures that the proposal is not delayed.


		Our usage of Gossipsub works such that:

		1. The original publisher of a message sends to all peers. This number is configurable but likely to be no less than 50.


		Targeting the propagating peer in this way ensures that the transaction 'should' propagate throughout the network albeit at a potentially slower rate than gossiping. Every level of propagation requires a round trip of request/response.

		We don't wait for each level of propagtion to retrieve the transactions as doing so would delay the block proposal. It is possible (even likely) that a sufficient number of validators already have the transactions and can execute immediately.


		### Request/Response directly from proposer

		Performing request/response more aggresively from the propagator should further increase the likelihood of transactions being generally available to all peers. However, if this is found to still be insufficient then we propose a method of requesting directly from the proposer. This would only be an option for committee members.

Initial design/thoughts around tx availability #63

Are you sure you want to change the base?

Initial design/thoughts around tx availability #63

Uh oh!

Conversation

PhilWindle commented May 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

just-mitch May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mralj May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spalladino commented Jun 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

just-mitch May 21, 2025 •

edited

Loading

mralj May 27, 2025 •

edited

Loading