-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust simulation ends prematurely with Error: Missing input block
#255
Comments
Looking into it, this can happen when there's enough traffic in the system that
I think the issue will go away if you switch What should the protocol do when a node has an endorsed certificate, but not all of the IBs it references? Should it just not attach a cert to that RB? |
Probably an EB shouldn't be considered endorsed until the node has seen all of the IBs it references? |
oh that makes sense, threw that in. |
Possession of the IB body should only be relevant for Voting for the EB that references it, and for reconstructing the ledger state of an RB that references the EB? I think you should be able to "trust" certificates on their own while generating RBs, the same as when validating them. |
I share @Saizan understanding of the protocol. Isn't the certificate proof that the majority of stake holders have seen & validated these IBs? |
A certificate ensures that at least on honest node has seen & validated the reference IBs.
Here things start to get tricky. E.g., in one the ledger design proposals (reward accounts), the RB producer should include txs that satisfy some property w.r.t. the IBs referenced by the EB in the RB. Assuming our network assumptions hold, Short Leios should guarantee that all IBs inside certified EBs are delivered by the end of the respective deliver 2 phase. Given that this may not be the case currently, as we are still exploring the networking part of the protocol, I would suggest not including EBs in RBs whose IB is missing. Instead include an older EB, or no EB if there is no valid EB satisfying the criteria outlined. |
I think that the freshest first strategy for IBs, combined with the oldest-first strategy when choosing an endorsed EB, is what leads to this. FF means that with a constant stream of newer IBs, the node will never prioritize downloading older ones. Oldest-EB-first means that these older IBs are needed more urgently than newer IBs.
This assumption will be broken if IB generation is fast enough to overwhelm the network. And the speed/resilience of the network isn't under our control IRL, so it makes sense to define the behavior if it does. The sims are using 1MB/sec bandwidth on each link, and generating ~10 IBs per second where each IB body is 150Kb. I think it's correct for a node's set of IBs to "fall behind" when it's fetching them from a single peer, and we generate more bytes of IB than fit in the link with that peer. |
Should it not be fetching them from more than one peer? Haskell sim interprets I don't disagree IB diffusion can fall behind in general. |
The rust sim interprets it as requesting the body from the first peer which announces it (and RequestFromAll as requesting from each peer which announces it). I could make that more sophisticated, request from the first peer which announces it and has capacity to send it. Not sure if that's what haskell is doing |
Perhaps we need a "endorsed-EB-then-freshest-first" strategy for IBs; i.e. first download IBs for EBs that have reached the vote threshold (in anticipation of including them in a RB), and then fetch freshest first. |
Changing the IB download strategy may introduce security issues and should not be done lightheartedly. The idea is that the IB rate is such so that with the freshest first policy IBs are delivered within some known window (most of the time) if released on time. If that is not the case, we should rethink the IB rate. Of course, since sometimes for probabilistic reasons this may not happen, we should consider how the node should handle this situation. |
Haskell node is not evaluating the state of the sender, but since a node has different threads consuming IBs (or votes or EBs) from different peers those threads just race to be the first to request a particular body (there isn't a global view of the received announcements either, each thread only knows about their peer). As a thread requests a body it signals to the others not to. Out of a 200s sim I could collect download counts like so
each row is a different node, each element in the array is how many bodies the node got from the peer with that index. |
On closer inspection, that matches what the rust sim is doing. Each node tracks which IBs a given peer has announced, requests one IB at a time from each peer, and doesn't request the same IB from two peers at once. |
I haven't encountered this in tag |
I've fixed the error, but still want to understand more why IBs aren't propagating quickly enough. I suspect the root cause is going to be similar to another open issue, and I'll close it when I can narrow it down to one of those. |
I have two Rust simulations that end prematurelly: for example one ends at Slot 422 with the following message:
Steps to reproduce
The text was updated successfully, but these errors were encountered: