Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Disconnects/Reconnects from TestComm #87

Merged
merged 3 commits into from
Feb 24, 2025
Merged

Allow Disconnects/Reconnects from TestComm #87

merged 3 commits into from
Feb 24, 2025

Conversation

samliok
Copy link
Collaborator

@samliok samliok commented Feb 17, 2025

No description provided.

@samliok samliok self-assigned this Feb 17, 2025
@samliok samliok linked an issue Feb 17, 2025 that may be closed by this pull request
@samliok samliok marked this pull request as draft February 17, 2025 16:36
@samliok samliok marked this pull request as ready for review February 19, 2025 17:47
nodes := []simplex.NodeID{{1}, {2}, {3}, []byte("lagging")}
net := newInMemNetwork(t, nodes)
startDisconnect := uint64(5)
endDisconnect := uint64(17) // TODO: create test where the lagging node is the leader after replication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we can parametrize the startDisconnect and endDisconnect indices, and run through them in parallel. We don't need that much, I think a startDisconnect that starts from 0 and can go up to 5, and an endDisconnect that starts from, say, 10, and goes up to 20 should be sufficient. Overall 50 tests that can run in parallel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this would work for certain cases.

require.False(t, isLaggingNodeLeader, "the lagging node should not be the leader")

We require this because after the node re-connects it needs to be sent an fcert to kick start replication. I can still create the tests but we'd need to skip when endDisconect+1 is the lagging node. Should I still do it?

missedSeqs := uint64(0)
// normal nodes continue to make progress
for i := startDisconnect; i < endDisconnect; i++ {
bb.triggerNewBlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we trigger new block when we have a empty round? I don't think we should do that. I know it's a no-op if the control buffer is full, but we're needlessly filling up the control buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to trigger this so that the node that is disconnected doesn't hang

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on what? We wait for a new block without the lock so we are open for new messages.

Can you elaborate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nevermind, fixed

@yacovm yacovm merged commit 5da298b into main Feb 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Disconnects/Reconnects from TestComm
2 participants