-
Notifications
You must be signed in to change notification settings - Fork 181
RUST-1842 Update prose tests for mongos deprioritization during retryable ops #1397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the necessary logs to show that the server was properly deprioritized in the server selection code back in? We can leave those in til all the code changes are approved, and then you can remove them as the last step before merging.
|
||
let mut guards = Vec::new(); | ||
for address in hosts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, can you add a comment here explaining why we set the failpoints this way rather than with separate clients? and ditto elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! let me know if you have any suggestions on the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these details aren't quite accurate - the important distinction to note is that we're using the same client to set the failpoints on each mongos as we are for the find operation. The fundamental problem that we were encountering was a race between server discovery, which happens in the background after a client is created, and the server selection process for find, which was previously happening right after creating the client. Server discovery goes roughly as follows:
client
gets created with two mongos addresses (localhost:27017
andlocalhost:27018
) and stores each of these in its topology with an initial server type ofUnknown
. (Unknown
servers are not eligible to be selected for operations)client
sends ahello
message to each mongos and waits for a reply- each mongos replies to the
hello
message with information about itself, andclient
uses this information to update its server type fromUnknown
toMongos
Executing an operation (in this case, enable_fail_point
) on each individual mongos forces the client to complete its discovery of that mongos and select it for the operation. This means that once we get to the find operation, client
has a list of two Mongos
servers to select from. On the contrary, when we were creating a new client for each call to enable_fail_point
and then the find operation, each of those clients was restarting the server discovery process from scratch.
The details here can be a little tricky to understand, so let me know if you have any questions about this and we can walk through it in more detail!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the detailed explanation, Isabel! I hadn’t fully understood how server discovery works in the background or how using separate clients was restarting that process. I also realize now that some of my original terminology wasn’t quite accurate (e.g., implying it was about a single mongos instead of the client's discovery state), so I appreciate the correction.
I’ve updated the comment to reflect that. Let me know if it looks good now or if I should tweak anything further - would be happy to chat about it more if my understanding is still off!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! thanks for making those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! you can push a change to remove the logs and then I'll reapprove to merge.
|
||
let mut guards = Vec::new(); | ||
for address in hosts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! thanks for making those changes.
retryable_reads
andretryable_writes
.Links