-
Notifications
You must be signed in to change notification settings - Fork 3.9k
release-25.2: fix spurious LDR DLQ entries #147350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-25.2: fix spurious LDR DLQ entries #147350
Conversation
This reworks the crud table handler's exhaustive test so that it runs against the batch handler interface. That allows it to test all of the LDR handler implementations. The test was extended to metamorphically define a unique index/constraint and swap the primary key column id. This allows the test to reproduce cockroachdb#145994. Informs: cockroachdb#145994 Release note: none
DistSender splits requests up in range order and collects errors in the order requests were dispatched. If there are multiple errors, the highest priority error is returned and ties are broken by dispatch order. Now, the merge logic consults the index of the request that generated the error. This is particularly useful for CPuts as it allows the caller to observe the first cput in the batch that failed. The deterministic order allows LDR, `deleteSwap`, and `updateSwap` to guarantee the cput error is for the primary key. This property is essential because they are using cput to optimistically guess at the value of the row in the database. The batches may also contain cput failures for unique indexes, but those errors should only be raised to the user if the primary key cput succeeded. This is still a bit of a hack. In a perfect world, the cput failure would be treated as a result instead of an error. That would allow the caller to inspect each cput result independently and would make it clear that cput failures do not poison any of the other requests in the batch. Informs: cockroachdb#146117 Release note: fixes an issue with LDR where the presence of a unique index may cause spurious DLQ entries if the unique index has a smaller index id than the primary key index.
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
@arulajmani I added you as a TL reviewer because the core fix is a change to the distsender. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @msbutler)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1684 at r2 (raw file):
// priority, the error with the lowest request index is preferred. This allows // a caller issuing multiple cputs to control which ConditionFailedError is // returned. Specifically, it allows returning a primary key cput failure
Is this comment incomplete? In particular, doesn't it rely on the primary key CPut
being earlier in batch compared to a unique index CPut
? It doesn't need to hold this backport, or be backported, but could we improve this comment on master?
Here is a PR on the master branch clarifying the doc comment: #147493 Thanks for the reviews! |
Backport:
Please see individual PRs for details.
Release justification: fixes #145994 which was discovered in customer environments.
/cc @cockroachdb/release