fix: track merge key in transaction for concurrent merge_insert conflict detection#6051
Draft
ozzieba wants to merge 1 commit intolance-format:mainfrom
Draft
fix: track merge key in transaction for concurrent merge_insert conflict detection#6051ozzieba wants to merge 1 commit intolance-format:mainfrom
ozzieba wants to merge 1 commit intolance-format:mainfrom
Conversation
…ict detection Add `merge_key_field_ids` to the Update operation in the transaction proto so conflict resolution can detect incompatible concurrent merge inserts. - Always include bloom filter for inserted rows regardless of PK metadata - Different merge keys (ON columns) are treated as conflicts - Asymmetric bloom filter pairs (Some, None) are treated as conflicts - Backward compatible: empty merge_key_field_ids for non-merge updates Refs: lancedb/lancedb#2463, lance-format#4585, lance-format#6018 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes lancedb/lancedb#2463, #4585
Supersedes #6018 (addresses reviewer feedback from @jackye1995)
Concurrent
merge_insertoperations that insert the same new key silentlyproduce duplicate rows when the schema lacks
unenforced-primary-keymetadata.Per review feedback, the right
fix is to track the merge key in the transaction model rather than relying
solely on the presence of a bloom filter.
Changes
Spec change:
transaction.protoAdd
repeated int32 merge_key_field_ids = 9to theUpdatemessage. Whennon-empty, this indicates the transaction is a merge insert and records which
columns were used as the merge key (the ON columns). This enables conflict
resolution to detect incompatible concurrent merge inserts even before checking
bloom filters.
Update
KeyExistenceFiltercomments to remove the requirement that field IDsmust match an unenforced primary key — they now represent the merge key.
Conflict resolution
Different merge keys → conflict: If two concurrent merge inserts use
different ON columns (e.g., one merges on
id, another onname), theirbloom filters are incompatible and cannot be compared. This is now detected
via
merge_key_field_idsand treated as a retryable conflict.Same merge key → bloom filter check: If the merge keys match, the
existing bloom filter intersection check determines whether the inserted
rows overlap.
Asymmetric bloom filters → conflict:
(Some, None)and(None, Some)are both conservatively treated as conflicts (fixes the original bug where
(None, Some)fell through silently).Backward compatible: Empty
merge_key_field_idsmeans "not a mergeinsert" — the existing
(None, None)fall-through is preserved for oldertransactions and regular updates.
Always emit bloom filter
The bloom filter is now always included for merge insert operations, regardless
of whether the schema has
unenforced-primary-keymetadata. Theis_primary_keygate has been removed.
Note on spec change process
Per @jackye1995's comment,
this adds a new field to
transaction.protowhich is a spec change. Happy tocreate a separate discussion for a community vote if needed (similar to
#5485).
The change is backward compatible: older writers produce empty
merge_key_field_idswhich is handled correctly by the new conflict resolver.Test plan
cargo clippyclean,cargo fmtclean