spec: add merge_key_field_ids to Update transaction message#6052
Draft
ozzieba wants to merge 1 commit intolance-format:mainfrom
Draft
spec: add merge_key_field_ids to Update transaction message#6052ozzieba wants to merge 1 commit intolance-format:mainfrom
ozzieba wants to merge 1 commit intolance-format:mainfrom
Conversation
Add `repeated int32 merge_key_field_ids = 9` to the Update message in transaction.proto. This field records which columns were used as the merge key (the ON columns) in a merge insert operation, enabling conflict resolution to detect incompatible concurrent merge inserts that use different merge keys. Backward compatible: empty for non-merge-insert updates and older writers. Refs: lancedb/lancedb#2463, lance-format#4585, lance-format#6018 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refs: lancedb/lancedb#2463, #4585, #6018
Add
repeated int32 merge_key_field_ids = 9to theUpdatemessage intransaction.proto, per @jackye1995'sfeedback
that the merge key should be tracked in the transaction model.
Motivation
Concurrent
merge_insertoperations can silently produce duplicate rows whenthe schema lacks
unenforced-primary-keymetadata (#4585). To fix thisproperly, conflict resolution needs to know which columns were used as the
merge key (the ON columns), so it can:
(incompatible bloom filters — must conflict)
inserted rows)
Currently the merge key is only embedded inside
KeyExistenceFilter.field_ids,which is optional and was previously gated on the schema having PK metadata.
Promoting the merge key to a top-level field on
Updatemakes the semanticsexplicit and enables conflict detection independent of bloom filter presence.
Changes
repeated int32 merge_key_field_ids = 9toUpdatemessageKeyExistenceFiltercomments to remove the PK-only restrictionCommunity vote
Per @jackye1995's suggestion, this is a spec change that may require a community
vote (similar to #5485).
Happy to create a vote discussion if needed.
A companion implementation PR will follow once this spec change is accepted.
🤖 Generated with Claude Code