Skip to content

spec: add merge_key_field_ids to Update transaction message#6052

Draft
ozzieba wants to merge 1 commit intolance-format:mainfrom
purpleplatform:spec/merge-key-proto
Draft

spec: add merge_key_field_ids to Update transaction message#6052
ozzieba wants to merge 1 commit intolance-format:mainfrom
purpleplatform:spec/merge-key-proto

Conversation

@ozzieba
Copy link

@ozzieba ozzieba commented Feb 27, 2026

Summary

Refs: lancedb/lancedb#2463, #4585, #6018

Add repeated int32 merge_key_field_ids = 9 to the Update message in
transaction.proto, per @jackye1995's
feedback
that the merge key should be tracked in the transaction model.

Motivation

Concurrent merge_insert operations can silently produce duplicate rows when
the schema lacks unenforced-primary-key metadata (#4585). To fix this
properly, conflict resolution needs to know which columns were used as the
merge key (the ON columns), so it can:

  1. Detect when two concurrent merge inserts use different merge keys
    (incompatible bloom filters — must conflict)
  2. Compare bloom filters when merge keys match (check for overlapping
    inserted rows)

Currently the merge key is only embedded inside KeyExistenceFilter.field_ids,
which is optional and was previously gated on the schema having PK metadata.
Promoting the merge key to a top-level field on Update makes the semantics
explicit and enables conflict detection independent of bloom filter presence.

Changes

  • Add repeated int32 merge_key_field_ids = 9 to Update message
  • Update KeyExistenceFilter comments to remove the PK-only restriction
  • Backward compatible: empty for non-merge-insert updates and older writers

Community vote

Per @jackye1995's suggestion, this is a spec change that may require a community
vote (similar to #5485).
Happy to create a vote discussion if needed.

A companion implementation PR will follow once this spec change is accepted.

🤖 Generated with Claude Code

Add `repeated int32 merge_key_field_ids = 9` to the Update message in
transaction.proto. This field records which columns were used as the
merge key (the ON columns) in a merge insert operation, enabling
conflict resolution to detect incompatible concurrent merge inserts
that use different merge keys.

Backward compatible: empty for non-merge-insert updates and older
writers.

Refs: lancedb/lancedb#2463, lance-format#4585, lance-format#6018

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant