Skip to content

filter: support ignore update only columns for kafka sink#5194

Open
lidezhu wants to merge 7 commits into
masterfrom
ldz/value-aware-filter
Open

filter: support ignore update only columns for kafka sink#5194
lidezhu wants to merge 7 commits into
masterfrom
ldz/value-aware-filter

Conversation

@lidezhu

@lidezhu lidezhu commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #5269

What is changed and how it works?

This pull request introduces a value-aware filtering capability for the Kafka sink in TiCDC. By allowing users to specify columns that should be ignored during update events, the system can now suppress unnecessary downstream updates when only non-critical columns (like metadata or version fields) are modified. This change involves updates to the configuration model, the event filtering engine, and the dispatcher logic to propagate these settings effectively.

Highlights

  • Feature Implementation: Added support for filtering out update events in the Kafka sink when only ignored columns are modified.
  • Configuration: Introduced ignore-update-only-columns configuration to specify columns that should not trigger an update event if they are the only ones changed.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • New Features
    • Introduced an ignore-update-only-columns option to skip UPDATE events that only change configured columns; exposed in the API and enabled for Kafka dispatchers.
  • Tests
    • Added unit and integration tests validating update-only-column filtering, dispatcher toggle behavior, and end-to-end scenarios.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 3, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 3, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14ffbe6e-2639-4e37-9a09-4d15145a71bc

📥 Commits

Reviewing files that changed from the base of the PR and between 9ea83c0 and e38dada.

📒 Files selected for processing (2)
  • pkg/filter/filter_test.go
  • pkg/filter/update_only_columns_filter.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/filter/filter_test.go
  • pkg/filter/update_only_columns_filter.go

📝 Walkthrough

Walkthrough

Adds an ignore-update-only-columns filter (config + proto + API), implements per-table update-only detection, threads a DMLFilterContext through event processing, gates the behavior to Kafka via a dispatcher flag, and updates tests and integration validation for Kafka sinks.

Changes

Ignore Update Only Columns

Layer / File(s) Summary
Proto, config, and API models
eventpb/event.proto, pkg/config/filter.go, api/v2/model.go, tests/integration_tests/api_v2/model.go
Add ignore_update_only_columns to EventFilterRule across proto/config/API and add enable_ignore_update_only_columns to DispatcherRequest.
Dispatcher capability and wiring
downstreamadapter/dispatcher/*, downstreamadapter/dispatchermanager/helper.go, pkg/messaging/message.go, downstreamadapter/eventcollector/dispatcher_session.go
Add EnableIgnoreUpdateOnlyColumns() to dispatcher interfaces, implement Kafka-only enablement in BasicDispatcher, propagate flag into dispatcher registration/reset requests, and expose it via messaging wrapper.
Dispatcher and collector tests
downstreamadapter/eventcollector/*, downstreamadapter/dispatcher/*_test.go, downstreamadapter/eventcollector/event_collector_test.go
Extend mocks to expose/consume the new capability flag, inject routers/handlers as needed, and add assertions validating the emitted DispatcherRequest flag.
Filter core and tests
pkg/filter/filter.go, pkg/filter/update_only_columns_filter.go, pkg/filter/filter_test.go
Introduce DMLFilterContext and updateOnlyColumnsFilter; integrate it into ShouldIgnoreDML when enabled and add table-driven tests covering update-only scenarios and case sensitivity.
Event path integration
pkg/common/event/*, pkg/eventservice/event_scanner.go, pkg/eventservice/event_service.go, pkg/eventservice/*_test.go
Thread DMLFilterContext through dmlProcessor, TxnEvent, and DMLEvent.AppendRow; update signatures and tests to pass the context; add DispatcherInfo method.
Common callers and benchmarks
pkg/common/event/dml_event_benchmark.go, pkg/common/event/util.go, related tests
Update benchmarks, utilities, and test callers to pass filter.DMLFilterContext{} and add necessary imports.
Integration test
tests/integration_tests/ignore_update_only_columns/*
Add changefeed config, SQL fixture, and Kafka-only run script that asserts ignored vs retained update outcomes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API_Server
  participant EventService
  participant Filter
  participant Dispatcher
  participant Kafka
  Client->>API_Server: configure EventFilterRule (ignore-update-only-columns)
  API_Server->>EventService: store config / create changefeed
  EventService->>Filter: build filter with UpdateOnlyColumns rules
  Dispatcher->>EventService: EnableIgnoreUpdateOnlyColumns() (Kafka -> true)
  EventService->>Filter: ShouldIgnoreDML(..., DMLFilterContext{EnableIgnoreUpdateOnlyColumns:true})
  alt Should be ignored
    EventService-->>Dispatcher: no message emitted
  else Should be kept
    EventService->>Dispatcher: emit event
    Dispatcher->>Kafka: publish event
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm, approved

Suggested reviewers

  • asddongmen
  • flowbehappy

Poem

🐰 I nibble at rows where only clocks tick,
ignoring version hops that add no new trick.
Kafka nods—"let bookkeeping pass,"
downstream stays light, not a single morass.
Hooray — a rabbit’s hop saves bytes quick.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title "filter: support ignore update only columns for kafka sink" clearly and concisely summarizes the main feature being added.
Linked Issues check ✅ Passed All coding requirements from issue #5269 are met: configuration field added, column change detection implemented, UPDATE-only filtering applied, column name resolution with case-sensitivity handling, and Kafka-only scope enforced.
Out of Scope Changes check ✅ Passed All changes directly support the ignore-update-only-columns feature: configuration models, filter implementation, dispatcher propagation, event processing, API contracts, and integration tests—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ldz/value-aware-filter

Warning

Tools execution failed with the following error:

Failed to run tools: Ping-pong health check failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 3, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the ignore-update-only-columns feature for TiCDC, allowing Kafka downstreams to filter out UPDATE events when all changed columns are configured as ignorable. The changes span configuration models, protobuf definitions, API models, and the core filtering pipeline. The reviewer provided valuable feedback focused on performance optimizations and safety on the hot path, suggesting the use of sync.RWMutex with a double-checked locking pattern, caching key column IDs to eliminate per-row map allocations, and adding a defensive nil check to prevent potential nil pointer dereferences.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/filter/update_only_columns_filter.go Outdated
Comment thread pkg/filter/update_only_columns_filter.go
return false, nil
}

keyColIDs := makeColumnIDSet(tableInfo.GetIndexColumns())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use the cached keyColIDs from the resolved schema to completely eliminate map allocations per row.

Suggested change
keyColIDs := makeColumnIDSet(tableInfo.GetIndexColumns())
keyColIDs := resolved.keyColIDs

Comment thread pkg/filter/update_only_columns_filter.go
Comment thread pkg/filter/filter.go
@lidezhu lidezhu changed the title [WIP] filter: support ignore update only columns for kafka sink Jun 9, 2026
@lidezhu

lidezhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/test all

@lidezhu lidezhu marked this pull request as ready for review June 9, 2026 10:40
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@lidezhu

lidezhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini summary

@lidezhu

lidezhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

This pull request introduces a value-aware filtering capability for the Kafka sink in TiCDC. By allowing users to specify columns that should be ignored during update events, the system can now suppress unnecessary downstream updates when only non-critical columns (like metadata or version fields) are modified. This change involves updates to the configuration model, the event filtering engine, and the dispatcher logic to propagate these settings effectively.

Highlights

  • Feature Implementation: Added support for filtering out update events in the Kafka sink when only ignored columns are modified.
  • Configuration: Introduced ignore-update-only-columns configuration to specify columns that should not trigger an update event if they are the only ones changed.
  • Performance Optimization: Implemented a new filter mechanism with cached column resolution to minimize overhead during DML processing.
  • Testing: Added comprehensive integration tests to verify the filtering logic for various scenarios, including primary/unique key updates and column case sensitivity.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity
  • Pull request created by lidezhu.
  • Bot triggered CI for the draft PR.
  • Code review feedback provided by gemini-code-assist regarding mutex usage and performance optimizations.
  • Author requested a full test run via /test all.
  • Author requested a summary and review via /gemini commands.

@lidezhu

lidezhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to ignore update-only columns during DML filtering, allowing users to skip replication of update events if only specified columns are modified. The code review feedback focuses on critical performance optimizations in the hot path of update_only_columns_filter.go. Key recommendations include using sync.RWMutex with double-checked locking to reduce lock contention, caching keyColIDs to avoid map allocations on every update event, optimizing columnValueEqual to directly compare common types and avoid heap allocations, and updating imports accordingly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/filter/update_only_columns_filter.go
Comment thread pkg/filter/update_only_columns_filter.go
Comment thread pkg/filter/update_only_columns_filter.go
Comment thread pkg/filter/update_only_columns_filter.go
Comment thread pkg/filter/update_only_columns_filter.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/event/util.go`:
- Around line 669-670: In DML2UpdateEvent, the call to dmlEvent.AppendRow(raw,
s.mounter.DecodeToChunk, nil, filter.DMLFilterContext{}) ignores its returned
error; modify the DML2UpdateEvent logic to capture the error return from
AppendRow, handle it appropriately (return the error up the call stack or log
and fail the test/fixture creation), and ensure any upstream callers (e.g., the
function that invokes DML2UpdateEvent) propagate or handle that error; reference
AppendRow and DML2UpdateEvent to locate and update the call site and
signature/returns as needed.

In `@tests/integration_tests/ignore_update_only_columns/run.sh`:
- Around line 69-70: The script uses unsafe argument forwarding and unquoted
variables: replace the bare invocation run $* with a safe forwarder using "$@"
and quote the WORK_DIR when calling check_logs (use check_logs "$WORK_DIR") so
arguments containing whitespace are preserved; update the two invocations of run
and check_logs to use "$@" and quoted "$WORK_DIR" respectively, ensuring the
existing run and check_logs functions remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 403d2a1a-8d99-413a-8e4b-eaac4f45bd5b

📥 Commits

Reviewing files that changed from the base of the PR and between 1bd702f and b74b8ed.

⛔ Files ignored due to path filters (1)
  • eventpb/event.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (26)
  • api/v2/model.go
  • downstreamadapter/dispatcher/basic_dispatcher.go
  • downstreamadapter/dispatcher/basic_dispatcher_active_active_test.go
  • downstreamadapter/dispatcher/basic_dispatcher_info.go
  • downstreamadapter/dispatchermanager/helper.go
  • downstreamadapter/eventcollector/dispatcher_session.go
  • downstreamadapter/eventcollector/dispatcher_stat_test.go
  • downstreamadapter/eventcollector/event_collector_test.go
  • eventpb/event.proto
  • pkg/common/event/dml_event.go
  • pkg/common/event/dml_event_benchmark.go
  • pkg/common/event/dml_event_test.go
  • pkg/common/event/util.go
  • pkg/config/filter.go
  • pkg/eventservice/event_scanner.go
  • pkg/eventservice/event_scanner_test.go
  • pkg/eventservice/event_service.go
  • pkg/eventservice/event_service_test.go
  • pkg/filter/filter.go
  • pkg/filter/filter_test.go
  • pkg/filter/update_only_columns_filter.go
  • pkg/messaging/message.go
  • tests/integration_tests/api_v2/model.go
  • tests/integration_tests/ignore_update_only_columns/conf/changefeed.toml
  • tests/integration_tests/ignore_update_only_columns/data/data.sql
  • tests/integration_tests/ignore_update_only_columns/run.sh

Comment thread pkg/common/event/util.go Outdated
Comment thread tests/integration_tests/ignore_update_only_columns/run.sh
@lidezhu

lidezhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

/test all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filter: support ignore update only columns for kafka sink

1 participant