filter: support ignore update only columns for kafka sink#5194
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds an ignore-update-only-columns filter (config + proto + API), implements per-table update-only detection, threads a DMLFilterContext through event processing, gates the behavior to Kafka via a dispatcher flag, and updates tests and integration validation for Kafka sinks. ChangesIgnore Update Only Columns
Sequence Diagram(s)sequenceDiagram
participant Client
participant API_Server
participant EventService
participant Filter
participant Dispatcher
participant Kafka
Client->>API_Server: configure EventFilterRule (ignore-update-only-columns)
API_Server->>EventService: store config / create changefeed
EventService->>Filter: build filter with UpdateOnlyColumns rules
Dispatcher->>EventService: EnableIgnoreUpdateOnlyColumns() (Kafka -> true)
EventService->>Filter: ShouldIgnoreDML(..., DMLFilterContext{EnableIgnoreUpdateOnlyColumns:true})
alt Should be ignored
EventService-->>Dispatcher: no message emitted
else Should be kept
EventService->>Dispatcher: emit event
Dispatcher->>Kafka: publish event
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: Ping-pong health check failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the ignore-update-only-columns feature for TiCDC, allowing Kafka downstreams to filter out UPDATE events when all changed columns are configured as ignorable. The changes span configuration models, protobuf definitions, API models, and the core filtering pipeline. The reviewer provided valuable feedback focused on performance optimizations and safety on the hot path, suggesting the use of sync.RWMutex with a double-checked locking pattern, caching key column IDs to eliminate per-row map allocations, and adding a defensive nil check to prevent potential nil pointer dereferences.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return false, nil | ||
| } | ||
|
|
||
| keyColIDs := makeColumnIDSet(tableInfo.GetIndexColumns()) |
|
/test all |
|
/gemini summary |
|
/gemini review |
Summary of ChangesThis pull request introduces a value-aware filtering capability for the Kafka sink in TiCDC. By allowing users to specify columns that should be ignored during update events, the system can now suppress unnecessary downstream updates when only non-critical columns (like metadata or version fields) are modified. This change involves updates to the configuration model, the event filtering engine, and the dispatcher logic to propagate these settings effectively. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to ignore update-only columns during DML filtering, allowing users to skip replication of update events if only specified columns are modified. The code review feedback focuses on critical performance optimizations in the hot path of update_only_columns_filter.go. Key recommendations include using sync.RWMutex with double-checked locking to reduce lock contention, caching keyColIDs to avoid map allocations on every update event, optimizing columnValueEqual to directly compare common types and avoid heap allocations, and updating imports accordingly.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/event/util.go`:
- Around line 669-670: In DML2UpdateEvent, the call to dmlEvent.AppendRow(raw,
s.mounter.DecodeToChunk, nil, filter.DMLFilterContext{}) ignores its returned
error; modify the DML2UpdateEvent logic to capture the error return from
AppendRow, handle it appropriately (return the error up the call stack or log
and fail the test/fixture creation), and ensure any upstream callers (e.g., the
function that invokes DML2UpdateEvent) propagate or handle that error; reference
AppendRow and DML2UpdateEvent to locate and update the call site and
signature/returns as needed.
In `@tests/integration_tests/ignore_update_only_columns/run.sh`:
- Around line 69-70: The script uses unsafe argument forwarding and unquoted
variables: replace the bare invocation run $* with a safe forwarder using "$@"
and quote the WORK_DIR when calling check_logs (use check_logs "$WORK_DIR") so
arguments containing whitespace are preserved; update the two invocations of run
and check_logs to use "$@" and quoted "$WORK_DIR" respectively, ensuring the
existing run and check_logs functions remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 403d2a1a-8d99-413a-8e4b-eaac4f45bd5b
⛔ Files ignored due to path filters (1)
eventpb/event.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (26)
api/v2/model.godownstreamadapter/dispatcher/basic_dispatcher.godownstreamadapter/dispatcher/basic_dispatcher_active_active_test.godownstreamadapter/dispatcher/basic_dispatcher_info.godownstreamadapter/dispatchermanager/helper.godownstreamadapter/eventcollector/dispatcher_session.godownstreamadapter/eventcollector/dispatcher_stat_test.godownstreamadapter/eventcollector/event_collector_test.goeventpb/event.protopkg/common/event/dml_event.gopkg/common/event/dml_event_benchmark.gopkg/common/event/dml_event_test.gopkg/common/event/util.gopkg/config/filter.gopkg/eventservice/event_scanner.gopkg/eventservice/event_scanner_test.gopkg/eventservice/event_service.gopkg/eventservice/event_service_test.gopkg/filter/filter.gopkg/filter/filter_test.gopkg/filter/update_only_columns_filter.gopkg/messaging/message.gotests/integration_tests/api_v2/model.gotests/integration_tests/ignore_update_only_columns/conf/changefeed.tomltests/integration_tests/ignore_update_only_columns/data/data.sqltests/integration_tests/ignore_update_only_columns/run.sh
|
/test all |
What problem does this PR solve?
Issue Number: close #5269
What is changed and how it works?
This pull request introduces a value-aware filtering capability for the Kafka sink in TiCDC. By allowing users to specify columns that should be ignored during update events, the system can now suppress unnecessary downstream updates when only non-critical columns (like metadata or version fields) are modified. This change involves updates to the configuration model, the event filtering engine, and the dispatcher logic to propagate these settings effectively.
Highlights
ignore-update-only-columnsconfiguration to specify columns that should not trigger an update event if they are the only ones changed.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit