[FLINK-38568] [mysql-cdc] [cdc-base] Optimize binlog split lookup using binary search #4166

huyuanfeng2018 · 2025-10-27T06:45:27Z

What is the purpose of the change

Optimize CDC binlog split lookup from O(n) to O(log n) using binary search.

Brief change log

Add sortFinishedSplitInfos() and findSplitByKeyBinary() methods
Update BinlogSplitReader to use binary search instead of linear search
Add comprehensive unit tests

Performance Impact

Time Complexity: O(n) → O(log n)

Verifying this change

Added 6 new unit tests covering various scenarios including edge cases and consistency verification with linear search.

huyuanfeng2018 · 2025-10-27T09:23:14Z

hi , @ruanhang1993. Could you have the time to review this Pr? Thank you very much ~

loserwang1024 · 2025-11-11T06:36:45Z

@huyuanfeng2018 please also add this improvement to base framework

leonardBang · 2025-11-11T10:10:04Z

...-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/utils/RecordUtils.java

+        if (sortedSplits == null || sortedSplits.isEmpty()) {
+            return null;
+        }
+


Current PR is already in a good shape, I only have two possible suggestions:

(1) Considering that most tables use auto-increment primary keys and that INSERT operations outnumber UPDATE operations in the binlog, the majority of events in the binlog will typically correspond to either the last split or the first split. Leveraging this data locality, we can further optimize by first checking whether the changelog matches the first or last split and then performing binary search on the remaining splits.

(2) We can also do same optimization for IncrementalSourceStreamFetcher which is used by other cdc connectors

WDYT? @huyuanfeng2018

huyuanfeng2018 · 2025-11-12T04:13:48Z

Thanks for @loserwang1024 and @leonardBang for the review. I totally agree with this optimization for the auto-increment scenario. In this case, almost only one comparison is needed, which is very cool. At the same time, I have added this optimization in other sources as well, except for MongoDB and Oracle. Please take the time to review it again~

huyuanfeng2018 · 2025-11-12T05:40:06Z

CI failed. Apache Download CDN has already removed the Flink 1.20.1 binaries pkg. Perhaps we need to upgrade the dependency version.

leonardBang · 2025-11-12T06:18:30Z

...ink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/utils/SourceRecordUtils.java

+     * @param key The chunk key to search for
+     * @return The split containing the key, or null if not found
+     */
+    public static FinishedSnapshotSplitInfo findSplitByKeyBinary(


Thanks @huyuanfeng2018 for the update, the updating looks good, one minor comment: this Utils class is a little long after this PR, could we extract a new Utils like SplitKeyUtils to make the code more readable?

Thanks for the quick review~

done.

leonardBang · 2025-11-12T06:48:11Z

CI failed. Apache Download CDN has already removed the Flink 1.20.1 binaries pkg. Perhaps we need to upgrade the dependency version.

@huyuanfeng2018 Could you append a commit to bump flink version to 1.20.3 to fix this issue ? I've thecked that https://dlcdn.apache.org/flink/flink-1.20.3/ should be okay.

huyuanfeng2018 · 2025-11-12T06:52:00Z

CI failed. Apache Download CDN has already removed the Flink 1.20.1 binaries pkg. Perhaps we need to upgrade the dependency version.

@huyuanfeng2018 Could you append a commit to bump flink version to 1.20.3 to fix this issue ? I've thecked that https://dlcdn.apache.org/flink/flink-1.20.3/ should be okay.

ok

…arch

…ary keys Optimize

leonardBang · 2025-11-12T09:59:49Z

@huyuanfeng2018 I like your community cooperation style, now we can rebase this PR to latest master and convert to normal one

huyuanfeng2018 · 2025-11-12T10:07:15Z

@huyuanfeng2018 I like your community cooperation style, now we can rebase this PR to latest master and convert to normal one

Already rebased onto master. Waiting for CI to finish, I will change the PR status from Draft to Ready.

leonardBang

+1

github-actions bot added the mysql-cdc-connector label Oct 27, 2025

leonardBang requested review from lvyanquan and ruanhang1993 October 31, 2025 02:42

leonardBang reviewed Nov 11, 2025

View reviewed changes

github-actions bot added base oracle-cdc-connector labels Nov 12, 2025

huyuanfeng2018 force-pushed the FLINK-38568 branch from 26fd827 to e5aeeb1 Compare November 12, 2025 05:07

huyuanfeng2018 requested a review from leonardBang November 12, 2025 05:42

leonardBang reviewed Nov 12, 2025

View reviewed changes

huyuanfeng2018 force-pushed the FLINK-38568 branch from cbd87e4 to 13c7594 Compare November 12, 2025 06:48

huyuanfeng2018 marked this pull request as draft November 12, 2025 07:02

huyuanfeng added 7 commits November 12, 2025 17:59

[FLINK-38568][mysql-cdc] Optimize binlog split lookup using binary se…

c3423ff

…arch

spotless apply

0e43cb3

trigger actions

f986787

verification

d8c2135

support binary split lookup in base framework and auto-increment prim…

40f3f29

…ary keys Optimize

spotless apply

fac5fa7

move splitkey method to SplitKeyUtils

168b487

huyuanfeng2018 force-pushed the FLINK-38568 branch from 13c7594 to 168b487 Compare November 12, 2025 09:59

huyuanfeng2018 changed the title ~~[FLINK-38568][mysql-cdc] Optimize binlog split lookup using binary search~~ [FLINK-38568] [mysql-cdc] [cdc-base] Optimize binlog split lookup using binary search Nov 12, 2025

huyuanfeng2018 marked this pull request as ready for review November 13, 2025 01:39

huyuanfeng2018 requested a review from leonardBang November 13, 2025 01:39

leonardBang approved these changes Nov 13, 2025

View reviewed changes

github-actions bot added approved reviewed labels Nov 13, 2025

leonardBang merged commit 7a6bfd8 into apache:master Nov 13, 2025
31 checks passed

[FLINK-38568] [mysql-cdc] [cdc-base] Optimize binlog split lookup using binary search #4166

[FLINK-38568] [mysql-cdc] [cdc-base] Optimize binlog split lookup using binary search #4166

Uh oh!

Conversation

huyuanfeng2018 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Performance Impact

Verifying this change

Uh oh!

huyuanfeng2018 commented Oct 27, 2025

Uh oh!

loserwang1024 commented Nov 11, 2025

Uh oh!

leonardBang Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huyuanfeng2018 commented Nov 12, 2025

Uh oh!

leonardBang Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

leonardBang commented Nov 12, 2025

Uh oh!

huyuanfeng2018 commented Nov 12, 2025

Uh oh!

leonardBang commented Nov 12, 2025

Uh oh!

huyuanfeng2018 commented Nov 12, 2025

Uh oh!

leonardBang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huyuanfeng2018 commented Oct 27, 2025 •

edited

Loading

huyuanfeng2018 commented Nov 12, 2025 •

edited

Loading