[FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup #230

lindong28 · 2023-04-07T12:47:07Z

What is the purpose of the change

Add util methods that allow algorithm developers to co-group two DataStreams with the same semantics and similar performance as DataSet#coGroup(...)

Here are the results of running the benchmark specified in FLINK-31753's JIRA description:

DataSet#coGroup takes 27.6 seconds.
DataStreamUtils#coGroup takes 31.5 seconds.

The DataStream is roughly 12.3% slower than DataSet. The performance difference should be negligible for real-word applications whose co-group function is non-trivial.

Notes:

The benchmark is run with Flink 1.18 snapshot.
Most classes under the sort folder are copied from the corresponding classes in apache/flink. We can remove these classese from apache/flink-ml after making the corresponding classes in apache/flink public.

Brief change log

Added the static method DataStreamUtils#coGroup(...).

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? JavaDocs

lindong28 · 2023-04-07T12:52:05Z

@zhipeng93 Can you help review this PR?

zhipeng93

Thanks for the PR :) I Left some comments below. Please take a look.

flink-ml-core/src/main/java/org/apache/flink/ml/common/datastream/sort/CoGroupOperator.java

flink-ml-core/src/test/java/org/apache/flink/ml/common/datastream/DataStreamUtilsTest.java

zhipeng93 · 2023-04-12T06:38:45Z

Thanks for the update. LGTM.

…performance as DataSet CoGroup

lindong28 · 2023-04-12T08:43:00Z

@zhipeng93 Thanks for the review.

…performance as DataSet CoGroup This closes apache#230.

lindong28 force-pushed the FLINK-31753 branch from fd3a7e3 to 92aac48 Compare April 7, 2023 12:50

lindong28 changed the title ~~[FLINK-31753] Support DataStream CoGroup in stream Mode with similar performance as DataSet CoGroup~~ [FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup Apr 7, 2023

lindong28 force-pushed the FLINK-31753 branch from 92aac48 to d71ed98 Compare April 7, 2023 12:52

lindong28 force-pushed the FLINK-31753 branch 4 times, most recently from 837fd3a to 514b8b3 Compare April 10, 2023 06:20

zhipeng93 reviewed Apr 11, 2023

View reviewed changes

lindong28 force-pushed the FLINK-31753 branch from 514b8b3 to 4a5914c Compare April 12, 2023 00:48

[FLINK-31753] Support DataStream CoGroup in stream mode with similar …

678eaf0

…performance as DataSet CoGroup

lindong28 force-pushed the FLINK-31753 branch from 4a5914c to 678eaf0 Compare April 12, 2023 08:29

lindong28 merged commit d7c9c8b into apache:master Apr 12, 2023

zhipeng93 pushed a commit to zhipeng93/flink-ml that referenced this pull request Apr 18, 2023

[FLINK-31753] Support DataStream CoGroup in stream mode with similar …

816c853

…performance as DataSet CoGroup This closes apache#230.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup #230

[FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup #230

lindong28 commented Apr 7, 2023 •

edited

Loading

lindong28 commented Apr 7, 2023

zhipeng93 left a comment

zhipeng93 commented Apr 12, 2023

lindong28 commented Apr 12, 2023

[FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup #230

[FLINK-31753] Support DataStream CoGroup in stream mode with similar performance as DataSet CoGroup #230

Conversation

lindong28 commented Apr 7, 2023 • edited Loading

What is the purpose of the change

Brief change log

Does this pull request potentially affect one of the following parts:

Documentation

lindong28 commented Apr 7, 2023

zhipeng93 left a comment

Choose a reason for hiding this comment

zhipeng93 commented Apr 12, 2023

lindong28 commented Apr 12, 2023

lindong28 commented Apr 7, 2023 •

edited

Loading