[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream by eason-yuchen-liu · Pull Request #54848 · apache/spark

eason-yuchen-liu · 2026-03-17T03:52:15Z

What changes were proposed in this pull request?

Rewrite addData to use records.size % numPartitions for better load balance across partitions.

Why are the changes needed?

Previously, it will only load balance across partitions when a sequence of data is input altogether. This change enables load balance for one-row-at-a-time input patterns.

Does this PR introduce any user-facing change?

No. This is a test only source.

How was this patch tested?

CI.

Was this patch authored or co-authored using generative AI tooling?

No.

HeartSaVioR · 2026-03-18T06:28:53Z

...src/main/scala/org/apache/spark/sql/execution/streaming/sources/LowLatencyMemoryStream.scala

-        val partitionId = index % numPartitions
-        records(partitionId) += ((toRow(item).copy().asInstanceOf[UnsafeRow], timestamp))
+    data.iterator.foreach { item =>
+      val partitionId = records.size % numPartitions


How this works? records.size will be always the same (= numPartitions) regardless of how the events are currently distributed, right? This change will simply put the data in a single partition, the first partition.

While we are here, I'd love to see the test at this point.

Improve load balance in LowLatencyMemoryStream addData

69061a7

HeartSaVioR reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848
eason-yuchen-liu wants to merge 1 commit intoapache:masterfrom
eason-yuchen-liu:lowLatencyMemoryStreamLoadBalance

eason-yuchen-liu commented Mar 17, 2026

Uh oh!

HeartSaVioR Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eason-yuchen-liu commented Mar 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HeartSaVioR Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HeartSaVioR Mar 18, 2026 •

edited

Loading