Skip to content

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848

Open
eason-yuchen-liu wants to merge 1 commit intoapache:masterfrom
eason-yuchen-liu:lowLatencyMemoryStreamLoadBalance
Open

[SPARK-56023][SS] Better load balance in LowLatencyMemoryStream#54848
eason-yuchen-liu wants to merge 1 commit intoapache:masterfrom
eason-yuchen-liu:lowLatencyMemoryStreamLoadBalance

Conversation

@eason-yuchen-liu
Copy link
Contributor

What changes were proposed in this pull request?

Rewrite addData to use records.size % numPartitions for better load balance across partitions.

Why are the changes needed?

Previously, it will only load balance across partitions when a sequence of data is input altogether. This change enables load balance for one-row-at-a-time input patterns.

Does this PR introduce any user-facing change?

No. This is a test only source.

How was this patch tested?

CI.

Was this patch authored or co-authored using generative AI tooling?

No.

val partitionId = index % numPartitions
records(partitionId) += ((toRow(item).copy().asInstanceOf[UnsafeRow], timestamp))
data.iterator.foreach { item =>
val partitionId = records.size % numPartitions
Copy link
Contributor

@HeartSaVioR HeartSaVioR Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this works? records.size will be always the same (= numPartitions) regardless of how the events are currently distributed, right? This change will simply put the data in a single partition, the first partition.

While we are here, I'd love to see the test at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants