Skip to content

Spark 4.1: New Async Spark Micro Batch Planner#15299

Open
RjLi13 wants to merge 2 commits intoapache:mainfrom
RjLi13:async-micro-batch-planner
Open

Spark 4.1: New Async Spark Micro Batch Planner#15299
RjLi13 wants to merge 2 commits intoapache:mainfrom
RjLi13:async-micro-batch-planner

Conversation

@RjLi13
Copy link
Contributor

@RjLi13 RjLi13 commented Feb 11, 2026

This is part 2 after splitting PR #15059

Part 1 PR is here: #15298.

This PR focuses on only introducing the new async spark micro batch planner and all changes to enable it.

Full context is in #15059 but posted below again:


Implements a new feature for Spark Structured Streaming and Iceberg users known as Async Spark Micro Batch Planner

Currently Microbatch planning in Iceberg is synchronous. Streaming queries plan out what batches to read and how many rows / files in each batch. Then it processes the data and repeats. By introducing an async planner, it improves streaming performance by pre-fetching table metadata and file scan tasks in a background thread, reducing micro-batch planning latency. This way planning can overlap with data processing and speed up dealing with large volumes.

This PR adds the option for users to set spark.sql.iceberg.async-micro-batch-planning-enabled if they want to use async planning. The code in SparkMicroBatchStream.java is moved to SyncSparkMicroBatchPlanner.java and SparkMicroBatchStream configures which planner to use. This option is defaulted to false, so existing behavior is unchanged.

This feature was originally authored by Drew Goya in our Netflix fork for Spark 3.3 & Iceberg 1.4. I built upon Drew's work by porting this to Spark 3.5 4.1 and current Iceberg version.

@RjLi13
Copy link
Contributor Author

RjLi13 commented Feb 11, 2026

Will put as ready for review when #15298 is merged. cc @bryanck

@RjLi13 RjLi13 changed the title Spark: New Async Spark Micro Batch Planner Spark 4.1: New Async Spark Micro Batch Planner Feb 11, 2026
@RjLi13 RjLi13 force-pushed the async-micro-batch-planner branch from e665d1d to 7815e12 Compare February 14, 2026 18:34
Ruijing Li added 2 commits February 14, 2026 20:44
This feature was originally built by Drew Goya <dgoya@netflix.com> for Spark 3.3 and Iceberg 1.4.
@RjLi13 RjLi13 force-pushed the async-micro-batch-planner branch from 7815e12 to 64f07d6 Compare February 15, 2026 04:46
@RjLi13 RjLi13 marked this pull request as ready for review February 15, 2026 04:49
@RjLi13
Copy link
Contributor Author

RjLi13 commented Feb 15, 2026

Reposting this comment about benchmark here: #15059 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant