Spark 4.1: New Async Spark Micro Batch Planner#15299
Open
RjLi13 wants to merge 2 commits intoapache:mainfrom
Open
Spark 4.1: New Async Spark Micro Batch Planner#15299RjLi13 wants to merge 2 commits intoapache:mainfrom
RjLi13 wants to merge 2 commits intoapache:mainfrom
Conversation
Contributor
Author
e665d1d to
7815e12
Compare
added 2 commits
February 14, 2026 20:44
This feature was originally built by Drew Goya <dgoya@netflix.com> for Spark 3.3 and Iceberg 1.4.
7815e12 to
64f07d6
Compare
Contributor
Author
|
Reposting this comment about benchmark here: #15059 (comment) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is part 2 after splitting PR #15059
Part 1 PR is here: #15298.
This PR focuses on only introducing the new async spark micro batch planner and all changes to enable it.
Full context is in #15059 but posted below again:
Implements a new feature for Spark Structured Streaming and Iceberg users known as Async Spark Micro Batch Planner
Currently Microbatch planning in Iceberg is synchronous. Streaming queries plan out what batches to read and how many rows / files in each batch. Then it processes the data and repeats. By introducing an async planner, it improves streaming performance by pre-fetching table metadata and file scan tasks in a background thread, reducing micro-batch planning latency. This way planning can overlap with data processing and speed up dealing with large volumes.
This PR adds the option for users to set spark.sql.iceberg.async-micro-batch-planning-enabled if they want to use async planning. The code in SparkMicroBatchStream.java is moved to SyncSparkMicroBatchPlanner.java and SparkMicroBatchStream configures which planner to use. This option is defaulted to false, so existing behavior is unchanged.
This feature was originally authored by Drew Goya in our Netflix fork for Spark 3.3 & Iceberg 1.4. I built upon Drew's work by porting this to Spark 3.5 4.1 and current Iceberg version.