Spark 4.1: Separate compaction and main operations by aokolnychyi · Pull Request #15301 · apache/iceberg

aokolnychyi · 2026-02-11T23:29:59Z

This PR pulls all compaction from main scans/writes in preparation for making the main scans and writes versioned.

This is a subset of changes from PR #15240.

aokolnychyi · 2026-02-11T23:30:40Z

...v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java

        spark()
            .read()
            .format("iceberg")
-            .option(SparkReadOptions.SCAN_TASK_SET_ID, groupId)


No longer needed. Just use group ID passed to table.

aokolnychyi · 2026-02-11T23:31:11Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/BaseSparkTable.java

+import org.apache.spark.sql.connector.expressions.Transform;
+import org.apache.spark.sql.types.StructType;
+
+abstract class BaseSparkTable


There will be more extending this in future PRs.

aokolnychyi · 2026-02-11T23:33:04Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java

-            SparkReadOptions.SCAN_TASK_SET_ID,
-            options.get(SparkWriteOptions.REWRITTEN_FILE_SCAN_TASK_SET_ID));
-    if (groupId != null) {
-      selector = REWRITE_SELECTOR;


Rewrite selectors are no longer required.

aokolnychyi · 2026-02-11T23:33:50Z

.../spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeletesRewriteBuilder.java

  }

-  private int specId(String fileSetId, List<PositionDeletesScanTask> tasks) {
+  private static int specId(String fileSetId, List<PositionDeletesScanTask> tasks) {


Required to avoid checkstyle failures due to name collision (fileSetId).

aokolnychyi · 2026-02-11T23:44:29Z

cc @amogh-jahagirdar @nastra @szehon-ho @huaxingao @singhpk234

aokolnychyi · 2026-02-12T20:13:05Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkRewriteTableCatalog.java

+import org.apache.spark.sql.types.StructType;
+import org.apache.spark.sql.util.CaseInsensitiveStringMap;
+
+public class SparkRewriteTableCatalog implements TableCatalog, SupportsFunctions {


This supports the bare minimum for compaction.
Nothing fancy like branch selection is required right now.

amogh-jahagirdar

Overall looks good to me, thanks @aokolnychyi . Just had a question on one of the test changes.

amogh-jahagirdar · 2026-02-15T15:05:36Z

spark/v4.1/spark/src/test/java/org/apache/iceberg/spark/source/TestPositionDeletesTable.java

                .format("iceberg")
-                .option(SparkReadOptions.SCAN_TASK_SET_ID, fileSetID)
-                .load(posDeletesTableName);
+                .option(SparkReadOptions.FILE_OPEN_COST, Integer.MAX_VALUE)


Not entirely following why the file open cost needed to explicitly be set now?

Oops, typo. Good catch.

github-actions bot added the spark label Feb 11, 2026

aokolnychyi commented Feb 11, 2026

View reviewed changes

aokolnychyi force-pushed the refactor-rewrites branch from 315d160 to edd421f Compare February 11, 2026 23:32

aokolnychyi commented Feb 11, 2026

View reviewed changes

aokolnychyi force-pushed the refactor-rewrites branch 2 times, most recently from ac8db9d to f7901d9 Compare February 12, 2026 00:10

Spark 4.1: Separate compaction and main operations

d2c7cd5

aokolnychyi force-pushed the refactor-rewrites branch from f7901d9 to d2c7cd5 Compare February 12, 2026 00:11

Rework checks for table cache

f73b71c

aokolnychyi commented Feb 12, 2026

View reviewed changes

amogh-jahagirdar self-requested a review February 15, 2026 03:10

amogh-jahagirdar approved these changes Feb 15, 2026

View reviewed changes

aokolnychyi added 2 commits February 16, 2026 13:00

Remove unnecessary change

d5991fe

Remove no longer used local vars

11639f5

aokolnychyi merged commit 9ce0e6e into apache:main Feb 17, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 4.1: Separate compaction and main operations#15301

Spark 4.1: Separate compaction and main operations#15301
aokolnychyi merged 4 commits intoapache:mainfrom
aokolnychyi:refactor-rewrites

aokolnychyi commented Feb 11, 2026 •

edited

Loading

Uh oh!

aokolnychyi Feb 11, 2026

Uh oh!

aokolnychyi Feb 11, 2026

Uh oh!

aokolnychyi Feb 11, 2026

Uh oh!

aokolnychyi Feb 11, 2026

Uh oh!

aokolnychyi commented Feb 11, 2026

Uh oh!

aokolnychyi Feb 12, 2026

Uh oh!

amogh-jahagirdar left a comment

Uh oh!

amogh-jahagirdar Feb 15, 2026

Uh oh!

aokolnychyi Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aokolnychyi commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aokolnychyi Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Feb 11, 2026

Uh oh!

aokolnychyi Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aokolnychyi commented Feb 11, 2026 •

edited

Loading