[spark], [infra] run spark integration tests in CI. #5590

zhongyujiang · 2025-05-12T10:34:44Z

Purpose

The current github CI for Spark module is missing integration tests, and some of Spark's integration tests are actually failing, they've just been consistently ignored by the CI.

This also fixes the error in time travel queries for tags: when a tag does not exist, the query now does not throw an exception, but instead directly returns the result of the latest snapshot.

Tests

API and Format

Documentation

zhongyujiang

cc @Zouxxyy @YannByron Can you please take a look when you have time? Thanks!

.github/workflows/utitcase-spark-4.x.yml

zhongyujiang · 2025-05-12T11:49:07Z

Some unit tests are failing, it looks like it's releated to the changes in the time travel part. Let me take a look.

paimon-core/src/main/java/org/apache/paimon/table/AbstractFileStoreTable.java

Zouxxyy · 2025-07-17T12:22:40Z

You are right, let me push this PR

zhongyujiang · 2025-07-18T03:19:15Z

@Zouxxyy Hi, sorry, I missed your message earlier. Previously, mvn test didn't run the tests for these ITCases, so I pulled this PR. Let me fix test failures and resolve these conflicts.

Zouxxyy · 2025-07-23T14:56:57Z

Can you replace the comparison test logic in testUpdateNestedColumnTypeInArray with checkAnswer(df, Seq(Row()...)

Error:    SparkSchemaEvolutionITCase.testUpdateNestedColumnTypeInArray:1019 
Expecting actual:
  ["[1,ArraySeq([apple,100], [banana,101])]",
    "[2,ArraySeq([cat,200], [dog,201])]"]
to contain exactly in any order:
  ["[1,WrappedArray([apple,100], [banana,101])]",
    "[2,WrappedArray([cat,200], [dog,201])]"]
elements not found:
  ["[1,WrappedArray([apple,100], [banana,101])]",
    "[2,WrappedArray([cat,200], [dog,201])]"]
and elements not expected:
  ["[1,ArraySeq([apple,100], [banana,101])]",
    "[2,ArraySeq([cat,200], [dog,201])]"]

zhongyujiang · 2025-07-24T02:45:55Z

@Zouxxyy Hi, thank you for reminding me about the checkAnswer. I was planning to write a custom comparison method myself. However, this method is a protected method in a Scala class, and to use it, I would need to convert the entire test SparkSchemaEvolutionITCase class to Scala. Do you think that's okay?

Zouxxyy · 2025-07-24T03:47:22Z

@Zouxxyy Hi, thank you for reminding me about the checkAnswer. I was planning to write a custom comparison method myself. However, this method is a protected method in a Scala class, and to use it, I would need to convert the entire test SparkSchemaEvolutionITCase class to Scala. Do you think that's okay?

Thanks, either is fine, you can choose the one with less work

zhongyujiang

@Zouxxyy Hi, I've fixed all tests, please take a look when you have time, thanks.

paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/RowTestHelper.scala

zhongyujiang · 2025-07-25T07:27:57Z

paimon-core/src/main/java/org/apache/paimon/table/source/snapshot/TimeTravelUtil.java

@@ -147,6 +147,8 @@ private static void adaptScanVersion(Options options, TagManager tagManager) {
        } else if (version.chars().allMatch(Character::isDigit)) {
            options.set(SCAN_SNAPSHOT_ID.key(), version);
        } else {
+            // by here, the scan version should be a tag.
+            options.set(SCAN_TAG_NAME.key(), version);


Previously, when querying a tag using the VERSION AS OF syntax, if a tag did not exist, the query would not throw an error but instead return the result of the latest snapshot, which is wrong. This is because the scan version was removed from the options during time travel.

zhongyujiang · 2025-07-25T08:16:52Z

Error: Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.245 s <<< FAILURE! - in org.apache.paimon.flink.lookup.FileStoreLookupFunctionTest
Error: org.apache.paimon.flink.lookup.FileStoreLookupFunctionTest.testLookupScanLeak(boolean)[2] Time elapsed: 0.173 s <<< FAILURE!
org.opentest4j.AssertionFailedError:
expected: 0
but was: 1

Failed test is not releated.

Zouxxyy · 2025-07-25T08:22:04Z

LGTM！CC @JingsongLi for a look for the tag modification

JingsongLi

+1

zhongyujiang commented May 12, 2025

View reviewed changes

.github/workflows/utitcase-spark-4.x.yml Show resolved Hide resolved

zhongyujiang commented May 12, 2025

View reviewed changes

paimon-core/src/main/java/org/apache/paimon/table/AbstractFileStoreTable.java Outdated Show resolved Hide resolved

zhongyujiang force-pushed the gh/fix-ci-spark branch from 41e5cb9 to a7aa1fb Compare July 21, 2025 13:24

zhongyujiang added 3 commits July 24, 2025 21:55

Run spark integration tests in CI.

aee923e

Fix time travel.

9c756a4

Fix row equals in tests.

cc01582

zhongyujiang force-pushed the gh/fix-ci-spark branch from a7aa1fb to cc01582 Compare July 24, 2025 13:56

zhongyujiang added 6 commits July 24, 2025 22:14

Fix.

7839cce

Fix.

f1a385b

Fix exception validation in tests.

cf4330a

Fix tests.

b14c55c

Improve class naming.

3690970

Add comment.

e83d1cb

zhongyujiang commented Jul 25, 2025

View reviewed changes

JingsongLi approved these changes Jul 25, 2025

View reviewed changes

JingsongLi merged commit f033cee into apache:master Jul 25, 2025
20 of 21 checks passed

zhongyujiang deleted the gh/fix-ci-spark branch July 25, 2025 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[spark], [infra] run spark integration tests in CI. #5590

[spark], [infra] run spark integration tests in CI. #5590

Uh oh!

zhongyujiang commented May 12, 2025 •

edited

Loading

Uh oh!

zhongyujiang left a comment

Uh oh!

Uh oh!

zhongyujiang commented May 12, 2025

Uh oh!

Uh oh!

Zouxxyy commented Jul 17, 2025

Uh oh!

zhongyujiang commented Jul 18, 2025

Uh oh!

Zouxxyy commented Jul 23, 2025 •

edited

Loading

Uh oh!

zhongyujiang commented Jul 24, 2025

Uh oh!

Zouxxyy commented Jul 24, 2025

Uh oh!

zhongyujiang left a comment

Uh oh!

Uh oh!

zhongyujiang Jul 25, 2025

Uh oh!

zhongyujiang commented Jul 25, 2025

Uh oh!

Zouxxyy commented Jul 25, 2025 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

Uh oh!

Uh oh!

[spark], [infra] run spark integration tests in CI. #5590

[spark], [infra] run spark integration tests in CI. #5590

Uh oh!

Conversation

zhongyujiang commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

zhongyujiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhongyujiang commented May 12, 2025

Uh oh!

Uh oh!

Zouxxyy commented Jul 17, 2025

Uh oh!

zhongyujiang commented Jul 18, 2025

Uh oh!

Zouxxyy commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhongyujiang commented Jul 24, 2025

Uh oh!

Zouxxyy commented Jul 24, 2025

Uh oh!

zhongyujiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhongyujiang Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

zhongyujiang commented Jul 25, 2025

Uh oh!

Zouxxyy commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhongyujiang commented May 12, 2025 •

edited

Loading

Zouxxyy commented Jul 23, 2025 •

edited

Loading

Zouxxyy commented Jul 25, 2025 •

edited

Loading