Skip to content

[spark], [infra] run spark integration tests in CI. #5590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 25, 2025

Conversation

zhongyujiang
Copy link
Contributor

@zhongyujiang zhongyujiang commented May 12, 2025

Purpose

The current github CI for Spark module is missing integration tests, and some of Spark's integration tests are actually failing, they've just been consistently ignored by the CI.

This also fixes the error in time travel queries for tags: when a tag does not exist, the query now does not throw an exception, but instead directly returns the result of the latest snapshot.

Tests

API and Format

Documentation

Copy link
Contributor Author

@zhongyujiang zhongyujiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Zouxxyy @YannByron Can you please take a look when you have time? Thanks!

@zhongyujiang
Copy link
Contributor Author

Some unit tests are failing, it looks like it's releated to the changes in the time travel part. Let me take a look.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Jul 17, 2025

You are right, let me push this PR

@zhongyujiang
Copy link
Contributor Author

@Zouxxyy Hi, sorry, I missed your message earlier. Previously, mvn test didn't run the tests for these ITCases, so I pulled this PR. Let me fix test failures and resolve these conflicts.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Jul 23, 2025

Can you replace the comparison test logic in testUpdateNestedColumnTypeInArray with checkAnswer(df, Seq(Row()...)

Error:    SparkSchemaEvolutionITCase.testUpdateNestedColumnTypeInArray:1019 
Expecting actual:
  ["[1,ArraySeq([apple,100], [banana,101])]",
    "[2,ArraySeq([cat,200], [dog,201])]"]
to contain exactly in any order:
  ["[1,WrappedArray([apple,100], [banana,101])]",
    "[2,WrappedArray([cat,200], [dog,201])]"]
elements not found:
  ["[1,WrappedArray([apple,100], [banana,101])]",
    "[2,WrappedArray([cat,200], [dog,201])]"]
and elements not expected:
  ["[1,ArraySeq([apple,100], [banana,101])]",
    "[2,ArraySeq([cat,200], [dog,201])]"]

@zhongyujiang
Copy link
Contributor Author

@Zouxxyy Hi, thank you for reminding me about the checkAnswer. I was planning to write a custom comparison method myself. However, this method is a protected method in a Scala class, and to use it, I would need to convert the entire test SparkSchemaEvolutionITCase class to Scala. Do you think that's okay?

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Jul 24, 2025

@Zouxxyy Hi, thank you for reminding me about the checkAnswer. I was planning to write a custom comparison method myself. However, this method is a protected method in a Scala class, and to use it, I would need to convert the entire test SparkSchemaEvolutionITCase class to Scala. Do you think that's okay?

Thanks, either is fine, you can choose the one with less work

Copy link
Contributor Author

@zhongyujiang zhongyujiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zouxxyy Hi, I've fixed all tests, please take a look when you have time, thanks.

@@ -147,6 +147,8 @@ private static void adaptScanVersion(Options options, TagManager tagManager) {
} else if (version.chars().allMatch(Character::isDigit)) {
options.set(SCAN_SNAPSHOT_ID.key(), version);
} else {
// by here, the scan version should be a tag.
options.set(SCAN_TAG_NAME.key(), version);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when querying a tag using the VERSION AS OF syntax, if a tag did not exist, the query would not throw an error but instead return the result of the latest snapshot, which is wrong. This is because the scan version was removed from the options during time travel.

@zhongyujiang
Copy link
Contributor Author

Error: Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.245 s <<< FAILURE! - in org.apache.paimon.flink.lookup.FileStoreLookupFunctionTest
Error: org.apache.paimon.flink.lookup.FileStoreLookupFunctionTest.testLookupScanLeak(boolean)[2] Time elapsed: 0.173 s <<< FAILURE!
org.opentest4j.AssertionFailedError:
expected: 0
but was: 1

Failed test is not releated.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Jul 25, 2025

LGTM!CC @JingsongLi for a look for the tag modification

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit f033cee into apache:master Jul 25, 2025
20 of 21 checks passed
@zhongyujiang zhongyujiang deleted the gh/fix-ci-spark branch July 25, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants