Skip to content

chore: Bump arrow to 18.3.0 #1773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2025
Merged

Conversation

Kontinuation
Copy link
Member

Which issue does this PR close?

Closes #1615 .

Rationale for this change

Issue #1615 is caused by a bug in arrow-java. The latest release 18.3.0 has shipped the fix for that bug so we can simply bump the version of arrow to fix the issue.

I have also found that apache/arrow-java#707 is also shipped with 18.3.0, it has fixed the sliced string array importing issue, so we can remove the workaround in our code.

What changes are included in this PR?

  • Bump version of arrow from 16.0.0 to 18.3.0
  • Remove our workaround for importing sliced string/binary arrays

How are these changes tested?

  1. Passing existing tests
  2. Manually run TPC-DS SF=100 to verify that Q23 fails when running TPC-DS SF=1 because of invalid offset buffer being exported for empty StringArray. #1615 was fixed, and range end index 294912 out of range for slice of length 147456 #540 does not revive.

@Kontinuation Kontinuation changed the title Bump arrow to 18.3.0 chore: Bump arrow to 18.3.0 May 22, 2025
@Kontinuation
Copy link
Member Author

Unfortunately Arrow Java requires Java 11 since version 18, so we cannot bump the version number as we have to support Java 8.

@codecov-commenter
Copy link

codecov-commenter commented May 22, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 59.45%. Comparing base (f09f8af) to head (dd1deeb).
Report is 213 commits behind head on main.

Files with missing lines Patch % Lines
...rc/main/java/org/apache/arrow/c/ArrowImporter.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1773      +/-   ##
============================================
+ Coverage     56.12%   59.45%   +3.32%     
- Complexity      976     1139     +163     
============================================
  Files           119      128       +9     
  Lines         11743    12523     +780     
  Branches       2251     2355     +104     
============================================
+ Hits           6591     7445     +854     
+ Misses         4012     3886     -126     
- Partials       1140     1192      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove
Copy link
Member

Thanks @Kontinuation. Let's talk about dropping support for Java 8? #1775

@andygrove
Copy link
Member

@Kontinuation We merged the PR to stop building with JDK 8

@Kontinuation Kontinuation marked this pull request as ready for review May 24, 2025 11:48
@Kontinuation
Copy link
Member Author

I have rebased this branch onto the latest main. The ubuntu-latest/java 17-spark-4.0/java test slowed down then failed. I'm not sure if it is transient or not.

Sat, 24 May 2025 02:45:30 GMT - columnar shuffle on map [float] (3 seconds, 751 milliseconds)
Sat, 24 May 2025 02:45:34 GMT - columnar shuffle on map [double] (3 seconds, 776 milliseconds)
Sat, 24 May 2025 02:45:42 GMT - columnar shuffle on map [date] (7 seconds, 957 milliseconds)
Sat, 24 May 2025 02:54:54 GMT - columnar shuffle on map [timestamp] (9 minutes, 11 seconds)
Sat, 24 May 2025 03:05:03 GMT [INFO] ------------------------------------------------------------------------
Sat, 24 May 2025 03:05:03 GMT [INFO] Reactor Summary for Comet Project Parent POM 0.9.0-SNAPSHOT:

@andygrove
Copy link
Member

I have rebased this branch onto the latest main. The ubuntu-latest/java 17-spark-4.0/java test slowed down then failed. I'm not sure if it is transient or not.

Sat, 24 May 2025 02:45:30 GMT - columnar shuffle on map [float] (3 seconds, 751 milliseconds)
Sat, 24 May 2025 02:45:34 GMT - columnar shuffle on map [double] (3 seconds, 776 milliseconds)
Sat, 24 May 2025 02:45:42 GMT - columnar shuffle on map [date] (7 seconds, 957 milliseconds)
Sat, 24 May 2025 02:54:54 GMT - columnar shuffle on map [timestamp] (9 minutes, 11 seconds)
Sat, 24 May 2025 03:05:03 GMT [INFO] ------------------------------------------------------------------------
Sat, 24 May 2025 03:05:03 GMT [INFO] Reactor Summary for Comet Project Parent POM 0.9.0-SNAPSHOT:

Yes, this is a transient issue and we have not figured out a root cause. It did start happening just after we changed the DataFusion dependency to be a git dependency on that latest code, soi that is one clue. We do not understand why this only happens on the Spark 4 builds.

Here are the timings for the above test from the Spark 3.5 run:

2025-05-24T02:13:42.8902668Z - columnar shuffle on map [bool] (4 seconds, 787 milliseconds)
2025-05-24T02:13:47.7676309Z - columnar shuffle on map [byte] (4 seconds, 877 milliseconds)
2025-05-24T02:13:52.3613358Z - columnar shuffle on map [short] (4 seconds, 593 milliseconds)
2025-05-24T02:13:56.8987095Z - columnar shuffle on map [int] (4 seconds, 537 milliseconds)
2025-05-24T02:14:01.6694624Z - columnar shuffle on map [long] (4 seconds, 770 milliseconds)
2025-05-24T02:14:06.2930589Z - columnar shuffle on map [float] (4 seconds, 623 milliseconds)
2025-05-24T02:14:10.9593270Z - columnar shuffle on map [double] (4 seconds, 666 milliseconds)
2025-05-24T02:14:15.4833575Z - columnar shuffle on map [date] (4 seconds, 524 milliseconds)
2025-05-24T02:14:20.0771823Z - columnar shuffle on map [timestamp] (4 seconds, 594 milliseconds)
2025-05-24T02:14:24.7802633Z - columnar shuffle on map [decimal] (4 seconds, 703 milliseconds)
2025-05-24T02:14:29.4309000Z - columnar shuffle on map [string] (4 seconds, 651 milliseconds)
2025-05-24T02:14:33.6948025Z - columnar shuffle on map [binary] (4 seconds, 264 milliseconds)

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kontinuation

@andygrove andygrove merged commit 6663245 into apache:main May 24, 2025
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Q23 fails when running TPC-DS SF=1 because of invalid offset buffer being exported for empty StringArray.
3 participants