Skip to content

Upgrade DataFusion to version 45 #4241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
patchwork01 opened this issue Feb 11, 2025 · 4 comments · Fixed by #4242
Closed

Upgrade DataFusion to version 45 #4241

patchwork01 opened this issue Feb 11, 2025 · 4 comments · Fixed by #4242
Assignees
Labels
version-upgrades Issues to upgrade dependencies
Milestone

Comments

@patchwork01
Copy link
Collaborator

Background

The following Dependabot upgrade failed for a couple of reasons:

The Arrow version didn't match the new version in DataFusion, and the Java integration test was failing because the output file from the compaction is not readable from Java. It looks like the schema in the output file has the row keys changed to optional. This test was passing before the DataFusion upgrade.

Description

We'd like to upgrade DataFusion to version 45.

Analysis

It's not clear what may have caused the schema change from the changelog for DataFusion:
https://github.com/apache/datafusion/blob/main/dev/changelog/45.0.0.md

@patchwork01 patchwork01 added the version-upgrades Issues to upgrade dependencies label Feb 11, 2025
@patchwork01 patchwork01 added this to the 0.29.0 milestone Feb 11, 2025
@patchwork01 patchwork01 self-assigned this Feb 11, 2025
@m09526
Copy link
Member

m09526 commented Feb 11, 2025

Row keys in schema, shouldn't be marked as optional/nullable. Sleeper doesnt't support nullable row keys or sort keys.

@gaffer01
Copy link
Member

But what's changed in DataFusion 45 to cause this to break now?

@m09526
Copy link
Member

m09526 commented Feb 11, 2025

The SketchUDF column (row key 0) is now being marked as optional, instead of required. Disabling the UDF avoids the bug. Why is the "is_nullable" output no longer being used properly?

@m09526
Copy link
Member

m09526 commented Feb 11, 2025

This is the cause:

https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.is_nullable

The "is_nullable" function in the ScalarUDFImpl has been deprecated in favour of "return_type_from_args".

Since is_nullable is never called in DataFusion 45.0.0, our sketch column is now nullable again.

Change introduced by: apache/datafusion#14094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version-upgrades Issues to upgrade dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants