Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

feat(df-repr/bridge): upgrade datafusion to 43.0.0 #260

Merged
merged 8 commits into from
Dec 8, 2024
Merged

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Dec 7, 2024

Despite the upgrade,

  • New create_df_context to be used across all crates to create a datafusion context with optd. We had too much duplicate code before to set up the context.
  • The main refactor is about the aggregation expressions. Datafusion has a new way of doing that.
  • Datafusion removed cross join. We didn't. We can eventually remove it but now it's blocked on two-stage cascades: if we simply treat cross join the same as inner join, we would time out.
  • Several other refactors to adapt to datafusion (i.e., limit node now takes i64, empty relation / placeholder row executor)
  • Keep as much as the original datafusion cli crate as possible. We now only patch main.rs and exec.rs.
  • There's one more breaking change that we might encounter later when doing sort physical properties. Now datafusion logical plan will remove duplicate sorts if there are no limits present. I feel this is a bad move b/c it's not a direct mapping from the original SQL statement...

@skyzh skyzh requested review from jurplel and yliang412 December 7, 2024 19:42
Signed-off-by: Alex Chi <[email protected]>
@jurplel
Copy link
Member

jurplel commented Dec 8, 2024

Waiting to review until after sqllogictest failures are fixed

@jurplel jurplel merged commit 8f269c5 into main Dec 8, 2024
1 check passed
@jurplel jurplel deleted the skyzh/upgrade-df branch December 8, 2024 22:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants