Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Built some initial e2e datafusion infrastructure #10

Closed
wants to merge 36 commits into from

Conversation

SarveshOO7
Copy link
Contributor

Problem

We want to be able to run our queries on datafusion.

Summary of changes

The only update I have is that I have setup the datafusion e2e to be able to run multiple queries and print the results and the plans. Currently, since we do not have an optimizer built out and we don’t have an logical plan IR to convert the datafusion logical plan into, I am simply running the datafusion optimizer.

@SarveshOO7 SarveshOO7 mentioned this pull request Jan 20, 2025
Copy link
Member

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this optd-datafusion instead or something more descriptive than "infra"?

Make sure to fix all of the clippy lints (you can either look at the Files changed tab on GitHub or just run cargo clippy --all-targets

Also, please add comments on everything because I can't really understand the code...

See the official Rust style guide as well for things that clippy cant check (things like full punctuation on comments)

@connortsui20
Copy link
Member

It might be a good idea to fully rebase everything on top of main, something like git rebase -i $(git merge-base HEAD master) and just squash everything into a single commit

@SarveshOO7
Copy link
Contributor Author

Yes, I agree. Most of what was done here has become useless. However, I think I'll still try and make a version of this stuff work on this branch and then worry about moving stuff around later.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 163 lines in your changes missing coverage. Please review.

Project coverage is 0.0%. Comparing base (c41cf41) to head (ae97fec).

Files with missing lines Patch % Lines
optd-datafusion/src/lib.rs 0.0% 97 Missing ⚠️
optd-datafusion/src/main.rs 0.0% 66 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
optd-datafusion/src/main.rs 0.0% <0.0%> (ø)
optd-datafusion/src/lib.rs 0.0% <0.0%> (ø)

@yliang412 yliang412 mentioned this pull request Feb 11, 2025
@yliang412
Copy link
Member

Integrated into #26

@yliang412 yliang412 closed this Feb 11, 2025
yliang412 added a commit that referenced this pull request Feb 14, 2025
## Problem

With the initial representation and storage added in #4 and #22, we now
want to support the full pipeline going from parsing SQL, optimizing the
plan using optd, and executing the query in Datafusion.

## Summary of changes

- Integrate all @SarveshOO7's good work in
#10
- Added one mock physical implementation rule + operator for each
logical operator
- Refactor scalar operator storage and reduce code bloat.
- Add physical storage tables and memo API.
- Bump MSRV to 1.81.0 to be compatible with datafusion 45.0.0:
apache/datafusion#14330

---------

Signed-off-by: Yuchen Liang <[email protected]>
Co-authored-by: SarveshOO7 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants