-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Model #4
Type Model #4
Conversation
e422259
to
1f7d7fc
Compare
bb1e918
to
d54fd27
Compare
96273fc
to
53f046f
Compare
d73bed6
to
0bc61c9
Compare
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed some variables for clarity, and moved around some code.
I added support to expressions and scalars.
Also added some MVP operators in there.
This commit adds core types for all of the core concepts we need to represent during query optimization (via definitions in the glossary): - Logical/physical plans - Partially materialized logical plans - Logical/physical/scalar operators - Logical/physical/scalar expressions - Transformation/implementation rules Co-authored-by: Connor Tsui <[email protected]> Co-authored-by: Alexis Schlomer <[email protected]>
82c9deb
to
7870639
Compare
@AlSchlo Several thoughts:
|
7870639
to
aa87617
Compare
Yes. We already have the differentiator in the "upper" layer of the enum, right?
Me neither, but I liked child even less, as scalars are also children.
Yes, since it's not always the root. I find that to be confusing.
Requires some extra thought. Not too sure if we want Scalars as a completely different type. They are after all both logical and physical in Cascades. |
Renames `RelLink` to `Relation` and `ScalarLink` to `Scalar`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work guys. This is solid foundation. Besides the nits, the two main improvements I want to point out us regarding the transformation rule API and ways we represent subqueries in the system.
For the transformation rule API, to add newly generated expressions back to the memo table + recursively check for duplicates, you probably want to return some kind of partial plan.
For subqueries, I would argue we might want to handle it in some form of our representation. So there is the case where scalar could contain relation children. But after normalization, we should be fine. We might also want to support executing a subquery without unnesting (it will be a useful feature).
optd-core/src/operator/relational/physical/join/sort_merge_join.rs
Outdated
Show resolved
Hide resolved
optd-core/src/operator/relational/physical/join/sort_merge_join.rs
Outdated
Show resolved
Hide resolved
## Problem #4 Was the first major PR, but since we were making a lot of changes last minute there were a few things that slipped through with respect to quality control. ## Summary of changes - Added very strict clippy documentation requirements. Discussion below. - Added documentation for everything except the `operator::relational::physical` module and the `operator::scalar` module, as those seem kind of unstable right now. - Changed the scalar generic param of expressions to use `ScalarGroupId` instead of `GroupId`. - Added several TODOs that I think will be resolved pretty quickly, though I'd appreciate it if others took a look at those in the case that we can actually get those done now. Now that we have something to work off of, requiring documentation on everything (including private items) means that we'll pay the cost right now of making sure people understand what we're doing in the future. I'm willing to relax it a little bit if we put _other_ stuff in place that ensure nobody in the future encounters large chunks of code with zero documentation.
## Problem With the initial representation and storage added in #4 and #22, we now want to support the full pipeline going from parsing SQL, optimizing the plan using optd, and executing the query in Datafusion. ## Summary of changes - Integrate all @SarveshOO7's good work in #10 - Added one mock physical implementation rule + operator for each logical operator - Refactor scalar operator storage and reduce code bloat. - Add physical storage tables and memo API. - Bump MSRV to 1.81.0 to be compatible with datafusion 45.0.0: apache/datafusion#14330 --------- Signed-off-by: Yuchen Liang <[email protected]> Co-authored-by: SarveshOO7 <[email protected]>
This PR models almost all of the types that will be necessary for optimization. This includes:
I've named the crate itself
optd-core
. This can be subject to change, but I feel this is a reasonable default for now.TODO: need to wait on #3 and #12 to be merged before proper CI checks can happenEdit: I removed the
cargo rustdoc
check because its creating more problem than it would solve, see #14