Type Model #4

connortsui20 · 2025-01-15T02:48:45Z

This PR models almost all of the types that will be necessary for optimization. This includes:

generic relational algebra operators that allow us to use the same "type" for both expressions in the memo table and operators in the plans
logical / physical plans
scalar operators and expressions
partially materialized logical plans for rule binding
transformation rule + implementation rule trait and some empty structs that implement them

I've named the crate itself optd-core. This can be subject to change, but I feel this is a reasonable default for now.

~~TODO: need to wait on #3 and #12 to be merged before proper CI checks can happen~~

Edit: I removed the cargo rustdoc check because its creating more problem than it would solve, see #14

codecov-commenter · 2025-01-29T06:04:32Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

AlSchlo

Renamed some variables for clarity, and moved around some code.

I added support to expressions and scalars.
Also added some MVP operators in there.

This commit adds core types for all of the core concepts we need to represent during query optimization (via definitions in the glossary): - Logical/physical plans - Partially materialized logical plans - Logical/physical/scalar operators - Logical/physical/scalar expressions - Transformation/implementation rules Co-authored-by: Connor Tsui <[email protected]> Co-authored-by: Alexis Schlomer <[email protected]>

connortsui20 · 2025-01-29T14:31:25Z

@AlSchlo Several thoughts:

Are we using the same GroupId for both relational and scalar groups? I actually don't have a strong preference either way, since that is easily changed in the future
I don't really like the name RelLink because I still think that Rel is somewhat ambiguous (hence why I spelled out "relational" every time I used it). I also abandoned Link because I felt it was not descriptive enough
Calling the root operator node instead of root I think is somewhat questionable, but I don't have any strong feelings about it
I'm not totally convinced about the LogicalExpression and PhysicalExpression enums, as it implies that scalars can be both a logical expression and a physical expression. Based on what we talked about yesterday, this doesn't really make a lot of sense -- scalars should be a completely different type. So I'm inclined to keep a RelationalExpression and ScalarExpression type...

.github/workflows/check.yml

AlSchlo · 2025-01-29T14:45:42Z

Are we using the same GroupId for both relational and scalar groups?

Yes. We already have the differentiator in the "upper" layer of the enum, right?

I don't really like the name RelLink because I still think that Rel is ambiguous

Me neither, but I liked child even less, as scalars are also children.

Calling the root operator node instead of root

Yes, since it's not always the root. I find that to be confusing.

I'm not totally convinced about the LogicalExpression and PhysicalExpression enums

Requires some extra thought. Not too sure if we want Scalars as a completely different type. They are after all both logical and physical in Cascades.

Renames `RelLink` to `Relation` and `ScalarLink` to `Scalar`.

yliang412

Great work guys. This is solid foundation. Besides the nits, the two main improvements I want to point out us regarding the transformation rule API and ways we represent subqueries in the system.

For the transformation rule API, to add newly generated expressions back to the memo table + recursively check for duplicates, you probably want to return some kind of partial plan.

For subqueries, I would argue we might want to handle it in some form of our representation. So there is the case where scalar could contain relation children. But after normalization, we should be fine. We might also want to support executing a subquery without unnesting (it will be a useful feature).

optd-core/src/operator/relational/physical/join/hash_join.rs

optd-core/src/rules/implementation/hash_join.rs

optd-core/src/operator/relational/physical/join/nl_join.rs

optd-core/src/operator/relational/physical/join/sort_merge_join.rs

optd-core/src/rules/transformation/mod.rs

optd-core/src/operator/relational/physical/join/sort_merge_join.rs

optd-core/src/operator/scalar/mod.rs

optd-core/src/expression.rs

optd-core/src/operator/relational/logical/scan.rs

optd-core/src/operator/relational/physical/scan/table_scan.rs

optd-core/src/plan/partial_logical_plan.rs

optd-core/src/rules/transformation/mod.rs

optd-core/src/operator/relational/physical/join/merge_join.rs

optd-core/src/operator/relational/physical/join/mod.rs

optd-core/src/operator/relational/physical/join/nl_join.rs

optd-core/src/operator/relational/physical/mod.rs

optd-core/src/operator/scalar/constants.rs

optd-core/src/plan/partial_logical_plan.rs

## Problem #4 Was the first major PR, but since we were making a lot of changes last minute there were a few things that slipped through with respect to quality control. ## Summary of changes - Added very strict clippy documentation requirements. Discussion below. - Added documentation for everything except the `operator::relational::physical` module and the `operator::scalar` module, as those seem kind of unstable right now. - Changed the scalar generic param of expressions to use `ScalarGroupId` instead of `GroupId`. - Added several TODOs that I think will be resolved pretty quickly, though I'd appreciate it if others took a look at those in the case that we can actually get those done now. Now that we have something to work off of, requiring documentation on everything (including private items) means that we'll pay the cost right now of making sure people understand what we're doing in the future. I'm willing to relax it a little bit if we put _other_ stuff in place that ensure nobody in the future encounters large chunks of code with zero documentation.

@SarveshOO7

## Problem With the initial representation and storage added in #4 and #22, we now want to support the full pipeline going from parsing SQL, optimizing the plan using optd, and executing the query in Datafusion. ## Summary of changes - Integrate all @SarveshOO7's good work in #10 - Added one mock physical implementation rule + operator for each logical operator - Refactor scalar operator storage and reduce code bloat. - Add physical storage tables and memo API. - Bump MSRV to 1.81.0 to be compatible with datafusion 45.0.0: apache/datafusion#14330 --------- Signed-off-by: Yuchen Liang <[email protected]> Co-authored-by: SarveshOO7 <[email protected]>

connortsui20 force-pushed the connor/types branch 2 times, most recently from e422259 to 1f7d7fc Compare January 15, 2025 20:09

connortsui20 force-pushed the connor/types branch 3 times, most recently from bb1e918 to d54fd27 Compare January 28, 2025 21:54

connortsui20 changed the title ~~Modeling of types to represent query optimization with a persistent memo table~~ Type Model Jan 28, 2025

connortsui20 force-pushed the connor/types branch from 96273fc to 53f046f Compare January 28, 2025 22:59

connortsui20 marked this pull request as ready for review January 28, 2025 23:04

connortsui20 force-pushed the connor/types branch 3 times, most recently from d73bed6 to 0bc61c9 Compare January 29, 2025 03:22

connortsui20 requested review from AlSchlo, yliang412 and SarveshOO7 January 29, 2025 03:31

cmu-db deleted a comment from codecov-commenter Jan 29, 2025

AlSchlo approved these changes Jan 29, 2025

View reviewed changes

connortsui20 force-pushed the connor/types branch from 82c9deb to 7870639 Compare January 29, 2025 14:27

connortsui20 commented Jan 29, 2025

View reviewed changes

.github/workflows/check.yml Outdated Show resolved Hide resolved

connortsui20 added 2 commits January 29, 2025 09:34

fix proc-macro2 minimal version error

ff1ab5d

fix clippy and format errors

aa87617

connortsui20 force-pushed the connor/types branch from 7870639 to aa87617 Compare January 29, 2025 14:35

rename generic parameters

ec347a8

Renames `RelLink` to `Relation` and `ScalarLink` to `Scalar`.

yliang412 reviewed Jan 29, 2025

View reviewed changes

yliang412 mentioned this pull request Jan 29, 2025

repr: handle subqueries that have relation as children of scalar #16

Closed

yliang412 reviewed Jan 29, 2025

View reviewed changes

optd-core/src/expression.rs Outdated Show resolved Hide resolved

SarveshOO7 requested changes Jan 29, 2025

View reviewed changes

optd-core/src/operator/relational/logical/scan.rs Show resolved Hide resolved

optd-core/src/operator/relational/physical/scan/table_scan.rs Show resolved Hide resolved

optd-core/src/plan/partial_logical_plan.rs Outdated Show resolved Hide resolved

AlSchlo added 2 commits January 29, 2025 12:36

Address comments

fe28191

Fix cargo fmt

8b16be3

yliang412 approved these changes Jan 29, 2025

View reviewed changes

connortsui20 commented Jan 29, 2025

View reviewed changes

optd-core/src/rules/transformation/mod.rs Show resolved Hide resolved

yliang412 requested a review from SarveshOO7 January 29, 2025 18:12

SarveshOO7 approved these changes Jan 29, 2025

View reviewed changes

distinguish between different types of IDs with newtypes

583f9de

connortsui20 commented Jan 29, 2025

View reviewed changes

AlSchlo force-pushed the connor/types branch from 8e99fb3 to 583f9de Compare January 29, 2025 21:53

AlSchlo added 3 commits January 29, 2025 17:16

Separate scalar expressions from logical

913bae8

Address Connor's nits

67465ed

Add missing file

7ce1585

AlSchlo merged commit c41cf41 into main Jan 29, 2025
12 checks passed

AlSchlo deleted the connor/types branch January 29, 2025 22:24

connortsui20 mentioned this pull request Jan 30, 2025

First PR cleanup #18

Merged

yliang412 mentioned this pull request Feb 11, 2025

feat: full e2e pipeline #26

Merged

Type Model #4

Type Model #4

Uh oh!

Conversation

connortsui20 commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 29, 2025

Welcome to Codecov 🎉

Uh oh!

AlSchlo left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented Jan 29, 2025

Uh oh!

Uh oh!

AlSchlo commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yliang412 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

connortsui20 commented Jan 15, 2025 •

edited

Loading

AlSchlo commented Jan 29, 2025 •

edited

Loading