Skip to content

Commit

Permalink
Type Model (#4)
Browse files Browse the repository at this point in the history
This PR models almost all of the types that will be necessary for
optimization. This includes:

- generic relational algebra operators that allow us to use the same
"type" for both expressions in the memo table and operators in the plans
- logical / physical plans
- scalar operators and expressions
- partially materialized logical plans for rule binding
- transformation rule + implementation rule trait and some empty structs
that implement them

I've named the crate itself `optd-core`. This can be subject to change,
but I feel this is a reasonable default for now.

~~TODO: need to wait on #3 and #12 to be merged before proper CI checks
can happen~~

Edit: I removed the `cargo rustdoc` check because its creating more
problem than it would solve, see #14

---------

Co-authored-by: Alexis Schlomer <[email protected]>
  • Loading branch information
connortsui20 and AlSchlo authored Jan 29, 2025
1 parent 6ffbc5a commit c41cf41
Show file tree
Hide file tree
Showing 44 changed files with 663 additions and 39 deletions.
17 changes: 0 additions & 17 deletions .github/workflows/check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,23 +75,6 @@ jobs:
# components: rustfmt
# - name: cargo-semver-checks
# uses: obi1kenobi/cargo-semver-checks-action@v2
doc:
# run docs generation on nightly rather than stable. This enables features like
# https://doc.rust-lang.org/beta/unstable-book/language-features/doc-cfg.html which allows an
# API be documented as only available in some specific platforms.
runs-on: ubuntu-latest
name: nightly / doc
steps:
- uses: actions/checkout@v4
with:
submodules: true
- name: Install nightly
uses: dtolnay/rust-toolchain@nightly
- name: Install cargo-docs-rs
uses: dtolnay/install@cargo-docs-rs
- name: cargo docs-rs
# TODO: Once we figure out the crates, rename this.
run: cargo docs-rs -p optd-tmp
hack:
# cargo-hack checks combinations of feature flags to ensure that features are all additive
# which is required for feature unification
Expand Down
52 changes: 51 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[workspace]
members = ["optd-tmp"]
members = ["optd-core"]
resolver = "2"
10 changes: 10 additions & 0 deletions optd-core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[package]
name = "optd-core"
version = "0.1.0"
edition = "2021"

[dependencies]
trait-variant = "0.1.2"

# Pin more recent versions for `-Zminimal-versions`.
proc-macro2 = "1.0.60" # For a missing feature (https://github.com/rust-lang/rust/issues/113152).
17 changes: 17 additions & 0 deletions optd-core/src/expression.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
//! Types for logical and physical expressions in the optimizer.
use crate::memo::GroupId;
use crate::operator::relational::logical::LogicalOperator;
use crate::operator::relational::physical::PhysicalOperator;

/// A logical expression in the memo table.
///
/// References children using [`GroupId`]s for expression sharing
/// and memoization.
pub type LogicalExpression = LogicalOperator<GroupId, GroupId>;

/// A physical expression in the memo table.
///
/// Like [`LogicalExpression`] but with specific implementation
/// strategies.
pub type PhysicalExpression = PhysicalOperator<GroupId, GroupId>;
5 changes: 5 additions & 0 deletions optd-core/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pub mod expression;
pub mod memo;
pub mod operator;
pub mod plan;
pub mod rules;
65 changes: 65 additions & 0 deletions optd-core/src/memo.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
//! Memo table implementation for query optimization.
//!
//! The memo table is a core data structure that stores expressions and their logical equivalences
//! during query optimization. It serves two main purposes:
//!
//! - Avoiding redundant optimization by memoizing already explored expressions
//! - Grouping logically equivalent expressions together to enable rule-based optimization
//!
//! # Structure
//!
//! - Each unique expression is assigned an expression ID (either [`LogicalExpressionId`],
//! [`PhysicalExpressionId`], or [`ScalarExpressionId`])
//! - Logically equivalent expressions are grouped together under a [`GroupId`]
//! - Logically equivalent scalar expressions are grouped toegether under a [`ScalarGroupId`]
//!
//! # Usage
//!
//! The memo table provides methods to:
//! - Add new expressions and get their IDs
//! - Add expressions to existing groups
//! - Retrieve expressions in a group
//! - Look up group membership of expressions
//! - Create new groups for expressions
use crate::expression::LogicalExpression;

/// A unique identifier for a logical expression in the memo table.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct LogicalExpressionId(u64);

/// A unique identifier for a physical expression in the memo table.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct PhysicalExpressionId(u64);

/// A unique identifier for a scalar expression in the memo table.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct ScalarExpressionId(u64);

/// A unique identifier for a group of relational expressions in the memo table.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct GroupId(u64);

/// A unique identifier for a group of scalar expressions in the memo table.
#[repr(transparent)]
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct ScalarGroupId(u64);

/// TODO(alexis) Add fields & link to storage layer.
pub struct Memo;

/// TODO(alexis) Stabilize API by first expanding the Python code.
impl Memo {
/// TODO(alexis) Add docs.
pub async fn add_logical_expr_to_group(
&mut self,
_group_id: GroupId,
_logical_expr: LogicalExpression,
) -> LogicalExpressionId {
todo!()
}
}
5 changes: 5 additions & 0 deletions optd-core/src/operator/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
//! This module contains type definitions related to query plan operators, both relational (logical
//! / physical) and scalar.
pub mod relational;
pub mod scalar;
8 changes: 8 additions & 0 deletions optd-core/src/operator/relational/logical/filter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/// Logical filter operator that selects rows matching a condition.
///
/// Takes input relation (`Relation`) and filters rows using a boolean predicate (`Scalar`).
#[derive(Clone)]
pub struct Filter<Relation, Scalar> {
pub child: Relation,
pub predicate: Scalar,
}
11 changes: 11 additions & 0 deletions optd-core/src/operator/relational/logical/join.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/// Logical join operator that combines rows from two relations.
///
/// Takes left and right relations (`Relation`) and joins their rows using a join condition
/// (`Scalar`).
#[derive(Clone)]
pub struct Join<Relation, Scalar> {
pub join_type: String,
pub left: Relation,
pub right: Relation,
pub condition: Scalar,
}
31 changes: 31 additions & 0 deletions optd-core/src/operator/relational/logical/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
//! Type definitions of logical operators in `optd`.
pub mod filter;
pub mod join;
pub mod project;
pub mod scan;

use filter::Filter;
use join::Join;
use project::Project;
use scan::Scan;

/// Each variant of `LogicalOperator` represents a specific kind of logical operator.
///
/// This type is generic over two types:
/// - `Relation`: Specifies whether the children relations are other logical operators or a group id.
/// - `Scalar`: Specifies whether the children scalars are other scalar operators or a group id.
///
/// This makes it possible to reuse the `LogicalOperator` type in [`LogicalPlan`],
/// [`PartialLogicalPlan`], and [`LogicalExpression`].
///
/// [`LogicalPlan`]: crate::plan::logical_plan::LogicalPlan
/// [`PartialLogicalPlan`]: crate::plan::partial_logical_plan::PartialLogicalPlan
/// [`LogicalExpression`]: crate::expression::LogicalExpression
#[derive(Clone)]
pub enum LogicalOperator<Relation, Scalar> {
Scan(Scan<Scalar>),
Filter(Filter<Relation, Scalar>),
Project(Project<Relation, Scalar>),
Join(Join<Relation, Scalar>),
}
9 changes: 9 additions & 0 deletions optd-core/src/operator/relational/logical/project.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/// Logical project operator that specifies output columns.
///
/// Takes input relation (`Relation`) and defines output columns/expressions
/// (`Scalar`).
#[derive(Clone)]
pub struct Project<Relation, Scalar> {
pub child: Relation,
pub fields: Vec<Scalar>,
}
9 changes: 9 additions & 0 deletions optd-core/src/operator/relational/logical/scan.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/// Logical scan operator that reads from a base table.
///
/// Reads from table (`String`) and optionally filters rows using a pushdown predicate
/// (`Scalar`).
#[derive(Clone)]
pub struct Scan<Scalar> {
pub table_name: String, // TODO(alexis): Mocked for now.
pub predicate: Option<Scalar>,
}
4 changes: 4 additions & 0 deletions optd-core/src/operator/relational/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
//! Type definitions for relational (logical and physical) operators.
pub mod logical;
pub mod physical;
10 changes: 10 additions & 0 deletions optd-core/src/operator/relational/physical/filter/filter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
/// Physical filter operator that applies a boolean predicate to filter input rows.
///
/// Takes a child operator (`Relation`) providing input rows and a predicate expression
/// (`Scalar`) that evaluates to true/false. Only rows where predicate is true
/// are emitted.
#[derive(Clone)]
pub struct Filter<Relation, Scalar> {
pub child: Relation,
pub predicate: Scalar,
}
2 changes: 2 additions & 0 deletions optd-core/src/operator/relational/physical/filter/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#[allow(clippy::module_inception)]
pub mod filter;
14 changes: 14 additions & 0 deletions optd-core/src/operator/relational/physical/join/hash_join.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/// Hash-based join operator that matches rows based on equality conditions.
///
/// Takes left and right input relations (`Relation`) and joins their rows using
/// a join condition (`Scalar`). Builds hash table from build side (right)
/// and probes with rows from probe side (left).
#[derive(Clone)]
pub struct HashJoin<Relation, Scalar> {
pub join_type: String,
/// Left relation that probes hash table.
pub probe_side: Relation,
/// Right relation used to build hash table.
pub build_side: Relation,
pub condition: Scalar,
}
13 changes: 13 additions & 0 deletions optd-core/src/operator/relational/physical/join/merge_join.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/// Merge join operator that matches rows based on equality conditions.
///
/// Takes sorted left and right relations (`Relation`) and joins their rows using
/// a join condition (`Scalar`). Both inputs must be sorted on join keys.
#[derive(Clone)]
pub struct MergeJoin<Relation, Scalar> {
pub join_type: String,
/// Left sorted relation.
pub left_sorted: Relation,
/// Right sorted relation.
pub right_sorted: Relation,
pub condition: Scalar,
}
3 changes: 3 additions & 0 deletions optd-core/src/operator/relational/physical/join/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pub mod hash_join;
pub mod merge_join;
pub mod nested_loop_join;
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/// Nested-loop join operator that matches rows based on a predicate.
///
/// Takes outer and inner relations (`Relation`) and joins their rows using
/// a join condition (`Scalar`). Scans inner relation for each outer row.
#[derive(Clone)]
pub struct NestedLoopJoin<Relation, Scalar> {
pub join_type: String,
/// Outer relation.
pub outer: Relation,
/// Inner relation scanned for each outer row.
pub inner: Relation,
pub condition: Scalar,
}
33 changes: 33 additions & 0 deletions optd-core/src/operator/relational/physical/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
//! Type definitions of physical operators in optd.
pub mod filter;
pub mod join;
pub mod project;
pub mod scan;

use filter::filter::Filter;
use join::{hash_join::HashJoin, merge_join::MergeJoin, nested_loop_join::NestedLoopJoin};
use project::project::Project;
use scan::table_scan::TableScan;

/// Each variant of `PhysicalOperator` represents a specific kind of physical operator.
///
/// This type is generic over two types:
/// - `Relation`: Specifies whether the children relations are other physical operators or a group
/// id.
/// - `Scalar`: Specifies whether the children scalars are other scalar operators or a group id.
///
/// This makes it possible to reuse the `PhysicalOperator` type in [`PhysicalPlan`]
/// and [`PhysicalExpression`].
///
/// [`PhysicalPlan`]: crate::plan::physical_plan::PhysicalPlan
/// [`PhysicalExpression`]: crate::expression::PhysicalExpression
#[derive(Clone)]
pub enum PhysicalOperator<Relation, Scalar> {
TableScan(TableScan<Scalar>),
Filter(Filter<Relation, Scalar>),
Project(Project<Relation, Scalar>),
HashJoin(HashJoin<Relation, Scalar>),
NestedLoopJoin(NestedLoopJoin<Relation, Scalar>),
SortMergeJoin(MergeJoin<Relation, Scalar>),
}
2 changes: 2 additions & 0 deletions optd-core/src/operator/relational/physical/project/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#[allow(clippy::module_inception)]
pub mod project;
9 changes: 9 additions & 0 deletions optd-core/src/operator/relational/physical/project/project.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/// Column projection operator that transforms input rows.
///
/// Takes input relation (`Relation`) and projects columns/expressions (`Scalar`)
/// to produce output rows with selected/computed fields.
#[derive(Clone)]
pub struct Project<Relation, Scalar> {
pub child: Relation,
pub fields: Vec<Scalar>,
}
1 change: 1 addition & 0 deletions optd-core/src/operator/relational/physical/scan/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pub mod table_scan;
9 changes: 9 additions & 0 deletions optd-core/src/operator/relational/physical/scan/table_scan.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/// Table scan operator that reads rows from a base table
///
/// Reads from table (`String`) and optionally filters rows using
/// a pushdown predicate (`Scalar`).
#[derive(Clone)]
pub struct TableScan<Scalar> {
pub table_name: String, // TODO(alexis): Mocked for now.
pub predicate: Option<Scalar>,
}
6 changes: 6 additions & 0 deletions optd-core/src/operator/scalar/add.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/// Addition expression for scalar values.
#[derive(Clone)]
pub struct Add<Scalar> {
pub left: Scalar,
pub right: Scalar,
}
Loading

0 comments on commit c41cf41

Please sign in to comment.