Skip to content

Commit

Permalink
refactor and make section for each term
Browse files Browse the repository at this point in the history
  • Loading branch information
connortsui20 committed Jan 17, 2025
1 parent e5bd30e commit 0a538af
Showing 1 changed file with 117 additions and 62 deletions.
179 changes: 117 additions & 62 deletions docs/src/architecture/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,25 @@ document defines key names and definitions for concepts that are required in opt
Many of the names and definitions will be inspired by the Cascades framework. However, there are a
few important differences that need to be addressed considering our memo table will be persistent.

# Terms

- [Memo Table](#memo-table)
- [Group](#group)
- Relational Group
- Scalar Group
- [Expression](#expression-logical-physical-scalar)
- Relational Expression
- Logical Expression
- Physical Expression
- Scalar Expression
- [Plan](#logical--physical-plans)
- Logical Plan
- Physical Plan
- [Operator / Plan Node](#operators)
- Logical Operator
- Physical Operator
- Scalar Operator
# Contents

- [Memo Table]
- [Expression]
- [Relational Expression]
- [Logical Expression]
- [Physical Expression]
- [Scalar Expression]
- **[Equivalence of Expressions](#expression-equivalence)**
- [Group]
- [Relational Group]
- [Scalar Group]
- [Plan]
- [Logical Plan]
- [Physical Plan]
- [Operator] / [Plan Node]
- [Logical Operator]
- [Physical Operator]
- [Scalar Operator]
- Property
- Logical Property
- Physical Property
Expand All @@ -32,70 +33,106 @@ few important differences that need to be addressed considering our memo table w
- Transformation Rule
- Implementation Rule

[Memo Table]: #memo-table
[Expression]: #expression-logical-physical-scalar
[Relational Expression]: #relational-expression
[Logical Expression]: #logical-expression
[Physical Expression]: #physical-expression
[Scalar Expression]: #scalar-expression
[Group]: #group
[Relational Group]: #relational-group
[Scalar Group]: #scalar-group
[Plan]: #query-plan
[Logical Plan]: #logical-plan
[Physical Plan]: #physical-plan
[Plan Node]: #operator
[Operator]: #operator
[Logical Operator]: #logical-operator
[Physical Operator]: #physical-operator
[Scalar Operator]: #scalar-operator

# Comparison with Cascades

In the Cascades framework, an expression is a tree of operators. In `optd`, we are instead defining
a Plan to be a tree or DAG of operators. An Expression in `optd` strictly refers to the
representation of an operator in the Memo Table, not in plans.
a logical or physical query [Plan] to be a tree or DAG of [Operator]s. An expression in `optd`
strictly refers to the representation of an operator in the [Memo Table], not in query plans.

See the [section below](#expression-logical-physical-scalar) for more information.
See the [section below](#expression-logical-physical-scalar) on the kinds of expressions for more
information.

Most other terms in `optd` are similar to Cascades or self-explanatory.
Most other terms in `optd` are similar to Cascades or are self-explanatory.

# Memo Table Terms

This section describes names and definitions of concepts related to the **Memo Table**.
This section describes names and definitions of concepts related to the memo table.

## Memo Table

The **Memo Table** is the data structure used for dynamic programming in a top-down plan enumeration
The memo table is the data structure used for dynamic programming in a top-down plan enumeration
search algorithm. The memo table consists of a mutually recursive data structure made up of
**Expressions** and **Groups**.
[Expression]s and [Group]s.

## Expression (Logical, Physical, Scalar)

An **Expression** is the representation of an operator **inside of the Memo Table**.
An **expression** is the representation of a non-materialized operator _inside_ of the [Memo Table].

<details>
There are 2 types of expressions: [Relational Expression]s and [Scalar Expression]s. A [Relational
Expression] can be either a [Logical Expression] or a [Physical Expression].

<summary> Types of Expressions </summary>
Note that different kinds of expressions can have the same names as [Operator]s or [Plan Node]s, but
expressions solely indicate non-materialized relational or scalar operators in the [Memo Table].

There are two types of Expressions, **Relational Expressions** and **Scalar Expressions**. A
Relational Expression can be either a **Logical Expression** or a **Physical Expression**.
Operators outside of the [Memo Table] should _**not**_ be referred to as expressions, and should
instead be referred to as [Operator]s or [Plan Node]s.

Examples of Logical Expressions include Logical Scan, Logical Join, or Logical Sort Expressions.
Notably, when we refer to an expression, _we are specifically talking about the representation of_
_operators inside the memo table_. A logical operator from an incoming logical plan should _not_
be called an [Logical Expression], and similarly a physical execution operator in the final output
physical plan should also _not_ be called an [Physical Expression].

Examples of Physical Expressions include Table Scan, Hash Join, or Sort Merge Join.
Another way to think about this is that expressions are _not_ materialized, and plan nodes and
operators inside query plans _are_ materialized. Operators inside of query plans (both logical and
physical) should be referred to as either logical or physical [Operator]s or logical or physical
[Plan Node]s.

Examples of Scalar Expressions include the expressions `t1.a < 42` or `t1.b = t2.c`.
Another key difference between expressions and [Plan Node]s is that expressions have 0 or more
**Group Identifiers** as children, and [Plan Node]s have 0 or more other [Plan Node]s as children.

Note that different kinds of Expressions can be named the same as Operators or Plan Nodes, but
Expressions solely indicate objects in the Memo Table.
## Relational Expression

</details>
A relational expression is either a [Logical Expression] or a [Physical Expression].

<br>
When we say "relational", we mean representations of operations in the relational algebra of SQL.

Operators _outside of the memo table_ should _**not**_ be referred to as Expressions, and should
be referred to as **Operators** or **Plan Nodes**.
Relational expressions differ from [Scalar Expression]s in that the result of algebraically
evaluating a relational expression produces a bag of tuples instead of a single scalar value.

<details>
See the following sections for more information.

<summary> Expressions vs Operators / Plan Nodes </summary>
## Logical Expression

Notably, when we refer to an Expression, _we are specifically talking about the representation of_
_operators inside the memo table_. A logical operator from an incoming logical plan should _not_
be called an Logical Expression, and similarly a physical execution operator in the final output
physical plan should also _not_ be called an Physical Expression.
A logical expression is a version of a [Relational Expression].

TODO(connor) Add more details.

Examples of logical expressions include Logical Scan, Logical Join, or Logical Sort expressions
(which can just be shorthanded to Scan, Join, or Sort).

Those should be referred to as **Logical/Physical Operators** or **Logical/Physical Plan Nodes**.
## Physical Expression

Another key difference between Plan Nodes and Expressions is that Expressions have 0 or more
**Group Identifiers** as children, not other Plan Nodes as children.
A physical expression is a version of a [Relational Expression].

</details>
TODO(connor) Add more details.

<br>
Examples of Physical Expressions include Table Scan, Index Scan, Hash Join, or Sort Merge Join.

## Scalar Expression

A scalar expression is a version of an [Expression].

TODO(everyone) Figure out the semantics of what a scalar expression really is.

Examples of Scalar Expressions include the expressions `t1.a < 42` or `t1.b = t2.c`.

## Expression Equivalence

Expand All @@ -111,11 +148,7 @@ Physical Expression has the same Logical Property and delivers the Physical Prop

## Group

A **Group** is a set of equivalent **Expressions**.

<details>

<summary> Types of Groups </summary>
A **Group** is a set of equivalent [Expression]s.

We follow the definition of groups in the Volcano and Cascades frameworks. From the EQOP Microsoft
article (Section 2.2, page 205):
Expand All @@ -124,36 +157,58 @@ article (Section 2.2, page 205):
> and all equivalent expressions within the class are called _group expressions_ or simply
> _expressions_.
A **Relational Group** is a set of 1 or more equivalent Logical Expressions and 0 or more equivalent
Physical Expressions.
## Relational Group

A relational group is a set of 1 or more equivalent [Logical Expression]s and 0 or more equivalent
[Physical Expression]s.

TODO(connor) Add more details.

A **Scalar Group** consists of equivalent Scalar Expressions.
TODO(connor) Add example.

</details>
## Scalar Group

<br>
A scalar group consists of equivalent [Scalar Expression]s.

TODO(connor) Add more details.

TODO(connor) Add example.

# Plan Enumeration and Search Concepts

This section describes names and definitions of concepts related to the general plan enumeration and
search of optimal query plans.

## Logical / Physical Plans
## Query Plan

TODO

## Logical Plan

A **Logical Plan** is a tree or DAG of **Logical Operators** that can be evaluated to produce a bag
of tuples. This can also be referred to as a Logical Query Plan. The Operators that make up this
Logical Plan can be considered Logical Plan Nodes.

## Physical Plan

A **Physical Plan** is a tree or DAG of **Physical Operators** that can be evaluated by an execution
engine to produce a table. This can also be referred to as a Physical Query Plan. The Operators that
make up this Physical Plan can be considered Physical Plan Nodes.

## Operators
## Operator

TODO

## Logical Operator

A **Logical Operator** is a node in a Logical Plan (which is a tree or DAG).

## Physical Operator

A **Physical Operator** is a node in a Physical Plan (which is a tree or DAG).

## Scalar Operator

A **Scalar Operator** describes an operation that can be evaluated to obtain a single value. This
can also be referred to as a SQL expression, a row expression, or a SQL predicate.

Expand Down

0 comments on commit 0a538af

Please sign in to comment.