-
Notifications
You must be signed in to change notification settings - Fork 9
Description
How to Perform Costing
Physical expressions need to be costed. Their children are either goals or other physical expressions (called goal members). Let's take the following example: EXPR(goal_1, sub_expr_2)
. To cost that expression, we have multiple approaches:
Approach 1: Recursive Optimal Costing
Recursively optimally cost goal_1
and sub_expr_2
. This approach is challenging because:
- It requires invalidation whenever we get a better expression for
goal_1
orsub_expr_2
- It doesn't ensure a global minimum, as greedy approaches are not always optimal
- We cannot support physical→physical optimizations (if that turns out to be useful)
Approach 2: Explore All Possibilities
Explore all possibilities and rely on the scheduler to avoid combinatorial explosion. This is more in line with what we do for transformations and implementations. We can define a costing function in the DSL with the following signature:
fn (plan: Physical*) cost(): (f64, Statistics)
Physical*
indicates that it is stored, so it has extra guarantees (e.g., all children are ingested). This mirrors what we use for logical implementations and transformations.
f64
is the cost, and Statistics
is any user-defined data type (could be ML weights, histograms, etc.).
When we encounter a goal, we expand it and materialize all physical expressions in that goal (and subgoals!). We need new syntax to expand/cost a nested physical expression. Idea: $
postfix, which means "into costed". The left type should be Physical*
, which can easily be tested with the type checker.
Approach 3: Final Approach (Best of All Worlds)
// This is a UDF/external function, similar to optimize for implementations
fn (plan: Physical) into_costed(cost: f64, stats: Statistics)
fn (plan: Physical*) cost(): Physical$
In the memo, each physical expression id will have a set of costed expressions:
pid -> {pid + cost + stats}
This approach is excellent because:
- It uses the same updating mechanisms as for implementations and explorations (consistent scheduler!)
- It allows for further physical→physical transformations
- You can do whatever you want when costing an expression! Can go as deep as needed, can choose to recursively cost if desired (or not!)
- Can propagate statistics perfectly
Only caveat: Cost pruning has no built-in mechanism, but you can instrument the scheduler.
📧 Contact: Please reach out to [email protected] for more information about this.