How to Perform Costing

## How to Perform Costing

Physical expressions need to be costed. Their children are either goals or other physical expressions (called goal members). Let's take the following example: `EXPR(goal_1, sub_expr_2)`. To cost that expression, we have multiple approaches:

### Approach 1: Recursive Optimal Costing
Recursively optimally cost `goal_1` and `sub_expr_2`. This approach is challenging because:
- It requires invalidation whenever we get a better expression for `goal_1` or `sub_expr_2`
- It doesn't ensure a global minimum, as greedy approaches are not always optimal
- We cannot support physical→physical optimizations (if that turns out to be useful)

### Approach 2: Explore All Possibilities
Explore all possibilities and rely on the scheduler to avoid combinatorial explosion. This is more in line with what we do for transformations and implementations. We can define a costing function in the DSL with the following signature:

```
fn (plan: Physical*) cost(): (f64, Statistics)
```

`Physical*` indicates that it is stored, so it has extra guarantees (e.g., all children are ingested). This mirrors what we use for logical implementations and transformations.

`f64` is the cost, and `Statistics` is any user-defined data type (could be ML weights, histograms, etc.).

When we encounter a goal, we expand it and materialize all physical expressions in that goal (and subgoals!). We need new syntax to expand/cost a nested physical expression. **Idea**: `$` postfix, which means "into costed". The left type should be `Physical*`, which can easily be tested with the type checker.

### Approach 3: Final Approach (Best of All Worlds)

```
// This is a UDF/external function, similar to optimize for implementations
fn (plan: Physical) into_costed(cost: f64, stats: Statistics)
```

```
fn (plan: Physical*) cost(): Physical$
```

In the memo, each physical expression id will have a set of costed expressions:
```
pid -> {pid + cost + stats}
```

This approach is excellent because:
1. It uses the same updating mechanisms as for implementations and explorations (consistent scheduler!)
2. It allows for further physical→physical transformations
3. You can do whatever you want when costing an expression! Can go as deep as needed, can choose to recursively cost if desired (or not!)
4. Can propagate statistics perfectly

**Only caveat**: Cost pruning has no built-in mechanism, but you can instrument the scheduler.

---

**📧 Contact**: Please reach out to aschlome@cs.cmu.edu for more information about this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Perform Costing #113