Skip to content

How to Perform Costing #113

@AlSchlo

Description

@AlSchlo

How to Perform Costing

Physical expressions need to be costed. Their children are either goals or other physical expressions (called goal members). Let's take the following example: EXPR(goal_1, sub_expr_2). To cost that expression, we have multiple approaches:

Approach 1: Recursive Optimal Costing

Recursively optimally cost goal_1 and sub_expr_2. This approach is challenging because:

  • It requires invalidation whenever we get a better expression for goal_1 or sub_expr_2
  • It doesn't ensure a global minimum, as greedy approaches are not always optimal
  • We cannot support physical→physical optimizations (if that turns out to be useful)

Approach 2: Explore All Possibilities

Explore all possibilities and rely on the scheduler to avoid combinatorial explosion. This is more in line with what we do for transformations and implementations. We can define a costing function in the DSL with the following signature:

fn (plan: Physical*) cost(): (f64, Statistics)

Physical* indicates that it is stored, so it has extra guarantees (e.g., all children are ingested). This mirrors what we use for logical implementations and transformations.

f64 is the cost, and Statistics is any user-defined data type (could be ML weights, histograms, etc.).

When we encounter a goal, we expand it and materialize all physical expressions in that goal (and subgoals!). We need new syntax to expand/cost a nested physical expression. Idea: $ postfix, which means "into costed". The left type should be Physical*, which can easily be tested with the type checker.

Approach 3: Final Approach (Best of All Worlds)

// This is a UDF/external function, similar to optimize for implementations
fn (plan: Physical) into_costed(cost: f64, stats: Statistics)
fn (plan: Physical*) cost(): Physical$

In the memo, each physical expression id will have a set of costed expressions:

pid -> {pid + cost + stats}

This approach is excellent because:

  1. It uses the same updating mechanisms as for implementations and explorations (consistent scheduler!)
  2. It allows for further physical→physical transformations
  3. You can do whatever you want when costing an expression! Can go as deep as needed, can choose to recursively cost if desired (or not!)
  4. Can propagate statistics perfectly

Only caveat: Cost pruning has no built-in mechanism, but you can instrument the scheduler.


📧 Contact: Please reach out to [email protected] for more information about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions