Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Commit 932521f

Browse files
feat: core filter selectivity (#81)
"Core" means I am handling all cases where we don't fall back to hardcoded defaults (e.g. missing statistics, "col = col", "col = subquery", etc.). Note the difference between this PR and [hello world selectivity](https://github.com/cmu-db/optd/pull/70). In this PR, the only file that changed is `base_cost.rs`, because all the "infrastructure" was already set up in "hello world selectivity" I have written 20 unit tests to test this core logic in a variety of scenarios (nulls vs no nulls, value in MCVs vs not in MCVs, reversing the order of children, etc.) I chose to write code-based unit tests instead of SQLPlanner-like tests because it allows fine-grained control of the expression tree down to the order of children (and because it takes a lot more work to modify SQLPlanner to output cardinality). I made an effort to make the unit tests less brittle by defining helper functions such as `bin_op()` or `const_i32()` for constructing expression trees. If our expression tree representation changes in the future, it is likely that only these helper functions need to be changed to make the unit tests work again. Some TPC-H queries require "non-core" logic, so this code doesn't currently run with all of TPC-H. To avoid crashing, I simply return INVALID_SELECTIVITY (0.001) for any branches that aren't a part of the "core". I will handle "non-core" logic in a future PR. Another case not being handled is comparisons (</<=/>=/>) between `Value` objects. Handling this requires further discussion because not all Values have a meaningful "order". In the meantime, I hardcoded converting all `Value` objects to `i32` to perform comparisons.
1 parent ced5c32 commit 932521f

File tree

4 files changed

+928
-81
lines changed

4 files changed

+928
-81
lines changed

Cargo.lock

+7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

ci.sh

+6-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,12 @@
22
# runs the stuff in CI.yaml locally
33
# unfortunately this needs to be updated manually. just update it if you get annoyed by GHAs failing
44

5-
set -ex
5+
set -e
66

77
cargo fmt --all -- --check
88
cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
9-
cargo test --no-fail-fast --workspace --all-features --locked
9+
cargo test --no-fail-fast --workspace --all-features --locked
10+
11+
# %s is a workaround because printing --- doesn"t work in some shells
12+
# this just makes it more obvious when the CI has passed
13+
printf "%s\n| \033[32m\033[1mCI PASSED\033[0m |\n%s\n" "-------------" "-------------"

optd-datafusion-repr/Cargo.toml

+1
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,4 @@ camelpaste = "0.1"
2020
datafusion-expr = "32.0.0"
2121
async-trait = "0.1"
2222
datafusion = "32.0.0"
23+
assert_approx_eq = "1.1.0"

0 commit comments

Comments
 (0)