Support Date32/Date64 in unwrap_cast optimization#21665
Support Date32/Date64 in unwrap_cast optimization#21665Dandandan wants to merge 2 commits intoapache:mainfrom
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (e9a740e) to 8dedd12 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (e9a740e) to 8dedd12 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (e9a740e) to 8dedd12 (merge-base) diff using: tpch File an issue against this benchmark runner |
f2d022e to
4d0268f
Compare
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
Add Date32 and Date64 to the supported numeric types in the existing
unwrap_cast_in_comparison optimizer. This allows filters like
CAST(CAST(col AS Int32) AS Date32) >= Date32("2013-07-01")
to be simplified to
col >= UInt16(15887)
eliminating per-row CAST operations. Date32 is internally i32 (days
since epoch) and Date64 is i64 (ms since epoch), so they participate
in numeric comparisons the same way as their integer counterparts.
This affects ClickBench Q36-Q42, which all filter on EventDate
(stored as UInt16, viewed as Date32). Each query previously evaluated
4 CAST operations per row; now it does 0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4d0268f to
df0953c
Compare
|
run benchmark clickbench_partitioned |
|
run benchmarks tpcds tpch |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (df0953c) to 5c653be (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (df0953c) to 5c653be (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (df0953c) to 5c653be (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
run benchmarks clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing unwrap-cast-date-types (df0953c) to 5c653be (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
| | DataType::Decimal32(_, _) | ||
| | DataType::Decimal64(_, _) | ||
| | DataType::Decimal128(_, _) | ||
| | DataType::Timestamp(_, _) |
There was a problem hiding this comment.
It is a different use case but similar to the current one: Is something similar needed also for other temporal types missing here (like Duration and Interval for example) ?
|
|
||
| // Date↔Timestamp casts are lossy (drop time-of-day or add midnight), | ||
| // so unwrapping would change comparison semantics. | ||
| if (is_date_type(&lit_data_type) && target_type.is_temporal()) |
There was a problem hiding this comment.
The comment is about Data<->Timestamp casts but this includes other temporal types like Duration and Interval. Maybe the comment should be extended.
But this also would prevent casting from Date32 to Date64. I think this might be desirable by someone.
Which issue does this PR close?
N/A
Rationale for this change
Filters like
WHERE EventDate >= '2013-07-01'on aUInt16column exposed asDate32via a view produceCAST(CAST(EventDate AS Int32) AS Date32) >= Date32("2013-07-01")— 4 CAST operations per row. The existingunwrap_cast_in_comparisonoptimizer can eliminate these but didn't supportDate32/Date64.What changes are included in this PR?
Add
Date32/Date64tois_supported_numeric_typeand the match arms intry_cast_numeric_literal. They are treated likeInt32/Int64since that's their physical representation (days / ms since epoch).A guard prevents Date↔Timestamp unwrapping, which is lossy (drops time-of-day).
ClickBench Q36-Q42 before:
After:
Are these changes tested?
Existing
caststests pass. ClickBench sqllogictest expectations updated.Are there any user-facing changes?
No.