Skip to content

perf: unwrap cast for comparing ints =/!= strings #15110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 27, 2025

Conversation

alan910127
Copy link
Contributor

@alan910127 alan910127 commented Mar 9, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Add a new function to cast literals to the target type when specific operators are applied.
  • Update unwrap_cast_in_comparison_for_binary and is_cast_expr_and_support_unwrap_cast_in_comparison_for_binary to use the new function.

Are these changes tested?

The following test cases are added:

  • CAST(c1, UTF8) < "123"
  • CAST(c1, UTF8) = "123"
  • CAST(c1, UTF8) != "123"
  • CAST(c6, UTF8) = "123"
  • CAST(c6, UTF8) != "123"

Are there any user-facing changes?

No.

@github-actions github-actions bot added the optimizer Optimizer rules label Mar 9, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @alan910127

I think this is looking quite close. I left a suggestion about how to do the conversion

I also think this needs a few more tests as suggested below.

FYI @jayzhan211 and @ion-elgreco

negative / null cases

Can you also test predicates with null literals and

-- compare with Null literal (ScalarValue::Utf8(None)
cast(c1, Utf8) = NULL
-- compare with strings that are not a number
cast(c1, Utf8) = 'not a number'
-- compare with strings that are numbers but do not fit in the range
-- where c1 is a Int8 so max value of 128, try comparing to a number
-- that is greater than 128
cast(c1, Utf8) = '1000'

"End to end" (sql level)

Can we please add at least one "end to end" style test in sqllogictest style (EXPLAIN ....)

Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest

Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files

| ScalarValue::Utf8View(Some(ref str))
| ScalarValue::LargeUtf8(Some(ref str)),
) => match target_type {
DataType::Int8 => str.parse::<i8>().ok().map(ScalarValue::from),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by calling ok() here I think this converts a predicate like cast(col, utf8) = 'foo' into col = 'null'

Which is not quite the same thing.

Also, str.parse() likely has subltely different semantics than the arrow cast kernels.

This, can can you please make this function:

  1. Return None if the conversion fails (following the convention of other functions in this module like try_cast_dictionary)
  2. Use the cast kernel to parse the values?

The cast kernel can be called by:

  1. Converting ScalarValue to an array with ScalarValue::to_array
  2. Call cast: https://docs.rs/arrow/latest/arrow/array/cast/index.html
  3. Calling ScalarValue::try_from_array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use your suggested casting implementation, could you help me check if I'm implementing it correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also 2 sqllogictest test cases are added (optimized/number not in range)

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 10, 2025
alamb
alamb previously approved these changes Mar 10, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @alan910127 -- this is great

While reviewing this PR I had a few more tests I wanted to suggest, but instead of doing another round of PR reviews I figured I would just push the commits directly to save time.

I also verified that DuckDB does this type of simplification too (note the CAST( │ │ '123' AS STRUCT(x INTEGER) which is casting the string ot int

D create table t (x int);
D insert into t values (1);
D explain select * from t where t = '123';

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│           FILTER          │
│    ────────────────────   │
│  (struct_pack(x) = CAST(  │
│ '123' AS STRUCT(x INTEGER)│
│             ))            │
│                           │
│          ~1 Rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│    ────────────────────   │
│          Table: t         │
│   Type: Sequential Scan   │
│       Projections: x      │
│                           │
│          ~1 Rows          │
└───────────────────────────┘

Strangely they don't seem to care about errors:

D  select * from t where t = '999999999999999999999999999999999999';
Conversion Error:
Type VARCHAR with value '999999999999999999999999999999999999' can't be cast to the destination type STRUCT

LINE 1: select * from t where t = '999999999999999999999999999999999999';

D select * from t where t = 'foo';
Conversion Error:
Type VARCHAR with value 'foo' can't be cast to the destination type STRUCT

LINE 1: select * from t where t = 'foo';
                                  ^
D select * from t;
┌───────┐
│   x   │
│ int32 │
├───────┤
│   1   │
└───────┘

target_type: &DataType,
op: Operator,
) -> Option<ScalarValue> {
macro_rules! cast_or_else_return_none {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ScalarValue is already dynamic, I don't think we need a macro to run on each type

I played around a little locally and this is what I came up with. What do you think?

/// Try to move a cast from a literal to the other side of a `=` / `!=` operator
///
/// Specifically, rewrites
/// ```sql
/// cast(col) <op> <litera>
/// ```
///
/// To
///
/// ```sql
/// col <op> cast(<literal>)
/// col <op> <cast_literal>
/// ```
fn cast_literal_to_type_with_op(
    lit_value: &ScalarValue,
    target_type: &DataType,
    op: Operator,
) -> Option<ScalarValue> {
    let (
        Operator::Eq | Operator::NotEq,
        ScalarValue::Utf8(Some(_))
        | ScalarValue::Utf8View(Some(_))
        | ScalarValue::LargeUtf8(Some(_)),
    ) = (op, lit_value)
    else {
        return None;
    };

    // Only try for integer types (TODO can we do this for other types
    // like timestamps)?
    use DataType::*;
    if matches!(
        target_type,
        Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 | UInt64
    ) {
        let opts = arrow::compute::CastOptions {
            safe: false,
            format_options: Default::default(),
        };

        let array = ScalarValue::to_array(lit_value).ok()?;
        let casted =
            arrow::compute::cast_with_options(&array, target_type, &opts).ok()?;
        ScalarValue::try_from_array(&casted, 0).ok()
    } else {
        None
    }
}

Copy link
Contributor Author

@alan910127 alan910127 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing another round of PR reviews I figured I would just push the commits directly to save time.

Thank you! I’m just curious—is storing a parquet file in slt tests common? The tests I’ve worked with so far use tables created in the same test file.

Strangely they don't seem to care about errors:

Interesting! This makes me wonder if they treat this behavior as an "optimization" or if it's simply an expected behavior to cast the literal to the column's type.

I played around a little locally and this is what I came up with. What do you think?

Oh, this looks much better! However, I’d prefer leaving the outer match as is, maybe there’s potential for another optimization in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they treat this behavior as an "optimization" or if it's simply an expected behavior to cast the literal to the column's type

It looks like my guess might be correct—they always cast the literal to the column's type, regardless of the operator.

D create table t as values (1), (2), (3);
D select * from t where col0 < '10';
┌───────┐
│ col0  │
│ int32 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

@alamb
Copy link
Contributor

alamb commented Mar 10, 2025

@findepi I wonder if you have some time to double check the correctness of this optimization (distributing a cast)

@@ -1758,7 +1758,7 @@ impl<S: SimplifyInfo> TreeNodeRewriter for Simplifier<'_, S> {
// try_cast/cast(expr as data_type) op literal
Expr::BinaryExpr(BinaryExpr { left, op, right })
if is_cast_expr_and_support_unwrap_cast_in_comparison_for_binary(
info, &left, &right,
info, &left, &right, op,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add op between left and right args -- this helps understand the role of the two &Expr args to the function
(or maybe even pass the whole &BinaryExpr into the function)

ScalarValue::Utf8(Some(_))
| ScalarValue::Utf8View(Some(_))
| ScalarValue::LargeUtf8(Some(_)),
) => match target_type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does target_type represent source type of the cast?

@findepi
Copy link
Member

findepi commented Mar 11, 2025

@findepi I wonder if you have some time to double check the correctness of this optimization (distributing a cast)

unwrapping cast is not an easy feat. The current implementation isn't exactly correct, as it conflates different strings which cast back to the same number:

CREATE OR REPLACE TABLE t AS SELECT arrow_cast('123', 'Int64') a;

-- correctly finds the row
SELECT * FROM t WHERE cast(a AS string) = '123';

-- incorrectly also finds the row
SELECT * FROM t WHERE cast(a AS string) = '0123';

@alan910127
Copy link
Contributor Author

@findepi so the test cases in unwrap_cast are not correct, and they're showing that I'm over-unwrapping. Am I understanding this correctly?

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

@findepi so the test cases in unwrap_cast are not correct, and they're showing that I'm over-unwrapping. Am I understanding this correctly?

I think @findepi is saying that the mapping proposed in the PR is more subtle

casting an integer 123 to a string will result in '123'.

It would not match the string '0123'

However, as written this PR will also match '0123' as we will cast 0123 to an integer which results in 123

So in other words, we should not do the conversion from '0123' (string) to 123 (int)

Maybe we can add a check that the string would be the same when we round tripped it

As in only do the rewrite if the sequence

  • cast to int
  • cast (back) to string

results in the exact same string as went int

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

(thank you for checking this @findepi -- 🙏

@findepi
Copy link
Member

findepi commented Mar 11, 2025

Maybe we can add a check that the string would be the same when we round tripped it

As in only do the rewrite if the sequence

  • cast to int
  • cast (back) to string

results in the exact same string as went int

neat

Comment on lines 213 to 230
match (op, lit_value) {
(
Operator::Eq | Operator::NotEq,
ScalarValue::Utf8(Some(_))
| ScalarValue::Utf8View(Some(_))
| ScalarValue::LargeUtf8(Some(_)),
) => match target_type {
DataType::Int8 => cast_or_else_return_none!(lit_value, DataType::Int8),
DataType::Int16 => cast_or_else_return_none!(lit_value, DataType::Int16),
DataType::Int32 => cast_or_else_return_none!(lit_value, DataType::Int32),
DataType::Int64 => cast_or_else_return_none!(lit_value, DataType::Int64),
DataType::UInt8 => cast_or_else_return_none!(lit_value, DataType::UInt8),
DataType::UInt16 => cast_or_else_return_none!(lit_value, DataType::UInt16),
DataType::UInt32 => cast_or_else_return_none!(lit_value, DataType::UInt32),
DataType::UInt64 => cast_or_else_return_none!(lit_value, DataType::UInt64),
_ => None,
},
_ => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's written, and as the function is named, this looks like generic approach that will work many different source/target type pairs, we just need to add logic.
However, the exact checks necessary for soundness will likely differ.

For casts between exact numbers (ints and decimals) and string types, the #15110 (comment) looks like a sound plan.
Plus, we could also simplify impossible target literals like this

  • cast(an_int as string) = 'abc' or cast(an_int as string) = '007' to if(an_int is null, null, false)

For casts between numbers and numbers, we likely need to recognize max addressible values, and cast impreciseness, especially when we want to handle <, <=, >, >= operators.
https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L199-L354 might serve as a good reference. Note how it handles NaN, max values, lossy round-tripping, sometimes converting < to <= as necessary.

For e.g cast from timestamp to date, the logic will likely be different again
https://github.com/trinodb/trino/blob/a870656f406b0f76e97c740659a58554d472994d/core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java#L357-L389.

With all this in mind, I'd suggest structuring this code so it's clear it addresses only the exact number and string case, by checking expr and literal types and delegating to a function that has "int" and "utf8" or "string" in the name.

@alan910127
Copy link
Contributor Author

As in only do the rewrite if the sequence

  • cast to int
  • cast (back) to string

results in the exact same string as went int

@alamb I found that this approach is different from how postgres and duckdb handle this situation:

  • SELECT * FROM t WHERE a = '0123' - they will cast the '0123' to an integer and it will match, so 1 row is returned.
  • SELECT * FROM t WHERE cast(a AS string) = '0123' - they will not cast the '0123', and this is a string comparison, so no rows are returned.
postgres
db=# create table t as select 123 a;
SELECT 1
db=# select * from t where a = '0123';
  a
-----
 123
(1 row)

db=# select * from t where cast(a AS text) = '0123';
 a
---
(0 rows)
duckdb
D create table t as select 123 a;
D select * from t where a = '0123';
┌───────┐
│   a   │
│ int32 │
├───────┤
│  123  │
└───────┘
D select * from t where cast(a AS string) = '0123';
┌────────┐
│   a    │
│ int32  │
├────────┤
│ 0 rows │
└────────┘

I'm not sure if it's only possible to handle it when the cast is created (type coercion?)

@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

I'm not sure if it's only possible to handle it when the cast is created (type coercion?)

That may well be the right thing to do 🤔

It is funny that @scsmithr filed a ticket this morning with a similar symptom:

🤔

@findepi
Copy link
Member

findepi commented Mar 11, 2025

This cast unwrapping is valuable orthogonally to what coercions get applied by the frontend.

And it looks feasible -- the plan in #15110 (comment) IMO should work (for integer and string type pairs).

@scsmithr
Copy link
Contributor

scsmithr commented Mar 11, 2025

I'm not sure if it's only possible to handle it when the cast is created (type coercion?)

That may well be the right thing to do 🤔

It is funny that @scsmithr filed a ticket this morning with a similar symptom:

* [Incorrect cast of integer columns to utf8 when comparing with utf8 constant #15161](https://github.com/apache/datafusion/issues/15161)

🤔

For context, I was running some implicit cast queries on a few different systems to compare outputs, and hit the aforementioned issue with datafusion. "Unexpected" might be a better word than "incorrect", but datafusion is the outlier here.

My take on this:

I don't believe manipulating casts after-the-fact is the right approach.

Given the query select * from t1 where column1 < '10'; where column1 is an Int32, the literal '10' should be treated as unknown, and the preference should be to cast that. The generated predicate should then be column1 < cast('10' as int), and whatever const folding happens will then take care of parsing '10' as an int during planning.

Without a proper "unknown" type, I think it should be sufficient to rank implicitly casting to utf8 last compared to any other implicit cast.

@alan910127
Copy link
Contributor Author

@findepi sorry I didn't notice your comment and I just pushed a new version with the unwrapping logic deleted.

So you think the two optimizations should coexist?

And it looks feasible -- the plan in #15110 (comment) IMO should work

But we have no way to tell if the cast is set by the user or not in the unwrap cast function. Doesn't this matter?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 11, 2025
@alamb
Copy link
Contributor

alamb commented Mar 11, 2025

I don't believe manipulating casts after-the-fact is the right approach.

Given the query select * from t1 where column1 < '10'; where column1 is an Int32, the literal '10' should be treated as unknown, and the preference should be to cast that. The generated predicate should then be column1 < cast('10' as int), and whatever const folding happens will then take care of parsing '10' as an int during planning.

I agree @scsmithr -- For the case in #15161 I agree that changing the coercion rules as you suggest is a better plan than trying to fix it up afterwards

However, I think the optimization in this PR is valuable for cases when the user has ended up with the casts some other way and DataFusion is now optimizing the expression

@findepi sorry I didn't notice your comment and I just pushed a new version with the unwrapping logic deleted.

So you think the two optimizations should coexist?

That is my understanding and also what I think should happen

And it looks feasible -- the plan in #15110 (comment) IMO should work

But we have no way to tell if the cast is set by the user or not in the unwrap cast function. Doesn't this matter?

I don't think the source of the cast matters (for this particular optimization)

@alan910127
Copy link
Contributor Author

I don't think the source of the cast matters (for this particular optimization)

Ah, yes. Since the string -> int -> string is checked and we're only checking equality, then they'll be the same. But I' m not sure if this unwrapping would happen after the string literal coercion.

That is my understanding and also what I think should happen

Got it, should I revert this PR to the unwrap cast optimization or directly combine both the optimization and the literal coercion in this PR?

@jayzhan211
Copy link
Contributor

I don't believe manipulating casts after-the-fact is the right approach.

Given the query select * from t1 where column1 < '10'; where column1 is an Int32, the literal '10' should be treated as unknown, and the preference should be to cast that. The generated predicate should then be column1 < cast('10' as int), and whatever const folding happens will then take care of parsing '10' as an int during planning.

Without a proper "unknown" type, I think it should be sufficient to rank implicitly casting to utf8 last compared to any other implicit cast.

Maybe we can figure out if it is possible to parse to the idea type for scalar, so '123' can be casted to i64 not str and '1.2' can be casted to float not str.

@jayzhan211
Copy link
Contributor

I try a bit and I think there are still some works to be done in how we parse the sql. #15202.

@jayzhan211
Copy link
Contributor

Actually I'm quite curious is string literal really an issue? If we want string, we can have query with quote select * from t1 where column1 < '10';, while if we want numeric, we can have query without quote select * from t1 where column1 < 10;. In practice, when is it beneficial to convert string literal into numeric type? 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize the feedback so far:

I think this PR needs one more tests as suggested by @findepi #15110 (comment)

-- should not remove cast on string
explain select a from t where cast(a as string) = '0100';

Here is a suggestion on making that pass: #15110 (comment)

Maybe we can add @findepi 's analysis in as comments too

I recommend that we file separate tickets for future work (such as supporting other types).

Does this make sense @alan910127 ?

@alamb
Copy link
Contributor

alamb commented Mar 14, 2025

Actually I'm quite curious is string literal really an issue? If we want string, we can have query with quote select * from t1 where column1 < '10';, while if we want numeric, we can have query without quote select * from t1 where column1 < 10;. In practice, when is it beneficial to convert string literal into numeric type? 🤔

I think the core usecase is the one given on #15035

Specifcally when the user is comparing a numeric column to a string literal (date = '202401').

The date column and '202401' need to be converted to a common type, and converting the string to numeric is faster (and can be used to prune parquet predicates, for example)

@alan910127
Copy link
Contributor Author

Does this make sense @alan910127 ?

So we'll keep both the coercion and cast unwrapping optimizations in this PR, is that correct? I'm unsure about "support other types," do you mean in the unwrapping step?

Comment on lines 330 to 333
Operator::Eq | Operator::NotEq,
ScalarValue::Utf8(Some(_))
| ScalarValue::Utf8View(Some(_))
| ScalarValue::LargeUtf8(Some(_)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems duplicated with the function in simplify_expressions, should I extract them into a utility function?

Copy link
Contributor Author

@alan910127 alan910127 Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tho I was wondering If we should care about the operator, since both postgres and duckdb cast the literal regardless of the operator (they even emit an error when the literal cannot be casted)

we do not perform the round-trip casting verification here is also a concern

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alan910127 -- I see this PR but I need to review it with a clear head. I plan to do so tomorrow morning.

It looks exciting in generla

let array = ScalarValue::to_array(lit_value).ok()?;
let casted =
arrow::compute::cast_with_options(&array, target_type, &opts).ok()?;
ScalarValue::try_from_array(&casted, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this code doesn't check for round-tripping as the other code in unwrap_cast.rs does?
Also, do we need to have the same/similar logic in two files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi thank you for the review. This file is focused on avoiding casts being applied to the columns, and I was trying to emulate the behavior of duckdb and postgres, where they always cast the literal and even throw an error if the literal cannot be casted. So I was wondering whether we should even take the operator into account here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be enough to add logic in the other file.
What coercions get applied is the SQL dialect-specific, query planner-dependent.
What optimizations get applied -- is the common optimizer brain all datafusion users end up using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean the "prevent cast" part should be removed, is that correct?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alan910127 for your patience. I am struggling to understand the implications of this PR and the changes -- while there may not be a large amount of code, I think what code there is is tricky and has potential correctness implications (e.g. @findepi 's comments #15035)

Thus I would like to request we break this PR into 2 smaller PRs for easier review and analysis:

  1. The changes for Avoid casting columns when comparing ints and strings #15035 (only to expr simplifier, not to type coercsion)
  2. The changes to type coercion (which I think handle Incorrect cast of integer columns to utf8 when comparing with utf8 constant #15161)

Perhaps you can revert the changes to type_coercsion.rs in this PR so we can get it merged and then make a follow on PR to address #15161 ?

THank you again so much for bearing with us

@alamb alamb dismissed their stale review March 25, 2025 20:40

PR now has more changes than when initially approved. Let's simplify and try again

@alan910127
Copy link
Contributor Author

Perhaps you can revert the changes to type_coercsion.rs in this PR so we can get it merged and then make a follow on PR to address #15161 ?

OK!

@github-actions github-actions bot removed the core Core DataFusion crate label Mar 26, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alan910127 -- I think this is a really nice improvement. Thanks for sticking with it

@alan910127
Copy link
Contributor Author

@alamb @findepi Thanks for your reviews and helps!

@alamb
Copy link
Contributor

alamb commented Mar 27, 2025

🚀

@alamb alamb merged commit 26c0c49 into apache:main Mar 27, 2025
27 checks passed
@alan910127 alan910127 deleted the perf/unwrap_cast_for_int_eq_string branch March 28, 2025 19:25
qstommyshu pushed a commit to qstommyshu/datafusion that referenced this pull request Mar 28, 2025
* perf: unwrap cast for comparing ints =/!= strings

* fix: update casting logic

* test: add more unit test and new sqllogictest

* Tweak slt tests

* Revert "perf: unwrap cast for comparing ints =/!= strings"

This reverts commit 808d6ab.

* fix: eliminate column cast and cast literal before coercion

* fix: physical expr coercion test

* feat: unwrap cast after round-trip cast verification

* fix: unwrap cast on round-trip cast stable strings

* revert: remove avoid cast changes

* refactor: apply review suggestions

---------

Co-authored-by: Andrew Lamb <[email protected]>
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* perf: unwrap cast for comparing ints =/!= strings

* fix: update casting logic

* test: add more unit test and new sqllogictest

* Tweak slt tests

* Revert "perf: unwrap cast for comparing ints =/!= strings"

This reverts commit 808d6ab.

* fix: eliminate column cast and cast literal before coercion

* fix: physical expr coercion test

* feat: unwrap cast after round-trip cast verification

* fix: unwrap cast on round-trip cast stable strings

* revert: remove avoid cast changes

* refactor: apply review suggestions

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid casting columns when comparing ints and strings
5 participants