-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix!: incorrect coercion when comparing with string literals #15482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -290,12 +290,31 @@ impl<'a> TypeCoercionRewriter<'a> { | |
right: Expr, | ||
right_schema: &DFSchema, | ||
) -> Result<(Expr, Expr)> { | ||
let (left_type, right_type) = BinaryTypeCoercer::new( | ||
&left.get_type(left_schema)?, | ||
&op, | ||
&right.get_type(right_schema)?, | ||
) | ||
.get_input_types()?; | ||
let left_type = left.get_type(left_schema)?; | ||
let right_type = right.get_type(right_schema)?; | ||
|
||
match (&left, &right) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am surprise this code is only triggered for literals -- I think coercion is supposed to happen based on data types not on expressions. Among other things, this code won't handle expressions ( I think we should change the base coercion rules for types There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think both are valid -- duckdb supports something like |
||
(left, Expr::Literal(ref lit_value)) => { | ||
if let Some(casted) = | ||
try_cast_literal_to_type_with_op(&left_type, op, lit_value)? | ||
{ | ||
return Ok((left.clone(), Expr::Literal(casted))); | ||
} | ||
} | ||
(Expr::Literal(ref lit_value), right) if op.swap().is_some() => { | ||
if let Some(casted) = try_cast_literal_to_type_with_op( | ||
&right_type, | ||
op.swap().unwrap(), | ||
lit_value, | ||
)? { | ||
return Ok((Expr::Literal(casted), right.clone())); | ||
} | ||
} | ||
_ => {} | ||
} | ||
|
||
let (left_type, right_type) = | ||
BinaryTypeCoercer::new(&left_type, &op, &right_type).get_input_types()?; | ||
|
||
Ok(( | ||
left.cast_to(&left_type, left_schema)?, | ||
|
@@ -304,6 +323,33 @@ impl<'a> TypeCoercionRewriter<'a> { | |
} | ||
} | ||
|
||
fn try_cast_literal_to_type_with_op( | ||
target_type: &DataType, | ||
op: Operator, | ||
lit_value: &ScalarValue, | ||
) -> Result<Option<ScalarValue>> { | ||
match (op, lit_value) { | ||
( | ||
op, | ||
ScalarValue::Utf8(_) | ScalarValue::Utf8View(_) | ScalarValue::LargeUtf8(_), | ||
) if op.supports_propagation() => { | ||
if target_type.is_numeric() { | ||
let Ok(casted) = lit_value.cast_to(target_type) else { | ||
return plan_err!( | ||
"Cannot coerce '{}' to type '{}'", | ||
lit_value, | ||
target_type | ||
); | ||
}; | ||
return Ok(Some(casted)); | ||
} | ||
|
||
Ok(None) | ||
} | ||
_ => Ok(None), | ||
} | ||
} | ||
|
||
impl TreeNodeRewriter for TypeCoercionRewriter<'_> { | ||
type Node = Expr; | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)] | |||||||||||
query TT | ||||||||||||
explain select a from t where a = '99999999999'; | ||||||||||||
---- | ||||||||||||
Comment on lines
230
to
232
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I imagine that instead of returning an empty plan, we should be expecting a runtime error here, something like
Suggested change
I do not see any instance in the sql logic tests that are actually expecting an empty plan upon an error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is likely an EXPLAIN-related issue, but returning an error when no plan could be generated seems like a reasonable approach. I'm not entirely sure why it's implemented this way, maybe we should call for help 😆 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error Perhaps we can file a ticket to track it I think we should update the test to just run the query (and expect an error) rather than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EDIT: @alamb > create table t as select CAST(123 AS int) a;
0 row(s) fetched.
Elapsed 0.021 seconds.
> select * from t where a = '9999999999';
type_coercion
caused by
Error during planning: Cannot coerce '9999999999' to type 'Int32'
> explain select * from t where a = '9999999999';
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ FilterExec │ |
| | │ -------------------- │ |
| | │ predicate: │ |
| | │ a = 9999999999 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 160 │ |
| | │ format: memory │ |
| | │ rows: 1 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (in case anyone else find this, you can get the full plan via |
||||||||||||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")] | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is there no plan? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought it's because it returns a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's file a ticket about the explain plan being missing |
||||||||||||
|
||||||||||||
# The predicate should still have the column cast when the value is a NOT valid i32 | ||||||||||||
query TT | ||||||||||||
explain select a from t where a = '99.99'; | ||||||||||||
---- | ||||||||||||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99.99")] | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the comments in this file are now out of date -- so perhaps we can update them |
||||||||||||
|
||||||||||||
|
||||||||||||
# The predicate should still have the column cast when the value is a NOT valid i32 | ||||||||||||
query TT | ||||||||||||
explain select a from t where a = ''; | ||||||||||||
---- | ||||||||||||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("")] | ||||||||||||
|
||||||||||||
|
||||||||||||
# The predicate should not have a column cast when the operator is = or != and the literal can be round-trip casted without losing information. | ||||||||||||
query TT | ||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this test is still needed since the literal casting behavior is not considered an "optimization"