-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
When executing a hash join with multiple join keys where one column is dictionary-encoded with fewer unique values than rows, DataFusion panics with:
InvalidArgumentError("Incorrect array length for StructArray field \"c1\", expected N got M")
To Reproduce
-- Small table with dictionary-encoded region (2 rows, 1 unique value)
CREATE TABLE small AS
SELECT id, arrow_cast(region, 'Dictionary(Int32, Utf8)') as region
FROM (VALUES (1, 'west'), (2, 'west')) AS t(id, region);
CREATE TABLE large AS
SELECT id, region, value
FROM (VALUES (1, 'west', 100), (2, 'west', 200), (3, 'east', 300)) AS t(id, region, value);
-- Multi-column join triggers panic
SELECT s.id, s.region, l.value
FROM small s
JOIN large l ON s.id = l.id AND s.region = l.region;Expected behavior
Query returns 2 rows:
+----+--------+-------+
| id | region | value |
+----+--------+-------+
| 1 | west | 100 |
| 2 | west | 200 |
+----+--------+-------+
Actual behavior
Panic:
thread 'main' panicked at arrow-array/src/array/struct_array.rs:91:46:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Incorrect array length for StructArray field \"c1\", expected 3 got 2")
Root cause
In flatten_dictionary_array introduced by #18393:
fn flatten_dictionary_array(array: &ArrayRef) -> ArrayRef {
downcast_dictionary_array! {
array => {
flatten_dictionary_array(array.values())
}
_ => Arc::clone(array)
}
}The function calls array.values() which returns the dictionary's unique values array, not the full array of values.
When building a StructArray for multi-column join keys, StructArray::try_new_with_length() detects the length mismatch:
if a.len() != len {
return Err(ArrowError::InvalidArgumentError(format!(
"Incorrect array length for StructArray field {:?}, expected {} got {}",
f.name(), len, a.len()
)));
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does