-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Change flatten
so it does only a level, not recursively
#15160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_ => arg_types[0].clone(), | ||
}, | ||
LargeList(field) => match field.data_type() { | ||
LargeList(field) => LargeList(Arc::clone(field)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It deserves to return LargeList
, if the nested data is List
LargeList(field) => LargeList(Arc::clone(field)), | |
List(field) | FixedSizeList(field, _) | LargeList(field) => { | |
LargeList(Arc::clone(field)) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I'll need to make some changes to support this. Currently trying to flatten LargeList(List)
will fail when casting the inner list to generic list with i64 offset (using the type parameter O
)
@@ -77,7 +77,6 @@ impl Flatten { | |||
pub fn new() -> Self { | |||
Self { | |||
signature: Signature { | |||
// TODO (https://github.com/apache/datafusion/issues/13757) flatten should be single-step, not recursive | |||
type_signature: TypeSignature::ArraySignature( | |||
ArrayFunctionSignature::RecursiveArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this signature still applicable?
Maybe we should switch to Array, and mark RecursiveArray as deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the deprecated note be since, from what I understand, this was added in specifically for flatten
to recursively coerce FixedLengthList
to List
. I'm wondering if any users would rely on that downstream
Hey @delamarch3 are you still tracking merging this ? interested in this new behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @delamarch3 👍 , I think the pr looks good. Waiting for other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @delamarch3 , noticed that null handling is not managed properly in the current implementation. Might as well fix it here directly
let list_arr = as_list_array(&array)?; | ||
let flattened_array = flatten_internal::<i32>(list_arr.clone(), None)?; | ||
Ok(Arc::new(flattened_array) as ArrayRef) | ||
let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts(); | |
let (field, offsets, values, nulls) = as_list_array(&array)?.clone().into_parts(); |
inner_field, | ||
offsets, | ||
inner_values, | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None, | |
nulls, |
let list_arr = as_large_list_array(&array)?; | ||
let flattened_array = flatten_internal::<i64>(list_arr.clone(), None)?; | ||
Ok(Arc::new(flattened_array) as ArrayRef) | ||
let (field, offsets, values, _) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let (field, offsets, values, _) = | |
let (field, offsets, values, nulls) = |
inner_field, | ||
offsets, | ||
inner_values, | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None, | |
nulls, |
Ok(Arc::new(flattened_array) as ArrayRef) | ||
} | ||
LargeList(_) => { | ||
let (inner_field, inner_offsets, inner_values, _) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let (inner_field, inner_offsets, inner_values, _) = | |
let (inner_field, inner_offsets, inner_values, nulls) = |
inner_field, | ||
offsets, | ||
inner_values, | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None, | |
nulls, |
Thanks for the reviews! |
@alamb can we get this merged please ? |
flatten
so it does only a level, not recursively
Thanks @cht42 @delamarch3 and @Weijun-H -- I reviewed this PR and marked it as an API change. I think it looks good and does what is requested Thanks! |
* flatten array in a single step instead of recursive * clippy * update flatten type signature to Array * add fixed list to list coercion to flatten signature * support LargeList(List) and LargeList(FixedSizeList) in flatten * add test for LargeList(FixedSizeList) * handle nulls * uncomment flatten(NULL) test - it already works
Which issue does this PR close?
flatten
should be single-step, not recursive #13757Rationale for this change
Parity with the
flatten
implementation in duckdb.What changes are included in this PR?
Remove the recursion in
flatten_internal
so that only the top level elements are flattened.Are these changes tested?
Existing sqllogictests have been updated.
Are there any user-facing changes?
Yes,
flatten
no longer recursively flattens the array