Skip to content

Change flatten so it does only a level, not recursively #15160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 17, 2025

Conversation

delamarch3
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Parity with the flatten implementation in duckdb.

What changes are included in this PR?

Remove the recursion in flatten_internal so that only the top level elements are flattened.

Are these changes tested?

Existing sqllogictests have been updated.

Are there any user-facing changes?

Yes, flatten no longer recursively flattens the array

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 11, 2025
@alamb alamb requested a review from findepi March 11, 2025 18:56
_ => arg_types[0].clone(),
},
LargeList(field) => match field.data_type() {
LargeList(field) => LargeList(Arc::clone(field)),
Copy link
Member

@Weijun-H Weijun-H Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It deserves to return LargeList, if the nested data is List

Suggested change
LargeList(field) => LargeList(Arc::clone(field)),
List(field) | FixedSizeList(field, _) | LargeList(field) => {
LargeList(Arc::clone(field))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I'll need to make some changes to support this. Currently trying to flatten LargeList(List) will fail when casting the inner list to generic list with i64 offset (using the type parameter O)

@@ -77,7 +77,6 @@ impl Flatten {
pub fn new() -> Self {
Self {
signature: Signature {
// TODO (https://github.com/apache/datafusion/issues/13757) flatten should be single-step, not recursive
type_signature: TypeSignature::ArraySignature(
ArrayFunctionSignature::RecursiveArray,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this signature still applicable?

Maybe we should switch to Array, and mark RecursiveArray as deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the deprecated note be since, from what I understand, this was added in specifically for flatten to recursively coerce FixedLengthList to List. I'm wondering if any users would rely on that downstream

@delamarch3 delamarch3 requested review from findepi and Weijun-H March 15, 2025 15:31
@cht42
Copy link
Contributor

cht42 commented Apr 2, 2025

Hey @delamarch3 are you still tracking merging this ? interested in this new behavior

@delamarch3
Copy link
Contributor Author

Hi @cht42, I'm just waiting for another review.

@findepi could you take another look at this when you get the chance please?

@cht42
Copy link
Contributor

cht42 commented Apr 9, 2025

Maybe @alamb or @Weijun-H can review ?

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @delamarch3 👍 , I think the pr looks good. Waiting for other comments.

Copy link
Contributor

@cht42 cht42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @delamarch3 , noticed that null handling is not managed properly in the current implementation. Might as well fix it here directly

let list_arr = as_list_array(&array)?;
let flattened_array = flatten_internal::<i32>(list_arr.clone(), None)?;
Ok(Arc::new(flattened_array) as ArrayRef)
let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts();
let (field, offsets, values, nulls) = as_list_array(&array)?.clone().into_parts();

inner_field,
offsets,
inner_values,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None,
nulls,

let list_arr = as_large_list_array(&array)?;
let flattened_array = flatten_internal::<i64>(list_arr.clone(), None)?;
Ok(Arc::new(flattened_array) as ArrayRef)
let (field, offsets, values, _) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (field, offsets, values, _) =
let (field, offsets, values, nulls) =

inner_field,
offsets,
inner_values,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None,
nulls,

Ok(Arc::new(flattened_array) as ArrayRef)
}
LargeList(_) => {
let (inner_field, inner_offsets, inner_values, _) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (inner_field, inner_offsets, inner_values, _) =
let (inner_field, inner_offsets, inner_values, nulls) =

inner_field,
offsets,
inner_values,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None,
nulls,

@delamarch3
Copy link
Contributor Author

Thanks for the reviews!

@cht42
Copy link
Contributor

cht42 commented Apr 16, 2025

@alamb can we get this merged please ?

@alamb alamb changed the title Flatten array in a single step instead of recursive Change flatten so it does only a level, not recursively Apr 17, 2025
@alamb alamb added the api change Changes the API exposed to users of the crate label Apr 17, 2025
@alamb
Copy link
Contributor

alamb commented Apr 17, 2025

Thanks @cht42 @delamarch3 and @Weijun-H -- I reviewed this PR and marked it as an API change. I think it looks good and does what is requested

Thanks!

@alamb alamb merged commit a0b0221 into apache:main Apr 17, 2025
28 checks passed
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* flatten array in a single step instead of recursive

* clippy

* update flatten type signature to Array

* add fixed list to list coercion to flatten signature

* support LargeList(List) and LargeList(FixedSizeList) in flatten

* add test for LargeList(FixedSizeList)

* handle nulls

* uncomment flatten(NULL) test - it already works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flatten should be single-step, not recursive
5 participants