Skip to content

adding support for Min/Max over LargeList and FixedSizeList #16071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

logan-keede
Copy link
Contributor

@logan-keede logan-keede commented May 16, 2025

Which issue does this PR close?

Rationale for this change

Using #16025 as reference, adding support and relevant tests for LargeList and FixedSizeList

What changes are included in this PR?

Some tests, and some code to include LargeList and FixedSizeList in generic function.

Are these changes tested?

Yes, BY CI

Are there any user-facing changes?

Yes, min/max is now supported for LargeList and FixedSizeList.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels May 16, 2025
@logan-keede logan-keede marked this pull request as ready for review May 16, 2025 19:38
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that this was shipped so fast, thanks!

Copy link
Contributor

@gabotechs gabotechs May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to get this tested! maybe it's a bit overkill to add dedicated .slt files though. Note that the core logic of the Min/Max function over lists is already tested in

{a: 1, b: 2, c: 3} {a: 1, b: 2, c: 4}
# Min/Max with list over integers
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, 2, 3]),
([1, 2]);
----
[1, 2] [1, 2, 3]
# Min/Max with lists over strings
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
(['a', 'b', 'c']),
(['a', 'b']);
----
[a, b] [a, b, c]
# Min/Max with list over booleans
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([true, true, false]),
([false, true]);
----
[false, true] [true, true, false]
# Min/Max with list over nullable integers
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([NULL, 1, 2]),
([1, 2]);
----
[1, 2] [NULL, 1, 2]
# Min/Max list with different lengths and nulls
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, NULL, 3]),
([1, 2, 3, 4]),
([1, 2]);
----
[1, 2] [1, NULL, 3]
# Min/Max list with only NULLs
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([NULL, NULL]),
([NULL]);
----
[NULL] [NULL, NULL]
# Min/Max list with empty lists
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([]),
([1]),
([]);
----
[] [1]
# Min/Max list of varying types (integers and NULLs)
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, 2, 3]),
([NULL, 2, 3]),
([1, 2, NULL]);
----
[1, 2, 3] [NULL, 2, 3]
# Min/Max list grouped by key with NULLs and differing lengths
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, [1, NULL, 3]),
(0, [1, 2, 3, 4]),
(1, [1, 2]),
(1, [NULL, 5]),
(1, [])
GROUP BY column1;
----
0 [1, 2, 3, 4] [1, NULL, 3]
1 [] [NULL, 5]
# Min/Max list grouped by key with NULLs and differing lengths
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, [NULL]),
(0, [NULL, NULL]),
(1, [NULL])
GROUP BY column1;
----
0 [NULL] [NULL, NULL]
1 [NULL] [NULL]
# Min/Max grouped list with empty and non-empty
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, []),
(0, [1]),
(0, []),
(1, [5, 6]),
(1, [])
GROUP BY column1;
----
0 [] [1]
1 [] [5, 6]
# Min/Max over lists with a window function
query ?
SELECT min(column1) OVER (ORDER BY column1) FROM VALUES
([1, 2, 3]),
([1, 2, 3]),
([2, 3])
----
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
# Min/Max over lists with a window function and nulls
query ?
SELECT min(column1) OVER (ORDER BY column1) FROM VALUES
(NULL),
([4, 5]),
([2, 3])
----
[2, 3]
[2, 3]
[2, 3]
# Min/Max over lists with a window function, nulls and ROWS BETWEEN statement
query ?
SELECT min(column1) OVER (ORDER BY column1 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM VALUES
(NULL),
([4, 5]),
([2, 3])
----
[2, 3]
[2, 3]
[4, 5]
# Min/Max over lists with a window function using a different column
query ?
SELECT max(column2) OVER (ORDER BY column1 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM VALUES
([1, 2, 3], [4, 5]),
([2, 3], [2, 3]),
([1, 2, 3], NULL)
----
[4, 5]
[4, 5]
[2, 3]

IMO it should be enough to add a small set of basic tests at the end of aggregate.slt proving that the new FixedSizeList and LargeList types are supported, and rely on the previous vanilla List tests for checking that the core logic works. Otherwise, it will be unclear if new tests to the Min/Max aggregations over lists logic should also be replicated for all list variants.

WDYT? would also be curious to know the stance of other contributors

Copy link
Contributor Author

@logan-keede logan-keede May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that this was shipped so fast, thanks!

I did not do anything, Everything was already done!

Really nice to get this tested! maybe it's a bit overkill to add dedicated .slt files though. Note that the core logic of the Min/Max function over lists is already tested in

{a: 1, b: 2, c: 3} {a: 1, b: 2, c: 4}
# Min/Max with list over integers
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, 2, 3]),
([1, 2]);
----
[1, 2] [1, 2, 3]
# Min/Max with lists over strings
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
(['a', 'b', 'c']),
(['a', 'b']);
----
[a, b] [a, b, c]
# Min/Max with list over booleans
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([true, true, false]),
([false, true]);
----
[false, true] [true, true, false]
# Min/Max with list over nullable integers
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([NULL, 1, 2]),
([1, 2]);
----
[1, 2] [NULL, 1, 2]
# Min/Max list with different lengths and nulls
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, NULL, 3]),
([1, 2, 3, 4]),
([1, 2]);
----
[1, 2] [1, NULL, 3]
# Min/Max list with only NULLs
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([NULL, NULL]),
([NULL]);
----
[NULL] [NULL, NULL]
# Min/Max list with empty lists
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([]),
([1]),
([]);
----
[] [1]
# Min/Max list of varying types (integers and NULLs)
query ??
SELECT MIN(column1), MAX(column1) FROM VALUES
([1, 2, 3]),
([NULL, 2, 3]),
([1, 2, NULL]);
----
[1, 2, 3] [NULL, 2, 3]
# Min/Max list grouped by key with NULLs and differing lengths
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, [1, NULL, 3]),
(0, [1, 2, 3, 4]),
(1, [1, 2]),
(1, [NULL, 5]),
(1, [])
GROUP BY column1;
----
0 [1, 2, 3, 4] [1, NULL, 3]
1 [] [NULL, 5]
# Min/Max list grouped by key with NULLs and differing lengths
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, [NULL]),
(0, [NULL, NULL]),
(1, [NULL])
GROUP BY column1;
----
0 [NULL] [NULL, NULL]
1 [NULL] [NULL]
# Min/Max grouped list with empty and non-empty
query I?? rowsort
SELECT column1, MIN(column2), MAX(column2) FROM VALUES
(0, []),
(0, [1]),
(0, []),
(1, [5, 6]),
(1, [])
GROUP BY column1;
----
0 [] [1]
1 [] [5, 6]
# Min/Max over lists with a window function
query ?
SELECT min(column1) OVER (ORDER BY column1) FROM VALUES
([1, 2, 3]),
([1, 2, 3]),
([2, 3])
----
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
# Min/Max over lists with a window function and nulls
query ?
SELECT min(column1) OVER (ORDER BY column1) FROM VALUES
(NULL),
([4, 5]),
([2, 3])
----
[2, 3]
[2, 3]
[2, 3]
# Min/Max over lists with a window function, nulls and ROWS BETWEEN statement
query ?
SELECT min(column1) OVER (ORDER BY column1 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM VALUES
(NULL),
([4, 5]),
([2, 3])
----
[2, 3]
[2, 3]
[4, 5]
# Min/Max over lists with a window function using a different column
query ?
SELECT max(column2) OVER (ORDER BY column1 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM VALUES
([1, 2, 3], [4, 5]),
([2, 3], [2, 3]),
([1, 2, 3], NULL)
----
[4, 5]
[4, 5]
[2, 3]

IMO it should be enough to add a small set of basic tests at the end of aggregate.slt proving that the new FixedSizeList and LargeList types are supported, and rely on the previous vanilla List tests for checking that the core logic works. Otherwise, it will be unclear if new tests to the Min/Max aggregations over lists logic should also be replicated for all list variants.

WDYT? would also be curious to know the stance of other contributors

If you consider that we are actually using a generic logic under the hood then yes, but if we think from the perspective of black-box testing then perhaps it is more prudent to have complete test suite.
(edit:- or we can just leave a note that code path is currently same, and developer/reviewer should account for that.)

on a side note, I would like to avoid adding any more test to aggregate.slt. see this #13723

it already gives 132 error on my laptop(configuration issues that can be fixed,not an actual problem) but yeah it is a mess to parse through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate functions Changes to functions implementation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Min/Max over LargeList and FixedSizeList
2 participants