Skip to content

feat: support array_repeat #1680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 29, 2025
Merged

feat: support array_repeat #1680

merged 4 commits into from
Apr 29, 2025

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Replaces #1205 .

Closes #1347

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@comphead
Copy link
Contributor Author

Now problem DF and Spark returns different value if count is null
DF returns empty array

> select array_repeat(null, arrow_cast(null, 'Int32'));
+---------------------------------------------------+
| array_repeat(NULL,arrow_cast(NULL,Utf8("Int32"))) |
+---------------------------------------------------+
| []                                                |
+---------------------------------------------------+

Spark returns NULL

scala> spark.sql("select array_repeat(1, null)").show(false)
+---------------------+
|array_repeat(1, NULL)|
+---------------------+
|NULL                 |
+---------------------+

.map(|x| *x as usize)
.collect::<Vec<_>>();

let mut nulls = NullBufferBuilder::new(count_array.len());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix to have nulls buffer and have response as null if count is null

@comphead comphead requested a review from Copilot April 27, 2025 17:27
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for the "array_repeat" scalar function. Key changes include:

  • Adding the new function implementation in a dedicated module (array_repeat).
  • Updating the scalar function registry to include "array_repeat".
  • Enhancing integration tests to verify the new function behavior.

Reviewed Changes

Copilot reviewed 5 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
native/spark-expr/src/comet_scalar_funcs.rs Added registration for the "array_repeat" UDF in the scalar functions.
native/spark-expr/src/array_funcs/mod.rs Included the new array_repeat module and its public export.
native/spark-expr/src/array_funcs/array_repeat.rs Implements the logic for the "array_repeat" function.
native/core/src/execution/planner.rs Added tests to validate the behavior of "array_repeat".
native/core/src/execution/operators/scan.rs Minor documentation formatting updates.
Files not reviewed (3)
  • spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: Language not supported
  • spark/src/main/scala/org/apache/comet/serde/arrays.scala: Language not supported
  • spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: Language not supported
Comments suppressed due to low confidence (1)

native/core/src/execution/planner.rs:2957

  • Consider adding additional test cases for array_repeat where the input column is of list type, as current tests only verify behavior for non-list (scalar) arrays.
fn test_array_repeat() {

@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.84%. Comparing base (f09f8af) to head (57c37d7).
Report is 166 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1680      +/-   ##
============================================
+ Coverage     56.12%   58.84%   +2.71%     
- Complexity      976     1082     +106     
============================================
  Files           119      126       +7     
  Lines         11743    12608     +865     
  Branches       2251     2363     +112     
============================================
+ Hits           6591     7419     +828     
- Misses         4012     4018       +6     
- Partials       1140     1171      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead comphead marked this pull request as ready for review April 27, 2025 18:40
Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarification (for my own understanding), otherwise lgtm.


for (row_index, &count) in count_vec.iter().enumerate() {
nulls.append(!count_array.is_null(row_index));
let repeated_array = if array.is_null(row_index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the result be a null array if the count is zero ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be empty array, added this test case as well

@comphead
Copy link
Contributor Author

Thanks @parthchandra for the review

@comphead comphead merged commit dbf2fb7 into apache:main Apr 29, 2025
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concat error while testing "array_repeat"
3 participants