Skip to content

Fix and Refactor Spark shuffle function #20483

@erenavsarogullari

Description

@erenavsarogullari

Describe the bug

Currently, when Spark shuffle function returns following error message when seed is null. This needs to be fixed by exposing NULL instead of 'Int64'.

Current:

query error
SELECT shuffle([2, 1], NULL);
----
DataFusion error: Execution error: shuffle seed must be Int64 type, got 'Int64'

New:

query error DataFusion error: Execution error: shuffle seed must be Int64 type but got 'NULL'
SELECT shuffle([1, 2, 3], NULL);

In addition to this fix, this PR also introduces following refactoring to shuffle function:

  • Combining args validation checks with single error message,
  • Extending current error message with expected data types:
Current:
shuffle does not support type '{array_type}'.

New:
shuffle does not support type '{array_type}'. Expected types: List, LargeList, FixedSizeList or Null." 
  • Adding new UT coverages for both shuffle.rs and shuffle.slt.

To Reproduce

Explained under description section.

Expected behavior

Explained under description section.

Additional context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions