Skip to content

Comments

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469

Open
erenavsarogullari wants to merge 1 commit intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix
Open

fix: Fix Spark slice function Null type to GenericListArray casting issue#20469
erenavsarogullari wants to merge 1 commit intoapache:mainfrom
erenavsarogullari:df_spark_slice_func_fix

Conversation

@erenavsarogullari
Copy link
Member

Which issue does this PR close?

Rationale for this change

Currently, Spark slice function accepts Null Arrays and return Null for this particular queries. DataFusion-Spark slice function also needs to return NULL when Null Array is set.
Spark Behavior (tested with latest Spark master):

> SELECT slice(NULL, 1, 2);
+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             null|
+-----------------+

DF Behaviour:
Current:

query error
SELECT slice(NULL, 1, 2);
----
DataFusion error: Internal error: could not cast array of type Null to arrow_array::array::list_array::GenericListArray<i32>.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues

New:

query ?
SELECT slice(NULL, 1, 2);
----
NULL

What changes are included in this PR?

Explained under first section.

Are these changes tested?

Added new UT cases for both slice.rs and slice.slt.

Are there any user-facing changes?

Yes, currently, slice function returns error message for Null Array inputs, however, expected behavior is to be returned NULL so end-user will get expected result instead of error message.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Feb 21, 2026
@Jefffrey
Copy link
Contributor

Does Spark return null (void) or an array of null (void)?

I tested on PySpark 4.1.1

>>> spark.sql("select slice(NULL, 1, 2) as a").printSchema()
root
 |-- a: array (nullable = true)
 |    |-- element: void (containsNull = true)
  • Seems to suggest the latter?

@erenavsarogullari
Copy link
Member Author

Spark returns Array as (nullable = true / false) and there are 2 representations as follows in terms of input array so it returns input array for both cases as either NULL or [NULL]:
Case1: input array is null (array (nullable = true)):

+-----------------+
|slice(NULL, 1, 2)|
+-----------------+
|             NULL|
+-----------------+

root
 |-- slice(NULL, 1, 2): array (nullable = true)
 |    |-- element: void (containsNull = true)

Case2: input array has null element (array (nullable = false)):

+------------------------+
|slice(array(NULL), 1, 2)|
+------------------------+
|                  [NULL]|
+------------------------+

root
 |-- slice(array(NULL), 1, 2): array (nullable = false)
 |    |-- element: void (containsNull = true)

@Jefffrey
Copy link
Contributor

This seems to suggest return type should be list of nulls instead of just null

mut func_args: ScalarFunctionArgs,
) -> Result<ColumnarValue> {
if func_args.args[0].data_type() == DataType::Null {
return Ok::<ColumnarValue, DataFusionError>(func_args.args[0].clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need of the turbofish here ?

Suggested change
return Ok::<ColumnarValue, DataFusionError>(func_args.args[0].clone());
return Ok(func_args.args[0].clone());

This will also make the new import of DataFusionError not needed

) -> Result<ColumnarValue> {
if func_args.args[0].data_type() == DataType::Null {
return Ok::<ColumnarValue, DataFusionError>(func_args.args[0].clone());
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
};
}

};
let slice = SparkSlice::new();
let result = slice.invoke_with_args(args).unwrap();
assert!(result.to_array(1).unwrap() == Arc::new(NullArray::new(1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert!(result.to_array(1).unwrap() == Arc::new(NullArray::new(1)));
assert_eq!(result.to_array(1).unwrap(), Arc::new(NullArray::new(1)));

];

let args = ScalarFunctionArgs {
args: input_args.to_owned(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args: input_args.to_owned(),
args: input_args,

args: input_args.to_owned(),
arg_fields: vec![Arc::new(Field::new(
"item",
List(FieldRef::new(Field::new("", DataType::Int64, true))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List(FieldRef::new(Field::new("", DataType::Int64, true))),
List(FieldRef::new(Field::new("f", DataType::Int64, true))),

It looks weird to use an empty name for a field.

List(FieldRef::new(Field::new("", DataType::Int64, true))),
false,
))],
number_rows: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
number_rows: 0,
number_rows: 1,

to match with the length of NullArray::new(1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Spark slice function Null type to GenericListArray casting issue

3 participants