feat: Implement Spark function `space` #19610

kazantsev-maksim · 2026-01-02T14:36:39Z

Which issue does this PR close?

N/A

Rationale for this change

Add new function: https://spark.apache.org/docs/latest/api/sql/index.html#space

What changes are included in this PR?

Implementation
Unit Tests
SLT tests

Are these changes tested?

Yes, tests added as part of this PR.

Are there any user-facing changes?

No, these are new function.

andygrove · 2026-01-02T15:27:56Z

datafusion/spark/src/function/string/space.rs

+        if args.args.len() != 1 {
+            return plan_err!("space expects exactly 1 argument");
+        }
+        make_scalar_function(spark_space, vec![])(&args.args)


I think that space is quite often invoked with a literal argument, e.g., space(2). The current implementation converts the scalar value into an array for each batch and then calls " ".repeat(m as usize) for each row, resulting in the same string being allocated each time. This could be optimized in the scalar case to build the string once rather than per row and then append the same string to the builder repeatedly.

In fact, the return type in this case could be a scalar not an array

andygrove · 2026-01-02T15:50:25Z

datafusion/spark/src/function/string/space.rs

+                if m < 0 {
+                    String::new()
+                } else {
+                    " ".repeat(m as usize)


This allocates a new string per row, which is discarded after being appended to the array. It would be more efficient for this function to build the string buffer and offsets directly.

andygrove · 2026-01-02T15:50:56Z

datafusion/sqllogictest/test_files/spark/string/space.slt

+# under the License.
+
+query T
+SELECT space(1::INT);


Could you add tests that use array inputs. The tests currently only cover scalar inputs.

Fix slt

fix slt

kazantsev-maksim · 2026-01-03T17:32:06Z

@andygrove thanks for the review, the benchmark test result in Comet is still not ideal.

comphead · 2026-01-03T19:41:26Z

datafusion/spark/src/function/string/space.rs

+
+fn spark_space_array_inner(array: &Int32Array) -> StringArray {
+    let values = array.values();
+    let data_capacity = values


Thanks @kazantsev-maksim just thinking aloud if we can iterate values only once?

fn spark_space_array_inner(array: &Int32Array) -> StringArray { let mut builder = StringBuilder::new(array.len()); let mut space_buf = String::new(); for v in array.iter() { match v { None => builder.append_null(), Some(l) if *l > 0 => { let l = *l as usize; if space_buf.len() < l { space_buf = " ".repeat(l); } builder.append_value(&space_buf[..l]); } Some(_) => builder.append_value(""), } } builder.finish() }

something like that?

@comphead thanks, performance has improved!

comphead · 2026-01-04T21:34:24Z

Thanks @kazantsev-maksim and @andygrove for the review

Impl spark sql space function

bff586e

github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Jan 2, 2026

Kazantsev Maksim added 2 commits January 2, 2026 19:11

Fix tests

8f039bd

Fix tests

d5fecbf

andygrove reviewed Jan 2, 2026

View reviewed changes

Kazantsev Maksim added 4 commits January 3, 2026 19:34

Fix PR issues

5720169

Fix PR issues

523b2ac

Fix PR issues

83ef891

Update space.slt

2453dc1

Fix slt

andygrove approved these changes Jan 3, 2026

View reviewed changes

Kazantsev Maksim added 4 commits January 3, 2026 21:05

Update space.slt

868180a

Fix slt

Update space.slt

b2e8f7b

fix slt

Update space.slt

dca244d

fix slt

Update space.slt

3c0b045

fix slt

comphead reviewed Jan 3, 2026

View reviewed changes

Add space benches

62fd392

comphead approved these changes Jan 4, 2026

View reviewed changes

comphead added this pull request to the merge queue Jan 4, 2026

Merged via the queue into apache:main with commit 7e04974 Jan 4, 2026
31 checks passed

kazantsev-maksim deleted the spark_space branch January 5, 2026 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Spark function `space` #19610

feat: Implement Spark function `space` #19610

Uh oh!

kazantsev-maksim commented Jan 2, 2026 •

edited

Loading

Uh oh!

andygrove Jan 2, 2026 •

edited

Loading

Uh oh!

andygrove Jan 2, 2026

Uh oh!

andygrove Jan 2, 2026

Uh oh!

andygrove Jan 2, 2026

Uh oh!

kazantsev-maksim commented Jan 3, 2026 •

edited

Loading

Uh oh!

comphead Jan 3, 2026

Uh oh!

kazantsev-maksim Jan 4, 2026

Uh oh!

comphead commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Implement Spark function space #19610

feat: Implement Spark function space #19610

Uh oh!

Conversation

kazantsev-maksim commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

andygrove Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

kazantsev-maksim commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comphead Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

kazantsev-maksim Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

comphead commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Implement Spark function `space` #19610

feat: Implement Spark function `space` #19610

kazantsev-maksim commented Jan 2, 2026 •

edited

Loading

andygrove Jan 2, 2026 •

edited

Loading

kazantsev-maksim commented Jan 3, 2026 •

edited

Loading