Skip to content

[SPARK-55056][SQL][PYTHON][TEST] Add tests for nested array with empty outer array#54880

Open
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55056-test
Open

[SPARK-55056][SQL][PYTHON][TEST] Add tests for nested array with empty outer array#54880
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55056-test

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Mar 18, 2026

What changes were proposed in this pull request?

Add tests to verify that writing triple-nested arrays (and nested arrays with maps) with an empty outer array no longer triggers a SIGSEGV.

Why are the changes needed?

SPARK-55056 reported a segmentation fault when serializing triple-nested arrays with an empty outer array via Arrow IPC. The root cause was in arrow-java: ListVector.getBufferSizeFor(0) returned 0, causing the offset buffer to be omitted for empty vectors, which violates the Arrow spec (offset buffer must have N+1 entries even when N=0).

This has been fixed upstream in arrow-java 19.0.0 (apache/arrow-java#343), which Spark adopted in SPARK-56000 (PR #54820). These tests confirm the fix works correctly without any Spark-side workaround.

Does this PR introduce any user-facing change?

No (test only).

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant