[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654

vrozov · 2025-01-24T18:35:02Z

What changes were proposed in this pull request?

Ensure that if relative path is used in DataStreamWriter, the path resolution is done on the Spark Driver and is not deferred to Spark Executor.
Construct fully qualified path in DataSource similar to how it is done for DataFrameWriter before it is passed to FileStreamSink.
Add a check to FileStreamSink that asserts that path is an absolute path.

https://lists.apache.org/thread/ffzwn1y2fgyjw0j09cv4np9z00wymxwv

Why are the changes needed?

To properly support relative paths in structured streaming. The use case mostly applies to single node local Spark cluster.

Does this PR introduce any user-facing change?

The change is only applicable to the use case when relative path is used in DataStreamWriter, resulting in data being output to correct directory. No changes are expected for absolute path (the most common production case).

How was this patch tested?

Added new test case to FileStreamSinkSuite.

Was this patch authored or co-authored using generative AI tooling?

No

…StreamSink

github-actions bot added SQL STRUCTURED STREAMING labels Jan 24, 2025

vrozov force-pushed the SPARK-50854 branch from 1ab31ed to 9bd12dc Compare January 24, 2025 20:30

[SPARK-50854][SS] Make path fully qualified before passing it to File…

c7c16ac

…StreamSink

vrozov force-pushed the SPARK-50854 branch from 9bd12dc to c7c16ac Compare January 24, 2025 23:15

github-actions bot added the R label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654

[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654

vrozov commented Jan 24, 2025

[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654

Are you sure you want to change the base?

[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654

Conversation

vrozov commented Jan 24, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?