[SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink #49654
+55
−19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
DataStreamWriter
, the path resolution is done on the Spark Driver and is not deferred to Spark Executor.DataSource
similar to how it is done forDataFrameWriter
before it is passed toFileStreamSink
.FileStreamSink
that asserts thatpath
is an absolute path.https://lists.apache.org/thread/ffzwn1y2fgyjw0j09cv4np9z00wymxwv
Why are the changes needed?
To properly support relative paths in structured streaming. The use case mostly applies to single node local Spark cluster.
Does this PR introduce any user-facing change?
The change is only applicable to the use case when relative path is used in
DataStreamWriter
, resulting in data being output to correct directory. No changes are expected for absolute path (the most common production case).How was this patch tested?
Added new test case to
FileStreamSinkSuite
.Was this patch authored or co-authored using generative AI tooling?
No