[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489

sryza · 2025-07-15T04:29:07Z

What changes were proposed in this pull request?

Adds a new spark-pipelines command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.

Syntax errors – e.g. invalid Python or SQL code
Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist
Graph validation errors - e.g. cyclic dependencies

Why are the changes needed?

Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.

Does this PR introduce any user-facing change?

Adds behavior; doesn't change existing behavior.

How was this patch tested?

Added unit tests
Executed dry-run on the CLI, for both success and error scenarios

Was this patch authored or co-authored using generative AI tooling?

gengliangwang · 2025-07-16T20:42:41Z

python/pyspark/pipelines/cli.py

@@ -257,6 +257,13 @@ def run(spec_path: Path) -> None:
    run_parser = subparsers.add_parser("run", help="Run a pipeline.")
    run_parser.add_argument("--spec", help="Path to the pipeline spec.")

+    # "dry-run" subcommand
+    run_parser = subparsers.add_parser(


@sryza shall we have an end-to-end test for the dry run mode? We should check that it can detect failures without side effects.

Sure thing – just added, in test_spark_connect.py

gengliangwang · 2025-07-17T04:52:50Z

@sryza seems there is linter failure https://github.com/sryza/spark/actions/runs/16335212360/job/46145897514

dongjoon-hyun · 2025-07-17T17:27:25Z

python/pyspark/pipelines/cli.py

+        "dry-run",
+        help="Launch a run that just validates the graph and checks for errors.",
+    )
+    run_parser.add_argument("--spec", help="Path to the pipeline spec.")


It seems to be added mistakenly. Please remove this duplication because we already have this at line 258, @sryza . 😄

Thanks for catching – just fixed

dongjoon-hyun · 2025-07-17T17:27:47Z

cc @peter-toth

dongjoon-hyun

+1, LGTM. Thank you, @sryza .

sryza · 2025-07-18T04:34:23Z

Merged to master

### What changes were proposed in this pull request? Adds a new `spark-pipelines` command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g. - Syntax errors – e.g. invalid Python or SQL code - Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist - Graph validation errors - e.g. cyclic dependencies ### Why are the changes needed? Leverage the declarative nature of Declarative Pipelines to make pipeline development easier. ### Does this PR introduce _any_ user-facing change? Adds behavior; doesn't change existing behavior. ### How was this patch tested? - Added unit tests - Executed `dry-run` on the CLI, for both success and error scenarios ### Was this patch authored or co-authored using generative AI tooling? Closes apache#51489 from sryza/dry-run. Lead-authored-by: Sandy Ryza <[email protected]> Co-authored-by: Sandy Ryza <[email protected]> Signed-off-by: Sandy Ryza <[email protected]>

github-actions bot added SQL PYTHON CONNECT labels Jul 15, 2025

sryza force-pushed the dry-run branch from 8d9407f to c969cf0 Compare July 15, 2025 04:34

github-actions bot added the DOCS label Jul 15, 2025

sryza force-pushed the dry-run branch from c969cf0 to 81b8eb7 Compare July 15, 2025 04:35

sryza changed the title ~~dry run~~ [SDP] dry run Jul 15, 2025

dry run

f8aec6f

sryza force-pushed the dry-run branch from 81b8eb7 to f8aec6f Compare July 15, 2025 15:23

sryza changed the title ~~[SDP] dry run~~ [SPARK-52511][SDP] dry run Jul 15, 2025

sryza marked this pull request as ready for review July 15, 2025 15:25

sryza requested review from gengliangwang and cloud-fan July 15, 2025 15:25

gengliangwang reviewed Jul 16, 2025

View reviewed changes

gengliangwang changed the title ~~[SPARK-52511][SDP] dry run~~ [SPARK-52511][SDP] Support dry-run mode in spark-pipelines command Jul 16, 2025

add Python end to end test

143fdd4

sryza requested a review from gengliangwang July 17, 2025 03:06

gengliangwang approved these changes Jul 17, 2025

View reviewed changes

dongjoon-hyun reviewed Jul 17, 2025

View reviewed changes

sryza added 3 commits July 17, 2025 10:45

fix lint

9e0dd6f

cli fix

5a00043

Merge remote-tracking branch 'apache/master' into dry-run

8f5c457

dongjoon-hyun approved these changes Jul 17, 2025

View reviewed changes

sryza added 2 commits July 17, 2025 16:12

fix refresh tests after merge conflict

5956c54

fix test_spark_connect after merge conflict

63282a0

sryza closed this in 93748cc Jul 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489

[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489

Uh oh!

sryza commented Jul 15, 2025 •

edited

Loading

Uh oh!

gengliangwang Jul 16, 2025

Uh oh!

sryza Jul 17, 2025

Uh oh!

gengliangwang commented Jul 17, 2025

Uh oh!

dongjoon-hyun Jul 17, 2025

Uh oh!

sryza Jul 17, 2025

Uh oh!

dongjoon-hyun commented Jul 17, 2025

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

sryza commented Jul 18, 2025

Uh oh!

Uh oh!

[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489

[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489

Uh oh!

Conversation

sryza commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

sryza Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented Jul 17, 2025

Uh oh!

dongjoon-hyun Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

sryza Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 17, 2025

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sryza commented Jul 18, 2025

Uh oh!

Uh oh!

sryza commented Jul 15, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading