-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-52511][SDP] Support dry-run mode in spark-pipelines command #51489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/pipelines/cli.py
Outdated
@@ -257,6 +257,13 @@ def run(spec_path: Path) -> None: | |||
run_parser = subparsers.add_parser("run", help="Run a pipeline.") | |||
run_parser.add_argument("--spec", help="Path to the pipeline spec.") | |||
|
|||
# "dry-run" subcommand | |||
run_parser = subparsers.add_parser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sryza shall we have an end-to-end test for the dry run mode? We should check that it can detect failures without side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing – just added, in test_spark_connect.py
@sryza seems there is linter failure https://github.com/sryza/spark/actions/runs/16335212360/job/46145897514 |
python/pyspark/pipelines/cli.py
Outdated
"dry-run", | ||
help="Launch a run that just validates the graph and checks for errors.", | ||
) | ||
run_parser.add_argument("--spec", help="Path to the pipeline spec.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be added mistakenly. Please remove this duplication because we already have this at line 258, @sryza . 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching – just fixed
cc @peter-toth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @sryza .
Merged to master |
### What changes were proposed in this pull request? Adds a new `spark-pipelines` command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g. - Syntax errors – e.g. invalid Python or SQL code - Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist - Graph validation errors - e.g. cyclic dependencies ### Why are the changes needed? Leverage the declarative nature of Declarative Pipelines to make pipeline development easier. ### Does this PR introduce _any_ user-facing change? Adds behavior; doesn't change existing behavior. ### How was this patch tested? - Added unit tests - Executed `dry-run` on the CLI, for both success and error scenarios ### Was this patch authored or co-authored using generative AI tooling? Closes apache#51489 from sryza/dry-run. Lead-authored-by: Sandy Ryza <[email protected]> Co-authored-by: Sandy Ryza <[email protected]> Signed-off-by: Sandy Ryza <[email protected]>
What changes were proposed in this pull request?
Adds a new
spark-pipelines
command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.Why are the changes needed?
Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.
Does this PR introduce any user-facing change?
Adds behavior; doesn't change existing behavior.
How was this patch tested?
dry-run
on the CLI, for both success and error scenariosWas this patch authored or co-authored using generative AI tooling?