Skip to content

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Jul 15, 2025

What changes were proposed in this pull request?

Adds a new spark-pipelines command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.

  • Syntax errors – e.g. invalid Python or SQL code
  • Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist
  • Graph validation errors - e.g. cyclic dependencies

Why are the changes needed?

Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.

Does this PR introduce any user-facing change?

Adds behavior; doesn't change existing behavior.

How was this patch tested?

  • Added unit tests
  • Executed dry-run on the CLI, for both success and error scenarios

Was this patch authored or co-authored using generative AI tooling?

@sryza sryza changed the title [SDP] dry run [SPARK-52511][SDP] dry run Jul 15, 2025
@sryza sryza marked this pull request as ready for review July 15, 2025 15:25
@sryza sryza requested review from gengliangwang and cloud-fan July 15, 2025 15:25
@@ -257,6 +257,13 @@ def run(spec_path: Path) -> None:
run_parser = subparsers.add_parser("run", help="Run a pipeline.")
run_parser.add_argument("--spec", help="Path to the pipeline spec.")

# "dry-run" subcommand
run_parser = subparsers.add_parser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sryza shall we have an end-to-end test for the dry run mode? We should check that it can detect failures without side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing – just added, in test_spark_connect.py

@gengliangwang gengliangwang changed the title [SPARK-52511][SDP] dry run [SPARK-52511][SDP] Support dry-run mode in spark-pipelines command Jul 16, 2025
@sryza sryza requested a review from gengliangwang July 17, 2025 03:06
@gengliangwang
Copy link
Member

"dry-run",
help="Launch a run that just validates the graph and checks for errors.",
)
run_parser.add_argument("--spec", help="Path to the pipeline spec.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be added mistakenly. Please remove this duplication because we already have this at line 258, @sryza . 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching – just fixed

@dongjoon-hyun
Copy link
Member

cc @peter-toth

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sryza .

@sryza sryza closed this in 93748cc Jul 18, 2025
@sryza
Copy link
Contributor Author

sryza commented Jul 18, 2025

Merged to master

haoyangeng-db pushed a commit to haoyangeng-db/apache-spark that referenced this pull request Jul 22, 2025
### What changes were proposed in this pull request?

Adds a new `spark-pipelines` command that launches an execution of a pipeline that doesn't write or read any data, but catches many kinds of errors that would be caught if the pipeline were to actually run. E.g.
- Syntax errors – e.g. invalid Python or SQL code
- Analysis errors – e.g. selecting from a table that doesn't exist or selecting a column that doesn't exist
- Graph validation errors - e.g. cyclic dependencies

### Why are the changes needed?

Leverage the declarative nature of Declarative Pipelines to make pipeline development easier.

### Does this PR introduce _any_ user-facing change?

Adds behavior; doesn't change existing behavior.

### How was this patch tested?

- Added unit tests
- Executed `dry-run` on the CLI, for both success and error scenarios

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#51489 from sryza/dry-run.

Lead-authored-by: Sandy Ryza <[email protected]>
Co-authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Sandy Ryza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants