[SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic #49617

yaooqinn · 2025-01-23T07:16:56Z

What changes were proposed in this pull request?

This PR adds a SparkDataFramePi Scala example to work both for Connect and Classic

Why are the changes needed?

The SparkPi example, mostly as the first step for users to get to know Spark, should be able to run on Spark Connect mode.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Manually build and test

bin/spark-submit --remote 'sc://localhost' --class org.apache.spark.examples.sql.SparkDataFramePi examples/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar
WARNING: Using incubator modules: jdk.incubator.vector
25/01/23 15:00:03 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
25/01/23 15:00:03 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type
25/01/23 15:00:03 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class
Pi is roughly 3.1388756943784717
25/01/23 15:00:04 INFO ShutdownHookManager: Shutdown hook called
25/01/23 15:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/84/dgr9ykwn6yndcmq1kjxqvk200000gn/T/spark-25ed842e-5888-47ce-bb0b-442385d643cb

Was this patch authored or co-authored using generative AI tooling?

no

…atible

yaooqinn · 2025-01-23T07:48:42Z

cc @cloud-fan @dongjoon-hyun @HyukjinKwon, thank you!

HyukjinKwon · 2025-01-23T07:53:23Z

Actually here is Core examples (https://github.com/apache/spark/tree/7f1158badeccfaaa763a3f39e5fc63a66029b52e/examples/src/main/scala/org/apache/spark/examples). SQL examples should be in https://github.com/apache/spark/tree/7f1158badeccfaaa763a3f39e5fc63a66029b52e/examples/src/main/scala/org/apache/spark/examples/sql

yaooqinn · 2025-01-23T07:58:27Z

Actually here is Core examples (https://github.com/apache/spark/tree/7f1158badeccfaaa763a3f39e5fc63a66029b52e/examples/src/main/scala/org/apache/spark/examples). SQL examples should be in https://github.com/apache/spark/tree/7f1158badeccfaaa763a3f39e5fc63a66029b52e/examples/src/main/scala/org/apache/spark/examples/sql

make sense

dongjoon-hyun

I understand what you aim, but this is not SQL in a user perspective, @yaooqinn .

We should distinguish SQL vs Spark Connect because Apache Spark already has Spark SQL modules and user interfaces like JDBC and spark-sql shell. Could you revise the name, 😄 ?

yaooqinn · 2025-01-24T02:12:01Z

I'd rename it with the FQDN as org.apache.spark.examples.sql.connect.SparkConnectPi

dongjoon-hyun · 2025-01-24T05:03:19Z

Thank you. Please revise the PR title and description accordingly too.

Addressed

cloud-fan · 2025-01-24T08:22:17Z

examples/src/main/scala/org/apache/spark/examples/sql/connect/SparkConnectPi.scala

+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.functions._
+
+/** Computes an approximation to pi with SparkSession/DataFrame APIs */


How is this different from the SQL example? My understanding is that the example should just use public SQL/DataFrame APIs and then it will work for both classic and Spark Connect. We should encourage users to use Spark SQL correctly (don't rely on private APIs), and in the example we can enable or disable Spark Connect w.r.t. the arguments.

IIUC，this example seem to be the exact thing you described. Or you were just concerning about the classname？

My point is why do we need to mention Spark Connect here? This is just a normal Spark SQL program and Spark Connect can support it because it doesn't use private APIs.

dongjoon-hyun · 2025-01-25T07:34:03Z

To @yaooqinn , the PR description seems to be outdated still~

bin/spark-submit --remote 'sc://localhost' --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar

yaooqinn · 2025-01-27T01:21:08Z

Updated. thank you @dongjoon-hyun

dongjoon-hyun

+1, LGTM from my side. Thank you, @yaooqinn .

Since it seems that there exists on-going discussion with @cloud-fan , I'll leave this to you and him.

cloud-fan · 2025-02-02T12:29:42Z

My point is that we don't need Spark Connect specific examples. All legal SQL examples (not use private APIs) should just work with Spark Connect.

I think it's a good idea to have more examples using DataFrame APIs instead of RDD, how about SparkDataFramePi?

dongjoon-hyun · 2025-02-03T17:32:27Z

+1 for the naming change (SparkConnectPi -> SparkDataFramePi).

yaooqinn · 2025-02-10T02:19:56Z

Thank you @cloud-fan and @dongjoon-hyun, SparkDataFramePi sounds good to me.

examples/src/main/scala/org/apache/spark/examples/sql/SparkDataFramePi.scala

…aFramePi.scala Co-authored-by: Wenchen Fan <[email protected]>

… and Classic ### What changes were proposed in this pull request? This PR adds a SparkDataFramePi Scala example to work both for Connect and Classic ### Why are the changes needed? The SparkPi example, mostly as the first step for users to get to know Spark, should be able to run on Spark Connect mode. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manually build and test ```log bin/spark-submit --remote 'sc://localhost' --class org.apache.spark.examples.sql.SparkDataFramePi examples/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar WARNING: Using incubator modules: jdk.incubator.vector 25/01/23 15:00:03 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true. 25/01/23 15:00:03 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type 25/01/23 15:00:03 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class Pi is roughly 3.1388756943784717 25/01/23 15:00:04 INFO ShutdownHookManager: Shutdown hook called 25/01/23 15:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/84/dgr9ykwn6yndcmq1kjxqvk200000gn/T/spark-25ed842e-5888-47ce-bb0b-442385d643cb ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #49617 from yaooqinn/SPARK-50917. Authored-by: Kent Yao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e823afa) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-02-10T19:42:51Z

Merged to master/4.0.

Thank you, @yaooqinn , @cloud-fan , @HyukjinKwon .

yaooqinn · 2025-02-11T02:04:13Z

Thank you very much, @dongjoon-hyun , @cloud-fan , @HyukjinKwon .

[SPARK-50917][EXAMPLES] Make SparkPi Scala example spark-connect comp…

7f1158b

…atible

github-actions bot added the EXAMPLES label Jan 23, 2025

yaooqinn requested a review from LuciferYang January 23, 2025 07:48

Add SQLPi

de07864

github-actions bot added the SQL label Jan 23, 2025

yaooqinn changed the title ~~[SPARK-50917][EXAMPLES] Make SparkPi Scala example spark-connect compatible~~ [SPARK-50917][EXAMPLES] Add SparkSQLPi Scala example to work both for Connect and Classic Jan 23, 2025

dongjoon-hyun previously requested changes Jan 23, 2025

View reviewed changes

rename

a5e03a9

yaooqinn changed the title ~~[SPARK-50917][EXAMPLES] Add SparkSQLPi Scala example to work both for Connect and Classic~~ [SPARK-50917][EXAMPLES] Add SparkConnectPi Scala example to work both for Connect and Classic Jan 24, 2025

cloud-fan reviewed Jan 24, 2025

View reviewed changes

dongjoon-hyun approved these changes Jan 28, 2025

View reviewed changes

rename

7a5f040

yaooqinn changed the title ~~[SPARK-50917][EXAMPLES] Add SparkConnectPi Scala example to work both for Connect and Classic~~ [SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic Feb 10, 2025

cloud-fan reviewed Feb 10, 2025

View reviewed changes

examples/src/main/scala/org/apache/spark/examples/sql/SparkDataFramePi.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Feb 10, 2025

View reviewed changes

Update examples/src/main/scala/org/apache/spark/examples/sql/SparkDat…

54d8d9b

…aFramePi.scala Co-authored-by: Wenchen Fan <[email protected]>

dongjoon-hyun closed this in e823afa Feb 10, 2025

yaooqinn deleted the SPARK-50917 branch February 11, 2025 02:04

[SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic #49617

[SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic #49617

Uh oh!

Conversation

yaooqinn commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yaooqinn commented Jan 23, 2025

Uh oh!

HyukjinKwon commented Jan 23, 2025

Uh oh!

yaooqinn commented Jan 23, 2025

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Jan 24, 2025

Uh oh!

dongjoon-hyun commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 25, 2025

Uh oh!

yaooqinn commented Jan 27, 2025

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 2, 2025

Uh oh!

dongjoon-hyun commented Feb 3, 2025

Uh oh!

yaooqinn commented Feb 10, 2025

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 10, 2025

Uh oh!

yaooqinn commented Feb 11, 2025

Uh oh!

Uh oh!

yaooqinn commented Jan 23, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jan 24, 2025 •

edited

Loading

cloud-fan Jan 24, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading