Skip to content

[Documentation]Instructions on how to take your application to production #345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

[Documentation]Instructions on how to take your application to production #345

wants to merge 7 commits into from

Conversation

elvaliuliuliu
Copy link
Contributor

Currently, there are some questions asked by customers that how they can run spark dotnet application in different scenarios. This PR gathers most commonly asked scenarios and provides general instructions on how customer can package their applications and submit jobs in such scenarios.

@imback82
Copy link
Contributor

cc: @bamurtaugh

@@ -0,0 +1,108 @@
Taking your Spark .Net Application to Production
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call it ".NET for Apache Spark" or ".NET for Spark" application instead (since we steer away from calling it Spark.NET publicly)? Also, I think ".NET" should be all caps for consistency.

Copy link
Contributor Author

@elvaliuliuliu elvaliuliuliu Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will change it to .NET for Apache Spark for now. And keep all .NET caps. Thanks!

This how-to provides general instructions on how to take your .NET for Apache Spark application to production.
In this documentation, we will summary the most commonly asked scenarios when running Spark .Net Application.
And you will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/).
- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency)
- [How to take your application to production when you have a single dependency](#how-to-take-your-application-to-production-when-you-have-a-single-dependency)

Not sure if we can change the phrasing here and still have it be precise, but "a single dependency" might sound a little cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we make these headings either more concise or more precise? i.e., either remove the "How to take your application to production" part since that phrase is already in the article title, or add a phrase that more specifically states what it means to take an app to production (does it just mean running spark-submit, so we could say something like "Deploy app with a single dependency"?).

Suggested change
- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency)
- [Single dependency](#single-dependency)
Suggested change
- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency)
- [How to deploy your application when you have a single dependency](#how-to-deploy-your-application-when-you-have-a-single-dependency)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! I would prefer the second one which I think is concise and precise.

```
#### 2. Using Apache Livy
- Please see below as an example of running your app with Apache Livy in Scenario 3 and Scenario 5.
And you should use `"files": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/nugetLibrary.dll"]` in Scenario 4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And you should use `"files": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/nugetLibrary.dll"]` in Scenario 4.
Additionally, you should use `"files": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/nugetLibrary.dll"]` in Scenario 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changed to resolve all the comments (except few which need some input). Thanks so much @bamurtaugh for your comments and feedback!

#### Scenario 4. SparkSession code references a function from a Nuget package that has been installed in the csproj
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj).
#### Scenario 5. SparkSession code references a function from a DLL on the user's machine
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does businessLogic.dll be from a different machine ?

Comment on lines +91 to +98
```shell
{
"file": "adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar",
"className": "org.apache.spark.deploy.dotnet.DotnetRunner",
"files": [“adl://<cluster name>.azuredatalakestore.net/<some dir>/businessLogic.dll" ],
"args": ["dotnet","adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.dll","<app arg 1>","<app arg 2>,"...","<app arg n>"]
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just provide the zip example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments! I have resolved all of them in this pr #349(I open a new pr #349 cause I could not edit on this one and will close this soon). Let's discuss and review in the new pr. Thanks for your understanding and sorry for the inconvenience.

@elvaliuliuliu
Copy link
Contributor Author

Closing this one and open a new pr #349 to move forward from there. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants