Skip to content

Commit 1274643

Browse files
Initial commit - Go plugin for natively launching Spark applications on Kubernetes
1 parent 3f0d3a8 commit 1274643

19 files changed

+3672
-34
lines changed

CODEOWNERS

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
11
# Comment line immediately above ownership line is reserved for related other information. Please be careful while editing.
22
#ECCN:Open Source
3-
#GUSINFO:Open Source,Open Source Workflow

CONTRIBUTING.md

Lines changed: 6 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,25 @@
1-
*This is a suggested `CONTRIBUTING.md` file template for use by open sourced Salesforce projects. The main goal of this file is to make clear the intents and expectations that end-users may have regarding this project and how/if to engage with it. Adjust as needed (especially look for `{project_slug}` which refers to the org and repo name of your project) and remove this paragraph before committing to your repo.*
1+
# Contributing Guide For Native Submit Plugin
22

3-
# Contributing Guide For {NAME OF PROJECT}
4-
5-
This page lists the operational governance model of this project, as well as the recommendations and requirements for how to best contribute to {PROJECT}. We strive to obey these as best as possible. As always, thanks for contributing – we hope these guidelines make it easier and shed some light on our approach and processes.
6-
7-
# Governance Model
8-
> Pick the most appropriate one
3+
This page lists the operational governance model of this project, as well as the recommendations and requirements for how to best contribute to native-submit-plugin. We strive to obey these as best as possible. As always, thanks for contributing – we hope these guidelines make it easier and shed some light on our approach and processes.
94

105
## Community Based
116

127
The intent and goal of open sourcing this project is to increase the contributor and user base. The governance model is one where new project leads (`admins`) will be added to the project based on their contributions and efforts, a so-called "do-acracy" or "meritocracy" similar to that used by all Apache Software Foundation projects.
138

14-
> or
15-
16-
## Salesforce Sponsored
17-
18-
The intent and goal of open sourcing this project is to increase the contributor and user base. However, only Salesforce employees will be given `admin` rights and will be the final arbitrars of what contributions are accepted or not.
19-
20-
> or
21-
22-
## Published but not supported
23-
24-
The intent and goal of open sourcing this project is because it may contain useful or interesting code/concepts that we wish to share with the larger open source community. Although occasional work may be done on it, we will not be looking for or soliciting contributions.
25-
26-
# Getting started
27-
28-
Please join the community on {Here list Slack channels, Email lists, Glitter, Discord, etc... links}. Also please make sure to take a look at the project [roadmap](ROADMAP.md) to see where are headed.
29-
309
# Issues, requests & ideas
3110

3211
Use GitHub Issues page to submit issues, enhancement requests and discuss ideas.
3312

3413
### Bug Reports and Fixes
35-
- If you find a bug, please search for it in the [Issues](https://github.com/{project_slug}/issues), and if it isn't already tracked,
36-
[create a new issue](https://github.com/{project_slug}/issues/new). Fill out the "Bug Report" section of the issue template. Even if an Issue is closed, feel free to comment and add details, it will still
14+
- If you find a bug, please search for it in the [Issues](https://github.com/native-submit-plugin/issues), and if it isn't already tracked,
15+
[create a new issue](https://github.com/native-submit-plugin/issues/new). Fill out the "Bug Report" section of the issue template. Even if an Issue is closed, feel free to comment and add details, it will still
3716
be reviewed.
3817
- Issues that have already been identified as a bug (note: able to reproduce) will be labelled `bug`.
3918
- If you'd like to submit a fix for a bug, [send a Pull Request](#creating_a_pull_request) and mention the Issue number.
4019
- Include tests that isolate the bug and verifies that it was fixed.
4120

4221
### New Features
43-
- If you'd like to add new functionality to this project, describe the problem you want to solve in a [new Issue](https://github.com/{project_slug}/issues/new).
22+
- If you'd like to add new functionality to this project, describe the problem you want to solve in a [new Issue](https://github.com/native-submit-plugin/issues/new).
4423
- Issues that have been identified as a feature request will be labelled `enhancement`.
4524
- If you'd like to implement the new feature, please wait for feedback from the project
4625
maintainers before spending too much time writing the code. In some cases, `enhancement`s may
@@ -51,7 +30,7 @@ Use GitHub Issues page to submit issues, enhancement requests and discuss ideas.
5130
alternative implementation of something that may have advantages over the way its currently
5231
done, or you have any other change, we would be happy to hear about it!
5332
- If its a trivial change, go ahead and [send a Pull Request](#creating_a_pull_request) with the changes you have in mind.
54-
- If not, [open an Issue](https://github.com/{project_slug}/issues/new) to discuss the idea first.
33+
- If not, [open an Issue](https://github.com/native-submit-plugin/issues/new) to discuss the idea first.
5534

5635
If you're new to our project and looking for some way to make your first contribution, look for
5736
Issues labelled `good first contribution`.

README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
# README
22

3-
A repo containing all the basic file templates and general guidelines for any open source project at Salesforce.
3+
This plugin provides a native alternative to `spark-submit` for launching Spark applications via the Spark Operator in a Kubernetes cluster. By bypassing the default mechanism of using `spark-submit` command to launch Spark applications, users can avoid the JVM spin-up overhead associated with spark-submit.
44

5-
## Usage
6-
7-
It's required that all files must be placed at the top level of your repository.
8-
9-
> **NOTE** Your README should contain detailed, useful information about the project!
5+
## Build
6+
Run the following command from the root directory of the project
107

8+
`go build -buildmode=plugin -o plugin.so ./main`

config/config_constants.go

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
package config
2+
3+
const (
4+
// SparkConfDirEnvVar is the environment variable to add to the driver and executor Pods that point
5+
// to the directory where the Spark ConfigMap is mounted.
6+
SparkConfDirEnvVar = "SPARK_CONF_DIR"
7+
// LabelAnnotationPrefix is the prefix of every labels and annotations added by the controller.
8+
LabelAnnotationPrefix = "sparkoperator.k8s.io/"
9+
// SparkAppNameLabel is the name of the label for the SparkApplication object name.
10+
SparkAppNameLabel = LabelAnnotationPrefix + "app-name"
11+
// ScheduledSparkAppNameLabel is the name of the label for the ScheduledSparkApplication object name.
12+
ScheduledSparkAppNameLabel = LabelAnnotationPrefix + "scheduled-app-name"
13+
// LaunchedBySparkOperatorLabel is a label on Spark pods launched through the Spark Operator.
14+
LaunchedBySparkOperatorLabel = LabelAnnotationPrefix + "launched-by-spark-operator"
15+
// SparkApplicationSelectorLabel is the AppID set by the spark-distribution on the driver/executors Pods.
16+
SparkApplicationSelectorLabel = "spark-app-selector"
17+
// SparkRoleLabel is the driver/executor label set by the operator/spark-distribution on the driver/executors Pods.
18+
SparkRoleLabel = "spark-role"
19+
// SparkDriverRole is the value of the spark-role label for the driver.
20+
SparkDriverRole = "driver"
21+
// SparkExecutorRole is the value of the spark-role label for the executors.
22+
SparkExecutorRole = "executor"
23+
// SubmissionIDLabel is the label that records the submission ID of the current run of an application.
24+
SubmissionIDLabel = LabelAnnotationPrefix + "submission-id"
25+
// SparkAppNameKey is the configuration property for application name.
26+
SparkAppNameKey = "spark.app.name"
27+
// SparkAppNamespaceKey is the configuration property for application namespace.
28+
SparkAppNamespaceKey = "spark.kubernetes.namespace"
29+
// SparkContainerImageKey is the configuration property for specifying the unified container image.
30+
SparkContainerImageKey = "spark.kubernetes.container.image"
31+
// SparkImagePullSecretKey is the configuration property for specifying the comma-separated list of image-pull
32+
// secrets.
33+
SparkImagePullSecretKey = "spark.kubernetes.container.image.pullSecrets"
34+
// SparkContainerImagePullPolicyKey is the configuration property for specifying the container image pull policy.
35+
SparkContainerImagePullPolicyKey = "spark.kubernetes.container.image.pullPolicy"
36+
// SparkNodeSelectorKeyPrefix is the configuration property prefix for specifying node selector for the pods.
37+
SparkNodeSelectorKeyPrefix = "spark.kubernetes.node.selector."
38+
//SparkDriverNodeSelectorKeyPrefix is the configuration property prefix for specifying node selector for the driver pods.
39+
SparkDriverNodeSelectorKeyPrefix = "spark.kubernetes.driver.node.selector."
40+
//SparkExecutorNodeSelectorKeyPrefix is the configuration property prefix for specifying node selector for the driver pods.
41+
SparkExecutorNodeSelectorKeyPrefix = "spark.kubernetes.executor.node.selector."
42+
// SparkDriverContainerImageKey is the configuration property for specifying a custom driver container image.
43+
SparkDriverContainerImageKey = "spark.kubernetes.driver.container.image"
44+
// SparkExecutorContainerImageKey is the configuration property for specifying a custom executor container image.
45+
SparkExecutorContainerImageKey = "spark.kubernetes.executor.container.image"
46+
// SparkDriverCoreRequestKey is the configuration property for specifying the physical CPU request for the driver.
47+
SparkDriverCoreRequestKey = "spark.kubernetes.driver.request.cores"
48+
// SparkExecutorCoreRequestKey is the configuration property for specifying the physical CPU request for executors.
49+
SparkExecutorCoreRequestKey = "spark.kubernetes.executor.request.cores"
50+
SparkExecutorCoreKey = "spark.executor.cores"
51+
// SparkDriverCoreLimitKey is the configuration property for specifying the hard CPU limit for the driver pod.
52+
SparkDriverCoreLimitKey = "spark.kubernetes.driver.limit.cores"
53+
// SparkExecutorCoreLimitKey is the configuration property for specifying the hard CPU limit for the executor pods.
54+
SparkExecutorCoreLimitKey = "spark.kubernetes.executor.limit.cores"
55+
// SparkDriverSecretKeyPrefix is the configuration property prefix for specifying secrets to be mounted into the
56+
// driver.
57+
SparkDriverSecretKeyPrefix = "spark.kubernetes.driver.secrets."
58+
// SparkExecutorSecretKeyPrefix is the configuration property prefix for specifying secrets to be mounted into the
59+
// executors.
60+
SparkExecutorSecretKeyPrefix = "spark.kubernetes.executor.secrets."
61+
// SparkDriverSecretKeyRefKeyPrefix is the configuration property prefix for specifying environment variables
62+
// from SecretKeyRefs for the driver.
63+
SparkDriverSecretKeyRefKeyPrefix = "spark.kubernetes.driver.secretKeyRef."
64+
// SparkExecutorSecretKeyRefKeyPrefix is the configuration property prefix for specifying environment variables
65+
// from SecretKeyRefs for the executors.
66+
SparkExecutorSecretKeyRefKeyPrefix = "spark.kubernetes.executor.secretKeyRef."
67+
// SparkDriverEnvVarConfigKeyPrefix is the Spark configuration prefix for setting environment variables
68+
// into the driver.
69+
SparkDriverEnvVarConfigKeyPrefix = "spark.kubernetes.driverEnv."
70+
// SparkExecutorEnvVarConfigKeyPrefix is the Spark configuration prefix for setting environment variables
71+
// into the executor.
72+
SparkExecutorEnvVarConfigKeyPrefix = "spark.executorEnv."
73+
// SparkDriverAnnotationKeyPrefix is the Spark configuration key prefix for annotations on the driver Pod.
74+
SparkDriverAnnotationKeyPrefix = "spark.kubernetes.driver.annotation."
75+
// SparkExecutorAnnotationKeyPrefix is the Spark configuration key prefix for annotations on the executor Pods.
76+
SparkExecutorAnnotationKeyPrefix = "spark.kubernetes.executor.annotation."
77+
// SparkDriverLabelKeyPrefix is the Spark configuration key prefix for labels on the driver Pod.
78+
SparkDriverLabelKeyPrefix = "spark.kubernetes.driver.label."
79+
// SparkExecutorLabelKeyPrefix is the Spark configuration key prefix for labels on the executor Pods.
80+
SparkExecutorLabelKeyPrefix = "spark.kubernetes.executor.label."
81+
// SparkDriverVolumesPrefix is the Spark volumes configuration for mounting a volume into the driver pod.
82+
SparkDriverVolumesPrefix = "spark.kubernetes.driver.volumes."
83+
// SparkExecutorVolumesPrefix is the Spark volumes configuration for mounting a volume into the driver pod.
84+
SparkExecutorVolumesPrefix = "spark.kubernetes.executor.volumes."
85+
// SparkDriverPodNameKey is the Spark configuration key for driver pod name.
86+
SparkDriverPodNameKey = "spark.kubernetes.driver.pod.name"
87+
// SparkDriverServiceAccountName is the Spark configuration key for specifying name of the Kubernetes service
88+
// account used by the driver pod.
89+
SparkDriverServiceAccountName = "spark.kubernetes.authenticate.driver.serviceAccountName"
90+
// account used by the executor pod.
91+
SparkExecutorAccountName = "spark.kubernetes.authenticate.executor.serviceAccountName"
92+
// SparkWaitAppCompletion is the Spark configuration key for specifying whether to wait for application to complete.
93+
SparkWaitAppCompletion = "spark.kubernetes.submission.waitAppCompletion"
94+
// SparkPythonVersion is the Spark configuration key for specifying python version used.
95+
SparkPythonVersion = "spark.kubernetes.pyspark.pythonVersion"
96+
// SparkMemoryOverheadFactor is the Spark configuration key for specifying memory overhead factor used for Non-JVM memory.
97+
SparkMemoryOverheadFactor = "spark.kubernetes.memoryOverheadFactor"
98+
// SparkDriverJavaOptions is the Spark configuration key for a string of extra JVM options to pass to driver.
99+
SparkDriverJavaOptions = "spark.driver.extraJavaOptions"
100+
// SparkExecutorJavaOptions is the Spark configuration key for a string of extra JVM options to pass to executors.
101+
SparkExecutorJavaOptions = "spark.executor.extraJavaOptions"
102+
// SparkExecutorDeleteOnTermination is the Spark configuration for specifying whether executor pods should be deleted in case of failure or normal termination
103+
SparkExecutorDeleteOnTermination = "spark.kubernetes.executor.deleteOnTermination"
104+
// SparkDriverKubernetesMaster is the Spark configuration key for specifying the Kubernetes master the driver use
105+
// to manage executor pods and other Kubernetes resources.
106+
SparkDriverKubernetesMaster = "spark.kubernetes.driver.master"
107+
// SparkDriverServiceAnnotationKeyPrefix is the key prefix of annotations to be added to the driver service.
108+
SparkDriverServiceAnnotationKeyPrefix = "spark.kubernetes.driver.service.annotation."
109+
// SparkDynamicAllocationEnabled is the Spark configuration key for specifying if dynamic
110+
// allocation is enabled or not.
111+
SparkDynamicAllocationEnabled = "spark.dynamicAllocation.enabled"
112+
// SparkDynamicAllocationShuffleTrackingEnabled is the Spark configuration key for
113+
// specifying if shuffle data tracking is enabled.
114+
SparkDynamicAllocationShuffleTrackingEnabled = "spark.dynamicAllocation.shuffleTracking.enabled"
115+
// SparkDynamicAllocationShuffleTrackingTimeout is the Spark configuration key for specifying
116+
// the shuffle tracking timeout in milliseconds if shuffle tracking is enabled.
117+
SparkDynamicAllocationShuffleTrackingTimeout = "spark.dynamicAllocation.shuffleTracking.timeout"
118+
// SparkDynamicAllocationInitialExecutors is the Spark configuration key for specifying
119+
// the initial number of executors to request if dynamic allocation is enabled.
120+
SparkDynamicAllocationInitialExecutors = "spark.dynamicAllocation.initialExecutors"
121+
// SparkDynamicAllocationMinExecutors is the Spark configuration key for specifying the
122+
// lower bound of the number of executors to request if dynamic allocation is enabled.
123+
SparkDynamicAllocationMinExecutors = "spark.dynamicAllocation.minExecutors"
124+
// SparkDynamicAllocationMaxExecutors is the Spark configuration key for specifying the
125+
// upper bound of the number of executors to request if dynamic allocation is enabled.
126+
SparkDynamicAllocationMaxExecutors = "spark.dynamicAllocation.maxExecutors"
127+
// GoogleApplicationCredentialsEnvVar is the environment variable used by the
128+
// Application Default Credentials mechanism. More details can be found at
129+
// https://developers.google.com/identity/protocols/application-default-credentials.
130+
GoogleApplicationCredentialsEnvVar = "GOOGLE_APPLICATION_CREDENTIALS"
131+
// ServiceAccountJSONKeyFileName is the assumed name of the service account
132+
// Json key file. This name is added to the service account secret mount path to
133+
// form the path to the Json key file referred to by GOOGLE_APPLICATION_CREDENTIALS.
134+
ServiceAccountJSONKeyFileName = "key.json"
135+
// HadoopTokenFileLocationEnvVar is the environment variable for specifying the location
136+
// where the file storing the Hadoop delegation token is located.
137+
HadoopTokenFileLocationEnvVar = "HADOOP_TOKEN_FILE_LOCATION"
138+
// HadoopDelegationTokenFileName is the assumed name of the file storing the Hadoop
139+
// delegation token. This name is added to the delegation token secret mount path to
140+
// form the path to the file referred to by HADOOP_TOKEN_FILE_LOCATION.
141+
HadoopDelegationTokenFileName = "hadoop.token"
142+
// SparkDriverContainerName is name of driver container in spark driver pod
143+
SparkDriverContainerName = "spark-kubernetes-driver"
144+
// SparkLocalDirVolumePrefix is the volume name prefix for "scratch" space directory
145+
SparkLocalDirVolumePrefix = "spark-local-dir-"
146+
)

0 commit comments

Comments
 (0)