@@ -161,7 +161,7 @@ Training and Hosting an XGBoost model using SageMaker PySpark
161
161
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
162
162
163
163
A XGBoostSageMakerEstimator runs a training job using the Amazon SageMaker XGBoost algorithm upon
164
- invocation of fit(), returning a SageMakerModel.
164
+ invocation of fit(), returning a SageMakerModel.
165
165
166
166
.. code-block :: python
167
167
@@ -173,16 +173,16 @@ invocation of fit(), returning a SageMakerModel.
173
173
# there is no need to do this in code.
174
174
conf = (SparkConf()
175
175
.set(" spark.driver.extraClassPath" , " :" .join(classpath_jars())))
176
- SparkContext(conf = conf)
176
+ SparkContext.getOrCreate (conf = conf)
177
177
178
178
iam_role = " arn:aws:iam:0123456789012:role/MySageMakerRole"
179
179
180
180
region = " us-east-1"
181
- training_data = spark.read.format(" libsvm" ).option(" numFeatures" , " 784" )
182
- .load(" s3a://sagemaker-sample-data-{} /spark/mnist/train/" .format(region))
181
+ training_data = ( spark.read.format(" libsvm" ).option(" numFeatures" , " 784" )
182
+ .load(" s3a://sagemaker-sample-data-{} /spark/mnist/train/" .format(region)))
183
183
184
- test_data = spark.read.format(" libsvm" ).option(" numFeatures" , " 784" )
185
- .load(" s3a://sagemaker-sample-data-{} /spark/mnist/train/" .format(region))
184
+ test_data = ( spark.read.format(" libsvm" ).option(" numFeatures" , " 784" )
185
+ .load(" s3a://sagemaker-sample-data-{} /spark/mnist/train/" .format(region)))
186
186
187
187
xgboost_estimator = XGBoostSageMakerEstimator(
188
188
trainingInstanceType = " ml.m4.xlarge" ,
@@ -265,7 +265,7 @@ Create a bootstrap script to install sagemaker_pyspark in your new EMR cluster:
265
265
#! /bin/bash
266
266
267
267
sudo pip install sagemaker_pyspark
268
- sudo /usr/bin/pip-3.4 install sagemaker_pyspark
268
+ sudo pip3 install sagemaker_pyspark
269
269
270
270
271
271
Upload this script to an S3 bucket:
@@ -274,7 +274,7 @@ Upload this script to an S3 bucket:
274
274
275
275
$ aws s3 cp bootstrap.sh s3://your-bucket/prefix/
276
276
277
- In the AWS Console launch a new EMR Spark Cluster, set s3://your-bucket/prefix/bootstrap.sh as the
277
+ In the AWS Console launch a new EMR Spark Cluster, set s3://your-bucket/prefix/bootstrap.sh as the
278
278
bootstrap script. Make sure to:
279
279
280
280
- Run the Cluster in the same VPC as your SageMaker Notebook Instance.
@@ -315,15 +315,13 @@ Configure your SageMaker Notebook instance to connect to the cluster
315
315
316
316
Open a terminal session in your notebook: new->terminal
317
317
318
- Copy the default `sparkmagic config <https://github
319
- .com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json> `__
318
+ Copy the default `sparkmagic config <https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json >`__
320
319
321
320
You can download it in your terminal using:
322
321
323
322
.. code-block :: sh
324
323
325
- $ wget https://raw.githubusercontent
326
- .com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json
324
+ $ wget https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json
327
325
328
326
In the ``kernel_python_credentials `` section, replace the ``url `` with
329
327
``http://your-cluster-private-dns-name:8998 ``.
@@ -335,7 +333,7 @@ Override the default spark magic config
335
333
$ cp example_config.json ~ /.sparkmagic/config.json
336
334
337
335
338
- Launch a notebook using either the `` pyspark2 `` or `` pyspark3 `` Kernel. As soon as you try to run
336
+ Launch a notebook using `` Sparkmagic (Pyspark) `` Kernel. As soon as you try to run
339
337
any code block, the notebook will connect to your spark cluster and get a ``SparkSession `` for you.
340
338
341
339
0 commit comments