-
Notifications
You must be signed in to change notification settings - Fork 131
Open
Labels
Description
System Information
- Spark or PySpark: PySpark
- SDK Version: v1.2.8
- Spark Version: v2.3.2
- Algorithm (e.g. KMeans): n/a
Describe the problem
I'm following the instructions proposed HERE to connect a local spark session running in a notebook in Sagemaker to the Glue Data Catalog of my account.
I know this is doable via EMR but I'd like do to the same using a Sagemaker notebook (or any other kind of separate spark installation)
Minimal repo / logs
Below is the current code that runs in the notebook but it doesn't actually work.
import sagemaker_pyspark
from pyspark.sql import SparkSession
classpath = ":".join(sagemaker_pyspark.classpath_jars())
spark = SparkSession.builder \
.config("spark.driver.extraClassPath", classpath) \
.config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") \
.config("hive.metastore.schema.verification", "false") \
.enableHiveSupport() \
.getOrCreate()
devonkinghorn, mattiamatrix, ismailsimsek and mohamed-ali