Usage of Glue Data Catalog with sagemaker_pyspark

### System Information
- **Spark or PySpark**: PySpark
- **SDK Version**: v1.2.8
- **Spark Version**: v2.3.2
- **Algorithm (e.g. KMeans)**: n/a

### Describe the problem
I'm following the instructions proposed [HERE](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html) to connect a local spark session running in a notebook in Sagemaker to the Glue Data Catalog of my account.

I know this is doable via EMR but I'd like do to the same using a Sagemaker notebook (or any other kind of separate spark installation)

### Minimal repo / logs
Below is the current code that runs in the notebook but it doesn't actually work.
```
import sagemaker_pyspark
from pyspark.sql import SparkSession

classpath = ":".join(sagemaker_pyspark.classpath_jars())

spark = SparkSession.builder \
    .config("spark.driver.extraClassPath", classpath) \
    .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory") \
    .config("hive.metastore.schema.verification", "false") \
    .enableHiveSupport() \
    .getOrCreate()
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage of Glue Data Catalog with sagemaker_pyspark #109

System Information

Describe the problem

Minimal repo / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Usage of Glue Data Catalog with sagemaker_pyspark #109

Description

System Information

Describe the problem

Minimal repo / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions