Skip to content

Conversation

kevinjqliu
Copy link
Contributor

Rationale for this change

Add make notebook to spin up a jupyter notebook

With spark connect (#2491) and our testing setup, we can quickly spin up a local env with

  • spark
  • iceberg rest catalog
  • hive metastore
  • minio
make test-integration-exec
make notebook

in the jupyter notebook, connect to spark easily

from pyspark.sql import SparkSession

# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()

Are these changes tested?

Are there any user-facing changes?

@kevinjqliu kevinjqliu requested a review from Fokko September 26, 2025 02:43
@echo "Cleanup complete."

notebook: ## Launch Jupyter Notebook
${POETRY} run pip install jupyter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into a poetry dependency group? Similar to the docs.

@Fokko
Copy link
Contributor

Fokko commented Sep 26, 2025

With spark connect (#2491) and our testing setup, we can quickly spin up a local env with

I agree, and that's great, but should we also spin up the resources as part of this effort? We could even inject a notebook that imports Spark-connect, etc (which won't be installed from a fresh install? I think this is a dev dependency, we probably want to double check there to avoid scaring newcomers to the project).

@jayceslesar
Copy link
Contributor

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Sep 26, 2025

We could even inject a notebook that imports Spark-connect

We could do getting started as a notebook! https://py.iceberg.apache.org/#getting-started-with-pyiceberg

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Sep 26, 2025

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

yea we could do that. the integration test setup gives us 2 different catalogs (rest and hms)

@Fokko
Copy link
Contributor

Fokko commented Sep 30, 2025

@kevinjqliu I would keep it simple, and go with the preferred catalog; REST :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants