-
Notifications
You must be signed in to change notification settings - Fork 42
Docs: add PostGIS integration guide with GeoPandas and ADBC examples #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for opening! I see a few CI issues ( I'm sorry I didn't get to this today...I will take a look Monday! |
|
Thanks for the review and for the pointers! I’ll add the Apache license header to the first cell of the notebook (following the pattern from the existing docs) and run I’ll push the updates shortly — thanks, and no worries at all. Looking forward to your review on Monday! |
|
I’ve added the Apache license header to the first cell of the notebook and addressed the formatting concerns. The updates are pushed. |
docs/postgis.md
Outdated
| > Note: | ||
| > SedonaDB is not currently distributed via PyPI. | ||
| > To run the SedonaDB examples in this notebook, you must install SedonaDB | ||
| > from source or use a development environment where SedonaDB is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this part (SedonaDB is distributed via PyPI)
docs/postgis.md
Outdated
| ## PostGIS Setup | ||
|
|
||
| Keep SQL static(do NOT execute). | ||
|
|
||
| ### Preparing a PostGIS table | ||
|
|
||
| ```md | ||
|
|
||
| The following SQL creates a simple PostGIS table that SedonaDB can read. | ||
|
|
||
| ```sql | ||
| CREATE TABLE my_places ( | ||
| id SERIAL PRIMARY KEY, | ||
| name VARCHAR(100), | ||
| geom GEOMETRY(Point, 4326) | ||
| ); | ||
|
|
||
| INSERT INTO my_places (name, geom) VALUES | ||
| ('New York', ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326)), | ||
| ('Los Angeles', ST_SetSRID(ST_MakePoint(-118.2437, 34.0522), 4326)), | ||
| ('Chicago', ST_SetSRID(ST_MakePoint(-87.6298, 41.8781), 4326)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of leaving this as unexecuted, I think it would be better to (1) use the built-in PostGIS container we have in the repo (you can start it via docker compose up --detach) and (2) start the tutorial by writing a simple GeoDataFrame into PostGIS. That way when this tutorial needs editing somebody else can easily recreate the data required.
| sd = sedona.db.connect() | ||
| df = sd.create_data_frame(gdf) | ||
| df.show() | ||
| df.schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think outputs like df.show() and df.schema should be shown. If you implement the suggestion about making the notebook reproducible above, it should be easy to execute all cells in the notebook and have the output rendered automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run this notebook such that the cells render?
| ```python | ||
| df.head(5).show() | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another output that would be great to show.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run this notebook so that the cells render?
|
just a quick note that I’ve pushed the latest updates addressing the feedback. Thanks! |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for continuing to work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the changes to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
| sd = sedona.db.connect() | ||
| df = sd.create_data_frame(gdf) | ||
| df.show() | ||
| df.schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run this notebook such that the cells render?
| ```python | ||
| df.head(5).show() | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run this notebook so that the cells render?
|
|
||
| ```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly explain these steps? Like:
First, use adbc_ingest to insert the values into a temporary table with geometry encoded as WKB. Then, use a CREATE TABLE AS query to create a table with the appropriate geometry column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion — I’ll add a brief step-by-step explanation in the notebook describing the adbc_ingest → temporary table → CREATE TABLE AS flow and push an update shortly
| ``` | ||
|
|
||
| ### Reading data from PostGIS into SedonaDB using ADBC | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the steps here? Like:
First, write a query that returns geometry as well-known binary (WKB). Then, use SedonaDB to transform that column into a geometry column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense — I’ll add a brief explanation in the notebook clarifying that we first query PostGIS to return geometries as WKB, then use SedonaDB to convert that WKB column back into a geometry column.
| - `adbc-driver-postgresql` | ||
|
|
||
| ### Optional: Installing dependencies in a Jupyter environment | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these two sections be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure — I’ll merge these two sections to avoid duplication.
|
|
||
| ````bash | ||
| pip install geopandas sqlalchemy psycopg2-binary adbc-driver-postgresql | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two bash blocks are missing the closing backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! I’ll fix the missing closing backticks.
|
Regarding running the notebook so the cells render: I attempted to run all cells locally, but rendering the outputs requires a working SedonaDB runtime (which isn’t available via PyPI) and a fully configured PostGIS + SedonaDB development environment. To keep the notebook reproducible for contributors, I’ve kept all cells executable and focused on correctness rather than committing environment-specific outputs. Please let me know if you’d prefer placeholder outputs or screenshots instead. |
|
|
Fixes #177
This PR adds a PostGIS integration guide for SedonaDB.
The documentation is written as a Jupyter notebook and rendered to Markdown
using the existing docs build pipeline.
The page covers:
adbc_ingest()andfetch_arrow()to avoid row-wise iteration and intermediate Pandas DataFrames
This PR only updates documentation and does not modify any source code.