-
DescriptionThe geoparquet 1.0 schema has been released recently. Since this a parquet based schema, there shouldn't be much work to formally support it. Current situationIf I take the example file, I can load it with geopandas: import geopandas
df = geopandas.read_parquet("example.parquet")
df
pop_est continent ... gdp_md_est geometry
0 889953.0 Oceania ... 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa ... 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa ... 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America ... 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America ... 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... But attempting to write the table with delta-rs fails via pyarrow since pyarrow has no comprehension of the properties of the binary from deltalake import DeltaTable, write_deltalake
write_deltalake("./data/geoparquet_delta", df)
...
pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column geometry with type geometry') An attempt at next stepsUsing the geoarrow spec, which is implemented in both python and rust already, we can overcome the pyarrow error above.
import geopandas
import geoarrow.pyarrow as ga
import pyarrow.parquet as pq
from deltalake import DeltaTable, write_deltalake
df = geopandas.read_parquet("example.parquet")
write_deltalake("./data/geoparquet_delta", df) # Fail converting `geometry`
arrow_df = pq.read_table("example.parquet") # Read geometry as a binary blob
write_deltalake("./data/geoparquet_delta", arrow_df) # writes geometry as blob
pandas_df = DeltaTable("./data/geoparquet_delta").to_pandas()
df = geopandas.GeoDataFrame(pandas_df, geometry=ga.to_geopandas(pandas_df.geometry)) # From delta Ultimately a solution may just be a case of adding a Use Case Storing geospatial datasets in a delta lake. For those interested in a deeper reasoning surrounding geoparquet: this blog post (now a little old but still relevant) is a good motivational basis. Related Issue(s) Was not able to find any specific ones. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
What's the datatype of the Something like this should work:
|
Beta Was this translation helpful? Give feedback.
-
Delta lake isn't just Parquet files, but has it's own spec and schema definition. So the most official way to go about this would be to add geometry columns as a type in the schema spec. I've had previous discussions with the core Delta Lake maintainers and they seem open to more specialized data types; just no one has proposed them yet.
I have one question on the GeoParquet spec. Initially I thought we could have geo-spatial as a column type. But in the docs it says:
I think if we were to keep with that, we wouldn't make it a data type. Though at the same time, maybe it's fine if we ignore that part? |
Beta Was this translation helpful? Give feedback.
-
Which can also be found here. The formal spec is here.
Thanks for the clarification. I'll read up on this as well as attempt an answer to your question - although I'm hoping we could pull in some of the experts on the geoparquet schema to tell us those details with confidence. I'll reach out to them. |
Beta Was this translation helpful? Give feedback.
Delta lake isn't just Parquet files, but has it's own spec and schema definition. So the most official way to go about this would be to add geometry columns as a type in the schema spec. I've had previous discussions with the core Delta Lake maintainers and they seem open to more specialized data types; just no one has proposed them yet.
I have one question on the GeoParquet spec. Initially I thought we could have geo-spatial as a column type. But in the docs it says:
I think if we were to keep with that, we wouldn't make i…