-
Notifications
You must be signed in to change notification settings - Fork 742
[GH-2504] Geopandas: Implement force_3d #2512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
petern48
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just some minor feedback about improving the docs and the tests. Should be good to merge afterwards.
| >>> from shapely import Polygon, LineString, Point | ||
| >>> s = geopandas.GeoSeries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> from shapely import Polygon, LineString, Point | |
| >>> s = geopandas.GeoSeries( | |
| >>> from shapely import Polygon, LineString, Point | |
| >>> from sedona.spark.geopandas import GeoSeries | |
| >>> s = GeoSeries( |
nit
| 2D geometries will get the provided Z coordinate; 3D geometries | ||
| are unchanged (unless their Z coordinate is ``np.nan``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this in test_geoseries.py? I see that this is from the original geopandas docstring, so I'm not entirely sure if we follow the same behavior naturally in Sedona. If we don't let me know, and we can figure out what to do next.
We could test something like whether Point(1, 1, np.nan) becomes Point(1, 1, 3) when calling .force_3d(3)
| Note that for empty geometries, 3D is only supported since GEOS 3.9 and then | ||
| still only for simple geometries (non-collections). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note that for empty geometries, 3D is only supported since GEOS 3.9 and then | |
| still only for simple geometries (non-collections). |
I think we can delete this part. The GEOS 3.9 part doesn't apply to use because we're built not on GEOS (which is a C library). And I think we do support collections since the test_match_geopandas_series test passed naturally (self.geoms in that test contains collections). Could you add a single test in test_geoseries.py that tests a GEOMETRYCOLLECTION with a non-zero z-arg?
e.g. GeometryCollection(Point(1, 1), LineString([(1, 1), (0, 1), (1, 0)])) and force_3d(3)
| sgpd_result = GeoSeries(geom).force_3d() | ||
| gpd_result = gpd.GeoSeries(geom).force_3d() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sgpd_result = GeoSeries(geom).force_3d() | |
| gpd_result = gpd.GeoSeries(geom).force_3d() | |
| sgpd_result = GeoSeries(geom).force_3d(4) | |
| gpd_result = gpd.GeoSeries(geom).force_3d(4) |
This is a minor change, and I don't think Sedona would fail it, but I think this would be a slightly better test if we test a non-zero z argument. These "match" tests in test_match_geopandas_series.py are the most comprehensive/powerful, so I'd prefer to avoid a common default case like 0. e.g. the chances that we have a bug that always set z=0 in some case is much higher than the chance that it always set z=4.
@petern48 Thanks Peter, Great insights as always. Just addressed them. |
| LineString([(1, 1, 2), (0, 1, 2), (1, 0, 2)]), | ||
| Polygon([(0, 0, 3), (0, 10, 3), (10, 10, 3), (0, 0, 3)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LineString([(1, 1, 2), (0, 1, 2), (1, 0, 2)]), | |
| Polygon([(0, 0, 3), (0, 10, 3), (10, 10, 3), (0, 0, 3)]), | |
| LineString([(1, 1, 3), (0, 1, 3), (1, 0, 3)]), | |
| Polygon([(0, 0, 4), (0, 10, 4), (10, 10, 4), (0, 0, 4)]), |
I noticed our tests here were slightly buggy. Here's what the expected results should look like. These geoms correspond to the 3 and 4 in z = [0, 2, 2, 3, 4, 5]. What's concerning is that the tests pass anyway. Experimenting locally, I noticed we can change the z-coordinate however we like, and the tests still pass. I think what's happening is that the check_sgpd_equals_gpd() doesn't yet check anything past the x and y dimensions...
(sigh) Unfortunately, what this means is that we'll need to modify the test code to test this properly before we can merge this PR. I filed an issue for this: #2513. You're welcome to try taking a stab at it or just move on to something else (while keeping this PR open). My time is limited, so I wouldn't be able to investigate it for at least a few days. I don't love having to put a hold on merging your great work here, but this is just how it is sometimes. Sorry about this :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, Thanks for the insights Peter. No problem, I'll move on to a new one.
9fafeb5 to
f90875a
Compare
|
@petern48 I made some adjustments to the tests in test_match_geopandas_series.py because the previous assertions were failing for M and ZM geometry cases. Please review the changes and let me know if there’s a better or cleaner approach. Feedback is welcome. |
|
Responding to the deletions in this commit: f90875a. Here are the failures from the test_match_geopandas tests (I ran locally). Next time, ideally report back when you see behavior like this, instead of relying on me to investigate myself (sometimes, it may take longer that way). Sometimes, it's reasonable to skip; sometimes it's major enough to just put a halt to it. Alright let's see. Error for LinearRing: Sedona is returning the equivalent of a LineString. Sedona doesn't support LinearRings since LineStrings are enough. Yep this is fine to skip ✅. Error for MultiPolygon: Ah interesting, it's the same geometry, but we return polygon instead of multipolygon. This is a minor bug in Sedona we found. Ideally, we want to return MultiPolygon. I dug in real quick and submitted a fix for this: #2526 Trying it on PostGIS, I see returning MultiPolygon is the desired behavior as well Error for GeometryCollection Fails for the same reason as multipolygon (since it has a multipolygon inside of it). Error for M coordinates. Hmm, yeah this one is interesting. Give me a bit more time to figure out what to do here. |
@petern48 Sure, I'll report the anomalies from next time, Thanks for a very clear explanation. Take your time. Thanks |
petern48
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @chay0112. Thanks for being patient. I took another look and saw that Sedona follows PostGIS behavior (which is what we want), and Geopandas behavior apparently is just different. I think we can keep this difference in behavior, but specify this difference in the docstring. Left some suggestions, but I think this should be ready once addressed.
| pytest.skip("geopandas force_3d requires version 1.0.0 or higher") | ||
| # 1) Promote 2D to 3D with z = 4 | ||
| for geom in self.geoms: | ||
| if isinstance(geom[0], (LinearRing, GeometryCollection, MultiPolygon)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if isinstance(geom[0], (LinearRing, GeometryCollection, MultiPolygon)): | |
| if isinstance(geom[0], (LinearRing)): |
This should theoretically pass now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Peter, Thanks for your suggestions.
I believe if we remove Multipolygon and GeometryCollections the test would still fail for geoms, something like below
`FAILED
============================================================================== FAILURES ==============================================================================
_______________________________________________________________ TestMatchGeopandasSeries.test_force_3d _______________________________________________________________
python/tests/geopandas/test_match_geopandas_series.py:890:
python/tests/geopandas/test_geopandas_base.py:70: in check_sgpd_equals_gpd
cls.assert_geometry_almost_equal(a, e, tolerance)
cls = <class 'tests.geopandas.test_match_geopandas_series.TestMatchGeopandasSeries'>
left_geom = <POLYGON Z ((0 0 4, 0 1 4, 1 0 4, 0 0 4), (0.1 0.1 4, 0.1 0.2 4, 0.2 0.1 4, ...>
right_geom = <MULTIPOLYGON Z (((0 0 4, 0 1 4, 1 0 4, 0 0 4), (0.1 0.1 4, 0.1 0.2 4, 0.2 0...>, tolerance = 0.01
ValueError: Geometry equality check failed for POLYGON Z ((0 0 4, 0 1 4, 1 0 4, 0 0 4), (0.1 0.1 4, 0.1 0.2 4, 0.2 0.1 4, 0.1 0.1 4)) and MULTIPOLYGON Z (((0 0 4, 0 1 4, 1 0 4, 0 0 4), (0.1 0.1 4, 0.1 0.2 4, 0.2 0.1 4, 0.1 0.1 4)))
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I realize you're not yet familiar with iterating with Java changes. You need to update your code to reflect the changes I made in this PR. I'm guessing you likely followed this guide for dev env setup, since it was mentioned in the original geopandas EPIC issue.
- First, make sure you pull in the latest main and merge it into this branch.
- Now, even after you get the changes, you need to rebuild the Sedona JAR and move it to your SPARK_HOME directory. These are the two commands in section 3) of those compile docs.
Basically, you just need to redo the 2 commands in section 3, and then the changes should be reflected when you run pytest.
mvn clean install -DskipTests -Dgeotools
cp spark-shaded/target/sedona-spark-shaded-*.jar $SPARK_HOME/jars/
Note, if you run into trouble with the first command succeeding, you may need to specify versions in the mvn build command. e.g. Here's what I run each time: mvn clean install -DskipTests -Dgeotools -Dspark=3.4 -Pscala2.12. The same command likely would still work for you if you set up the same versions as in those docs. @chay0112 Give it a try. Knowing how to do this is also very helpful for getting into fixing small Java bugs, like I had done earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, Thanks for letting me know.
| """Force the dimensionality of a geometry to 3D. | ||
| 2D geometries will get the provided Z coordinate; 3D geometries | ||
| are unchanged (unless their Z coordinate is ``np.nan``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| are unchanged (unless their Z coordinate is ``np.nan``). | |
| are unchanged (unless their Z coordinate is ``np.nan``). | |
| Note: Sedona's behavior may differ from Geopandas' for M and ZM geometries. | |
| For M geometries, Sedona will replace the M coordinate and add the Z coordinate. | |
| For ZM geometries, Sedona will drop the M coordinate and retain the Z coordinate. |
| ] | ||
| ) | ||
| result = s.force_3d(z) | ||
| self.check_sgpd_equals_gpd(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add these cases below?
ST_Force3D(st_geomfromtext('POINT M (1 2 3)'), 7.5)
-- Result: POINT Z(1 2 7.5)
ST_Force3D(st_geomfromtext('POINT ZM (1 2 3 4)'), 7.5)
-- Result: POINT Z(1 2 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those curious about the difference in behavior, Geopandas returns the following instead. It's strange and inconsistent imo... I would guess it's unintentional and just a result of adding M support later on.
gpd.GeoSeries([shapely.wkt.loads("POINT M (1 2 3)")]).force_3d(7.5)
0 POINT M (1 2 3)
gpd.GeoSeries([shapely.wkt.loads("POINT ZM (1 2 3 4)")]).force_3d(7.5)
0 POINT Z (1 2 7.5)|
Also, could you mention "geopandas" in the PR title (somehow). The PR title eventually becomes the commit message when merged, so it's helpful to specify these details. Could be something like one of the following (I typically do the first one).
|
f90875a to
52f268b
Compare
|
@petern48 Just addressed all the comments. It looks like the docs build is failing, and I noticed the same issue in other PRs as well. I'm not sure of the root cause, but if that's not a blocker, this PR is ready for review. No rush though ! |
|
Yeah, the doc build failure should be unrelated. Recently was fixed here: #2551 |
petern48
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking it through!
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Geopandas: Implement force_3d #2504What changes were proposed in this PR?
How was this patch tested?
Did this PR include necessary documentation updates?