Skip to content

Conversation

@aulemahal
Copy link
Collaborator

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes issue #xyz
  • Tests for the changes have been added (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated (with summary of main changes)

What kind of change does this PR introduce?:

When subsetting a curvilinear dataset (2D lat lon) to gridpoints, use a scipy.spatial.KDTree to find nearest neighbours with euclidean distance in lat/lon space, instead of computing the great circle distance (in meters) for all points.

We are already using a lat/lon euclidean distance for the rectilinear case (1D lat / lon). The loss in precision is compensated by a significant performance boost. For example, my use case needed to extract 94 points from a 800x1000 grid. Instead of computing 94x1000x800 great circle distances, we now only need to compute 94 when tolerance is passed. None otherwise.

Before this change, my use case took ~120 s and now it takes 350 ms.

I also modified how we get the lat and lon coordinate to use the utils instead of relying on lat and lon names. And I modified these utils so a variable named "lon" is detected as a longitude (and similarly for "lat").

Does this PR introduce a breaking change?:

Kinda as the distance metric has changed. In most cases, I don't expect different result, but there could be some extreme cases with points near the poles where a different neighbour is now choosed.

Other information:

@aulemahal aulemahal requested a review from sol1105 September 9, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants