Skip to content

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Jul 4, 2025

This adds supervised spatial indices to the package. Supervised spatial indices are a method of structuring a spatial training regime for local models. Basically, you fit a local model for each page in a spatial index, and split the page when the model can be improved by doing so. This adds quite a few classes:

  1. QuadtreeRegressor: Use a Quadtree to structure a spatial feature engineering search. This trains a quadtree on the data where pages are split if they improve the regression loss. After the tree is built, we train a single final model on the set of feature:spatial_index_page interaction terms. Pruning "rolls up" feature:spatial_index_page interaction terms along the tree structure. This works on both rasters (X is n_row,n_col,n_features) and point (X is n_sites, n_features) data.
  2. QuadtreeClassifier: Same as above but for discrete outcomes.
  3. QuadtreeBoostingRegressor: Same as QuadtreeRegressor, but predictions accumulate down the tree. Thus, for each split, the parent prediction plus the child prediction is compared to the parent, rather than comparing the separate child model vs. the parent model in the child.
  4. QuadtreeEnsembleRegressor: Same as QuadtreeRegressor, but instead of training a single global model on the discovered feature:spatial index interaction terms, we use ensemble of local models at each leaf in the spatial index. For any additive model, this will be the same as QuadtreeRegressor.
  5. QuadtreeEnsembleClassifier: Same as QuadtreeEnsembleRegressor, but for discrete outcomes.
  6. KDTreeRegressor: Like QuadtreeRegressor, but using a KDTree instead of a Quadtree. This means that each parent has two children (rather than four), and splits are made at the absolute residual-weighted median of the longest page side by default.
  7. KDTreeClassifier: Like KDTreeRegressor but for discrete outcomes
  8. KDTreeBoostingRegressor: Like QuadtreeBoostingRegressor but with KDTrees
  9. KDTreeEnsembleRegressor: Like QuadtreeEnsembleRegressor but with KDTrees
  10. KDTreeEnsembleClassifier:Like QuadtreeEnsembleClassifier but with KDTrees

Todo:

  • tests
  • Hilbert RTree for areal/lattice data
  • set split_test='eps' by default.
  • Allow splines. Each page instantiates a knot (KDTree at the median, Quadtree at the center). Right now, we just fit piecewise-linear models over the domain. But, one could apply a basis function in b(X,Y), too, which would ensure that predictions are smooth from page to page.

@ljwolf ljwolf marked this pull request as draft July 4, 2025 09:02
Copy link

codecov bot commented Jul 4, 2025

Codecov Report

Attention: Patch coverage is 0% with 396 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@5858f2f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
gwlearn/quadtree.py 0.00% 396 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #20   +/-   ##
=======================================
  Coverage        ?   59.72%           
=======================================
  Files           ?        6           
  Lines           ?     1100           
  Branches        ?        0           
=======================================
  Hits            ?      657           
  Misses          ?      443           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ljwolf
Copy link
Member Author

ljwolf commented Jul 4, 2025

After researching it, I am going to say that adding splines here is out of scope for this PR. I thought it would be as easy as applying a b-simple spline recursion function to the dummy variable matrix. But, I do not immediately see how to do this.

Naively, we'd do the following.

  1. Start with a global model.
  2. Consider candidate split in branch j. This split introduces extra knots (KDTree adds 2, Quadtree adds 4). Calculate a new scipy.interpolate.bisplrep(*data_coords, z=y, tx=new_knots, ty=new_knots).
  3. Test if the new spline improves score by eps. If so, keep the new knots and add splits to queue. If not, consider j branch fathomed.
  4. After growing, use the same "roll-up" pruning procedure; check the feature importance of the sets of spline terms by knot. If those are not important (in sum), zero that set of coefficients for that feature.

I think introducing splines would make this a SpatialMARS. With a little bit of digging, I think if this were desired, I'd need to look into the bivariate spline literature to see if this style of dynamic knot generation hasn't already been figured out. It seems highly likely someone has done this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant