Supervised spatial indices #20
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds supervised spatial indices to the package. Supervised spatial indices are a method of structuring a spatial training regime for local models. Basically, you fit a local model for each page in a spatial index, and split the page when the model can be improved by doing so. This adds quite a few classes:
QuadtreeRegressor
: Use a Quadtree to structure a spatial feature engineering search. This trains a quadtree on the data where pages are split if they improve the regression loss. After the tree is built, we train a single final model on the set offeature:spatial_index_page
interaction terms. Pruning "rolls up"feature:spatial_index_page
interaction terms along the tree structure. This works on both rasters (X
isn_row,n_col,n_features
) and point (X
isn_sites, n_features
) data.QuadtreeClassifier
: Same as above but for discrete outcomes.QuadtreeBoostingRegressor
: Same asQuadtreeRegressor
, but predictions accumulate down the tree. Thus, for each split, the parent prediction plus the child prediction is compared to the parent, rather than comparing the separate child model vs. the parent model in the child.QuadtreeEnsembleRegressor
: Same asQuadtreeRegressor
, but instead of training a single global model on the discovered feature:spatial index interaction terms, we use ensemble of local models at each leaf in the spatial index. For any additive model, this will be the same asQuadtreeRegressor
.QuadtreeEnsembleClassifier
: Same asQuadtreeEnsembleRegressor
, but for discrete outcomes.KDTreeRegressor
: LikeQuadtreeRegressor
, but using a KDTree instead of a Quadtree. This means that each parent has two children (rather than four), and splits are made at the absolute residual-weighted median of the longest page side by default.KDTreeClassifier
: LikeKDTreeRegressor
but for discrete outcomesKDTreeBoostingRegressor
: LikeQuadtreeBoostingRegressor
but with KDTreesKDTreeEnsembleRegressor
: LikeQuadtreeEnsembleRegressor
but with KDTreesKDTreeEnsembleClassifier
:LikeQuadtreeEnsembleClassifier
but with KDTreesTodo:
split_test='eps'
by default.Allow splines. Each page instantiates a knot (KDTree at the median, Quadtree at the center). Right now, we just fit piecewise-linear models over the domain. But, one could apply a basis function inb(X,Y)
, too, which would ensure that predictions are smooth from page to page.