How to handle NAs in raster stacks #43
-
Hey mlr3spatial team, On another note, I was wondering how to handle NAs within a prediction pipeline in mlr3/mlr3spatial, as NAs can come quite naturally in raster data (e.g. rotated satellite imagery with a lot of "white space" within the bounding box). In a flattened sf object (each raster cell is a data row) I could just use something like na.omit() to get a clean dataset - but how could this look like for raster stacks/stars objects? In my specific case I also have several bands where some cell values are missing - I would rather remove them from the training pipeline than imputing them. I have a feeling that this is rather a general technical mlr3 topic but is quite specific to this data format, which is why I am asking the question here. Thanks in advance for your input! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Thanks for raising this issue! You're definitely in the correct place. I think so far we haven't investigated this scenario in detail, so yeah, currently there is no straightforward solution to this. I think we should be able to reproduce this by tweaking the stack we use in the examples and inject some NA's. In an optimal way this should be doable via a pipeop from mlr3 to stay consistent rather than adding a option just within {mlr3spatial}. What do you think @be-marc? |
Beta Was this translation helpful? Give feedback.
-
ProposalDemo raster with missing values. library(mlr3)
library(mlr3spatial)
library(sf)
library(terra)
library(R6)
library(checkmate)
demo_raster = function(dimension, missings = FALSE) {
assert_int(dimension, lower = 2)
assert_int(dimension, lower = 2)
data = matrix(c(stats::rnorm(floor(dimension^2 / 2), 0, 1),
stats::rnorm(ceiling(dimension^2 / 2), 1, 1)), nrow = dimension)
if (missings) {
data = apply(data, 2, function(x) {
x[sample(c(TRUE, FALSE), length(x), replace = TRUE, prob = c(0.1, 0.9))] = NA
x
})
}
write_raster(data)
} Demo raster stack with missing values. demo_stack_spatraster = function(size = 50, layers = 5, missings = FALSE) {
assert_int(layers, lower = 1)
dimension = floor(sqrt(size / layers * 1e+06 / 4))
raster_features = replicate(layers - 1, demo_raster(dimension, missings = missings))
data_response = matrix(c(rep(0, floor(dimension^2 / 2)), rep(1, ceiling(dimension^2 / 2))), nrow = dimension)
raster_response = write_raster(data_response)
raster = terra::rast(c(raster_features, list(raster_response)))
names(raster) = c(paste0("x_", 1:(layers - 1)), "y")
raster
} Generate a study area. The truth data is an sf object with point observations. The features are a raster stack. generate_study_area = function(size = 50L, layers = 5L, missings = TRUE) {
raster = demo_stack_spatraster(size, layers, missings)
observations = sf::st_as_sf(terra::spatSample(raster, size = 1000, as.points = TRUE))
raster$y = NULL
list(observation = na.omit(observations), data = raster)
} Let's create a study area and train a learner on the observations. # generate study area
study_area = generate_study_area(size = 10L, missings = TRUE)
# train task
backend = as_data_backend(study_area$observation)
train_task = as_task_regr(backend, target = "y")
# learner fitted on training data
learner = lrn("regr.rpart")
learner$train(train_task) Now we want to predict all raster cells but leave out cells that have missing values. Usually, the data is read in the We can do this by wrapping the trained learner inside a new special learner. In a final release, we can do the wrapping in LearnerRegrSpatialPredict = R6::R6Class("LearnerRegrSpatialPredict",
inherit = LearnerRegr,
public = list(
learner = NULL,
initialize = function(learner) {
self$learner = assert_learner(learner)
},
predict = function(task, row_ids = NULL) {
data = task$data(rows = row_ids)
ids = complete.cases(data[, task$feature_names, with = FALSE])
pred = self$learner$predict_newdata(data[ids])
response = rep(NaN, nrow(data))
response[ids] = pred$data$response
pred$data$row_ids = seq_len(nrow(data))
pred$data$response = response
pred$data$truth = rep(NaN, nrow(data))
pred
}
)
) Using Predict the raster. # predict task
backend = as_data_backend(study_area$data)
predict_task = TaskUnsupervised$new("study_area", task_type = "regr", backend = backend)
# wrap learner
spatial_learner = LearnerRegrSpatialPredict$new(learner)
# predict raster
response = predict_spatial(predict_task, spatial_learner, 5L)
plot(response) |
Beta Was this translation helpful? Give feedback.
-
Raster cells with NAs in one or multiple layers are now NA in the predicted raster. Done by #50. |
Beta Was this translation helpful? Give feedback.
Raster cells with NAs in one or multiple layers are now NA in the predicted raster. Done by #50.