How to handle NAs in raster stacks #43

damondamondamon · 2022-03-09T11:22:24Z

damondamondamon
Mar 9, 2022

Hey mlr3spatial team,
thanks for handling the last bug I reported.

On another note, I was wondering how to handle NAs within a prediction pipeline in mlr3/mlr3spatial, as NAs can come quite naturally in raster data (e.g. rotated satellite imagery with a lot of "white space" within the bounding box).

In a flattened sf object (each raster cell is a data row) I could just use something like na.omit() to get a clean dataset - but how could this look like for raster stacks/stars objects?

In my specific case I also have several bands where some cell values are missing - I would rather remove them from the training pipeline than imputing them. I have a feeling that this is rather a general technical mlr3 topic but is quite specific to this data format, which is why I am asking the question here.

Thanks in advance for your input!

Answered by be-marc

May 26, 2022

Raster cells with NAs in one or multiple layers are now NA in the predicted raster. Done by #50.

View full answer

pat-s · 2022-03-09T13:44:43Z

pat-s
Mar 9, 2022
Maintainer

Thanks for raising this issue! You're definitely in the correct place.

I think so far we haven't investigated this scenario in detail, so yeah, currently there is no straightforward solution to this.
But we should definitely provide a solution or guidance how to deal with this scnario.

I think we should be able to reproduce this by tweaking the stack we use in the examples and inject some NA's.

In an optimal way this should be doable via a pipeop from mlr3 to stay consistent rather than adding a option just within {mlr3spatial}. What do you think @be-marc?

0 replies

be-marc · 2022-03-09T18:10:07Z

be-marc
Mar 9, 2022
Maintainer Sponsor

Proposal

Demo raster with missing values.

library(mlr3)
library(mlr3spatial)
library(sf)
library(terra)
library(R6)
library(checkmate)

demo_raster = function(dimension, missings = FALSE) {
  assert_int(dimension, lower = 2)
  assert_int(dimension, lower = 2)
  data = matrix(c(stats::rnorm(floor(dimension^2 / 2), 0, 1),
    stats::rnorm(ceiling(dimension^2 / 2), 1, 1)), nrow = dimension)
  if (missings) {
    data = apply(data, 2, function(x) {
      x[sample(c(TRUE, FALSE), length(x), replace = TRUE, prob = c(0.1, 0.9))] = NA
      x
    })
  }
  write_raster(data)
}

Demo raster stack with missing values.

demo_stack_spatraster = function(size = 50, layers = 5, missings = FALSE) {
  assert_int(layers, lower = 1)
  dimension = floor(sqrt(size / layers * 1e+06 / 4))
  raster_features = replicate(layers - 1, demo_raster(dimension, missings = missings))
  data_response = matrix(c(rep(0, floor(dimension^2 / 2)), rep(1, ceiling(dimension^2 / 2))), nrow = dimension)
  raster_response = write_raster(data_response)
  raster = terra::rast(c(raster_features, list(raster_response)))
  names(raster) = c(paste0("x_", 1:(layers - 1)), "y")
  raster
}

Generate a study area. The truth data is an sf object with point observations. The features are a raster stack.

generate_study_area = function(size = 50L, layers = 5L, missings = TRUE) {
  raster = demo_stack_spatraster(size, layers, missings)
  observations = sf::st_as_sf(terra::spatSample(raster, size = 1000, as.points = TRUE))
  raster$y = NULL
  list(observation = na.omit(observations), data = raster)
}

Let's create a study area and train a learner on the observations.

# generate study area
study_area = generate_study_area(size = 10L, missings = TRUE)

# train task
backend = as_data_backend(study_area$observation)
train_task = as_task_regr(backend, target = "y")

# learner fitted on training data
learner = lrn("regr.rpart")
learner$train(train_task)

Now we want to predict all raster cells but leave out cells that have missing values. Usually, the data is read in the Learner$.predict() method. At the moment this fails with learners that don't support missing values. So we need to filter the cells with missing values before they are passed to the trained learner. Just filtering them out is not enough because we need to insert NaNs in the raster at cells with missing values.

We can do this by wrapping the trained learner inside a new special learner. In a final release, we can do the wrapping in predict_spatial().

LearnerRegrSpatialPredict = R6::R6Class("LearnerRegrSpatialPredict",
  inherit = LearnerRegr,
  public = list(
    learner = NULL,

    initialize = function(learner) {
      self$learner = assert_learner(learner)
    },

    predict = function(task, row_ids = NULL) {
      data = task$data(rows = row_ids)
      ids = complete.cases(data[, task$feature_names, with = FALSE])
      pred = self$learner$predict_newdata(data[ids])
      response = rep(NaN, nrow(data))
      response[ids] = pred$data$response
      pred$data$row_ids = seq_len(nrow(data))
      pred$data$response = response
      pred$data$truth = rep(NaN, nrow(data))
      pred
    }
  )
)

Using $predict_newdata avoids to read the data more than once from disk. However, $predict_newdata copies the data frame once. This will increase the ram usage but hopefully not affect the runtime too much.

Predict the raster.

# predict task
backend = as_data_backend(study_area$data)
predict_task = TaskUnsupervised$new("study_area", task_type = "regr", backend = backend)

# wrap learner
spatial_learner = LearnerRegrSpatialPredict$new(learner)

# predict raster
response = predict_spatial(predict_task, spatial_learner, 5L)
plot(response)

0 replies

be-marc · 2022-05-26T08:59:38Z

be-marc
May 26, 2022
Maintainer Sponsor

Raster cells with NAs in one or multiple layers are now NA in the predicted raster. Done by #50.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to handle NAs in raster stacks #43

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to handle NAs in raster stacks #43

Uh oh!

damondamondamon Mar 9, 2022

Replies: 3 comments

Uh oh!

pat-s Mar 9, 2022 Maintainer

Uh oh!

Uh oh!

be-marc Mar 9, 2022 Maintainer Sponsor

Proposal

Uh oh!

be-marc May 26, 2022 Maintainer Sponsor

damondamondamon
Mar 9, 2022

pat-s
Mar 9, 2022
Maintainer

be-marc
Mar 9, 2022
Maintainer Sponsor

be-marc
May 26, 2022
Maintainer Sponsor