Skip to content

Reworked critical questions and added summary #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 49 additions & 12 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ library(knitr)
opts_chunk$set(fig.width = 12, fig.align = 'center', fig.height = 8, dpi = 72)
```

# Week 2, Tutorial 6: Advanced Raster Analysis
# Advanced Raster Analysis

## Learning objectives
* Carry out a supervised classification on a SpatRaster
Expand All @@ -35,14 +35,14 @@ opts_chunk$set(fig.width = 12, fig.align = 'center', fig.height = 8, dpi = 72)

## Introduction to Sentinel-2 data used here

We will carry out a supervised classification using Sentinel 2 data for the Gewata region in Ethiopia. To do this, we use atmospherically corrected Level 2A data acquired on December 27 2020. These data were downloaded from ESA's online data hub (https://scihub.copernicus.eu/dhus), a part of the Copernicus European Programme. As it is freely available, Sentinel data has been commonly used next to Landsat data for environmental monitoring.
We will carry out a supervised classification using Sentinel 2 data for the Gewata region in Ethiopia. To do this, we use atmospherically corrected Level 2A data acquired on December 27, 2020. These data were downloaded from [ESA's online data hub](https://scihub.copernicus.eu/dhus), a part of the Copernicus European Programme. As it is freely available, Sentinel data has been commonly used next to Landsat data for environmental monitoring.
![Sentinel bands in comparison to Lansat bands ](figs/Landsat.v.Sentinel-2.jpg)

## Data exploration

Download the data to your computer and open your preferred R IDE to the directory of this tutorial.

After downloading the data we begin with visualization. The data consists of all the Sentinel 2 bands at a spatial resolution of 20 m. We will also make use of training polygons for the land cover classification, which will be introduced later.
After downloading the data we begin with visualization. The data consists of all the Sentinel 2 bands at a spatial resolution of 20 m, meaning that each pixel on the scene corresponds to a ground distance of 20 m by 20 m. We will also make use of training polygons for the land cover classification, which will be introduced later.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a little more description since students may not be familiar with the term "spatial resolution".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in fact the preferred term is actually "pixel size"! The term "resolution" means the smallest recognisable object. Sentinel-2 resolution is actually 10 m (you can tell apart objects that are 10 m across), but in this example, we use bands aggregated to 20 m pixel size or pixel spacing. Likewise, we could disaggregate it to 5 m pixel size (e.g. using bilinear interpolation), but that won't make the resolution any finer, as you still won't be able to identify any objects that are smaller than 10 m across.


```{r, message=FALSE, include=TRUE, results='hide', warning=FALSE}
# Check for packages and install if missing
Expand Down Expand Up @@ -102,13 +102,13 @@ pairs(Gewata, maxpixels = 1000)

Note that both `hist()` and `pairs()` compute histograms and scatterplots based on a random sample of raster pixels. The size of this sample can be changed with the argument `maxpixels`.

Calling `pairs()` on a `SpatRaster` reveals potential correlations between the layers themselves. In the case of bands of the Gewata subset, we can see that band 3 and 5 (in the visual part of the EM spectrum) and bands 6 and 7 (in the near infrared part of the EM spectrum) are highly correlated. Similar correlation also exist between band 11 and 12. However, Band 8a contains significant non-redundant information.
Calling `pairs()` on a `SpatRaster` reveals potential correlations between the layers themselves, which give an indication of which information may be redundant.

```{block, type="alert alert-success"}
> **Question 1**: Given what we know about the location of these bands along the EM spectrum, how could these scatterplots be explained?
> **Question 1**: In the case of the bands of the Gewata subset, list two pairs of bands that contain redundant information and two bands that have significant non-redundant information.
```

In the previous tutorial, we explored two ways to calculate NDVI, using direct raster algebra or using `app()`. Since we will be using NDVI again later in this tutorial, let's calculate it again and store it in our workspace using `app()`.
In a [previous tutorial](../IntroToRaster/index.html#subsetting-layers-from-spatraster), we explored two ways to calculate NDVI, using direct raster algebra or using `app()`. Since we will be using NDVI again later in this tutorial, let's calculate it again and store it in our workspace using `app()`.

```{r}
par(mfrow = c(1, 1)) # reset plotting window
Expand All @@ -119,7 +119,7 @@ plot(ndvi)
Aside from the advantages of `app()` regarding memory usage, an additional advantage of this function is the fact that the result can be written immediately to the file by including the `filename = "..."` argument, which will allow you to write your results to file immediately, after which you can reload it in subsequent sessions without having to repeat your analysis.

```{block, type="alert alert-success"}
> **Question 2**: What is the advantage of including the NDVI layer in the classification?
> **Question 2**: What is the advantage of including the NDVI layer in the landcover classification? Hint: For information on NDVI, check out [this source](https://gisgeography.com/ndvi-normalized-difference-vegetation-index/).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a hint with more information about NDVI to help students understand this new term. This is a pretty open ended question so it should be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be good. Note that there should be a space in "land cover", though.

```

## Classifying raster data
Expand Down Expand Up @@ -293,10 +293,22 @@ class(modelRF)
str(modelRF)
names(modelRF)

# Inspect the prediction error
modelRF$prediction.error

# Calculate the overall accuracy
1 - modelRF$prediction.error

# Inspect the confusion matrix of the OOB error assessment
modelRF$confusion.matrix
```

The overall accuracy and the confusion matrix are often used to evaluate the results of a supervised classification. However, the confusion matrix can provide more detailed information by giving per-class accuracies.

```{block, type="alert alert-info"}
**Note**: If you wish to learn how to read a confusion matrix, check out this [tutorial](https://www.evidentlyai.com/classification-metrics/confusion-matrix).
```

Earlier we provided a brief explanation of OOB error, though it can be a valuable metric for evaluating your model, it can overestimate the true prediction error depending on the parameters presents in the model.

Since we set `importance = "permutation"`, we now also have information on the statistical importance of each of our covariates which we can retrieve using the `importance()` command.
Expand All @@ -311,7 +323,7 @@ The mean decrease in accuracy indicates the amount by which the classification a
Since the NDVI layer scores relatively low according to the mean accuracy decrease criterion, try to construct an alternate Random Forest model as above, but excluding this layer, you can name it something like 'modelRF2'.

```{block, type="alert alert-success"}
> **Question 4**: What effect does this have on the overall accuracy of the results (hint: compare the confusion matrices of the original and new outputs). What effect does leaving this variable out have on the processing time (hint: use `system.time()`)?
> **Question 4**: What effect does this have on the accuracy of the results? Hint: Compare the overall accuracies (or the confusion matrices) of the original and new outputs. What effect does leaving this variable out have on the processing time? Hint: use `system.time()`?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question assumes that students have worked with accuracies and confusion matrices, which may not be the case. I would make the question more flexible, allowing students to solely focus on the overall accuracy to answer the question if they do not feel comfortable with the confusion matrix. (I just noticed a typo at the end of the question, misplaced question mark.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine. And yes, there's a misplaced question mark :) It's also technically two questions, maybe good to split them?

```

Now we apply this model to the rest of the image and assign classes to all pixels. Note that for this step, the names of the layers in the input SpatRaster (here `covs`) must correspond exactly to the column names of the training table. We will use the `predict()` function from the `terra` package. This function uses a pre-defined model to predict values of raster cells based on a SpatRaster. This model can be derived by a linear regression, for example. In our case, we will use the model provided by the `ranger()` function.
Expand Down Expand Up @@ -515,8 +527,33 @@ plot(forest, col = "dark green", legend = FALSE)

# Today's summary

We learned about:
Today you performed a supervised classification, you identified patches and sieve connected cells, and you learned to deal with thematic raster data. Some functions to retain:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"retain"? Maybe "remember" would be more usual here.


## Data exploration

* `hist()`: Create a histogram for each layer of a `SpatRaster`.
* `pairs()`: Create a scatterplot for each pair of layers of a `SpatRaster`.
* `app()`: Apply a function to all pixels of a `SpatRaster` more efficiently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also apply a custom function, so perhaps add in parentheses "(custom)"


## Training data preparation

* `extract()`: Retrieve a value for the raster below a polygon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also point or line, so you can just say "a vector"


## Contruct a Random Forest model

* `ranger()`: Build a Random Forest model based on a data frame of predictors and a response variable. `importance = "permutation"` and `importance()` to get the statistical importance of each predictor.

## Run a model on the data

* `predict()`: Predict raster values based on a predefined model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "pretrained" or "trained" would be more clear here.


## Applying a raster sieve by identifying patches

* `setValues()`: Assign a new value to a raster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely clear; rather, setValues() sets all values of a raster to a certain value or certain values. This is equivalent to MySpatRaster[] = MyValue.

* `patches()`: Identify patches of raster cells and attribute an ID to each. Specify neighbors with `direction = 4`or `direction = 8`.
* `freq()`: Count the number of raster cells per ID.

## Working with thematic rasters

* `segregate()`: Create a `SpatRaster` object for each layer or class.

- performing a supervised classification using R scripts
- identifying patches and sieve connected cells within a raster
- dealing with thematic (i.e. categorical) raster data