-
Notifications
You must be signed in to change notification settings - Fork 16
Reworked critical questions and added summary #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ library(knitr) | |
opts_chunk$set(fig.width = 12, fig.align = 'center', fig.height = 8, dpi = 72) | ||
``` | ||
|
||
# Week 2, Tutorial 6: Advanced Raster Analysis | ||
# Advanced Raster Analysis | ||
|
||
## Learning objectives | ||
* Carry out a supervised classification on a SpatRaster | ||
|
@@ -35,14 +35,14 @@ opts_chunk$set(fig.width = 12, fig.align = 'center', fig.height = 8, dpi = 72) | |
|
||
## Introduction to Sentinel-2 data used here | ||
|
||
We will carry out a supervised classification using Sentinel 2 data for the Gewata region in Ethiopia. To do this, we use atmospherically corrected Level 2A data acquired on December 27 2020. These data were downloaded from ESA's online data hub (https://scihub.copernicus.eu/dhus), a part of the Copernicus European Programme. As it is freely available, Sentinel data has been commonly used next to Landsat data for environmental monitoring. | ||
We will carry out a supervised classification using Sentinel 2 data for the Gewata region in Ethiopia. To do this, we use atmospherically corrected Level 2A data acquired on December 27, 2020. These data were downloaded from [ESA's online data hub](https://scihub.copernicus.eu/dhus), a part of the Copernicus European Programme. As it is freely available, Sentinel data has been commonly used next to Landsat data for environmental monitoring. | ||
 | ||
|
||
## Data exploration | ||
|
||
Download the data to your computer and open your preferred R IDE to the directory of this tutorial. | ||
|
||
After downloading the data we begin with visualization. The data consists of all the Sentinel 2 bands at a spatial resolution of 20 m. We will also make use of training polygons for the land cover classification, which will be introduced later. | ||
After downloading the data we begin with visualization. The data consists of all the Sentinel 2 bands at a spatial resolution of 20 m, meaning that each pixel on the scene corresponds to a ground distance of 20 m by 20 m. We will also make use of training polygons for the land cover classification, which will be introduced later. | ||
|
||
```{r, message=FALSE, include=TRUE, results='hide', warning=FALSE} | ||
# Check for packages and install if missing | ||
|
@@ -102,13 +102,13 @@ pairs(Gewata, maxpixels = 1000) | |
|
||
Note that both `hist()` and `pairs()` compute histograms and scatterplots based on a random sample of raster pixels. The size of this sample can be changed with the argument `maxpixels`. | ||
|
||
Calling `pairs()` on a `SpatRaster` reveals potential correlations between the layers themselves. In the case of bands of the Gewata subset, we can see that band 3 and 5 (in the visual part of the EM spectrum) and bands 6 and 7 (in the near infrared part of the EM spectrum) are highly correlated. Similar correlation also exist between band 11 and 12. However, Band 8a contains significant non-redundant information. | ||
Calling `pairs()` on a `SpatRaster` reveals potential correlations between the layers themselves, which give an indication of which information may be redundant. | ||
|
||
```{block, type="alert alert-success"} | ||
> **Question 1**: Given what we know about the location of these bands along the EM spectrum, how could these scatterplots be explained? | ||
> **Question 1**: In the case of the bands of the Gewata subset, list two pairs of bands that contain redundant information and two bands that have significant non-redundant information. | ||
GreatEmerald marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
In the previous tutorial, we explored two ways to calculate NDVI, using direct raster algebra or using `app()`. Since we will be using NDVI again later in this tutorial, let's calculate it again and store it in our workspace using `app()`. | ||
In a [previous tutorial](../IntroToRaster/index.html#subsetting-layers-from-spatraster), we explored two ways to calculate NDVI, using direct raster algebra or using `app()`. Since we will be using NDVI again later in this tutorial, let's calculate it again and store it in our workspace using `app()`. | ||
|
||
```{r} | ||
par(mfrow = c(1, 1)) # reset plotting window | ||
|
@@ -119,7 +119,7 @@ plot(ndvi) | |
Aside from the advantages of `app()` regarding memory usage, an additional advantage of this function is the fact that the result can be written immediately to the file by including the `filename = "..."` argument, which will allow you to write your results to file immediately, after which you can reload it in subsequent sessions without having to repeat your analysis. | ||
|
||
```{block, type="alert alert-success"} | ||
> **Question 2**: What is the advantage of including the NDVI layer in the classification? | ||
> **Question 2**: What is the advantage of including the NDVI layer in the landcover classification? Hint: For information on NDVI, check out [this source](https://gisgeography.com/ndvi-normalized-difference-vegetation-index/). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a hint with more information about NDVI to help students understand this new term. This is a pretty open ended question so it should be ok. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it should be good. Note that there should be a space in "land cover", though. |
||
``` | ||
|
||
## Classifying raster data | ||
|
@@ -293,10 +293,22 @@ class(modelRF) | |
str(modelRF) | ||
names(modelRF) | ||
|
||
# Inspect the prediction error | ||
modelRF$prediction.error | ||
|
||
# Calculate the overall accuracy | ||
1 - modelRF$prediction.error | ||
|
||
GreatEmerald marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Inspect the confusion matrix of the OOB error assessment | ||
modelRF$confusion.matrix | ||
``` | ||
|
||
The overall accuracy and the confusion matrix are often used to evaluate the results of a supervised classification. However, the confusion matrix can provide more detailed information by giving per-class accuracies. | ||
|
||
```{block, type="alert alert-info"} | ||
**Note**: If you wish to learn how to read a confusion matrix, check out this [tutorial](https://www.evidentlyai.com/classification-metrics/confusion-matrix). | ||
``` | ||
GreatEmerald marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Earlier we provided a brief explanation of OOB error, though it can be a valuable metric for evaluating your model, it can overestimate the true prediction error depending on the parameters presents in the model. | ||
|
||
Since we set `importance = "permutation"`, we now also have information on the statistical importance of each of our covariates which we can retrieve using the `importance()` command. | ||
|
@@ -311,7 +323,7 @@ The mean decrease in accuracy indicates the amount by which the classification a | |
Since the NDVI layer scores relatively low according to the mean accuracy decrease criterion, try to construct an alternate Random Forest model as above, but excluding this layer, you can name it something like 'modelRF2'. | ||
|
||
```{block, type="alert alert-success"} | ||
> **Question 4**: What effect does this have on the overall accuracy of the results (hint: compare the confusion matrices of the original and new outputs). What effect does leaving this variable out have on the processing time (hint: use `system.time()`)? | ||
> **Question 4**: What effect does this have on the accuracy of the results? Hint: Compare the overall accuracies (or the confusion matrices) of the original and new outputs. What effect does leaving this variable out have on the processing time? Hint: use `system.time()`? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The question assumes that students have worked with accuracies and confusion matrices, which may not be the case. I would make the question more flexible, allowing students to solely focus on the overall accuracy to answer the question if they do not feel comfortable with the confusion matrix. (I just noticed a typo at the end of the question, misplaced question mark.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's fine. And yes, there's a misplaced question mark :) It's also technically two questions, maybe good to split them? |
||
``` | ||
|
||
Now we apply this model to the rest of the image and assign classes to all pixels. Note that for this step, the names of the layers in the input SpatRaster (here `covs`) must correspond exactly to the column names of the training table. We will use the `predict()` function from the `terra` package. This function uses a pre-defined model to predict values of raster cells based on a SpatRaster. This model can be derived by a linear regression, for example. In our case, we will use the model provided by the `ranger()` function. | ||
|
@@ -515,8 +527,33 @@ plot(forest, col = "dark green", legend = FALSE) | |
|
||
# Today's summary | ||
|
||
We learned about: | ||
Today you performed a supervised classification, you identified patches and sieve connected cells, and you learned to deal with thematic raster data. Some functions to retain: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "retain"? Maybe "remember" would be more usual here. |
||
|
||
## Data exploration | ||
|
||
* `hist()`: Create a histogram for each layer of a `SpatRaster`. | ||
* `pairs()`: Create a scatterplot for each pair of layers of a `SpatRaster`. | ||
* `app()`: Apply a function to all pixels of a `SpatRaster` more efficiently. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also apply a custom function, so perhaps add in parentheses "(custom)" |
||
|
||
## Training data preparation | ||
|
||
* `extract()`: Retrieve a value for the raster below a polygon. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But also point or line, so you can just say "a vector" |
||
|
||
## Contruct a Random Forest model | ||
|
||
* `ranger()`: Build a Random Forest model based on a data frame of predictors and a response variable. `importance = "permutation"` and `importance()` to get the statistical importance of each predictor. | ||
|
||
## Run a model on the data | ||
|
||
* `predict()`: Predict raster values based on a predefined model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps "pretrained" or "trained" would be more clear here. |
||
|
||
## Applying a raster sieve by identifying patches | ||
|
||
* `setValues()`: Assign a new value to a raster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not entirely clear; rather, |
||
* `patches()`: Identify patches of raster cells and attribute an ID to each. Specify neighbors with `direction = 4`or `direction = 8`. | ||
* `freq()`: Count the number of raster cells per ID. | ||
|
||
## Working with thematic rasters | ||
|
||
* `segregate()`: Create a `SpatRaster` object for each layer or class. | ||
GreatEmerald marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- performing a supervised classification using R scripts | ||
- identifying patches and sieve connected cells within a raster | ||
- dealing with thematic (i.e. categorical) raster data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little more description since students may not be familiar with the term "spatial resolution".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and in fact the preferred term is actually "pixel size"! The term "resolution" means the smallest recognisable object. Sentinel-2 resolution is actually 10 m (you can tell apart objects that are 10 m across), but in this example, we use bands aggregated to 20 m pixel size or pixel spacing. Likewise, we could disaggregate it to 5 m pixel size (e.g. using bilinear interpolation), but that won't make the resolution any finer, as you still won't be able to identify any objects that are smaller than 10 m across.