-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path06-evaluation.Rmd
85 lines (60 loc) · 3.45 KB
/
06-evaluation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# Forecast evaluation {#accuracy}
Where possible, the accuracy evaluation should be handled by existing tidymodels tools such as [yardstick](https://tidymodels.github.io/yardstick/). It is likely that some changes or extensions will be needed for full support of time series accuracy metrics.
## Accuracy
The [forecast package](https://github.com/robjhyndman/forecast/) implements accuracy as a function which is applied to a model. Out of sample accuracy can be computed by additionally providing a test set.
It is probably more transparent to compute accuracy metrics by directly providing actual response values and model predictions.
## Model vs data centric
forecast is model centric
```{r, eval = FALSE}
# forecast
accuracy(f = forecast, x = new_ts)
```
yardstick is data centric
https://github.com/r-lib/generics/pull/22
```{r, eval = FALSE}
# yardstick
fit_tbl %>%
accuracy(col1, col2)
```
## [Proposed fable API](https://github.com/tidyverts/fable/issues/66)
### Desirable functionality
By default, `accuracy()` should provide a basic set of measures of fit for both models (`mdl_df`) and forecasts (`fbl_ts`), similarly to the `forecast` package (perhaps only MAE, RMSE/MSE, and MAPE by default).
It should be sufficiently flexible to support analysts in calculating a wide variety of accuracy measures, including:
- Point forecast accuracy measures
- Interval accuracy measures
- Distribution accuracy measures
- User specified accuracy measures
The user should be able to specify which measures they wish to compute, including measures exported by `fablelite`, measures from extension packages, and user specified measures.
### Proposed user interface
The accuracy measures to be calculated can be specified as a list of accuracy measure functions as the `measures` argument. This input will also be flattened, allowing groups of accuracy measures to be defined.
The `...` is used to provide additional arguments that will be applied to all accuracy measures (where supported).
For models (`mdl_df`), no additional inputs are required:
```r
mbl %>%
accuracy(
measures = list(MASE, MAE, ME),
...
)
```
For forecasts (`fbl_ts`), the test set must be provided. Additionally, the dataset used for model training can be provided (interface still under consideration) to extend the inputs (required for MASE):
```r
mbl %>%
accuracy(
new_data,
measures = list(MASE, MAE, ME),
training_data = NULL
...
)
```
### Implementation details
To achieve this, accuracy measure functions can expect a set of basic inputs from `accuracy()`. The measures that are required for computation should be used as formals for the function. These inputs include (list is not yet comprehensive and will be added to):
- .resid: A vector of residuals from either the training (model accuracy) or test (forecast accuracy) data.
- .resp: A vector of responses matching the residuals (for forecast accuracy, the original data must be provided).
- .fitted: The fitted values from the model, or forecasted values from the forecast.
- .dist: The distribution of fitted values from the model, or forecasted values from the forecast.
- .period: The seasonal period of the data (defaulting to 'smallest' seasonal period).
- .expr_resp: An expression for the response variable.
If a method allows more inputs than this, such as demeaning for MASE, these additional arguments are provided in the dots of the accuracy function.
## Cross validation
`CV(tsbl, mdl, h, window_type, ...)`
## Visualisation