06-evaluation.Rmd

# Forecast evaluation {#accuracy}

Where possible, the accuracy evaluation should be handled by existing tidymodels tools such as [yardstick](https://tidymodels.github.io/yardstick/). It is likely that some changes or extensions will be needed for full support of time series accuracy metrics.

## Accuracy

The [forecast package](https://github.com/robjhyndman/forecast/) implements accuracy as a function which is applied to a model. Out of sample accuracy can be computed by additionally providing a test set.

It is probably more transparent to compute accuracy metrics by directly providing actual response values and model predictions.

## Model vs data centric

forecast is model centric
```{r, eval = FALSE}
# forecast
accuracy(f = forecast, x = new_ts)
```
yardstick is data centric
https://github.com/r-lib/generics/pull/22

```{r, eval = FALSE}
# yardstick
fit_tbl %>% 
  accuracy(col1, col2)
```

## [Proposed fable API](https://github.com/tidyverts/fable/issues/66)

### Desirable functionality
By default, `accuracy()` should provide a basic set of measures of fit for both models (`mdl_df`) and forecasts (`fbl_ts`), similarly to the `forecast` package (perhaps only MAE, RMSE/MSE,  and MAPE by default).

It should be sufficiently flexible to support analysts in calculating a wide variety of accuracy measures, including:

- Point forecast accuracy measures
- Interval accuracy measures
- Distribution accuracy measures
- User specified accuracy measures

The user should be able to specify which measures they wish to compute, including measures exported by `fablelite`, measures from extension packages, and user specified measures.

### Proposed user interface

The accuracy measures to be calculated can be specified as a list of accuracy measure functions as the `measures` argument. This input will also be flattened, allowing groups of accuracy measures to be defined.

The `...` is used to provide additional arguments that will be applied to all accuracy measures (where supported).

For models (`mdl_df`), no additional inputs are required:

```r
mbl %>% 
  accuracy(
    measures = list(MASE, MAE, ME),
    ...
  )
```

For forecasts (`fbl_ts`), the test set must be provided. Additionally, the dataset used for model training can be provided (interface still under consideration) to extend the inputs (required for MASE):
```r
mbl %>% 
  accuracy(
    new_data,
    measures = list(MASE, MAE, ME),
    training_data = NULL
    ...
  )
```

### Implementation details

To achieve this, accuracy measure functions can expect a set of basic inputs from `accuracy()`. The measures that are required for computation should be used as formals for the function. These inputs include (list is not yet comprehensive and will be added to):

- .resid: A vector of residuals from either the training (model accuracy) or test (forecast accuracy) data.
- .resp: A vector of responses matching the residuals (for forecast accuracy, the original data must be provided).
- .fitted: The fitted values from the model, or forecasted values from the forecast.
- .dist: The distribution of fitted values from the model, or forecasted values from the forecast.
- .period: The seasonal period of the data (defaulting to 'smallest' seasonal period).
- .expr_resp: An expression for the response variable.

If a method allows more inputs than this, such as demeaning for MASE, these additional arguments are provided in the dots of the accuracy function.

## Cross validation

`CV(tsbl, mdl, h, window_type, ...)`

## Visualisation