Skip to content

Commit

Permalink
fix: Improving burden_level vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
Lasse Engbo Christiansen committed Jan 31, 2025
1 parent 3bcfa28 commit 85ddfd7
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 39 deletions.
74 changes: 37 additions & 37 deletions vignettes/burden_levels.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Automated Detection of Seasonal Epidemic Burden Levels"
title: "Seasonal Burden Levels"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Automated Detection of Seasonal Epidemic Burden Levels}
%\VignetteIndexEntry{Seasonal Burden Levels}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Expand All @@ -19,13 +19,13 @@ library(aedseo)
```

To provide a concise overview of how the `seasonal_burden_levels()` algorithm operates, we utilize the same example data presented
in the `vignette("aedseo")`. The plot below illustrates the two `method` arguments available in the
in the `vignette("aedseo")`. The plot below illustrates the two methods available for estiming burden levels in the
`combined_seasonal_output()` function:

- **`intensity_levels`**: This method assesses burden levels by comparing it to observations from previous seasons.
- **`peak_levels`**: This method assesses burden levels by referencing only the highest observations within each season.
- **`intensity_levels`**: Intended for within season classification of observations.
- **`peak_levels`**: Intended for comparing the height of peaks between seasons.

The disease-specific threshold is the `very low` threshold for both methods.
The disease-specific threshold is the `very low` breakpoint for both methods. Breakpoints are named as the upper bounds for the burden levels.

```{r, echo = FALSE}
withr::local_seed(222)
Expand Down Expand Up @@ -127,7 +127,7 @@ burden_levels_df |>
legend.position = "right",
legend.key.width = grid::unit(2, "cm"),
axis.title = ggplot2::element_text(size = 12)
) +
ggplot2::scale_y_log10(

Check warning on line 132 in vignettes/burden_levels.Rmd

View workflow job for this annotation

GitHub Actions / lint

file=vignettes/burden_levels.Rmd,line=132,col=3,[indentation_linter] Indentation should be 2 spaces but is 3 spaces.
breaks = y_tics,
Expand All @@ -142,61 +142,61 @@ The methodology used to define the burden levels of seasonal epidemics is based
Historical data from all available seasons is used to establish the levels for the current season.
This is done by:

- Using `n` peak weekly observations from each season.
- Selecting only peak observations if they surpass the disease-specific threshold.
- Weighting the observations such that recent observations have a greater impact than older observations.
- A proper distribution (log-normal, weibull and exponential are implemented) is fitted to the weighted
`n` peak observations. Then the parameters of the selected distribution are optimised to select the best fit.
- Using `n` highest (peak) observations from each season.
- Selecting only observations if they surpass the disease-specific threshold.
- Weighting the observations such that recent observations have a greater influence than older observations.
- A proper distribution (log-normal, weibull and exponential are currently implemented) is fitted to the weighted
`n` peak observations. The selected distribution with the fitted parameters is used to calculate percentiles to be used as breakpoints.
- Burden levels can be defined by two methods:
- `intensity_levels` which models the risk compared to what has been observed in previous seasons.
- `peak_levels` which models the risk compared to what has been observed in the `n` peak observations each season.
This is the method used in [mem](https://github.com/lozalojo/mem), with log normal distribution and without weights.
- `peak_levels` which models the height of the seasonal peaks. Using the log-normal distribution without weights is similar
to the default in [mem](https://github.com/lozalojo/mem).
- `intensity_levels` which models the within season levels. The highest breakpoint is coinciding with the `peak_levels` method.
Intermediate breakpoints are evenly distributed on a logaritmic scale, between the `very low` and `high` breakpoints,
to give the same relative difference between the breakpoints.

The model is implemented in the `seasonal_burden_levels()` function of the `aedseo` package.
In the following sections we will describe the arguments for the function and how the model is build.

#### Peak observations
`n_peak` observations are used to describe the highest observations that are observed each season.
The default of `n_peak` is `6` as we are only interested in the highest observations.
`n_peak` observations are used to describe the number of highest observations that are used from each season.
The default of `n_peak` is `6` - this is similar to using latest five seasons in `mem`.

#### Weighting
`A decay_factor` is implemented to give more weight to recent seasons as they are often more indicative of current and future trends.
As time progresses, the relevance of older seasons may decrease due to changes in factors like population immunity,
virus mutations, or intervention strategies. Weighting older seasons less reflects this reduced relevance.
The default of `decay_factor` is `0.8`, allowing the model to be responsive to recent changes without being overly
A `decay_factor` is implemented to give more weight to recent seasons as they are often more indicative of current and future trends.
As time progresses, the relevance of older seasons may decrease due to changes in factors like testing recommendation, population immunity,
virus mutations, or intervention strategies. Weighting older seasons less reflects this reduced relevance.
The default `decay_factor` is `0.8`, allowing the model to be responsive to recent changes without being overly
sensitive to short-term fluctuations.
The optimal decay factor can vary depending on the variability and trends within the data. For datasets where seasonal
patterns are highly stable, a higher decay factor (i.e. longer memory) might be appropriate. Conversely, data that has changed a lot across
seasons, a lower factor could improve predictions.
From time-series analysis $1/(1-decay_factor)$ is the effective memory so that the default corresponds to five seasons.

#### Distribution and optimisation
`family` is the argument used to select which distribution the `n_peak` observations should be fitted to, users can
choose between `lnorm`, `weibull` and `exp` distributions. The log-normal distribution theoretically
aligns well with the nature of epidemic data, which often exhibits multiplicative growth patterns.
In our optimization process, we evaluated the distributions to determine their performance in fitting Danish non-sentinel
cases and hospitalisation data for RSV, SARS-CoV-2 and Influenza (A and B). All three distributions had comparable
objective function values during optimisation, hence we did not see any statistical significant difference in their performance.
weighted likelihood values during optimisation, hence we did not see any statistical significant difference in their performance.

The model uses the `fit_quantiles()` function which employs the `stats::optim` for optimisation of the distribution parameters.
The model uses the `fit_quantiles()` function which employs the `stats::optim` for estimating the parameters that maximizes the weighted likelihood.
The `optim_method` argument can be passed to `seasonal_burden_levels()`, default is `Nelder-Mead` but other methods can be selected,
see `?fit_quantiles`.

*Note:* [mem](https://github.com/lozalojo/mem) uses the log-normal distribution, which allows for more straightforward benchmarking,
due to this, the default is `lnorm`.

#### Burden levels
`method` is the argument used to select one of the two methods `intensity_levels`(default) and `peak_levels`.
Both methods return quantile(s) from the fitted distribution which are used to define the burden levels.
Burden levels are "very low", "low", "medium" and "high".
Both methods return quantile(s) from the fitted distribution which are used to define the breakpoins for the burden levels.
Breakpoints are named `very low`, `low`, `medium` and `high` and define the upper bound of the corresponding burden level.

- `intensity_levels` takes one quantile as argument, representing the highest intensity that has been observed in previous seasons.
The default is set at a 95% confidence level, which is used to determine the "high" burden level. The disease-specific threshold
determines the "very low" burden level. The "low" and "medium" burden levels are calculated based on the relative
increase between "very low" and "high" burden levels.
- `intensity_levels` takes one percentile as argument, representing the highest breakpoint.
The default is set at a 95% percentile. The disease-specific threshold
determines the `very low` breakpoint. The `low` and `medium` breakpoints are calculated to give identical relative
increases between the `very low` and `high` breakpoints.

- `peak_levels` takes three quantiles as argument, representing the "low", "medium" and "high" burden levels.
The default thresholds are set at 40%, 90%, and 97.5% to align with the parameters used in the [mem](https://github.com/lozalojo/mem).
The disease-specific threshold defines the "very low" burden level.
- `peak_levels` takes three percentiles as argument, representing the `low`, `medium` and `high` breakpoints.
The default percentiles are 40%, 90%, and 97.5% to align with the parameters used in [mem](https://github.com/lozalojo/mem).
The disease-specific threshold defines the `very low` breakpoint.

## Applying the `seasonal_burden_levels()` algorithm

Expand Down Expand Up @@ -273,7 +273,7 @@ intensity_levels_n_neg_t <- seasonal_burden_levels(
```

### Use the `peak_levels` method
[mem](https://github.com/lozalojo/mem) uses the `n` highest observations from each epidemic period to fit the parameters of the distribution,
`mem` uses the `n` highest observations from each epidemic period to fit the parameters of the distribution,
where `n = 30/seasons`. The data has four seasons, to align with mem, we use `n_peak = 8`
```{r}
peak_levels_n <- seasonal_burden_levels(
Expand Down Expand Up @@ -352,7 +352,7 @@ The same data is created with following combinations:

These combinations are selected as it is realistic for real world data to have noise, and differentiation between
trend can occur declining or inclining between seasons.
Burden levels correspond to season *2024/2025* calculated based on the three previous seasons.
Breakpoints for season *2024/2025* calculated based on the three previous seasons.

### Aedseo levels
```{r, echo = FALSE, fig.width=10, fig.height=8, dpi=300}
Expand Down
4 changes: 2 additions & 2 deletions vignettes/seasonal_onset.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Automated Detection of Seasonal Epidemic Onset"
title: "Seasonal Epidemic Onset"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Automated Detection of Seasonal Epidemic Onset}
%\VignetteIndexEntry{Seasonal Epidemic Onset}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Expand Down

0 comments on commit 85ddfd7

Please sign in to comment.