Skip to content

Commit

Permalink
documentation adjustions
Browse files Browse the repository at this point in the history
  • Loading branch information
SofiaOtero committed Jan 24, 2025
1 parent 63fde87 commit 81c5b9f
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 26 deletions.
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@

## Improvements

* Enhanced clarity and user guidance in the `vignette("generate_seasonal_wave")`, providing a comprehensive walkthrough of the application of the 'generate_seasonal_data()' function with detailed explanations and illustrative examples (#56).
* Enhanced clarity and user guidance in the vignettes:
- `vignette("generate_seasonal_wave")`,
- `vignette("aedseo")`,
- `vignette("seasonal_onset")`
providing a comprehensive walkthrough of the application of the functions provided by the `aedseo` package with detailed explanations and illustrative examples (#56, #57).

* Improved the autoplot function which can now visualise dates as days, weeks and months on the x-axis with the `time_interval` argument (#56).

Expand Down
4 changes: 2 additions & 2 deletions R/autoplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -177,14 +177,14 @@ autoplot.tsd_onset <- function(
#' @param factor_to_max A numeric specifying the factor to multiply the high burden level for extending the y-axis.
#' @param disease_color A character specifying the base color for the disease level regions.
#' @param season_start,season_end `r rd_season_start_end()`
#' @param time_interval_step `rd time_interval_step`
#' @param time_interval_step `r rd_time_interval_step`
#' @param y_label A character vector specifying the y label text.
#' @param fill_alpha A numeric vector specifying the transparency levels for the fill colors of burden levels.
#' Must match the number of levels.
#' @param text_burden_size A numeric specifying the size of the text labels.
#' @param text_family A character specifying the font family for the text labels.
#' @param line_color A character specifying the color of the line connecting observations.
#' @param line_width `rd_line_width`
#' @param line_width `r rd_line_width`
#' @param vline_color A character specifying the color of the vertical outbreak start lines.
#' @param vline_linetype A character specifying the line type for outbreak start lines.
#' @param vline_width A numeric specifying the width of the outbreak start line.
Expand Down
5 changes: 3 additions & 2 deletions man/autoplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 23 additions & 15 deletions vignettes/aedseo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,19 @@ library(aedseo)

## Introduction

The `aedseo` package performs automated and early detection of seasonal epidemic onsets and calculates the burden levels from time series dataset stratified by season.
The `aedseo` package performs automated and early detection of seasonal epidemic onsets and calculates
the burden levels from time series dataset stratified by season.
The seasonal onset (`seasonal_onset()`) estimates growth rates for consecutive time intervals and calculates the sum of cases.
The burden levels (`seasonal_burden_levels()`) use the previous seasons to calculate the levels of the current season.
The algorithm allows for surveillance of pathogens, by alarming when the observations increase significantly in the selected time interval and based on the
disease specific threshold, while also evaluating the burden of current rates based on previous seasons.
The burden levels (`seasonal_burden_levels()`) use the previous seasons to calculate the burden levels of the current season.
The algorithm allows for surveillance of pathogens, by alarming when the observations increase significantly in
the selected time interval and based on the
disease-specific threshold, while also evaluating the burden of current rates based on previous seasons.

### Generate seasonal data

To apply the `aedseo` algorithm, data needs to be transformed into a `tsd` object.
If you have your own data, the `to_time_series()` function can be used with the arguments: `observation`, `time`, `time_interval`.
In the following section, the application of the algorithm is shown with simulated data created with the `generate_seasonal_data()` function.
In the following section, the application of the algorithm is shown with simulated data created with the `generate_seasonal_data()`function.
More information about the function can be found in the `vignette("generate_seasonal_wave")`

```{r, include = FALSE}
Expand All @@ -48,10 +50,9 @@ tsd_data <- generate_seasonal_data(
)
```

In the following figure, the simulated data (solid circles) is visualized alongside the mean (solid line) for the three arbitrary years of weekly data.

In the following figure simulated data (solid circles) are visualised as individual observations.
The solid line connects these points, representing the underlying mean trend over three years of weekly data.
```{r}
# Have a glance at the time varying mean and the simulated data
plot(tsd_data)
```

Expand All @@ -60,11 +61,11 @@ Respiratory viruses can circulate in different seasons based on the location.
In the nordic hemisphere they mostly circulate in the fall and winter seasons, hence surveillance is intensified from week 40 to week 20 in the following year.
To include all data the season in the example is set from week 21 to week 20 in the following year.

### Determining the disease specific threshold
The disease specific threshold can be determined by examining seasonal observations for the pathogen from previous seasons,
### Determining the disease-specific threshold
The disease-specific threshold can be determined by examining seasonal observations for the pathogen from previous seasons,
and determine at what number of observations the rate suddenly increases drastically.

In this example the disease specific threshold is determined based on consecutive significant observations from all available previous seasons.
In this example the disease-specific threshold is determined based on consecutive significant observations from all available previous seasons.
Significant observations are defined as those with a significant positive growth rate (a positive lower growth rate).

To capturing short-term changes and fluctuations in the data, a rolling window of size $k = 5$ is used to create subsets of the data for model fitting,
Expand Down Expand Up @@ -136,11 +137,15 @@ significant_vs_obs |>
ggplot2::theme_bw()
```

To declare a seasonal onset, the number of cases must exceed a threshold within a five week window. This threshold is determined by dividing the sum of cases at the onset of continuous significant observations by five, resulting in a disease-specific threshold for each time step.
To declare a seasonal onset, the number of cases must exceed a threshold within a five week window.
This threshold is determined by dividing the sum of cases at the onset of continuous significant observations by five,
resulting in a disease-specific threshold for each time step.

From the analysis plot, it is observed that the significant growth rate begins at approximately 700 observations. Therefore, the disease-specific threshold is calculated as: $\frac{700}{5} = 140$.
From the analysis plot, it is observed that the significant growth rate begins at approximately 700 observations.
Therefore, the disease-specific threshold is calculated as: $\frac{700}{5} = 140$.

In other words, a weekly observation count surpassing 140 for five consecutive weeks alongside with a significant positive growth rate signifies the onset of the season.
In other words, a weekly observation count surpassing 140 for five consecutive weeks alongside with a
significant positive growth rate signifies the onset of the season.

Inspect the exact numbers of when the significant observations for each season starts
```{r}
Expand All @@ -153,7 +158,10 @@ significant_vs_obs |>
dplyr::select(season, week, disease_threshold)
```

By inspecting the output from the above code, we observe that the first season initiates significantly later compared to the subsequent seasons. This delayed start was also evident in the plot. As a result, we determine the disease specific threshold based on the complete seasons spanning from 2022/2023 to 2024/2025. As a result, the disease specific threshold is established at `140`.
By inspecting the output from the above code, we observe that the first season initiates significantly later
compared to the subsequent seasons. This delayed start was also evident in the plot, hence we determine
the disease-specific threshold based on the complete seasons spanning from 2022/2023 to 2024/2025.
As a result, the disease-specific threshold is established at `140`.

## Applying the main algorithm
The primary function of the `aedseo` package is the `combined_seasonal_output()` which integrates the `seasonal_onset()` and `seasonal_burden_levels()` functions to deliver a comprehensive seasonal analysis.
Expand Down
18 changes: 12 additions & 6 deletions vignettes/seasonal_onset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,16 @@ library(aedseo)

The methodology used to detect the onset of seasonal respiratory epidemics can be divided into two essential criteria:

- The local estimate of the exponential growth rate, $r$, is significantly greater than zero.
- The sum of cases (SoC) over the past $k$ units of time exceeds a disease-specific threshold.
1. The local estimate of the exponential growth rate, $r$, is significantly greater than zero.
2. The sum of cases (SoC) over the past $k$ units of time exceeds a disease-specific threshold.

Here, $k$ denotes the window size employed to obtain the local estimate of the exponential growth rate and the SoC.
Here, $k$ denotes the window size employed to obtain the local estimate of the exponential growth rate and SoC.
When both of these criteria are met, an alarm is triggered and the onset of the seasonal epidemic is detected.

The model is implemented in the `seasonal_onset()` function of the `aedseo package`.
Criteria one is fufilled if the `sum_of_cases_warning` in the output is `TRUE`.
Criteria two is fufilled if the `growth_warning` in the output is `TRUE`.

### Exponential growth rate

The exponential growth rate, denoted as $r$, represents the per capita change in the number of new cases per unit of time.
Expand Down Expand Up @@ -60,7 +64,7 @@ or to use negative binomial regression (not implemented yet), which assumes $v=\

## Applying the `seasonal_onset` algorithm

First we generate some data as an `tsd` object, with the `geberate_seasonal_data()` function.
First we generate some data as an `tsd` object, with the `generate_seasonal_data()` function.
```{r}
# Construct an 'tsd' object with time series data
set.seed(222)
Expand All @@ -76,8 +80,9 @@ tsd_data <- generate_seasonal_data(
```
Next, the `tsd` object is passed to the `seasonal_onset()` function. Here, a window size of `k=5` is specified,
meaning that a total of 5 weeks is used in the local estimate of the exponential growth rate.
`na_fraction_allowed = 0.4` defines how large a fraction of observables in the k window that are allowed to be NA, here $0.4*5 = 2$ observables.
`na_fraction_allowed = 0.4` defines how large a fraction of observables in the k window that are allowed to be `NA`, here $0.4*5 = 2$ observables.
Additionally, a 95\% confidence interval is specified. Finally, the exponential growth rate is estimated using quasi-Poisson regression to account for overdispersion in the data.
A disease-specific threshold can additionally be passed to the function, but is not necessary if only the growth rate calculations are wanted.
`season_start` and `season_end` can be used to specify the season to stratify the observations by.
This algorithm runs across seasons, such that the first date in a new season will use the last `k-1` weeks from the previous season.

Expand Down Expand Up @@ -126,7 +131,8 @@ The resulting prediction object will contain estimates, lower bounds, and upper
### Summarizing seasonal_onset results

The summary method for `tsd_onset` objects provides a concise summary of your automated early detection of seasonal epidemic onset (seasonal_onset) analysis.
You can use it to retrieve important information about your analysis, including the latest growth warning and the total number of growth warnings:
You can use it to retrieve important information about your analysis, including the latest growth warning, latest sum of cases warning
(if a disease-specific threshold) has been defined, and the total number of growth warnings in the series:


```{r summary}
Expand Down

0 comments on commit 81c5b9f

Please sign in to comment.