Skip to content

Commit bdb4d28

Browse files
committed
add a readme
1 parent 9f52d08 commit bdb4d28

File tree

6 files changed

+786
-101
lines changed

6 files changed

+786
-101
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
^\.Rproj\.user$
33
^data-raw$
44
^LICENSE\.md$
5+
^README\.Rmd$

README.Rmd

Lines changed: 55 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -6,132 +6,87 @@ output: github_document
66

77
```{r, include = FALSE}
88
knitr::opts_chunk$set(
9-
collapse = TRUE,
9+
collapse = FALSE,
1010
comment = "#>",
1111
fig.path = "man/figures/README-",
1212
out.width = "100%"
1313
)
1414
```
1515

16-
# epipredict
16+
# epidatasets
1717

1818
<!-- badges: start -->
19-
[![R-CMD-check](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml)
2019
<!-- badges: end -->
2120

22-
**Note:** This package is currently in development and may not work as expected. Please file bug reports as issues in this repo, and we will do our best to address them quickly.
21+
This package contains data sets used to compile vignettes
22+
and other documentation in Delphi R Packages. The goal is to
23+
avoid calls to the Delphi Epidata API, and deposit some
24+
examples here for easy offline use.
2325

2426
## Installation
2527

26-
You can install the development version of epipredict from [GitHub](https://github.com/) with:
28+
You can install the development version of `{epidatasets}` like so:
2729

2830
``` r
2931
# install.packages("remotes")
30-
remotes::install_github("cmu-delphi/epipredict")
32+
remotes::install_github("cmu-delphi/epidatasets")
3133
```
3234

33-
## Documentation
3435

35-
You can view documentation for the `main` branch at <https://cmu-delphi.github.io/epipredict>.
36+
## Contents
3637

38+
This package contains a number of different datasets, along
39+
with the code used to generate them. See the Source Code if
40+
you want to examine the necessary API calls.
3741

38-
## Goals for `epipredict`
42+
All data included here is in `epi_df` format, which is a
43+
subclass of `tbl_df` which is a subclass of `data.frame`.
44+
The data will print nicely if you load the `{epiprocess}`
45+
or `{tibble}` packages, but these are not required to access
46+
or inspect the data sets. For example,
3947

40-
**We hope to provide:**
41-
42-
1. A set of basic, easy-to-use forecasters that work out of the box. You should be able to do a reasonably limited amount of customization on them. For the basic forecasters, we currently provide:
43-
* Baseline flat-line forecaster
44-
* Autoregressive forecaster
45-
* Autoregressive classifier
46-
2. A framework for creating custom forecasters out of modular components. There are four types of components:
47-
* Preprocessor: do things to the data before model training
48-
* Trainer: train a model on data, resulting in a fitted model object
49-
* Predictor: make predictions, using a fitted model object
50-
* Postprocessor: do things to the predictions before returning
51-
52-
**Target audiences:**
53-
54-
* Basic. Has data, calls forecaster with default arguments.
55-
* Intermediate. Wants to examine changes to the arguments, take advantage of built in flexibility.
56-
* Advanced. Wants to write their own forecasters. Maybe willing to build up from some components that we write.
57-
58-
The Advanced user should find their task to be relatively easy. Examples of these tasks are illustrated in the [vignettes and articles](https://cmu-delphi.github.io/epipredict).
59-
60-
## Intermediate example
61-
62-
The package comes with some built-in historical data for illustration, but
63-
up-to-date versions of this could be downloaded with the [`{covidcast}` package](https://cmu-delphi.github.io/covidcast/covidcastR/index.html) and processed using [`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^1]
64-
65-
[^1]: Other epidemiological signals for non-Covid related illnesses are available with [`{epidatr}`](https://github.com/cmu-delphi/epidatr) which interfaces directly to Delphi's [Epidata API](https://cmu-delphi.github.io/delphi-epidata/)
66-
67-
```{r epidf, message=FALSE}
68-
library(tidyverse)
69-
library(epipredict)
70-
jhu <- case_death_rate_subset
71-
jhu
72-
```
73-
74-
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
75-
76-
```{r make-forecasts, warning=FALSE}
77-
two_week_ahead <- arx_forecaster(
78-
jhu,
79-
outcome = "death_rate",
80-
predictors = c("case_rate", "death_rate"),
81-
args_list = arx_args_list(
82-
lags = list(c(0,1,2,3,7,14), c(0,7,14)),
83-
ahead = 14
84-
)
85-
)
48+
```{r}
49+
library(epidatasets)
50+
head(cases_deaths_subset)
8651
```
8752

88-
In this case, we have used a number of different lags for the case rate, while only using 3 weekly lags for the death rate (as predictors). The result is both a fitted model object which could be used any time in the future to create different forecasts, as well as a set of predicted values (and prediction intervals) for each location 14 days after the last available time value in the data.
89-
90-
```{r print-model}
91-
two_week_ahead$epi_workflow
92-
```
93-
94-
The fitted model here involved preprocessing the data to appropriately generate lagged predictors, estimating a linear model with `stats::lm()` and then postprocessing the results to be meaningful for epidemiological tasks. We can also examine the predictions.
95-
96-
```{r show-preds}
97-
two_week_ahead$predictions
53+
Compared to
54+
```{r}
55+
library(tibble)
56+
cases_deaths_subset
9857
```
9958

100-
The results above show a distributional forecast produced using data through the end of 2021 for the 14th of January 2022. A prediction for the death rate per 100K inhabitants is available for every state (`geo_value`) along with a 90% predictive interval.
101-
102-
<!--
103-
104-
During a quiet period, a user decides they want to first predict whether a surge is about to occur, say using variant information from GISAID. Then for surging locations, they want to train an AR model using past surges in the same location. Everywhere else, they predict a flat line. We should be able to do this in a few lines of code.
105-
106-
Delphi's own forecasts have been produced/evaluated in this way for a while now, but the code base is scattered and evolving. We want to consolidate, generalize, and simplify to allow others to benefit as well.
107-
108-
The basic framework should allow for something like the following. This would
109-
feel very familiar to anyone working in `R`+`{tidyverse}`.
110-
111-
**Simple linear autoregressive model with scaling (modular)**
112-
113-
```{r ideal-framework, eval=FALSE}
114-
my_fcaster = new_epi_predictor() %>%
115-
add_preprocessor(scaler, var = cases, by = pop) %>%
116-
add_preprocessor(lagger, var = dv_cli, lags = c(0, 7, 14)) %>%
117-
add_trainer(lm) %>%
118-
add_predictor(lm.predict) %>%
119-
add_postprocessor(scaler, by = 1/pop)
59+
Compared to
60+
```{r, message=FALSE}
61+
library(epiprocess)
62+
cases_deaths_subset
12063
```
12164

122-
Then you could run this on an `epi_df` with one line.
123-
124-
```{r run-ideal, eval=FALSE}
125-
my_fcaster(lead(cases, 7) ~ ., epi_df, key_vars, time_vars)
65+
Note that an `epi_df` comes with metadata (visible in that
66+
final version), that describes the observation frequency,
67+
`time_type`, the unit of geographical measurement, `geo_type`
68+
and the data vintage, `as_of`. For more on these, see the
69+
`{epiprocess}`.
70+
71+
For the more visually inclined, that particular data set contains
72+
reported 7-day averaged cases and deaths per capita for a
73+
handful of US states.
74+
75+
```{r, echo=FALSE, message=FALSE, dev='svg'}
76+
library(ggplot2)
77+
lab = c(case_rate_7d_av = "Weekly-average cases per 100K inhabitants",
78+
death_rate_7d_av = "Weekly-average deaths per 100K inhabitants")
79+
cases_deaths_subset |>
80+
dplyr::select(geo_value:death_rate_7d_av) |>
81+
tidyr::pivot_longer(case_rate_7d_av:death_rate_7d_av) |>
82+
ggplot(aes(time_value, value, colour = geo_value)) +
83+
geom_line() +
84+
scale_x_date(name = "") +
85+
scale_y_continuous(expand = expansion(c(0, 0.05))) +
86+
facet_wrap(~ name, scales = "free_y", nrow = 2,
87+
labeller = labeller(name = lab)) +
88+
theme_bw() +
89+
guides(colour = guide_legend(nrow = 1)) +
90+
scale_color_brewer(palette = "Set1") +
91+
theme(legend.position = "bottom", legend.title = element_blank())
12692
```
127-
128-
The hypothetical example of first classifying, then fitting different models would also fit into this framework. And this isn't far from our current production models.
129-
130-
131-
132-
133-
### What this isn't
134-
135-
This is not a framework for SIR models. We intend to create some simple versions, but advanced models---those that use variants, hospitalizations, different types of immunity, age stratification, etc.---cannot be compartmentalized in the same way (though see [pypm](https://pypm.github.io/home/)). These types of models also are better at scenario modeling than short term forecasts unless they are quite complicated.
136-
137-
-->

README.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
2+
<!-- README.md is generated from README.Rmd. Please edit that file -->
3+
4+
# epidatasets
5+
6+
<!-- badges: start -->
7+
<!-- badges: end -->
8+
9+
This package contains data sets used to compile vignettes and other
10+
documentation in Delphi R Packages. The goal is to avoid calls to the
11+
Delphi Epidata API, and deposit some examples here for easy offline use.
12+
13+
## Installation
14+
15+
You can install the development version of `{epidatasets}` like so:
16+
17+
``` r
18+
# install.packages("remotes")
19+
remotes::install_github("cmu-delphi/epidatasets")
20+
```
21+
22+
## Contents
23+
24+
This package contains a number of different datasets, along with the
25+
code used to generate them. See the Source Code if you want to examine
26+
the necessary API calls.
27+
28+
All data included here is in `epi_df` format, which is a subclass of
29+
`tbl_df` which is a subclass of `data.frame`. The data will print nicely
30+
if you load the `{epiprocess}` or `{tibble}` packages, but these are not
31+
required to access or inspect the data sets. For example,
32+
33+
``` r
34+
library(epidatasets)
35+
head(cases_deaths_subset)
36+
```
37+
38+
#> geo_value time_value case_rate_7d_av death_rate_7d_av cases cases_7d_av
39+
#> 1 ca 2020-03-01 0.0032659 0.0000000 6 1.285714
40+
#> 2 ca 2020-03-02 0.0043545 0.0000000 4 1.714286
41+
#> 3 ca 2020-03-03 0.0061689 0.0000000 6 2.428571
42+
#> 4 ca 2020-03-04 0.0097976 0.0003629 11 3.857143
43+
#> 5 ca 2020-03-05 0.0134264 0.0003629 10 5.285714
44+
#> 6 ca 2020-03-06 0.0199582 0.0003629 18 7.857143
45+
46+
Compared to
47+
48+
``` r
49+
library(tibble)
50+
cases_deaths_subset
51+
```
52+
53+
#> # A tibble: 4,026 × 6
54+
#> geo_value time_value case_rate_7d_av death_rate_7d_av cases cases_7d_av
55+
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
56+
#> 1 ca 2020-03-01 0.00327 0 6 1.29
57+
#> 2 ca 2020-03-02 0.00435 0 4 1.71
58+
#> 3 ca 2020-03-03 0.00617 0 6 2.43
59+
#> 4 ca 2020-03-04 0.00980 0.000363 11 3.86
60+
#> 5 ca 2020-03-05 0.0134 0.000363 10 5.29
61+
#> 6 ca 2020-03-06 0.0200 0.000363 18 7.86
62+
#> 7 ca 2020-03-07 0.0294 0.000363 26 11.6
63+
#> 8 ca 2020-03-08 0.0341 0.000363 19 13.4
64+
#> 9 ca 2020-03-09 0.0410 0.000726 23 16.1
65+
#> 10 ca 2020-03-10 0.0468 0.000726 22 18.4
66+
#> # ℹ 4,016 more rows
67+
68+
Compared to
69+
70+
``` r
71+
library(epiprocess)
72+
cases_deaths_subset
73+
```
74+
75+
#> An `epi_df` object, 4,026 x 6 with metadata:
76+
#> * geo_type = state
77+
#> * time_type = day
78+
#> * as_of = 2023-06-07 16:50:07.8681
79+
#>
80+
#> # A tibble: 4,026 × 6
81+
#> geo_value time_value case_rate_7d_av death_rate_7d_av cases cases_7d_av
82+
#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>
83+
#> 1 ca 2020-03-01 0.00327 0 6 1.29
84+
#> 2 ca 2020-03-02 0.00435 0 4 1.71
85+
#> 3 ca 2020-03-03 0.00617 0 6 2.43
86+
#> 4 ca 2020-03-04 0.00980 0.000363 11 3.86
87+
#> 5 ca 2020-03-05 0.0134 0.000363 10 5.29
88+
#> 6 ca 2020-03-06 0.0200 0.000363 18 7.86
89+
#> 7 ca 2020-03-07 0.0294 0.000363 26 11.6
90+
#> 8 ca 2020-03-08 0.0341 0.000363 19 13.4
91+
#> 9 ca 2020-03-09 0.0410 0.000726 23 16.1
92+
#> 10 ca 2020-03-10 0.0468 0.000726 22 18.4
93+
#> # ℹ 4,016 more rows
94+
95+
Note that an `epi_df` comes with metadata (visible in that final
96+
version), that describes the observation frequency, `time_type`, the
97+
unit of geographical measurement, `geo_type` and the data vintage,
98+
`as_of`. For more on these, see the `{epiprocess}`.
99+
100+
For the more visually inclined, that particular data set contains
101+
reported 7-day averaged cases and deaths per capita for a handful of US
102+
states.
103+
104+
<img src="man/figures/README-unnamed-chunk-5-1.svg" width="100%" />

_pkgdown.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,17 @@ url: https://cmu-delphi.github.io/epipredict/
22
template:
33
bootstrap: 5
44
bootswatch: cosmo
5+
bslib:
6+
font_scale: 1.0
7+
primary: "#C41230" # dark blue active text
8+
link-color: "#C41230" # brighter link color, contrast 4.52
9+
# navbar
10+
navbar-bg: "#C41230" # match to site bg
11+
navbar-fg: "#f8f8f8"
512

613
navbar:
7-
bg: dark
14+
bg: "#C41230"
15+
fg: "#f8f8f8"
816

917
home:
1018
links:
81 KB
Loading

0 commit comments

Comments
 (0)