Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
01b71e3
Change to just file `LICENSE`, add `@source` for CTIS data
brookslogan Jun 14, 2023
4dc2306
describe format for all datasets; unify doc section naming format
nmdefries Jul 6, 2023
0bfcc1e
documentation wording
nmdefries Jul 7, 2023
e8d975a
move data dictionary docs to separate section to ensure heritability
nmdefries Jul 12, 2023
9ea783c
build docs
nmdefries Jul 12, 2023
d98b829
add data contributors as authors
nmdefries Jul 12, 2023
2b00f03
remove dangling partial attribution for CTIS data
nmdefries Aug 1, 2023
403e890
remove copyright holder info for CTIS. Resolve in issue cmu-delphi/ep…
nmdefries Aug 1, 2023
b9cab85
note use in epiprocess vignettes
nmdefries Jul 13, 2023
a66ebd1
build docs
nmdefries May 31, 2024
aec5488
Merge branch 'lcb/activate-roxygen-markdown' into lcb/adjust-attribution
nmdefries May 31, 2024
09abc23
archive, cases_deaths doc wording
nmdefries May 31, 2024
854401b
county and outlier doc line wrapping
nmdefries May 31, 2024
653fd18
add description tag to keep external use in top-level description
nmdefries Aug 30, 2024
4226ecc
outlier dataset uses nj, not ca data
nmdefries Aug 30, 2024
af635dc
add nat as maintainer
nmdefries Sep 4, 2024
d4a1a42
title suggestions
nmdefries Sep 5, 2024
4e8119e
attribute COVID Canada working group
nmdefries Sep 5, 2024
a9ad878
match wording between data attributions
nmdefries Sep 5, 2024
bca6556
build docs with new data titles
nmdefries Sep 5, 2024
89d8bd6
convert archive..._dt to archive
nmdefries Sep 6, 2024
0017d3a
generate all datasets with explicit as_of
nmdefries Sep 6, 2024
20a0142
document all as_ofs
nmdefries Sep 6, 2024
adccc93
check if epiprocess installed on load
nmdefries Sep 9, 2024
aacbe27
remove on-load error
nmdefries Sep 12, 2024
18c7a60
on data load, convert to epiprocess object if epiprocess installed
nmdefries Sep 12, 2024
0cf2d69
describe archive format of can_prov_cases
nmdefries Sep 13, 2024
04339e3
use helper to modify rather than overwrite sysdata.rda
nmdefries Sep 13, 2024
49b622c
warn on data load if epiprocess not installed
nmdefries Sep 13, 2024
046ac65
update links and types in docs
nmdefries Sep 13, 2024
ee36573
document new links and types
nmdefries Sep 13, 2024
1173874
remove discontinued direction field from jhu
nmdefries Sep 13, 2024
bcb3e44
zero-pad state census fips
nmdefries Sep 16, 2024
4d561d8
document JHU missings as "NA"
nmdefries Sep 16, 2024
89d8aa9
Merge branch 'main' into lcb/adjust-attribution
nmdefries Sep 16, 2024
59c4d89
covid_case_death_rates should only import starting from Dec 2020
nmdefries Sep 17, 2024
89de76e
add canadian grad income dataset
nmdefries Sep 18, 2024
d9bc11d
add attribution for doctor-visits
nmdefries Sep 25, 2024
8769da3
double-check attribution and modifications to all datasets
nmdefries Sep 25, 2024
1030ae7
don't wrap href
nmdefries Sep 26, 2024
2cbe3d1
helper avoid error when sysdata.rda is empty
nmdefries Sep 26, 2024
51452a1
use stronger compression on external data files
nmdefries Sep 26, 2024
c43f260
use stronger compression on internal data
nmdefries Sep 26, 2024
9347a04
move tibble to Enhances
nmdefries Oct 1, 2024
68dbeff
_helper.R sysdata env to inherit from empty environment
nmdefries Oct 1, 2024
3a8ab75
spelling in documentation
nmdefries Oct 1, 2024
c55d178
spelling in documentation
nmdefries Oct 1, 2024
5638351
require newer version of epiprocess
nmdefries Oct 1, 2024
e87ec28
rename starter datasets from _dt to _tbl
nmdefries Oct 1, 2024
c6f8fcc
rebuild docs
nmdefries Oct 1, 2024
619a6cc
save datasets with version=2 in `save` for backwards compatibility
nmdefries Oct 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,42 @@ Type: Package
Package: epidatasets
Title: Epidemiological Data for Delphi Tooling Examples
Version: 0.0.1
Authors@R:
person(c("Daniel", "J."), "McDonald", , "[email protected]", role = c("cre", "aut"))
Authors@R: c(
person(c("Daniel", "J."), "McDonald", email="[email protected]", role = c("aut")),
person("Nat", "DeFries", email="[email protected]", role = c("cre", "aut")),
person("Johns Hopkins University Center for Systems Science and Engineering", role = "dtc", comment = "Owner of COVID-19 cases and deaths data from the COVID-19 Data Repository"),
person("Johns Hopkins University", role = "cph", comment = "Copyright holder of COVID-19 cases and deaths data from the COVID-19 Data Repository"),
person("Carnegie Mellon University Delphi Group", role = "dtc", comment = "Owner of masking and social-distancing data from the COVID-19 Trends and Impacts Survey. Owner of claims-based CLI data from the Delphi Epidata API"),
person("The COVID-19 Canada Open Data Working Group", role = "dtc", comment = "Owner of Canadian COVID-19 cases rates from the Covid19Canada data repository"),
person("Statistics Canada", role = "dtc", comment = "Owner of Canadian graduate employment income data from the Statistics Canada website")
)
Description: This package contains data sets used to compile vignettes and
other documentation in Delphi R Packages. The goal is to avoid calls
to the Delphi Epidata API, and to deposit some examples here for easy
offline use.
License: MIT + file LICENSE
License: file LICENSE
Depends:
R (>= 2.10)
Suggests:
covidcast,
data.table,
dplyr,
epidatr,
epiprocess,
here,
httr,
jsonlite,
lubridate,
magrittr,
purrr,
readr
Enhances:
epiprocess (>= 0.9.0),
tibble
Remotes:
cmu-delphi/epidatr,
cmu-delphi/epiprocess
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
URL: https://cmu-delphi.github.io/epidatasets/
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
YEAR: 2023
COPYRIGHT HOLDER: epidatasets authors
This contains a collection of data from different sources under different
licenses; please see the documentation for each object for license information.
21 changes: 0 additions & 21 deletions LICENSE.md

This file was deleted.

177 changes: 150 additions & 27 deletions R/epipredict-data.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
#' Subset of JHU daily state cases and deaths
#' JHU daily COVID-19 cases and deaths rates from all states
#'
#' This data source of confirmed COVID-19 cases and deaths
#' is based on reports made available by the Center for
#' Systems Science and Engineering at Johns Hopkins University.
#' This example data ranges from Dec 31, 2020 to Dec 31, 2021,
#' and includes all states.
#' This data source of confirmed COVID-19 cases and deaths is based on reports
#' made available by the Center for Systems Science and Engineering at Johns
#' Hopkins University, as downloaded from the CMU Delphi COVIDcast Epidata
#' API. This example data is a snapshot as of March 20, 2024, and
#' ranges from December 31, 2020 to December 31, 2021. It
#' includes all states. It is used in the {epiprocess} correlation vignette.
#'
#' @format A tibble with 20,496 rows and 4 variables:
#' @format An [`epiprocess::epi_df`] (object of class `c("epi_df", "tbl_df", "tbl", "data.frame")`) with 37576 rows and 4 columns.
#' @section Data dictionary:
#' The data has columns:
#' \describe{
#' \item{geo_value}{the geographic value associated with each row
#' of measurements.}
Expand Down Expand Up @@ -38,47 +41,104 @@
#'
#' Data set on state populations, from the 2019 US Census.
#'
#' @format Data frame with 57 rows (including one for the United States as a
#' whole, plus the District of Columbia, Puerto Rico Commonwealth,
#' American Samoa, Guam, the U.S. Virgin Islands, and the Northern Mariana,
#' Islands).
#' @format A [`tibble::tibble`] (object of class `c("tbl_df", "tbl", "data.frame")`) with 57 rows and 4 columns.
#' @section Data dictionary:
#' The data includes 57 regions (all US states, the United
#' States as a whole, the District of Columbia, Puerto Rico Commonwealth,
#' American Samoa, Guam, the U.S. Virgin Islands, and the Northern Mariana
#' Islands) with columns:
#'
#' \describe{
#' \item{fips}{FIPS code}
#' \item{fips}{2-digit FIPS code}
#' \item{name}{Full name of the state or territory}
#' \item{pop}{Estimate of the location's resident population in
#' 2019.}
#' \item{abbr}{Postal abbreviation for the location}
#' }
#'
#' @source United States Census Bureau, at
#' @source
#' This object is derived from several datasets from the United States
#' Census Bureau, Population Division, at
#' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf},
#' \url{https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-puerto-rico-municipios.html},
#' and \url{https://www.census.gov/data/tables/2010/dec/2010-island-areas.html}
#' and \url{https://www.census.gov/data/tables/2010/dec/2010-island-areas.html}.
#' It is made available through the `covidcast` package. This data is
#' public domain.
"state_census"

# Epipredict Vignette Data ----------------------------------------------------

#' CTIS COVID Behaviours
#' Subset of CTIS COVID-19-related behaviours from 5 states
#'
#' Data set for a handful of states on masking and distancing behaviours
#' during the COVID-19 Pandemic and downloaded from the CMU Delphi COVIDcast
#' Epidata API. This data set covers the period from
#' June to December 2021.
#' during the COVID-19 Pandemic, and downloaded from the CMU Delphi COVIDcast
#' Epidata API. This example data is a snapshot as of March 20, 2024, and
#' ranges from June 4, 2021 to December 31, 2021.
#' It is limited to California, Florida, Texas, New Jersey, and New York.
#'
#' @format A [`tibble::tibble`] (object of class `c("tbl_df", "tbl", "data.frame")`) with 1055 rows and 4 columns.
#' @section Data dictionary:
#' The data has columns:
#' \describe{
#' \item{geo_value}{the geographic value associated with each row
#' of measurements.}
#' \item{time_value}{the time value associated with each row of measurements.}
#' \item{masking}{Estimated percentage of people who wore a mask for most or all of the time while in public in the past 7 days; those not in public in the past 7 days are not counted.}
#' \item{distancing}{Estimated percentage of respondents who reported that all or most people they encountered in public in the past 7 days maintained a distance of at least 6 feet. Respondents who said that they have not been in public for the past 7 days are excluded.}
#' }
#'
#' @source
#' This object contains a modified part of the
#' \href{https://cmu-delphi.github.io/delphi-epidata/symptom-survey/#covid-19-trends-and-impact-survey}{data
#' aggregations in the API} that are prepared from the
#' \href{https://www.pnas.org/doi/full/10.1073/pnas.2111454118}{COVID-19
#' Trends and Impact Survey}; see the first link for more information on
#' citing in publications.
#' The data is made available via the
#' \href{https://cmu-delphi.github.io/delphi-epidata/}{Delphi Epidata API}.
#'
#' These aggregations are licensed under the terms of
#' the \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons
#' Attribution license}.
#'
#' Modifications:
#' * The data has been limited to a very small number of rows, the
#' signal names slightly altered, and formatted into an `epi_df`.
"ctis_covid_behaviours"

#' COVID-19 Incident Cases and Deaths
#' Subset of COVID-19 incident cases and deaths from 5 states
#'
#' Data set for 5 states containing COVID-19 Incident Cases and Deaths as
#' reported
#' by JHU-CSSE and downloaded from the CMU Delphi COVIDcast Epidata API.
#' This data set covers the period from June 2021 to December 2021, and is
#' used in the epipredict Vignette on ... .
#' reported by JHU-CSSE and downloaded from the CMU Delphi COVIDcast Epidata
#' API. This example data is a snapshot as of March 20, 2024, and
#' ranges from June 4, 2021 to December 31, 2021. It
#' is limited to California, Florida, Texas, New Jersey, and New York.
#'
#' @source This object contains a modified part of the \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University} as \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{republished in the COVIDcast Epidata API}. This data set is licensed under the terms of the
#' @format An [`epiprocess::epi_df`] (object of class `c("epi_df", "tbl_df", "tbl", "data.frame")`) with 1055 rows and 4 columns.
#' @section Data dictionary:
#' The data has columns:
#' \describe{
#' \item{geo_value}{the geographic value associated with each row
#' of measurements.}
#' \item{time_value}{the time value associated with each row of measurements.}
#' \item{cases}{Number of new confirmed COVID-19 cases, daily}
#' \item{deaths}{Number of new confirmed COVID-19 deaths, daily}
#' }
#'
#' @source This object contains a modified part of the \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University}
#' as \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{republished in the COVIDcast Epidata API}.
#' This data set is licensed under the terms of the
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
#' by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering.
#' Copyright Johns Hopkins University 2020.
#'
#' Modifications:
#' * \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html}{From the COVIDcast Epidata API}:
#' The signals are taken directly from the JHU CSSE
#' \href{https://github.com/CSSEGISandData/COVID-19}{COVID-19 GitHub repository}
#' without changes.
#' * Furthermore, the data has been limited to a very small number of rows, the
#' signal names slightly altered, and formatted into an `epi_df`.
"counts_subset"

#' Canadian COVID-19 case rates
Expand All @@ -93,13 +153,76 @@
#' \href{https://github.com/ccodwg/CovidTimelineCanada}{ccodwg/CovidTimelineCanada GitHub repository},
#' which also reports vaccine-related signals.
#'
#' This dataset contains versioned data covering the period from April 2020 to
#' December 2021 and is used in the epipredict slide vignette.
#' This dataset contains versioned data snapshots from February 1, 2021 to December
#' 1, 2021 covering the period from April 2, 2020 to December 1, 2021. It is
#' used in the epipredict slide vignette.
#'
#' @format An [`epiprocess::epi_archive`]. The DT attribute contains the data formatted as a [`data.table::data.table`] (object of class `c("data.table", "data.frame")`) with 65299 rows and 4 columns.
#' @section Data dictionary:
#' The data in the `epi_archive$DT` attribute has columns:
#' \describe{
#' \item{version}{the time value specifying the version for each row of measurements.}
#' \item{geo_value}{the province or territory associated with each row of measurements.}
#' \item{time_value}{the time value associated with each row of measurements.}
#' \item{case_rate}{number of new confirmed cases due to COVID-19 per 100,000 population, daily}
#' }
#' @source This object contains a modified part of the COVID-19 Canada Open
#' Data Working Group's
#' \href{https://github.com/ccodwg/Covid19Canada}{Covid19Canada data repository} (archived).
#' This data set is licensed under the terms of the
#' \href{https://creativecommons.org/licenses/by/4.0/}{Creative Commons Attribution 4.0 International license}
#' by the COVID-19 Canada Open Data Working Group.
#' by the COVID-19 Canada Open Data Working Group. The COVID-19 Canada Open
#' Data Working Group collected the data from publicly available sources such
#' as government datasets and news releases.
#'
#' Modifications:
#' * The case rate signal are calculated using the case count taken directly from the CCODWG
#' \href{https://github.com/ccodwg/Covid19Canada}{ccodwg/Covid19Canada GitHub repository}
#' and population data.
#' * Furthermore, the data has been limited to a very small number of rows, the
#' signal names slightly altered, some province names replaced with abbreviations, and
#' formatted into an `epi_archive`.
#'
#' The population data used (but not included in the dataset itself) is from the
#' \href{https://github.com/mountainMath/BCCovidSnippets/}{mountainMath/BCCovidSnippets GitHub repository}.
"can_prov_cases"

#' Subset of Statistics Canada median employment income for postsecondary graduates
#'
#' Data set for all territories (aggregated) and all 10 provinces containing
#' yearly income data for postsecondary graduates as reported by Statistics
#' Canada, downloaded from the Statistics Canada website at
#' www.statcan.gc.ca. This example data is a snapshot as of September 18,
#' 2024, and ranges from 2010 to 2017 (yearly).
#'
#' @format An [`epiprocess::epi_df`] (object of class `c("epi_df", "tbl_df", "tbl", "data.frame")`) with 10193 rows and 8 columns.
#' @section Data dictionary:
#' The data has columns:
#' \describe{
#' \item{geo_value}{The province in Canada associated with each
#' row of measurements.}
#' \item{time_value}{The time value, a year integer in YYYY format}
#' \item{edu_qual}{The education qualification}
#' \item{fos}{The field of study}
#' \item{age_group}{The age group; either 15 to 34 or 35 to 64}
#' \item{num_graduates}{The number of graduates for the given row of characteristics}
#' \item{med_income_2y}{The median employment income two years after graduation}
#' \item{med_income_5y}{The median employment income five years after graduation}
#' }
#' @source This object contains modified data adapted from
#' Statistics Canada, \href{https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3710011501}{
#' Table 37-10-0115-01 Characteristics and median employment income of
#' longitudinal cohorts of postsecondary graduates two and five years after
#' graduation, by educational qualification and field of study
#' (primary groupings)}. This does not constitute an endorsement by Statistics Canada of this product.
#'
#' The data is licensed under the terms of the
#' \href{https://www.statcan.gc.ca/en/reference/licence}{Statistics Canada Open License}.
#'
#' Modifications:
#' * Only provincial and territorial regions are kept.
#' * Only age group, field of study, and educational qualification are kept as
#' covariates. For the remaining covariates, we keep aggregated values and
#' drop the level-specific rows.
#' * No modifications were made to the time range of the data.
"grad_employ_subset"
Loading
Loading