Skip to content

midas-network/tycho_national_incidence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tycho Data - Level 1 National Incidence Rate

This repository contains data and code to reproduce national incidence rate results as in:

Panhuis, Willem G. van, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y. Lee, et al. 2013. “Contagious Diseases in the United States from 1888 to the Present.” New England Journal of Medicine 369 (22): 2152–58. https://doi.org/10.1056/NEJMms1215400.

Infectious Disease - Data

Tycho Version 1 - Level 1

Data are downloadable from Zenodo

Version 1.0.0 of level 1 data includes counts at the state level for smallpox, polio, measles, mumps, rubella, hepatitis A, and whooping cough and at the city level for diphtheria. The time period of data varies per disease somewhere between 1916 and 2011. This version includes cases as well as incidence rates per 100,000 population based on historical population estimates. These data have been used by investigators at the University of Pittsburgh to estimate the impact of vaccination programs in the United States, published in the New England Journal of Medicine.

Raw data at the city/state level are mirrored at raw/tycho_level1/ProjectTycho_Level1_v1.0.0.csv.

Procesed data containing national rates can be found at process_data/national_incidence_rate_lvl1.csv. These ratese were calculated by adding all the state or city together per epi week and disease, and then calculating the incidence rate per 100,000 by applying: number of cases divided by the population size, multiplied by 100,000.

Additional Information

Measles

Polio

Census

Total population estimates per states and at national level per year from census.gov

Age Group

Additional age group information is available at national (and state, for specific years) level on the census.gov website:

Data Processing Pipeline

Scripts in the src directory pull data from Zenodo and census and prepare data for processing. These data are also mirrored in the raw directory.

Output

Data

The output of the data processing pipeline is available in the process_data directory.

The directory contains 4 files:

  • census.csv: population size per year and location (state, national) since 1900 to 2023.
  • national_incidence_rate_lvl1.csv: case count ("value") and incidence rate per 100,000 ("incidence_rate") per disease and per week ("date"). For each disease, the vaccine introduction year (extracted from the NEJM article) is available and set to the first day of the year for visualization purposes ("vaccine")
  • vaccine_year_introduction.csv: year of the vaccine introduction for each disease of interest (extracted from the NEJM article).
  • sysdata.rda: internal location dictionary to help with standardizing location name in between different sources.

Visualization

The Visualization folder contains multiple outputs files and code in a quarto file to generate them. All the quarto files have at least one output format, to generate all the available formats, please use the quarto::quarto_render() function.

For example:

quarto::quarto_render("visualization/nat_incidence_level1.qmd", 
                      output_format = "all")

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published