initial commit

cboettig · cboettig · commit 0919a964417c · 2017-11-13T20:26:48.000-08:00
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
+.Rbuildignore
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,10 @@
+# R for travis: see documentation at https://docs.travis-ci.com/user/languages/r
+
+language: R
+sudo: false
+cache: packages
+script: 
+#  - R -e "devtools::install(dep=TRUE)"
+  - R -e 'lapply(list.files(pattern=".*.Rmd", recursive=TRUE), rmarkdown::render)'
+
+
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -0,0 +1,3 @@
+Package: compendium
+Version: 0.1.0
+Depends: rmarkdown, tidyverse, httr, jsonlite 
diff --git a/README.md b/README.md
@@ -0,0 +1,34 @@
+
+*add travis-ci badge here*
+
+## Team Members:
+
+- full name, github handle
+- full name, github handle
+
+This repository is a template for your team's repository.
+
+## assignment
+
+All work for this assignment should be in the `assignment` directory.  You will work in the `.Rmd` notebook, and commit your rendered output files (`.md` and associated files) in the `assignment` directory as well.
+
+## Special files
+
+All team repositories will also include most of the special files found here:
+
+### Common files
+
+- `README.md` this file, a general overview of the repository in markdown format.  
+- `.gitignore` Optional file, ignore common file types we don't want to accidentally commit to GitHub. Most projects should use this. 
+- `<REPO-NAME>.Rproj` Optional, an R-Project file created by RStudio for it's own configuration.  Some people prefer to `.gitignore` this file.
+
+
+### Infrastructure for Testing
+
+- `.travis.yml`: A configuration file for automatically running [continuous integration](https://travis-ci.com) checks to verify reproducibility of all `.Rmd` notebooks in the repo.  If all `.Rmd` notebooks can render successfully, the "Build Status" badge above will be green (`build success`), otherwise it will be red (`build failure`).  
+- `DESCRIPTION` a metadata file for the repository, based on the R package standard. It's main purpose here is as a place to list any additional R packages/libraries needed for any of the `.Rmd` files to run.
+- `tests/render_rmds.R` an R script that is run to execute the above described tests, rendering all `.Rmd` notebooks. 
+
+
+
+
diff --git a/assignment/extinction-assignment.Rmd b/assignment/extinction-assignment.Rmd
@@ -0,0 +1,182 @@
+---
+title: "Extinctions Unit"
+author: "Your name, partner name"
+maketitle: true
+output: github_document
+---
+
+
+
+```{r include=FALSE}
+library("tidyverse")
+library("httr")
+library("jsonlite")
+#library("printr")
+knitr::opts_chunk$set(comment=NA)
+```
+
+
+
+## Extinctions Module
+
+_Are we experiencing the sixth great extinction?_  
+
+What is the current pace of extinction? Is it accelerating? How does it compare to background extinction rates?
+
+## Background
+
+- [Section Intro Video](https://youtu.be/QsH6ytm89GI)
+- [Ceballos et al (2015)](http://doi.org/10.1126/sciadv.1400253)
+
+## Computational Topics
+
+- Accessing data from a RESTful API
+- Error handling
+- JSON data format
+- Regular expressions
+- Working with missing values
+
+## Additional references:
+
+- http://www.hhmi.org/biointeractive/biodiversity-age-humans (Video)
+- [Barnosky et al. (2011)](http://doi.org/10.1038/nature09678)
+- [Pimm et al (2014)](http://doi.org/10.1126/science.1246752)
+- [Sandom et al (2014)](http://dx.doi.org/10.1098/rspb.2013.3254)
+
+
+# Getting started (based on live code session)
+
+
+
+
+## CURL and REST
+
+Use the `httr` package to make a single API query against the following endpoint: `http://api.iucnredlist.org/index/species/Acaena-exigua.json`
+
+```{r}
+
+```
+
+Examine the response and the content of the response.  Can you tell if the call was successful? What was the return type object? Can you parse the return into an R object? Can you represent the return data as a data.frame?
+
+```{r}
+```
+
+
+
+# Working with Regular Expressions
+
+- [Self-guided Tutorial](http://regexone.com/)
+- [Cheetsheet](http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/)
+- [stringr RegEx Vignette](https://cran.r-project.org/web/packages/stringr/vignettes/regular-expressions.html)
+
+
+One of the entries in the response contains a field that may contain some information on when the species went extinct.  Identify the appropriate column and extract this information using a *regular expression*, as discussed in the live code exercise.  
+
+
+```{r}
+```
+
+
+
+
+# Calculating Extinction Rates: Putting it all together
+
+First, to know what queries to make to the IUCN REST API, we need a list of extinct species names.  This information can be downloaded from the IUCN website, but unfortunately this is not easily automated.  Thus we'll download the data file using a copy already prepared for the course: 
+
+
+```{r}
+extinct = read_csv("http://berkeley.carlboettiger.info/espm-88b/extinction/data/extinct.csv")
+extinct
+```
+
+
+Write a function to extract the rationale for the extinction for all extinct species in the data set (see above file)
+
+```{r}
+
+```
+
+Test your function on a subset of the data before attempting the full data set.  Use our `dplyr` pipe syntax to iterate over your function.  
+
+
+
+```{r}
+
+
+```
+
+Now create a function that can extract the date from the rationale, and include this function in your data analysis pipeline. 
+
+
+```{r}
+
+```
+
+
+
+
+## Histogram of Extinction Dates
+
+We can get a sense for the tempo of extinctions by plotting extinctions since 1500 in 25-year interval bins.  
+
+```{r}
+```
+
+# Exercises
+
+
+# Question 1: Extinctions by group
+
+A. Compute the number of extinctions from 1500 - 1900 and from 1900 to present of each of the following taxonomic groups: 
+
+  - Vertebrates
+  - Mammals
+  - Birds
+  - Fish
+  - Amphibians
+  - Reptiles
+  - Insects
+  - Plants
+
+Compare your estimates to Table 1 of [Ceballos et al (2015)](http://doi.org/10.1126/sciadv.1400253).  
+
+
+## Question 2: Weighing by number of species
+
+
+The number of species going extinct per century in a given taxonomic group will be influenced by how many species are present in the group to begin with. (For an obvious example, the number of vertebrate extinctions is always going to be higher than the number of mammal extinctions, since mammals are vertebrates).  Overall, these numbers do not change greatly over a period of a few hundred years, so we were able to make the relative comparisons between the roughly pre-industrial and post-industrial periods above.  
+
+As discussed by Tony Barnosky in the introductory video (or in [Ceballos et al (2015)](http://doi.org/10.1126/sciadv.1400253) paper), if we want to compare these extinction rates against the long-term palentological record, it is necessary to weigh the rates by the total number of species. That is, to compute the number of extinctions per million species per year (MSY; equivalently, the number extinctions per 10,000 species per 100 years).  
+
+A) First, we will compute how many species are present in each of the taxonomic groups.  To do so, we need a table that has not only extinct species, but all assessed species.  We will once again query this information from the IUCN API.
+
+
+This is going to involve a lot of data -- more than the API can serve in a single chunk.  Instead, the API breaks the returns up into groups of 10,000 species per page (see API docs: http://apiv3.iucnredlist.org/api/v3/docs#species).  Luckily, the API also tells us the total number of species:
+
+http://apiv3.iucnredlist.org/api/v3/speciescount?token=9bb4facb6d23f48efbf424bb05c0c1ef1cf6f468393bc745d42179ac4aca5fee
+
+The code below queries the first page.  How many pages will we need to get all the data?  Modify the example below to collect all of the data into a single DataFrame.  Note the use of `append` to add data to an existing data.frame with matching column labels.  
+
+
+```{r}
+
+```
+
+
+B) Based on the complete data, write queries that count the number of species in each group.  Then use these numbers to compute MSY, the number extinctions per 10,000 species per 100 years, for each of the groups listed in Question 1.  How do your estimates compare to the overall historical average of about 2 MSY?
+
+## Question 3: Improving our algorithm
+
+
+In parsing the data with regular expressions, we encountered certain data that resulted in missing values.  Identify and investigate the strings for which we were not able to extract a date value.  
+
+- Why did the date extraction fail?  
+- Can you deduce an approximate date by examining the text? 
+- Can you modify the regular expression to reduce the number of missing values?  
+- How do these missing values impact our overall estimate of the extinction rate? (In which direction, and by approximately what amount?)
+
+
+## Question 4: Looking forward (bonus)
+
+Plot the MSY rates in intervals of 50 years for each of the groups as a line plot (compare to Figure 1a of [Ceballos et al (2015)](http://doi.org/10.1126/sciadv.1400253) paper).  Compute the slope of these curves to forecast the extinction rate in 2100.  
diff --git a/extinction-template.Rproj b/extinction-template.Rproj
@@ -0,0 +1,17 @@
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: knitr
+LaTeX: pdfLaTeX
+
+BuildType: Package
+PackageUseDevtools: Yes
+PackageInstallArgs: --no-multiarch --with-keep.source

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+Package: compendium`
	`2`	`+Version: 0.1.0`
	`3`	`+Depends: rmarkdown, tidyverse, httr, jsonlite`