Merge pull request #39 from worldbank/fct19-dev

Fct19 dev
worldbank · Jul 25, 2019 · c8e072b · c8e072b
2 parents 955df21 + 5926e6c
commit c8e072b
Show file tree

Hide file tree

Showing 18 changed files with 1,946 additions and 591 deletions.
diff --git a/DataWork/DataSets/DG1__DS_DIR_HDR b/DataWork/DataSets/DG1__DS_DIR_HDR
diff --git a/DataWork/DataSets/Final/DG1__DS_DIR_HDR b/DataWork/DataSets/Final/DG1__DS_DIR_HDR
diff --git a/DataWork/DataSets/Final/whr_panel.Rda b/DataWork/DataSets/Final/whr_panel.Rda
diff --git a/DataWork/DataSets/Final/whr_panel.csv b/DataWork/DataSets/Final/whr_panel.csv
diff --git a/DataWork/DataSets/Final/whr_panel.dta b/DataWork/DataSets/Final/whr_panel.dta
diff --git a/DataWork/DataSets/Raw/DG1__DS_DIR_HDR b/DataWork/DataSets/Raw/DG1__DS_DIR_HDR
diff --git a/DataWork/Output/Raw/desc_table.tex b/DataWork/Output/Raw/desc_table.tex
@@ -1,6 +1,6 @@
 
 % Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
-% Date and time: Tue, Apr 23, 2019 - 9:28:13 AM
+% Date and time: Thu, Jun 13, 2019 - 10:22:57 AM
 \begin{table}[!htbp] \centering 
   \caption{} 
   \label{} 

diff --git a/Presentations/Lab 1 - Intro to R - Part I.Rmd b/Presentations/Lab 1 - Intro to R - Part I.Rmd
@@ -1,7 +1,7 @@
 ---
 title: "Introduction I - R basics"
 subtitle: "R for Stata Users"
-date: "April 18"
+date: "June 2019"
 author: "Luiza Andrade, Leonardo Viotti & Rob Marty "
 output:
   beamer_presentation:
@@ -52,8 +52,24 @@ whr <- read.csv(file.path(finalData,"whr_panel.csv"),
 
 ```
 
+# Installation
+
+## Installation
+
+This training requires that you have R installed in your computer:
+
+### Instructions
+
+ * Please visit (https://cran.r-project.org) and select a Comprehensive R Archive Network (CRAN) mirror close to you.
+
+ * If you're in the US, you can directly visit the mirror at Berkley university at (https://cran.cnr.berkeley.edu).
+
+ * we also strongly suggest installing R studio. You can get it in (https://www.rstudio.com/), but you need to install R first.
+
+
 # Introduction
 
+
 ## Introduction
 
 These training sessions will offer a quick introduction to R, its amazing features and why it is so much better than Stata. 
@@ -65,10 +81,10 @@ This first session will present the basic concepts you will need to use R.
 The next sessions will include:
 
  * __Introduction to R part II__
+ * __Data Processing__
  * __Descriptive Analysis__
  * __Data Visualization__
- * __Geospatial__
- * __Data Processing__
+ * __Geospatial__ (Hands on session on Friday)
 
 For the most recent versions of these trainings, visit the R-training GitHub repo at
 https://github.com/worldbank/dime-r-training
@@ -91,7 +107,6 @@ Some advantages of R over Stata:
 Some possible disadvantages of R:
 
   * Higher cost of entry than Stata.
-    + That doesn't mean that the learning curve is steeper all the way up!
   * Stata is more specialized:
     + Certain common tasks are simpler in Stata.
   * Stata has wider adoption among micro-econometricians.
@@ -123,25 +138,12 @@ Python is even more flexible and has more users than R. So, why should I bother
 
   * Despite being super popular for data science, Python has fewer libraries developed for econometrics.
 
-  * Python still cannot do everything Stata does without some trouble, R can.
+  * Python is a bit harder to set up and get started.
 
-  * R and Python are very similar, specially if your background is in Stata.
-
+  * It can be a harder to find help only for statistics and econometrics especially for beginners.
 
 # Getting started
 
-## Getting started
-
-This training requires that you have R installed in your computer:
-
-### Installation
-
- * Please visit (https://cran.r-project.org) and select a Comprehensive R Archive Network (CRAN) mirror close to you.
-
- * If you're in the US, you can directly visit the mirror at Berkley university at (https://cran.cnr.berkeley.edu).
-
- * we also strongly suggest installing R studio. You can get it in (https://www.rstudio.com/), but you need to install R first.
-
 
 ## Getting started
 
@@ -228,23 +230,29 @@ Let's start by loading the data set we'll be using:
 
  * If you wish to do any non-permanent changes to your data, you'll need to preserve the original data to keep it intact.
 
- * R works in a completely different way: you can have as many datasets (objects) as you wish (or your computer's memory allows) and operations will only have lasting effects if you store them.
+
+## Data in R
+
+ R works in a completely different way: 
+
+ * You can have as many datasets (objects) as you wish or your computer's memory allows.
+
+ * Operations will only have lasting effects if you store them.
 
 ## Data in R
 
 * Everything that exists in R's memory -- variables, datasets, functions -- is an object.
 
-* You could think of an object like a chunk of data stored in the memory that has a name by which you call it (exactly like macros in Stata).
+* You could think of an object like a chunk of data with some properties that has a name by which you call it.
 
 * If you create an object, it is going to be stored in memory until you delete it or quit R.
 
 * Whenever you run anything you intend to use in the future, you need to store it as an object.
 
 
-
 ## Data in R
 
-To better understand the idea, we're going to use the data from the United Nations' World Happiness Report. First, let's take a look at the data.
+To better understand the idea, we're going to use the data we opened from the United Nations' World Happiness Report. First, let's take a look at the data.
 
 Type the following code to explore the data:
 ```{r, include = T, results = "hide"}
@@ -317,7 +325,7 @@ We can see that nothing happened to the original data. This happens because we d
 x <- 42
 ```
 
-From now on, *x* is associated with the stored value (until you replace it delete it or close R).
+From now on, *x* is associated with the stored value (until you replace it, delete it, or quit the R session).
 
 ## Data in R
 
@@ -380,6 +388,20 @@ You can also see that your environment pane now has two objects:
  3. Print (display) is built into R. If you execute any action without storing it, R will simply print the results of that action but won't save anything in the memory.
 
 
+# Functions
+
+## Quick intro to functions
+
+
+`head()`, `View()`, `subset()` and`read.csv()` are functions!
+
+  * Functions in R take named arguments (unlike in Stata that you have arguments and options).
+  * Usually the first argument is the obeject you want to use the function on, e.g. `subset(whr, ...)`
+  * Functions usually return values that you can store in an object, print or use directly as an argumet of another function.
+
+We will explore this ideas in depth in the next session.
+
+
 # R objects
 
 ## R objects
@@ -491,7 +513,7 @@ whr[22,"country"] # The same as whr$country[22]
 
 Lists are more complex objects that can contain many objects of different classes and dimensions.
 
-Lists are fancy and can have a lot of functionalities and attributes. They are the output of many functions and are used to construct complex objects.
+The outputs of many functions, a regression for example, are simmilar to lists.
 
 It would be beyond the scope of this introduction to go deep into them, but here's a quick example
 
@@ -711,8 +733,8 @@ help(summary)
 * Surviving graduate econometrics with R:
 https://thetarzan.wordpress.com/2011/05/24/surviving-graduate-econometrics-with-r-the-basics-1-of-8/
 
-* An Introduction to R at
-https://cran.r-project.org/
+* CRAN's manuals:
+https://cran.r-project.org/manuals.html
 
 * R programming in Coursera:
 https://www.coursera.org/learn/r-programming
@@ -738,7 +760,7 @@ https://www.r-graph-gallery.com/
 
 * R Graphics Cookbook - Winston Chang
 
-* R for Data Science - Hadley Wickha and Garrett Grolemund
+* R for Data Science - Hadley Wickham and Garrett Grolemund
 
 ----
 

diff --git a/Presentations/Lab 2 - Intro to R - Part II.Rmd b/Presentations/Lab 2 - Intro to R - Part II.Rmd
@@ -1,7 +1,7 @@
 ---
 title: "Intro to R -- Part II"
 subtitle: "R for Stata Users"
-date: "April, 2019"
+date: "June 2019"
 author: "Luiza Andrade, Leonardo Viotti & Rob Marty "
 output:
   beamer_presentation:
@@ -174,14 +174,18 @@ Let's test if that worked:
   dataWorkFolder
 
 ```
-
+ 
 ## Loading a data set from CSV
+\begin{block}{\texttt{read.csv(file, header = FALSE)}}
 
-### ``read.csv(file, header = FALSE)``
+  \begin{itemize}
+    \item \textbf{file:} is the path to the file you want to open, including it's name and format (\texttt{.csv}).
+    \item \textbf{header:} if \texttt{TRUE}, will read the first row as variable names.
+    \item \textbf{stringsAsFactors:} logical. See next slide for more.
+
+  \end{itemize}
+\end{block}
 
- * **file**: is the path to the file you want to open, including it's name and format (``.csv``)
- * **header**: if `TRUE`, will read the first row as variable names
- * **stringsAsFactors:** logical. See next slide for more.
 
 ## Loading a data set from CSV
 
@@ -200,7 +204,7 @@ Let's test if that worked:
 
 3. Open the code you just saved.
 
-4. Add a line opening the data set in `PART 5` of your Master script
+4. Add a line opening the data set in your code
 ```{r, eval = F}
 # Load data set
 whr <- read.csv(file.path(finalData,"whr_panel.csv"),
@@ -235,7 +239,7 @@ Use some of the functions listed above to explore the `whr` data set.
 
 \footnotesize
 ```{r, eval = F}
-# View the data set (same as clickin on it in the Environment pane)
+# View the data set (same as clicking on it in the Environment pane)
 View(whr)
 ```
 
@@ -472,10 +476,10 @@ RStudio's default is to print warning messages, but not stop the code at the lin
 
 ## Looping
 
-  * ``sapply(X, FUN, ...)``: applies a function to all elements of a vector or list and returns the result in a vector. Its arguments are
-    * **X:** a matrix (or data frame) the function will be applied to
-    * **FUN:** the function you want to apply
-    * **...:** possible function options
+### ``sapply(X, FUN, ...)``: applies a function to all elements of a vector or list and returns the result in a vector. Its arguments are
+* **X:** a matrix (or data frame) the function will be applied to
+* **FUN:** the function you want to apply
+* **...:** possible function options
 
 ## Looping
 
@@ -493,11 +497,11 @@ RStudio's default is to print warning messages, but not stop the code at the lin
 
 A more general version is the `apply` function.
 
-  * ``apply(X, MARGIN, FUN, ...)``: applies a function to all columns or rows of matrix. Its arguments are
-    * **X:** a matrix (or data frame) the function will be applied to
-    * **MARGIN:** 1 to apply the function to all rows or 2 to apply the function to all columns
-    * **FUN:** the function you want to apply
-    * **...:** possible function options
+### ``apply(X, MARGIN, FUN, ...)``: applies a function to all columns or rows of matrix. Its arguments are
+* **X:** a matrix (or data frame) the function will be applied to
+* **MARGIN:** 1 to apply the function to all rows or 2 to apply the function to all columns
+* **FUN:** the function you want to apply
+* **...:** possible function options
 
 ## Looping
 

diff --git a/Presentations/Lab 6 - Data Processing.Rmd → Presentations/Lab 3 - Data Processing.Rmd b/Presentations/Lab 6 - Data Processing.Rmd → Presentations/Lab 3 - Data Processing.Rmd
@@ -1,7 +1,7 @@
 ---
 title: "Data Processing"
 subtitle: "R for Stata Users"
-date: "May 2019"
+date: "June 2019"
 author: "Luiza Andrade, Leonardo Viotti & Rob Marty"
 output:
   beamer_presentation:
@@ -138,7 +138,7 @@ Here's a how you can do that:
 ### Exercise 1: Load data
 Use the `read.csv` function to load the three `WHR` data sets from `DataWork > DataSets > Raw`. Create an object called `whrYY` with each data set.
 
- * TIP 1: use the ``file.path()`` function and the ``rawData`` object created in the master to simplify the folder path.
+ * TIP 1: use the ``file.path()`` function to simplify the folder path.
  * TIP 2: for this data set, we want to read strings as strings, not factors.
 
 
@@ -443,6 +443,15 @@ whr17$Region[whr17$Country %in% c("Mozambique",
 any(is.na(whr17$Region))
 ```
 
+## Missing values
+
+Unlike in Stata, R never\footnote{Of course, there might be an obscure package with a function that does this or you can write your on function. But base R and all the major packages don't and we never came across any function on CRAN that does.} treats missings as zeros by default in any function.
+
+ * If your vector (column or row in your dataset) has at least one `NA`, any function that takes it as an argument will return `NA`.
+ * If you wish to treat missings as zeros or ignore them, you need to explicitly do it.
+ * E.g. `mean(myVector, na.rm = T)` or `rowSums(myDataFrame, na.rm = T)`
+
+
 ## Renaming variables
 
 The second problem we found, of different names for the same variable in different data sets, can be easily fixed with the rename function:
@@ -605,7 +614,7 @@ whr_panel$happy_high <-
 
   * The `tidyverse` function `mutate` make this process simpler
 
-## Creating variables base on a formula
+## Creating variables based on a formula
 
 ### `mutate(.data, ...)`
 
@@ -615,7 +624,7 @@ Adds new variables and preserves existing
   * **...:**    name-value pairs of expressions. Use NULL to drop a variable
 
 
-## Creating variables base on a formula
+## Creating variables based on a formula
 
 ### Exercise 10: Create a variable based on a formula
 Use the `mutate` function to create a variable called `happy_high` in the `whr_panel` data set indicating whether the `happy_score` is above the median.
@@ -799,7 +808,7 @@ head(happy_long)
 
 ## `melt`: reshape from wide to long
 
-### `melt(data, id.vars, measure.vars)
+### `melt(data, id.vars, measure.vars)`
 
   * **data: ** a **data.table** object to melt
   * **id.vars:** a vector of unique IDs in `data`