Skip to content

Commit

Permalink
Merge pull request #39 from worldbank/fct19-dev
Browse files Browse the repository at this point in the history
Fct19 dev
  • Loading branch information
luizaandrade authored Jul 25, 2019
2 parents 955df21 + 5926e6c commit c8e072b
Show file tree
Hide file tree
Showing 18 changed files with 1,946 additions and 591 deletions.
Binary file removed DataWork/DataSets/DG1__DS_DIR_HDR
Binary file not shown.
Binary file removed DataWork/DataSets/Final/DG1__DS_DIR_HDR
Binary file not shown.
Binary file modified DataWork/DataSets/Final/whr_panel.Rda
Binary file not shown.
941 changes: 471 additions & 470 deletions DataWork/DataSets/Final/whr_panel.csv

Large diffs are not rendered by default.

Binary file modified DataWork/DataSets/Final/whr_panel.dta
Binary file not shown.
Binary file removed DataWork/DataSets/Raw/DG1__DS_DIR_HDR
Binary file not shown.
2 changes: 1 addition & 1 deletion DataWork/Output/Raw/desc_table.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Tue, Apr 23, 2019 - 9:28:13 AM
% Date and time: Thu, Jun 13, 2019 - 10:22:57 AM
\begin{table}[!htbp] \centering
\caption{}
\label{}
Expand Down
78 changes: 50 additions & 28 deletions Presentations/Lab 1 - Intro to R - Part I.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Introduction I - R basics"
subtitle: "R for Stata Users"
date: "April 18"
date: "June 2019"
author: "Luiza Andrade, Leonardo Viotti & Rob Marty "
output:
beamer_presentation:
Expand Down Expand Up @@ -52,8 +52,24 @@ whr <- read.csv(file.path(finalData,"whr_panel.csv"),
```

# Installation

## Installation

This training requires that you have R installed in your computer:

### Instructions

* Please visit (https://cran.r-project.org) and select a Comprehensive R Archive Network (CRAN) mirror close to you.

* If you're in the US, you can directly visit the mirror at Berkley university at (https://cran.cnr.berkeley.edu).

* we also strongly suggest installing R studio. You can get it in (https://www.rstudio.com/), but you need to install R first.


# Introduction


## Introduction

These training sessions will offer a quick introduction to R, its amazing features and why it is so much better than Stata.
Expand All @@ -65,10 +81,10 @@ This first session will present the basic concepts you will need to use R.
The next sessions will include:

* __Introduction to R part II__
* __Data Processing__
* __Descriptive Analysis__
* __Data Visualization__
* __Geospatial__
* __Data Processing__
* __Geospatial__ (Hands on session on Friday)

For the most recent versions of these trainings, visit the R-training GitHub repo at
https://github.com/worldbank/dime-r-training
Expand All @@ -91,7 +107,6 @@ Some advantages of R over Stata:
Some possible disadvantages of R:

* Higher cost of entry than Stata.
+ That doesn't mean that the learning curve is steeper all the way up!
* Stata is more specialized:
+ Certain common tasks are simpler in Stata.
* Stata has wider adoption among micro-econometricians.
Expand Down Expand Up @@ -123,25 +138,12 @@ Python is even more flexible and has more users than R. So, why should I bother

* Despite being super popular for data science, Python has fewer libraries developed for econometrics.

* Python still cannot do everything Stata does without some trouble, R can.
* Python is a bit harder to set up and get started.

* R and Python are very similar, specially if your background is in Stata.

* It can be a harder to find help only for statistics and econometrics especially for beginners.

# Getting started

## Getting started

This training requires that you have R installed in your computer:

### Installation

* Please visit (https://cran.r-project.org) and select a Comprehensive R Archive Network (CRAN) mirror close to you.

* If you're in the US, you can directly visit the mirror at Berkley university at (https://cran.cnr.berkeley.edu).

* we also strongly suggest installing R studio. You can get it in (https://www.rstudio.com/), but you need to install R first.


## Getting started

Expand Down Expand Up @@ -228,23 +230,29 @@ Let's start by loading the data set we'll be using:

* If you wish to do any non-permanent changes to your data, you'll need to preserve the original data to keep it intact.

* R works in a completely different way: you can have as many datasets (objects) as you wish (or your computer's memory allows) and operations will only have lasting effects if you store them.

## Data in R

R works in a completely different way:

* You can have as many datasets (objects) as you wish or your computer's memory allows.

* Operations will only have lasting effects if you store them.

## Data in R

* Everything that exists in R's memory -- variables, datasets, functions -- is an object.

* You could think of an object like a chunk of data stored in the memory that has a name by which you call it (exactly like macros in Stata).
* You could think of an object like a chunk of data with some properties that has a name by which you call it.

* If you create an object, it is going to be stored in memory until you delete it or quit R.

* Whenever you run anything you intend to use in the future, you need to store it as an object.



## Data in R

To better understand the idea, we're going to use the data from the United Nations' World Happiness Report. First, let's take a look at the data.
To better understand the idea, we're going to use the data we opened from the United Nations' World Happiness Report. First, let's take a look at the data.

Type the following code to explore the data:
```{r, include = T, results = "hide"}
Expand Down Expand Up @@ -317,7 +325,7 @@ We can see that nothing happened to the original data. This happens because we d
x <- 42
```

From now on, *x* is associated with the stored value (until you replace it delete it or close R).
From now on, *x* is associated with the stored value (until you replace it, delete it, or quit the R session).

## Data in R

Expand Down Expand Up @@ -380,6 +388,20 @@ You can also see that your environment pane now has two objects:
3. Print (display) is built into R. If you execute any action without storing it, R will simply print the results of that action but won't save anything in the memory.


# Functions

## Quick intro to functions


`head()`, `View()`, `subset()` and`read.csv()` are functions!

* Functions in R take named arguments (unlike in Stata that you have arguments and options).
* Usually the first argument is the obeject you want to use the function on, e.g. `subset(whr, ...)`
* Functions usually return values that you can store in an object, print or use directly as an argumet of another function.

We will explore this ideas in depth in the next session.


# R objects

## R objects
Expand Down Expand Up @@ -491,7 +513,7 @@ whr[22,"country"] # The same as whr$country[22]

Lists are more complex objects that can contain many objects of different classes and dimensions.

Lists are fancy and can have a lot of functionalities and attributes. They are the output of many functions and are used to construct complex objects.
The outputs of many functions, a regression for example, are simmilar to lists.

It would be beyond the scope of this introduction to go deep into them, but here's a quick example

Expand Down Expand Up @@ -711,8 +733,8 @@ help(summary)
* Surviving graduate econometrics with R:
https://thetarzan.wordpress.com/2011/05/24/surviving-graduate-econometrics-with-r-the-basics-1-of-8/

* An Introduction to R at
https://cran.r-project.org/
* CRAN's manuals:
https://cran.r-project.org/manuals.html

* R programming in Coursera:
https://www.coursera.org/learn/r-programming
Expand All @@ -738,7 +760,7 @@ https://www.r-graph-gallery.com/

* R Graphics Cookbook - Winston Chang

* R for Data Science - Hadley Wickha and Garrett Grolemund
* R for Data Science - Hadley Wickham and Garrett Grolemund

----

Expand Down
38 changes: 21 additions & 17 deletions Presentations/Lab 2 - Intro to R - Part II.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Intro to R -- Part II"
subtitle: "R for Stata Users"
date: "April, 2019"
date: "June 2019"
author: "Luiza Andrade, Leonardo Viotti & Rob Marty "
output:
beamer_presentation:
Expand Down Expand Up @@ -174,14 +174,18 @@ Let's test if that worked:
dataWorkFolder
```

## Loading a data set from CSV
\begin{block}{\texttt{read.csv(file, header = FALSE)}}

### ``read.csv(file, header = FALSE)``
\begin{itemize}
\item \textbf{file:} is the path to the file you want to open, including it's name and format (\texttt{.csv}).
\item \textbf{header:} if \texttt{TRUE}, will read the first row as variable names.
\item \textbf{stringsAsFactors:} logical. See next slide for more.

\end{itemize}
\end{block}

* **file**: is the path to the file you want to open, including it's name and format (``.csv``)
* **header**: if `TRUE`, will read the first row as variable names
* **stringsAsFactors:** logical. See next slide for more.

## Loading a data set from CSV

Expand All @@ -200,7 +204,7 @@ Let's test if that worked:

3. Open the code you just saved.

4. Add a line opening the data set in `PART 5` of your Master script
4. Add a line opening the data set in your code
```{r, eval = F}
# Load data set
whr <- read.csv(file.path(finalData,"whr_panel.csv"),
Expand Down Expand Up @@ -235,7 +239,7 @@ Use some of the functions listed above to explore the `whr` data set.

\footnotesize
```{r, eval = F}
# View the data set (same as clickin on it in the Environment pane)
# View the data set (same as clicking on it in the Environment pane)
View(whr)
```

Expand Down Expand Up @@ -472,10 +476,10 @@ RStudio's default is to print warning messages, but not stop the code at the lin

## Looping

* ``sapply(X, FUN, ...)``: applies a function to all elements of a vector or list and returns the result in a vector. Its arguments are
* **X:** a matrix (or data frame) the function will be applied to
* **FUN:** the function you want to apply
* **...:** possible function options
### ``sapply(X, FUN, ...)``: applies a function to all elements of a vector or list and returns the result in a vector. Its arguments are
* **X:** a matrix (or data frame) the function will be applied to
* **FUN:** the function you want to apply
* **...:** possible function options

## Looping

Expand All @@ -493,11 +497,11 @@ RStudio's default is to print warning messages, but not stop the code at the lin

A more general version is the `apply` function.

* ``apply(X, MARGIN, FUN, ...)``: applies a function to all columns or rows of matrix. Its arguments are
* **X:** a matrix (or data frame) the function will be applied to
* **MARGIN:** 1 to apply the function to all rows or 2 to apply the function to all columns
* **FUN:** the function you want to apply
* **...:** possible function options
### ``apply(X, MARGIN, FUN, ...)``: applies a function to all columns or rows of matrix. Its arguments are
* **X:** a matrix (or data frame) the function will be applied to
* **MARGIN:** 1 to apply the function to all rows or 2 to apply the function to all columns
* **FUN:** the function you want to apply
* **...:** possible function options

## Looping

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Data Processing"
subtitle: "R for Stata Users"
date: "May 2019"
date: "June 2019"
author: "Luiza Andrade, Leonardo Viotti & Rob Marty"
output:
beamer_presentation:
Expand Down Expand Up @@ -138,7 +138,7 @@ Here's a how you can do that:
### Exercise 1: Load data
Use the `read.csv` function to load the three `WHR` data sets from `DataWork > DataSets > Raw`. Create an object called `whrYY` with each data set.

* TIP 1: use the ``file.path()`` function and the ``rawData`` object created in the master to simplify the folder path.
* TIP 1: use the ``file.path()`` function to simplify the folder path.
* TIP 2: for this data set, we want to read strings as strings, not factors.


Expand Down Expand Up @@ -443,6 +443,15 @@ whr17$Region[whr17$Country %in% c("Mozambique",
any(is.na(whr17$Region))
```

## Missing values

Unlike in Stata, R never\footnote{Of course, there might be an obscure package with a function that does this or you can write your on function. But base R and all the major packages don't and we never came across any function on CRAN that does.} treats missings as zeros by default in any function.

* If your vector (column or row in your dataset) has at least one `NA`, any function that takes it as an argument will return `NA`.
* If you wish to treat missings as zeros or ignore them, you need to explicitly do it.
* E.g. `mean(myVector, na.rm = T)` or `rowSums(myDataFrame, na.rm = T)`


## Renaming variables

The second problem we found, of different names for the same variable in different data sets, can be easily fixed with the rename function:
Expand Down Expand Up @@ -605,7 +614,7 @@ whr_panel$happy_high <-

* The `tidyverse` function `mutate` make this process simpler

## Creating variables base on a formula
## Creating variables based on a formula

### `mutate(.data, ...)`

Expand All @@ -615,7 +624,7 @@ Adds new variables and preserves existing
* **...:** name-value pairs of expressions. Use NULL to drop a variable


## Creating variables base on a formula
## Creating variables based on a formula

### Exercise 10: Create a variable based on a formula
Use the `mutate` function to create a variable called `happy_high` in the `whr_panel` data set indicating whether the `happy_score` is above the median.
Expand Down Expand Up @@ -799,7 +808,7 @@ head(happy_long)

## `melt`: reshape from wide to long

### `melt(data, id.vars, measure.vars)
### `melt(data, id.vars, measure.vars)`

* **data: ** a **data.table** object to melt
* **id.vars:** a vector of unique IDs in `data`
Expand Down
Loading

0 comments on commit c8e072b

Please sign in to comment.