Matching_and_Subclassification.Rmd

---
title: "Causal Inference: <br> *The Mixtape*"
subtitle: "<it>Matching and Subclassification</it>"
output: 
  learnr::tutorial:
    css: css/style.css
    highlight: "kate"
runtime: shiny_prerendered
---

## Welcome

This is material for the **Matching and Subclassification** chapter in Scott Cunningham's book, [Causal Inference: The Mixtape.](https://mixtape.scunning.com/)

### Packages needed

The first thing you need to do is install a few packages to make sure everything runs:

```{r, eval = FALSE}
install.packages("tidyverse")
install.packages("cli")
install.packages("haven")
install.packages("rmarkdown")
install.packages("learnr")
install.packages("haven")
install.packages("stargazer")

# For this chapter only
install.packages("MatchIt")
install.packages("cem")
install.packages("estimatr")
```

### Load

```{r load, warning=FALSE, message=FALSE}
library(learnr)
library(haven)
library(tidyverse)
library(stargazer)

# For this chapter only
library(MatchIt)
library(cem)
library(estimatr)

# 10 minute code time limit
options(tutorial.exercise.timelimit = 600)

# read_data function
read_data <- function(df) {
  full_path <- paste0("https://raw.github.com/scunning1975/mixtape/master/", df)
  return(haven::read_dta(full_path))
}
```


## Titatnic


```{r titanic_subclassification, exercise=TRUE, echo=FALSE}

## Simple Difference in Outcomes

titanic <- read_data("titanic.dta") %>% 
  mutate(d = case_when(class == 1 ~ 1, TRUE ~ 0))

ey1 <- titanic %>% 
  filter(d == 1) %>%
  pull(survived) %>% 
  mean()

ey0 <- titanic %>% 
  filter(d == 0) %>%
  pull(survived) %>% 
  mean()

sdo <- ey1 - ey0

cli::cli_text("The {.emph simple difference in outcomes} estimate is {round(sdo,2)}")


## Weighted Average Treatment Effect 
titanic <- titanic %>% 
  mutate(s = case_when(sex == 0 & age == 1 ~ 1,
                       sex == 0 & age == 0 ~ 2,
                       sex == 1 & age == 1 ~ 3,
                       sex == 1 & age == 0 ~ 4,
                       TRUE ~ 0))

ey11 <- titanic %>% 
  filter(s == 1 & d == 1) %>%
  pull(survived) %>% mean()

ey10 <- titanic %>% 
  filter(s == 1 & d == 0) %>%
  pull(survived) %>% mean()

ey21 <- titanic %>% 
  filter(s == 2 & d == 1) %>%
  pull(survived) %>% mean()

ey20 <- titanic %>% 
  filter(s == 2 & d == 0) %>%
  pull(survived) %>% mean()

ey31 <- titanic %>% 
  filter(s == 3 & d == 1) %>%
  pull(survived) %>% mean()

ey30 <- titanic %>% 
  filter(s == 3 & d == 0) %>%
  pull(survived) %>% mean()

ey41 <- titanic %>% 
  filter(s == 4 & d == 1) %>%
  pull(survived) %>% mean()

ey40 <- titanic %>% 
  filter(s == 4 & d == 0) %>%
  pull(survived) %>% mean()

diff1 = ey11 - ey10
diff2 = ey21 - ey20
diff3 = ey31 - ey30
diff4 = ey41 - ey40

obs = nrow(titanic)

wt1 <- titanic %>% 
  filter(s == 1 & d == 0) %>%
  nrow(.)/obs

wt2 <- titanic %>% 
  filter(s == 2 & d == 0) %>%
  nrow(.)/obs

wt3 <- titanic %>% 
  filter(s == 3 & d == 0) %>%
  nrow(.)/obs

wt4 <- titanic %>% 
  filter(s == 4 & d == 0) %>%
  nrow(.)/obs

wate = diff1*wt1 + diff2*wt2 + diff3*wt3 + diff4*wt4

cli::cli_text("The {.emph weigthted average treatment effect} estimate is {round(wate,2)}")


```

#### Questions
- Using the simple difference in outcomes, how much does the probability of survival increase for first-class passengers relative to some control group?
- Explain in your own words what stratifying on gender and age did for this difference in outcomes between treatment and control?
- After stratifying on gender and age, what happens to the difference in probability of survival between first-class and non-first-class passengers?


## Training Example

First, we will look at the distribution of age between the treated and non-treated groups

```{r training}

training_example <- read_data("training_example.dta") %>% 
  slice(1:20)

ggplot(training_example, aes(x=age_treat)) +
  stat_bin(bins = 10, na.rm = TRUE)

ggplot(training_example, aes(x=age_control)) +
  geom_histogram(bins = 10, na.rm = TRUE)

```

#### Questions

- Compare the distribution of ages between the treated and the control groups. How do they differ, if at all?


```{r training_bias_reduction, exercise=TRUE, echo=FALSE}

training_bias_reduction <- read_data("training_bias_reduction.dta") %>% 
    mutate(
      Y1 = case_when(Unit %in% c(1,2,3,4) ~ Y),
      Y0 = c(4,0,5,1,4,0,5,1)
    )

train_reg <- lm(Y ~ X, training_bias_reduction)

training_bias_reduction <- training_bias_reduction %>% 
  mutate(u_hat0 = predict(train_reg))


training_bias_reduction %>% 
  select(Unit, Y1, Y0, Y, D, X, u_hat0)


```


## National Supported Work Demonstration Experiment

To compare results, let's first look at the treatment effect identified by a true experiment.

```{r nsw_experimental, exercise=TRUE, echo=FALSE}

nsw_dw <- read_data("nsw_mixtape.dta")

mean1 <- nsw_dw %>% 
  filter(treat == 1) %>% 
  pull(re78) %>% 
  mean()

mean0 <- nsw_dw %>% 
  filter(treat == 0) %>% 
  pull(re78) %>% 
  mean()


ate <- unique(mean1 - mean0)

cli::cli_text("The {.emph experimental ATE} estimate is {round(ate,2)}")


```

#### Questions
- How do you interpret the above estimated ATE?
- Say you were interested in the ATT.  Can you report the ATT from a randomized experiment?  If so, what is it? If not, why not?


Now, lets turn to a non-experimental control group. We first have to load the data from the CPS. and estimate the propensity score

```{r}

# Prepare data for logit 
nsw_dw_cpscontrol <- read_data("cps_mixtape.dta") %>% 
  bind_rows(nsw_dw) %>% 
  mutate(
    agesq = age^2,
    agecube = age^3,
    educsq = educ*educ,
    u74 = case_when(re74 == 0 ~ 1, TRUE ~ 0),
    u75 = case_when(re75 == 0 ~ 1, TRUE ~ 0),
    interaction1 = educ*re74,
    re74sq = re74^2,
    re75sq = re75^2,
    interaction2 = u74*hisp
  )

# estimating propensity score
logit_nsw <- glm(treat ~ age + agesq + agecube + educ + educsq + 
                   marr + nodegree + black + hisp + re74 + re75 + u74 +
                   u75 + interaction1, family = binomial(link = "logit"), 
                 data = nsw_dw_cpscontrol)

nsw_dw_cpscontrol <- nsw_dw_cpscontrol %>% 
  mutate(pscore = logit_nsw$fitted.values)

```


```{r nsw_pscore, exercise=TRUE, echo=FALSE}

# mean pscore 
pscore_control <- nsw_dw_cpscontrol %>% 
  filter(treat == 0) %>% 
  pull(pscore) %>% 
  mean()

pscore_treated <- nsw_dw_cpscontrol %>% 
  filter(treat == 1) %>% 
  pull(pscore) %>% 
  mean()

cli::cli_text("The mean propensity score among the treated group is {round(pscore_treated, 2)} and among the control group is {round(pscore_control, 2)}")


# histogram
nsw_dw_cpscontrol %>% 
  filter(treat == 0) %>% 
  ggplot() +
  geom_histogram(aes(x = pscore))

nsw_dw_cpscontrol %>% 
  filter(treat == 1) %>% 
  ggplot() +
  geom_histogram(aes(x = pscore))

```

#### Questions
- Compare the mean propensity score between the treated and the control groups. What does this reveal about the two groups?
- Compare the distribution of propensity scores between the treated and the control groups. How do they differ, if at all?


```{r nsw_ipw, exercise=TRUE, echo=FALSE}


#continuation
N <- nrow(nsw_dw_cpscontrol)

# Manual with non-normalized weights using all data
nsw_dw_cpscontrol <- nsw_dw_cpscontrol %>% 
  mutate(d1 = treat/pscore,
         d0 = (1-treat)/(1-pscore))

s1 <- sum(nsw_dw_cpscontrol$d1)
s0 <- sum(nsw_dw_cpscontrol$d0)

nsw_dw_cpscontrol <- nsw_dw_cpscontrol %>% 
  mutate(y1 = treat * re78/pscore,
         y0 = (1-treat) * re78/(1-pscore),
         ht = y1 - y0)


te_1 <- nsw_dw_cpscontrol %>% 
  pull(ht) %>% 
  mean()

cli::cli_text("Treatment Effect {.emph (non-normalized, all data)}: {round(te_1, 2)}")

# Manual with normalized weights
nsw_dw_cpscontrol <- nsw_dw_cpscontrol %>% 
  mutate(y1 = (treat*re78/pscore)/(s1/N),
         y0 = ((1-treat)*re78/(1-pscore))/(s0/N),
         norm = y1 - y0)

te_2 <- nsw_dw_cpscontrol %>% 
  pull(norm) %>% 
  mean()

cli::cli_text("Treatment Effect {.emph (normalized, all data)}: {round(te_2, 2)}")


# Trimming propensity score ---------------
nsw_dw_trimmed <- nsw_dw_cpscontrol %>% 
  select(-d1, -d0, -y1, -y0, -ht, -norm) %>% 
  filter(!(pscore >= 0.9)) %>% 
  filter(!(pscore <= 0.1))

N <- nrow(nsw_dw_trimmed)

# Manual with non-normalized weights using trimmed data
nsw_dw_trimmed <- nsw_dw_trimmed %>% 
  mutate(d1 = treat/pscore,
         d0 = (1-treat)/(1-pscore))

s1 <- sum(nsw_dw_trimmed$d1)
s0 <- sum(nsw_dw_trimmed$d0)

nsw_dw_trimmed <- nsw_dw_trimmed %>% 
  mutate(y1 = treat * re78/pscore,
         y0 = (1-treat) * re78/(1-pscore),
         ht = y1 - y0)


te_3 <- nsw_dw_trimmed %>% 
  pull(ht) %>% 
  mean()

cli::cli_text("Treatment Effect {.emph (non-normalized, trimmed data)}: {round(te_3, 2)}")

# Manual with normalized weights with trimmed data
nsw_dw_trimmed <- nsw_dw_trimmed %>% 
  mutate(y1 = (treat*re78/pscore)/(s1/N),
         y0 = ((1-treat)*re78/(1-pscore))/(s0/N),
         norm = y1 - y0)

te_4 <- nsw_dw_trimmed %>% 
  pull(norm) %>% 
  mean()

cli::cli_text("Treatment Effect {.emph (normalized, trimmed data)}: {round(te_4, 2)}")

```


#### Questions

- Explain the overlap condition in the context of these data.  How did we ensure that overlap held in the data? 
- When we are using non-trimmed data, why is the treatment effect negative? (*hint:* it has to do with extreme probability scores)
- What does this imply about the challenges of using non-experimental data when estimating causal effects, and why is conditioning on a trimmed propensity score important?


## Nearest-Neighbor Matching


```{r nn, exercise=TRUE, echo=FALSE}}
m_out <- matchit(treat ~ age + agesq + agecube + educ +
                   educsq + marr + nodegree +
                   black + hisp + re74 + re75 + 
                   u74 + u75 + interaction1,
                 data = nsw_dw_cpscontrol, 
                 method = "nearest", min.controls=5)

m_data <- match.data(m_out)

(m_ate <- lm_robust(re78 ~ treat, 
               data = m_data,
               weights = m_data$weights))


```


```{r cem, exercise=TRUE, echo=FALSE}

m_out <- matchit(treat ~ age + agesq + agecube + educ +
                   educsq + marr + nodegree +
                   black + hisp + re74 + re75 + 
                   u74 + u75 + interaction1,
                 data = nsw_dw_cpscontrol, 
                 method = "cem", min.controls=5)

m_data <- match.data(m_out)

(m_ate <- lm_robust(re78 ~ treat, 
               data = m_data,
               weights = m_data$weights))


```

#### Questions
- Compare our results from nearest-neighbor matching to what we found using the experimental data, the simple difference in outcomes using non-experimental controls, and propensity score weighting using non-experimental controls.
- DIFFICULT: Write a program that performs bootstrapping to get an estimate of the variance of the estimator. (HINT: Write a loop)