The Potential Outcomes Framework

The POF in practice

Let's revisit the example from our slides once again.

Say we are interested in assessing the premise of Allport's hypothesis about interpersonal contact being conducive to reducing intergroup prejudice.

We are studying a set of (\(n=8\)) students assigned to a dorm room with a person from their own ethnic group (contact=0) and from a different group (contact=1).

Student (i) Prejudice (C=0) Prejudice (C=1)
1 6 5
2 4 2
3 4 4
4 6 7
5 3 1
6 2 2
7 8 7
8 4 5

Data set

Today we will work with the prejudice_df object. The data frame contains the following four variables:

  • student_id: numeric student identification
  • prej_0: prejudice level under \(Y_{0i}\) (Contact=0)
  • prej_1: prejudice level under \(Y_{1i}\) (Contact=1)
  • dorm_type: binary for actual treatment state
## # A tibble: 8 x 4
##   student_id prej_0 prej_1 dorm_type
##        <dbl>  <dbl>  <dbl>     <dbl>
## 1          1      6      5         0
## 2          2      4      2         1
## 3          3      4      4         0
## 4          4      6      7         0
## 5          5      3      1         1
## 6          6      2      2         1
## 7          7      8      7         0
## 8          8      4      5         0

Treatment Effects

a) Individual Treatment Effect (ITE)

We assume from the potential outcomes framework that each subject has a potential outcome under both treatment states. Let's take the first student in the list as an example.

The figure illustrates the potential outcomes for Student 1.

We see that in a reality where Student 1 is assigned to in-group dorm (contact=0) their levels of prejudice are 6. On the contrary, in a reality where Student 1 is assigned to co-ethnic dorm (contact=1) their levels of prejudice are 5.

From this illustration, we can gather the individual treatment effect (ITE) for student one. The ITE is equal to the values under treatment (contact=1) minus to the values without treatment (contact=0) or \(ITE = y_{1i} - y_{0i}\).

\[ITE = 5 - 6 = -1\]

As it was put in Cunningham’s book:

The ITE is a “comparison of two states of the world” (Cunningham, 2021): individuals are exposed to contact, and not exposed to it.

Evidently, each subject can only be observed in one treatment state at any point in time in real life. This is known as the fundamental problem (Holland, 1986) of causal inference. The Individual Treatment Effect (ITE) in reality is unattainable. Still, it provides us with a conceptual foundation for causal estimation.

Exercise: Our data are coming from a world with perfect information. In that sense, we have both potential outcomes prej_0 and prej_1. Can you think of a way to calculate the ITE for the eight students with one of the dplyr verbs we learned in the previous section?

Hint

Can you think of a way we can use the verb mutate()

Answer

#you can employ dplyr::mutate() to create the new variable ite
prejudice_df %>% 
  dplyr::mutate(ite = prej_1 - prej_0)
student_id prej_0 prej_1 dorm_type ite
1 6 5 0 -1
2 4 2 1 -2
3 4 4 0 0
4 6 7 0 1
5 3 1 1 -2
6 2 2 1 0
7 8 7 0 -1
8 4 5 0 1


Average Treatment Effect (ATE)

Normally, we are not interested in the estimates of individual subjects, but rather a population. The Average Treatment Effect (ATE) is the difference in the average potential outcomes of the population.

\[ATE = E(Y_{1i}) - E(Y_{0i})\]

In other words, the ATE is the average ITE of all the subjects in the population. As you can see, the ATE as defined in the formula is also not attainable. Can you think why?

Exercise: Since our data are coming from a world with perfect information. Can you think of a way to calculate the ATE for the eight students based on what we learned last week?

Hint

We have already extracted the ite with mutate(). We know that the the ATE is the averge of every subject's ITE. Do you remember summarize()?

Answer

#we know that the ATE is the averge of every subject's ITE. Do you remember dplyr::summarize()?
#how can we use the verbs from last week to get the average treatment effect?

prejudice_df %>%
  dplyr::mutate(ite = prej_1 - prej_0) %>%
  dplyr::summarize(ate=mean(ite))
ate
-0.5


The Average Treatment Effect Among the Treated and Control (ATT) and (ATC)

The names for these two estimates are very self-explanatory. These two estimates are simply the average treatment effects conditional on the group subjects are assigned to.

The average treatment effect on the treated ATT is defined as the difference in the average potential outcomes for those subjects who were treated: \[ATT = E(Y_{1i}-Y_{0i} | D = 1)\]

The average treatment effect under control ATC is defined as the difference in the average potential outcomes for those subjects who were not treated: \[ATC = E(Y_{1i}-Y_{0i} | D = 0)\]

Exercise: Since our data are coming from a world with perfect information. Can you think of a way to calculate the ATT and ATC for the eight students based on what we learned last week?

Hint

We have already extracted the ite with mutate(). We know that the ATT and ATC are the average of every subject's ITE grouped by their treatment status. Do you remember how the combination of group_by() and summarize() worked?

Answer

#we know that the ATT and ATC are the average of every subject's ITE grouped by their treatment status. Do you remember how the combination of dplyr::group_by() and dplyr::summarize() worked?
#how can we use the verbs from last week to get the average treatment effect on the treated and untreated?

prejudice_df %>%
  dplyr::mutate(ite = prej_1 - prej_0) %>%
  dplyr::group_by(dorm_type) %>%
  dplyr::summarize(treatment_effects=mean(ite))
dorm_type treatment_effects
0 0.000000
1 -1.333333


The Naive Average Treatment Effect (NATE)

So far, we have worked with perfect information. Still, we know that in reality we can only observe subjects in one treatment state. This is the information we do have.

The Naive Average Treatment Effect (NATE) is the calculation we can compute based on the observed outcomes.

\[NATE = E(Y_{1i}|D{i}=1) - E(Y_{0i}|D{i}=0)\] *reads in English as: "The expected average outcome under treatment for those treated minus the expected average outcome under control for those not treated"

Exercise: Can you think of a way to calculate the NATE for the eight students employing the new observed_prej variable?

prejudice_df %>%
  dplyr::mutate(observed_prej = ifelse(dorm_type == 1, prej_1, prej_0))
student_id prej_0 prej_1 dorm_type observed_prej
1 6 5 0 6
2 4 2 1 2
3 4 4 0 4
4 6 7 0 6
5 3 1 1 1
6 2 2 1 2
7 8 7 0 8
8 4 5 0 4
Hint

We have already extracted the average observed outcomes depending on the treatment status with mutate(). We know that the NATE is the difference in average observed outcomes grouped by their treatment status. Do you remember how the combination of group_by() and summarize() worked?

Answer

#we know that the NATE is the difference in average observed outcomes grouped by their treatment status. Do you remember how the combination of dplyr::group_by() and dplyr::summarize() worked?

prejudice_df %>%
  dplyr::mutate(observed_prej = ifelse(dorm_type == 1, prej_1, prej_0)) %>%
  dplyr::group_by(dorm_type) %>%
  dplyr::summarize(mean(observed_prej))
  
#You can just substract the values
dorm_type mean(observed_prej)
0 5.600000
1 1.666667

You can just substract the values


Note. The ìfelse() function is a very handy tool to have. It allows us to generate conditional statements. The syntax is the following:

ifelse(condition_to_meet, what_to_do_if_met, what_to_do_if_not_met)

In the case of observed_prej, we ask R to create a new variable, where if the subject is in a co-ethnic dorm, we print the prejudice value under treatment. If that condition is not met, we print the prejudice value under control.



Bias

Bias

During the lecture, we met two sources of bias:


Baseline bias

Baseline bias—also known as selection bias— is difference in expected outcomes in the absence of treatment for the actual treatment and control group. In other words, these are the underlying differences that individuals in either group start off with.


Differential treatment effect bias

Differential treatment effect bias — also known as Heterogeneous Treatment Effect (HTE) bias — is the difference in returns to treatment (the treatment effect) between the treatment and control group, multiplied by the share of the population in control. In other words, this type of bias relates to the dissimilarities stemming for ways in which individuals in either group are affected differently by the treatment.

We will let you think about these for the mock assignment

Exercise: Since our data are coming from a world with perfect information. Can you think of a way to explore the existence baseline bias in our data?

Hint

We know that the baseline bias is the difference in average observed outcomes under control grouped by their treatment status. Do you remember how the combination of dplyr::group_by() and dplyr::summarize() worked?

Exercise: Since our data are coming from a world with perfect information. Can you think of a way to explore the existence differential treatment effect bias in our data?

Hint

We know that the differential treatment effect bias is the difference in difference in the average of every subject's ITE grouped by their treatment status (or the difference between ATT and ATCs). Maybe you can go back an check how to get the average treatment effect on the treated and untreated.

Previous