You are hired as an outside consultant by the Organization of Economic Non-Cooperation for Development (OENCD), they are interested in studying the effect of a carbon tax on national carbon dioxide emissions per capita. You are provided with data for the twenty-members of the organization from 2009 to 2019. The data can be called a balanced panel based on the description given in the lecture (i.e. each panel member is observed every year)
set.seed(42) #for consistent results
library(dplyr) # to wrangle our data
library(tidyr) # to wrangle our data - pivot_longer()
library(ggplot2) # to render our graphs
library(readr) # for loading the .csv data
library(lmtest) # to gather our clustered standard errors - coeftest()
library(plm) # to gather our clustered standard errors - vcovHC() and plm()
library(kableExtra) # to render better formatted tables
library(stargazer) # for formatting your model output
library(lubridate) # for working with dates-times in a more intuitive manner
carbon_tax_df <- readr::read_csv("https://raw.githubusercontent.com/seramirezruiz/stats-ii-lab/master/Session%207/data/carbon_tax_df.csv") # simulated data
names(carbon_tax_df) # to check the names of the variables in our data
## [1] "country_name" "country_code" "year" "tax"
## [5] "income_class" "co2_per_capita"
Our dataset carbon_tax_df, contains the following information:
country_name
: Name of the countrycountry_code
: Three-letter country codeyear
: Yeartax
: Dummy for whether the carbon tax was in placeincome_class
: Categorical variable with income label (Low to High)co2_per_capita
: carbon dioxide emissions per capita in metric tons (T)
We can use what we have learnt about the base R table() function, to check how many countries had a tax
in place every year.
table(carbon_tax_df$tax, carbon_tax_df$year) %>%
knitr::kable() %>% # create kable table
kableExtra::kable_styling() # view kable table
2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12 | 12 | 10 | 9 | 8 | 8 | 5 | 4 | 4 | 4 | 5 |
1 | 0 | 0 | 2 | 3 | 4 | 4 | 7 | 8 | 8 | 8 | 7 |
ggplot(carbon_tax_df, aes(x = factor(year),
y= co2_per_capita,
color = factor(tax))) +
geom_point() + #create scatterplot
theme(legend.position="bottom") + #move legend to the bottom
labs(title = "Exploratory plot of CO2 emissions per capita",
x = "Year",
y = "CO2 emissions in metric tons (T)",
color = "Carbon tax") +
theme_minimal() +
scale_color_discrete(labels = c("No", "In place")) #change labels of the legend
carbon_tax_df %>%
dplyr::mutate(year_as_date = lubridate::ymd(year, truncated = 2L), #turning numeric to date format
income_class = factor(carbon_tax_df$income_class,
levels = c("High", "High-Middle",
"Low-Middle", "Low"))) %>% #reordering the factors
ggplot(., aes(x = year_as_date,
y= co2_per_capita,
color = factor(tax))) +
geom_point() + #create scatterplot
geom_path(aes(group = 1)) + #to render consecutive lines disregarding the tax (you will likely use geom_line() for the assignment)
facet_wrap(country_name~income_class) + #to split plot into panels based on this dimension
scale_x_date(date_labels = "%Y") + #telling ggplot that we want the ticks to be the years in the dates
labs(title = "Exploratory plot of CO2 emissions per capita",
x = "Year",
y = "CO2 emissions in metric tons (T)",
color = "Carbon tax") +
theme_bw() +
theme(legend.position="bottom") + #move legend to the bottom
scale_color_discrete(labels = c("No", "In place")) #change labels of the legend
Note: The exploratory data analysis portions of our scripts will not transfer directly to this week's assignment; however, they will be very useful for the extension portion of your final replication paper. Summarizing, graphing, and exploring your data will be critical to discover patterns, to spot anomalies, and to check assumptions
In the example at hand, we can think of unit-specific factors as characteristics of individual countries that are constant over time (e.g. a country that just loves big pick-up trucks). We can also think about time-specific factors that affect all countries (e.g. global economic shocks).
We can formulate this as:
\[Y_{it} = β_0 + β_1D_{it} + \theta_{i} + \delta_t + \upsilon_{it}\] where \(\theta_i\) reflects the time-invariant traits of the units, \(\delta_t\) reflects the time-specific factors that affect everyone and \(\upsilon_{it}\) is the idiosyncratic error
We will move forward by creating three models:
co2_per_capita
on tax
.
naive_carbon <- lm(co2_per_capita ~ tax, data = carbon_tax_df)
pooled_naive_carbon <- plm(co2_per_capita ~ tax, data = carbon_tax_df, model = "pooling")
stargazer(naive_carbon, type = "text")
=============================================== Dependent variable:
--------------------------- co2_per_capita
----------------------------------------------- tax -6.261***
(0.566)
Constant 10.627***
(0.352)
Observations 132
R2 0.485
Adjusted R2 0.481
Residual Std. Error 3.166 (df = 130)
F Statistic 122.394*** (df = 1; 130)
=============================================== Note: p<0.1; p<0.05; p<0.01
stargazer(pooled_naive_carbon, type = "text")
======================================== Dependent variable:
--------------------------- co2_per_capita
---------------------------------------- tax -6.261***
(0.566)
Constant 10.627***
(0.352)
Observations 132
R2 0.485
Adjusted R2 0.481
F Statistic 122.394*** (df = 1; 130)
======================================== Note: p<0.1; p<0.05; p<0.01
This model is telling us that on average, the \(CO_2\) emmissions per capita are reduced by 6.2 metric tons when a carbon tax is put in place. Still, after all the work we have done throughout the semester, we understand that there may be a plethora of factors that could be skewing the results of this bivariate regression.
We will learn two ways of gathering unit- and time-fixed effects in R. First, we will perform Least Squares Dummy Variables (LSDV) estimation with lm(), where we essentially get an individual estimate for each unit. Second, we will run our model with plm(), which will do the same mechanics, yet it will not estimate each of the units intercepts (because it relies on the "de-meaning" approach).
lsdv_unit_fe <- lm(co2_per_capita ~ tax + country_name, data = carbon_tax_df)
unit_fe <- plm(co2_per_capita ~ tax,
data = carbon_tax_df,
index = c("country_name","year"),
effect = "individual",
model = "within")
stargazer(lsdv_unit_fe, type = "text")
================================================== Dependent variable:
--------------------------- co2_per_capita
-------------------------------------------------- tax -4.436***
(0.474)
country_nameBorovia 1.507
(0.912)
country_nameCarpania 5.106***
(0.912)
country_nameCorinthia 6.434***
(0.887)
country_nameFreedonia 6.120***
(0.903)
country_nameGlenraven 0.372
(0.951)
country_nameKhemed -0.672
(0.951)
country_nameLaurania 1.533
(0.937)
country_nameParano 3.693***
(0.887)
country_nameRon 2.715***
(0.912)
country_nameRumekistan 7.431***
(0.887)
country_nameTransia 1.883*
(0.968)
Constant 6.912***
(0.627)
Observations 132
R2 0.797
Adjusted R2 0.776
Residual Std. Error 2.079 (df = 119)
F Statistic 38.844*** (df = 12; 119)
================================================== Note: p<0.1; p<0.05; p<0.01
stargazer(unit_fe, type = "text")
======================================== Dependent variable:
--------------------------- co2_per_capita
---------------------------------------- tax -4.436***
(0.474)
Observations 132
R2 0.424
Adjusted R2 0.366
F Statistic 87.716*** (df = 1; 119)
======================================== Note: p<0.1; p<0.05; p<0.01
Adding unit-level fixed effects to the model, i.e. accounting for unobserved, time-invariant characteristics of countries and only focusing on within-state variation, an increase in the imposition of a carbon tax reduces \(CO_2\) per capita emissions by 4.44 metric tons. Once we have captured the variation between countries, we can see that our results from the naive model were substantially biased. We can still try to capture the time-specific portion of the error.
The results from the Least Squares Dummy Variables (LSDV) estimation are read in reference to a baseline. In this case, the constant is representing the intercept for Adjikistan. We can utilize the individual slopes for each country to say that Freedonians emit on average 5.93 more metric tons of \(CO_2\) per capita than Adjikistanians.
We will perform our regressions with Least Squares Dummy Variables (LSDV) estimation with lm() and our simplified way with plm().
lsdv_unit_time_fe <- lm(co2_per_capita ~ tax + country_name + factor(year),
data = carbon_tax_df)
unit_time_fe <- plm(co2_per_capita ~ tax,
data = carbon_tax_df,
index = c("country_name", "year"),
model = "within",
effect = "twoways")
stargazer(lsdv_unit_time_fe, type = "text")
================================================== Dependent variable:
--------------------------- co2_per_capita
-------------------------------------------------- tax -3.908***
(0.280)
country_nameBorovia 1.267***
(0.418)
country_nameCarpania 4.866***
(0.418)
country_nameCorinthia 6.434***
(0.398)
country_nameFreedonia 5.928***
(0.410)
country_nameGlenraven -0.012
(0.447)
country_nameKhemed -1.056**
(0.447)
country_nameLaurania 1.196***
(0.436)
country_nameParano 3.693***
(0.398)
country_nameRon 2.474***
(0.418)
country_nameRumekistan 7.431***
(0.398)
country_nameTransia 1.450***
(0.459)
factor(year)2010 -4.697***
(0.381)
factor(year)2011 1.890***
(0.384)
factor(year)2012 -3.061***
(0.387)
factor(year)2013 0.106
(0.392)
factor(year)2014 -1.878***
(0.392)
factor(year)2015 -3.018***
(0.414)
factor(year)2016 -3.292***
(0.424)
factor(year)2017 -0.963**
(0.424)
factor(year)2018 -2.488***
(0.424)
factor(year)2019 -1.399***
(0.414)
Constant 8.621***
(0.396)
Observations 132
R2 0.963
Adjusted R2 0.955
Residual Std. Error 0.933 (df = 109)
F Statistic 127.305*** (df = 22; 109) ================================================== Note: p<0.1; p<0.05; p<0.01
stargazer(unit_time_fe, type = "text")
======================================== Dependent variable:
--------------------------- co2_per_capita
---------------------------------------- tax -3.908***
(0.280)
Observations 132
R2 0.641
Adjusted R2 0.568
F Statistic 194.229*** (df = 1; 109)
======================================== Note: p<0.1; p<0.05; p<0.01
Now in addition to adding unit-level fixed effects to the model, we control for time-specific factors that affect the global per capita \(CO_2\) emissions. The results suggest that the effect of a carbon-tax leads to a decrease \(CO_2\) emissions of 3.91 metric tons per capita.
Assumption for errors is \(υ_{it} ∼ iid(0, σ_υ^{2})\).
With longer panels, serial correlation between errors is a problem: \(Cor(υ_{it}, υ_i(t−1))≠0\).
We can test for serial correlation in our time and unit FE specification model, as such:
pbgtest(unit_time_fe, order = 2)
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: co2_per_capita ~ tax
## chisq = 0.25101, df = 2, p-value = 0.8821
## alternative hypothesis: serial correlation in idiosyncratic errors
In this case, we do not find any serial correlation, so we do not need to correct our standard errors manually. If we were to find serial correlation, we could introduce robust standard errors with a similar syntax to the one used last week for clustered standard errors:
model_with_robust_se = coeftest(unit_time_fe, vcov=vcovHC(unit_time_fe, type = "sss"))
Your results are welcomed internationally and all states move forward with the measure.