Replicating Angrist & Kreuger: Results & Further Analysis
Ankur Singh Chawla
December 14, 2015
I. Summary
Angrist & Kreuger (hereby referred to as A&K) published an important report in
1991 studying school attendance and returns to education. In a seminal paper entitled
Does Compulsory School Attendance Affect Schooling and Earnings, they studied
whether compulsory schooling in high school has an effect on both amount of school
completed and earnings later in life. The paper used a unique methodology: it focused on
students born in different times in the year, with the assumption that high schools in most
states do not let students drop out until their sixteenth or seventeenth birthday.1 Their
results were meaningful: the study found that students born in the first quarter of the year
have a slightly lower average level of education than children born later in the year. A&K
conducted analyses to provide evidence that this “quarter of birth effect” was due to
compulsory education (i.e. high school) rather than voluntary education (i.e. college).
They also performed a number of variations on their regression analyses of season of
birth and returns to education, which we will explore in this paper. Finally, they used
instruments and the 2SLS method to eliminate endogeneity and thus bias in OLS.
In this paper, we will replicate the key analyses A&K conducted. We will
additionally provide some of our own extensions of their work. Like A&K, we will use
three successive birth cohorts upon which to perform our tests.
1 It is important to note that the strong legal incentives for compulsory education (truancy laws, etc.) is an important assumption here, as noted by A&K.
II. Empirical Strategies
A&K hypothesized that returns to education are lower for people born in the
beginning of the year. Their reasoning was the following: most high schools require
students in a given grade to have their birthday by Jan 1 of the following year. So
students who are born earlier in the year are “older” in that they reach their sixteenth
birthday more quickly than their peers. A&K provide evidence for this claim in a few
ways. First, they simply graphed ‘quarter and year of birth’ against ‘years of completed
education’ for all three cohorts. In order to remove the effect of natural increases in
education levels in the cohorts, they subtracted a moving average from ‘education
attained’ at each quarter of birth to gain education differentials at each quarter in each
year. They put these differentials in a bar graph at each quarter of birth to visualize the
effects. A&K also regressed education against each quarter of birth for high school
graduates, college grads, master’s grads, and doctorate holders. This is important because
it demonstrates that the returns to education are lower in the first quarter only for high
school grads, which supports evidence that compulsory education has an effect on returns
to education.
A&K use a series of variations on OLS to depict returns to education. They
calculate OLS returns to education using Wald Estimates and a simple bivariate
regression, which is only slightly lower than OLS returns to education for the “middle-
aged cohort”. Another variation is the use of 2SLS and OLS comparisons. Here, they
assume education is endogenous and could contain bias. They replace it with various
instruments using dummy variable interactions. These instruments include an interaction
between quarter of birth and year of birth and an interaction between quarter of birth and
state born, along with numerous exogenous dummy variables to remove distorting
effects. A&K do this OLS & 2SLS comparison across all three cohorts, and even testing
for black males.
III. Replications
Here, we discuss our replications and make comparisons with A&K’s work. All
figures and tables can be found in Appendix I.
Figures I and II
These figures show us the education level of each quarter and year of birth for the cohorts
born between 1930 – 1939 and 1940-1949. We took the mean education level of
everyone in our dataset born in each quarter. As we can see, Figures I and II visually
depict that people born in quarters 3 and 4 tend to have spikes in education level,
especially in comparison to those born in quarters 1 and 2 of the next year. This is a
meaningful visualization, and it agrees with A&K’s treatment of education level and
quarter of birth.
Figure IV
Figure IV has two parts, for two cohorts. Here, we attempt to remove trends in years of
education between different cohorts. This helps us better understand the effect of season
of birth on education level. It improves upon Figures 1 and 2 by removing the extra effect
of education trends. We calculate a ‘moving average of education’ around each quarter of
birth. Then we subtract the moving average from the mean education at each quarter to
get our differentials. Again, we see that those born in quarters 1 and 2 have a negative
differential (lower education level) than those born in quarters 3 and 4. Again, our
analysis agrees with A&K’s.
Table I
This builds upon Figure 1, but calculates differentials for both cohorts across high school
graduates, college graduates, master’s grads, and PhD holders. We regress the
differentials against each quarter of birth and provide an F-test of joint probability. The
numbers slightly differ from A&K’s but all the general trends remain the same. For
instance, A&K predict a 12.4% lesser return to education for people in the 1930 cohort in
quarter 1, and we predict 15.5%. However, the crux of this test is the F-tests. The 1930
and 1940 cohorts have a high F-test for total education and an even higher F-test for high
school graduates. The F-tests for higher education are low and insignificant. Both this
study and A&K agree on this. This is strong evidence that points to compulsory
education as the major reason for variations in returns to educations.
Figure V
Figure V uses a similar methodology as Figures I and II, except now we are comparing
quarters and years of birth against the natural logarithm of weekly earnings. Essentially,
we are seeing if season of birth influences weekly earnings later in life. Again, we see
that those born in quarters 1 and 2 have weekly earnings less than those born later in the
year. A note on data: we use the “older” cohort here in order to remove effects of
increasing earnings in a younger population. Again, we agree with A&K’s results here.
Table III
This table employs a Wald estimate (another way to test the strength of a parameter
estimate). We do this because, following A&K, we want to find the return to education
based on season of birth. The Wald estimate is lower than the OLS estimate for the 1920
cohort, an higher than the OLS estimate for the 1930 cohort.
Tables IV – VI
Tables IV – VI provide a comparison of OLS and 2SLS estimates (and standard errors)
for all three of our cohorts. This shows two ways of demonstrating evidence of return to
education, using typical OLS and 2SLS (which uses instruments for ‘years of education’)
methods. It’s important to note a few differences with A&K’s findings. One of our
dummy variables, SMSA (location in city), had a negative estimate in all our outputs.
Additionally, our chi-squared statistic consistently did not match with A&K’s. Apart
from these notable differences and a few minor variations in our estimates, our general
trends remain consistent between studies.
Tables VII and VIII
These tables use the same method as Tables IV – VI, but with a few differences. Table
VII adds a dummy variable “state person was born in” as both an exogenous variable and
an instrument for education.2 For Table VIII, it provides estimates for black males from
2 Specifically, the instrument is the interaction between quarter of birth and state. They are both dummy variables.
one cohort. We find that black males born in quarter 1 have a worse return to education
than the total population. This agrees with A&K.
IV. Extensions
The extensions provide us an opportunity to do a little more analysis on A&K’s
findings.
First-stage results from 2SLS Quarter of Birth – Education model
The first-stage regressions run the endogenous variable (EDUC) and provide an F-test for
joint probability. For all three cohorts, the F-test is under 5, which means we reject the
relationship. This has serious implications for the relationship, and it signals to us that
much of the analysis must depend on the strength of the instruments.
Quarter of Birth, various relationships
We run simple bivariate regressions on quarter of birth and various variables. Extension
II can be viewed in Appendix I. Here, we regress the first quarter of birth on various
dummies and we regress the last quarter of birth on the same variables. We see that these
variables have little effect on when one was born in the year.
LIML & Fuller Estimators
LIML is an estimator that can be found in a system of simultaneous equations, i.e. 2SLS.
It measures the endogeneity of variables on the right side of a regression. The Fuller
estimator is a modification of LIML. Both estimators are k-class and they are essentially
ratios of the sum of squared residuals of both equations in the 2SLS model. A ratio near
or less than 1 provides us with evidence that the relationships are unbiased. In Extension
III (Appendix I), we provide a table of LIML and Fuller estimates from the preferred
2SLS model in Tables IV – VIII. As we can see, the LIML and Fuller estimates are all
near 1, which strengthens the overall findings (or at least provides evidence for
unbiasedness).
V. Conclusion
Through working on these models developed by A&K, we learned that much of
their original research can be replicated successfully. The biggest concern for this study is
endogeneity and instrument strength. According to our extensions, there is clear evidence
of endogeneity based on our first-stage F-tests on 2SLS regressions (endogeneity is a sign
of biasedness). However, as Tables IV – VIII demonstrate, our instruments yield high F-
tests, which may mitigate this weakness. Strong instruments are the key to eliminating
bias implied by endogeneity. The LIML and Fuller tests in Extension III actually provide
some evidence of unbiasedness, which helps strengthen the overall relationship of
interest.
A&K provide strong evidence that season of birth has a direct relationship with
returns to education. They define returns on education as total amount of education
received and weekly earnings. Further, A&K demonstrate that this relationship is
significant in high school. This leads them to make the claim that compulsory education
is a root assumption of their research. If strong instruments can mitigate the primary
concern of endogeneity, we will have ample evidence provided by A&K about the
relationship between season of birth and returns on education.
APPENDIX I.
Figure I.
Figure II.
Figure IV-1.
Figure IV-2.
Table I.
Figure V.
Table III.1920 - 1929 Cohort
Born in Q1 Born in Q2, Q3, Q4 Difference of SEs
ln (wkly wage) 5.148471 5.15745 -0.0089789Education 11.3996 11.52515 -0.1255553Wald estimate . . 0.0715133OLS estimate . . 0.0801112
1930 - 1939 CohortBorn in Q1 Born in Q2, Q3, Q4 Difference of SEs
ln (wkly wage) 5.891596 5.902695 -0.0110989Education 12.68807 12.79688 -0.1088179Wald estimate . . 0.101995OLS estimate . . 0.070851
Table IV. 1920 – 1929 cohort.
Table V. 1930 – 1939 cohort.
Table VI. 1940 – 1949 cohort.
Table VII. 1930 – 1939 cohort. Adjusted for states.
Table VIII. Black males. 1930 – 1939 cohort.
Extension II.
Extension III.Table IV Table V Table VI Table VII Table VIII
LIML 1.000099 1.000059 1.000101 1.001902 1.010613Fuller 1 1 1 1 1
Top Related