Download - Replication_Angrist & Kreuger 1991

Replicating Angrist & Kreuger: Results & Further Analysis

Ankur Singh Chawla

December 14, 2015

I. Summary

Angrist & Kreuger (hereby referred to as A&K) published an important report in

1991 studying school attendance and returns to education. In a seminal paper entitled

Does Compulsory School Attendance Affect Schooling and Earnings, they studied

whether compulsory schooling in high school has an effect on both amount of school

completed and earnings later in life. The paper used a unique methodology: it focused on

students born in different times in the year, with the assumption that high schools in most

states do not let students drop out until their sixteenth or seventeenth birthday.1 Their

results were meaningful: the study found that students born in the first quarter of the year

have a slightly lower average level of education than children born later in the year. A&K

conducted analyses to provide evidence that this “quarter of birth effect” was due to

compulsory education (i.e. high school) rather than voluntary education (i.e. college).

They also performed a number of variations on their regression analyses of season of

birth and returns to education, which we will explore in this paper. Finally, they used

instruments and the 2SLS method to eliminate endogeneity and thus bias in OLS.

In this paper, we will replicate the key analyses A&K conducted. We will

additionally provide some of our own extensions of their work. Like A&K, we will use

three successive birth cohorts upon which to perform our tests.

1 It is important to note that the strong legal incentives for compulsory education (truancy laws, etc.) is an important assumption here, as noted by A&K.

II. Empirical Strategies

A&K hypothesized that returns to education are lower for people born in the

beginning of the year. Their reasoning was the following: most high schools require

students in a given grade to have their birthday by Jan 1 of the following year. So

students who are born earlier in the year are “older” in that they reach their sixteenth

birthday more quickly than their peers. A&K provide evidence for this claim in a few

ways. First, they simply graphed ‘quarter and year of birth’ against ‘years of completed

education’ for all three cohorts. In order to remove the effect of natural increases in

education levels in the cohorts, they subtracted a moving average from ‘education

attained’ at each quarter of birth to gain education differentials at each quarter in each

year. They put these differentials in a bar graph at each quarter of birth to visualize the

effects. A&K also regressed education against each quarter of birth for high school

graduates, college grads, master’s grads, and doctorate holders. This is important because

it demonstrates that the returns to education are lower in the first quarter only for high

school grads, which supports evidence that compulsory education has an effect on returns

to education.

A&K use a series of variations on OLS to depict returns to education. They

calculate OLS returns to education using Wald Estimates and a simple bivariate

regression, which is only slightly lower than OLS returns to education for the “middle-

aged cohort”. Another variation is the use of 2SLS and OLS comparisons. Here, they

assume education is endogenous and could contain bias. They replace it with various

instruments using dummy variable interactions. These instruments include an interaction

between quarter of birth and year of birth and an interaction between quarter of birth and

state born, along with numerous exogenous dummy variables to remove distorting

effects. A&K do this OLS & 2SLS comparison across all three cohorts, and even testing

for black males.

III. Replications

Here, we discuss our replications and make comparisons with A&K’s work. All

figures and tables can be found in Appendix I.

Figures I and II

These figures show us the education level of each quarter and year of birth for the cohorts

born between 1930 – 1939 and 1940-1949. We took the mean education level of

everyone in our dataset born in each quarter. As we can see, Figures I and II visually

depict that people born in quarters 3 and 4 tend to have spikes in education level,

especially in comparison to those born in quarters 1 and 2 of the next year. This is a

meaningful visualization, and it agrees with A&K’s treatment of education level and

quarter of birth.

Figure IV

Figure IV has two parts, for two cohorts. Here, we attempt to remove trends in years of

education between different cohorts. This helps us better understand the effect of season

of birth on education level. It improves upon Figures 1 and 2 by removing the extra effect

of education trends. We calculate a ‘moving average of education’ around each quarter of

birth. Then we subtract the moving average from the mean education at each quarter to

get our differentials. Again, we see that those born in quarters 1 and 2 have a negative

differential (lower education level) than those born in quarters 3 and 4. Again, our

analysis agrees with A&K’s.

Table I

This builds upon Figure 1, but calculates differentials for both cohorts across high school

graduates, college graduates, master’s grads, and PhD holders. We regress the

differentials against each quarter of birth and provide an F-test of joint probability. The

numbers slightly differ from A&K’s but all the general trends remain the same. For

instance, A&K predict a 12.4% lesser return to education for people in the 1930 cohort in

quarter 1, and we predict 15.5%. However, the crux of this test is the F-tests. The 1930

and 1940 cohorts have a high F-test for total education and an even higher F-test for high

school graduates. The F-tests for higher education are low and insignificant. Both this

study and A&K agree on this. This is strong evidence that points to compulsory

education as the major reason for variations in returns to educations.

Figure V

Figure V uses a similar methodology as Figures I and II, except now we are comparing

quarters and years of birth against the natural logarithm of weekly earnings. Essentially,

we are seeing if season of birth influences weekly earnings later in life. Again, we see

that those born in quarters 1 and 2 have weekly earnings less than those born later in the

year. A note on data: we use the “older” cohort here in order to remove effects of

increasing earnings in a younger population. Again, we agree with A&K’s results here.

Table III

This table employs a Wald estimate (another way to test the strength of a parameter

estimate). We do this because, following A&K, we want to find the return to education

based on season of birth. The Wald estimate is lower than the OLS estimate for the 1920

cohort, an higher than the OLS estimate for the 1930 cohort.

Tables IV – VI

Tables IV – VI provide a comparison of OLS and 2SLS estimates (and standard errors)

for all three of our cohorts. This shows two ways of demonstrating evidence of return to

education, using typical OLS and 2SLS (which uses instruments for ‘years of education’)

methods. It’s important to note a few differences with A&K’s findings. One of our

dummy variables, SMSA (location in city), had a negative estimate in all our outputs.

Additionally, our chi-squared statistic consistently did not match with A&K’s. Apart

from these notable differences and a few minor variations in our estimates, our general

trends remain consistent between studies.

Tables VII and VIII

These tables use the same method as Tables IV – VI, but with a few differences. Table

VII adds a dummy variable “state person was born in” as both an exogenous variable and

an instrument for education.2 For Table VIII, it provides estimates for black males from

2 Specifically, the instrument is the interaction between quarter of birth and state. They are both dummy variables.

one cohort. We find that black males born in quarter 1 have a worse return to education

than the total population. This agrees with A&K.

IV. Extensions

The extensions provide us an opportunity to do a little more analysis on A&K’s

findings.

First-stage results from 2SLS Quarter of Birth – Education model

The first-stage regressions run the endogenous variable (EDUC) and provide an F-test for

joint probability. For all three cohorts, the F-test is under 5, which means we reject the

relationship. This has serious implications for the relationship, and it signals to us that

much of the analysis must depend on the strength of the instruments.

Quarter of Birth, various relationships

We run simple bivariate regressions on quarter of birth and various variables. Extension

II can be viewed in Appendix I. Here, we regress the first quarter of birth on various

dummies and we regress the last quarter of birth on the same variables. We see that these

variables have little effect on when one was born in the year.

LIML & Fuller Estimators

LIML is an estimator that can be found in a system of simultaneous equations, i.e. 2SLS.

It measures the endogeneity of variables on the right side of a regression. The Fuller

estimator is a modification of LIML. Both estimators are k-class and they are essentially

ratios of the sum of squared residuals of both equations in the 2SLS model. A ratio near

or less than 1 provides us with evidence that the relationships are unbiased. In Extension

III (Appendix I), we provide a table of LIML and Fuller estimates from the preferred

2SLS model in Tables IV – VIII. As we can see, the LIML and Fuller estimates are all

near 1, which strengthens the overall findings (or at least provides evidence for

unbiasedness).

V. Conclusion

Through working on these models developed by A&K, we learned that much of

their original research can be replicated successfully. The biggest concern for this study is

endogeneity and instrument strength. According to our extensions, there is clear evidence

of endogeneity based on our first-stage F-tests on 2SLS regressions (endogeneity is a sign

of biasedness). However, as Tables IV – VIII demonstrate, our instruments yield high F-

tests, which may mitigate this weakness. Strong instruments are the key to eliminating

bias implied by endogeneity. The LIML and Fuller tests in Extension III actually provide

some evidence of unbiasedness, which helps strengthen the overall relationship of

interest.

A&K provide strong evidence that season of birth has a direct relationship with

returns to education. They define returns on education as total amount of education

received and weekly earnings. Further, A&K demonstrate that this relationship is

significant in high school. This leads them to make the claim that compulsory education

is a root assumption of their research. If strong instruments can mitigate the primary

concern of endogeneity, we will have ample evidence provided by A&K about the

relationship between season of birth and returns on education.

APPENDIX I.

Figure I.

Figure II.

Figure IV-1.

Figure IV-2.

Table I.

Figure V.

Table III.1920 - 1929 Cohort

Born in Q1 Born in Q2, Q3, Q4 Difference of SEs

ln (wkly wage) 5.148471 5.15745 -0.0089789Education 11.3996 11.52515 -0.1255553Wald estimate . . 0.0715133OLS estimate . . 0.0801112

1930 - 1939 CohortBorn in Q1 Born in Q2, Q3, Q4 Difference of SEs

ln (wkly wage) 5.891596 5.902695 -0.0110989Education 12.68807 12.79688 -0.1088179Wald estimate . . 0.101995OLS estimate . . 0.070851

Table IV. 1920 – 1929 cohort.

Table V. 1930 – 1939 cohort.

Table VI. 1940 – 1949 cohort.

Table VII. 1930 – 1939 cohort. Adjusted for states.

Table VIII. Black males. 1930 – 1939 cohort.

Extension II.

Extension III.Table IV Table V Table VI Table VII Table VIII

LIML 1.000099 1.000059 1.000101 1.001902 1.010613Fuller 1 1 1 1 1