COARSENING BIAS: HOW INSTRUMENTING FOR INSTRUMENTAL ...

COARSENING BIAS: HOW INSTRUMENTING FOR

COARSENED TREATMENTS UPWARDLY BIASES

INSTRUMENTAL VARIABLE ESTIMATES

JOHN MARSHALL∗

JANUARY 2015

Political scientists increasingly use instrumental variable (IV) methods, and must of-ten choose between operationalizing their endogenous treatment variable as discreteor continuous. For theoretical and data availability reasons, researchers frequentlycoarsen treatments with multiple intensities (e.g. treating a continuous treatment asbinary). I demonstrate that such coarsening can substantially upwardly bias IV es-timates by subtly violating the exclusion restriction. However, I show that using atreatment where multiple intensities are affected by the instrument—even when a fine-grained measure of all intensities is unavailable—recovers a consistent causal estimate.These important analytical insights are illustrated using simulations and in the contextof identifying the long-run effect of high school education on voting Conservative inGreat Britain. Exploiting a major school leaving age reform to instrument for school-ing, I demonstrate that coarsening years of schooling into a dummy for completinghigh school upwardly biases the nevertheless large IV estimate by a factor of three.

∗PhD candidate, Department of Government, Harvard University. [email protected]. I thank John Bullock,Anthony Fowler, Andy Hall, Torben Iversen, Rakeen Mabud, Horacio Larreguy, Brandon Stewart, Dustin Tingley andTess Wise for illuminating discussions or useful comments.

1

[email protected]

1 Introduction

Instrumental variable (IV) techniques are now a standard part of the political scientist’s method-

ological toolkit. IV analyses have illuminated complex relationships such as the effects of democ-

racy on economic development (Acemoglu, Johnson and Robinson 2001), international trade

agreements on foreign direct investment (Buthe and Milner 2008), and campaign spending on

election outcomes (Gerber 1998). Given an appropriate instrument can identify important causal

relationships that cannot be easily disentangled, it is not surprising to find that the number of ar-

ticles published in the American Journal of Political Science and the American Political Science

Review using IV techniques has almost doubled over the last decade (see Figure 1).

[Figure 1 about here]

Best practice for using IV methods is now receiving greater scrutiny (e.g. Angrist and Pis-

chke 2008; Sovey and Green 2011). However, this article highlights a previously-unrecognized

but potentially severe source of bias: how coarsening a continuous or multi-valued (endogenous)

treatment variable can substantially upwardly bias IV estimates.1 Given that 36% of AJPS and

APSR publications since 2005 (that use IV methods) instrument for binary treatments or ordinal

treatments with few categories, this risk of bias is highly relevant.2 Such coarsening may appear

appealing when: (1) the researcher believes that coarsening aids interpretation, or avoids imposing

linearity on a statistical relationship; (2) theory suggests that the treatment effect may be non-

linear; or (3) more granular measures of the treatment are unavailable. However, using formal

proofs, simulations and an empirical application, this article clearly demonstrates that coarsening

can cause large biases in the IV context.

1Like Angrist, Imbens and Rubin (1996), I refer to the endogenous variable as the “treatment”.Although it is not random, the predicted values from the first stage are effectively random.

2I count ordinal treatments with five or fewer categories.

2

02

46

Num

ber

of a

rtic

les

publ

ishe

d

1986 1990 1994 1998 2002 2006 2010 2014

Year

AJPS APSR

Figure 1: Annual trends in the usage of instrument variable techniques in political science

Notes: Counts are based on data provided by Allison Carnegie (from Sovey and Green 2011) and theauthor’s own reading of AJPS and APSR articles. The APSR count for 2014 includes only the first three(of four) editions of each journal. Any reference to implementing an IV technique is included.

3

I first analytically characterize coarsening bias and illustrate the results using Monte Carlo sim-

ulations, before leveraging a natural discontinuity in the school leaving age to illustrate its extent

in the context of identifying high school’s effect on vote choice in Great Britain. I show that instru-

menting for completing high school over-estimates the nevertheless large causal effect of late high

school education on voting Conservative later in life by a factor of three. In general, these results

suggest that coarsening bias can explain why IV estimates are often orders of magnitude larger

than the analogous OLS or reduced form estimates. However, I show that using a linear measure

of the treatment—which consistently and robustly recovers a causal effect weighting the causal

effect at each intensity by the relative proportion of subjects induced to reach that intensity by the

instrument—represents an appealing alternative. To avoid drawing false inferences, this article

thus demonstrates that it is imperative that political scientists become aware of how coarsening

bias can substantially inflate IV estimates.

This article’s primary theoretical contribution characterizes the potential biases associated with

coarsening an (endogenous) treatment variable in an IV analysis. For ease of exposition, I focus on

instrumenting for a treatment intensity—a treatment that takes multiple values—that is coded as a

binary variable.3 Intuitively, coarsening bias arises when an instrument affects the intensity of the

underlying treatment, which in turn affects the outcome of interest, in a way that is not registered

when the treatment is operationalized as a binary variable. For example, a dummy for completing

high school would fail to register the effects of school leaving laws that encouraged some students

to stay in school for an additional year without completing high school. By grouping together

multiple years of schooling (or treatment intensities) that each affect the outcome, coarsening

falsely creates the impression that the sum of the effects at each intensity can be attributed to

completing high school. Coarsening bias therefore violates the exclusion restriction underpinning

IV estimation (e.g. Angrist, Imbens and Rubin 1996), because the instrument affects the outcome

3The argument equally applies when only a single value of an ordinal treatment is affected bythe instrument because the treatment effectively serves as a dummy variable.

4

through an avenue not captured by the dichotomous treatment variable in the first stage.

Coarsening bias is especially large when: (1) any change in treatment intensity affects the

outcome, e.g. if the treatment’s true effect is linear; and (2) when the instrument has large effects on

intensities other than the coarsened threshold. In general, only when the true effect of a treatment is

discontinuous (among the intensities affected by the instrument) and the researcher is able to both

recognize and measure the value of the treatment where this occurs, will the IV estimate associated

with a coarsened treatment variable be unbiased. These conditions constitute a significantly more

demanding condition that I refer to as the strong exclusion restriction. In many application, this is

unlikely to hold.

Both observational and experimental data are vulnerable to such coarsening bias. For exam-

ple, in the case of Pierskalla and Hollenbach (2013), who use local communication regulations to

instrument for an indicator of local cell phone coverage, favorable regulations could increase cell

phone usage without affecting the services used to define their treatment. Lacking the correct first

stage could explain why their IV estimates are 20 times larger than their OLS estimates. Similarly,

Milligan, Moretti and Oreopoulos (2004) use the compulsory school laws of U.S. states to instru-

ment for completing high school, and find that completing high school increases turnout by 30-40

percentage points and the likelihood that an individual follows politics by 40-85 percentage points.

These enormous estimates would be upwardly biased if the instrument increases education levels,

below the point of completing high school, that also increase these outcomes. In experimental stud-

ies, the extent of coarsening bias depends upon the type of endogenous variable that a randomized

instrument affects. For Gerber, Huber and Washington (2010), who in one specification instrument

for an individual’s partisan identification and find effects 15 times larger than their corresponding

OLS estimates, upward bias may occur if their randomized mailing causes voters to move toward

a political party without passing the threshold required to register a new partisan identification.4

4Gerber, Huber and Washington (2010) also consider another specification where partisanshipis coded using a 7-point scale. This is unlikely to be biased.

5

Conversely, experiments inducing respondents to uptake truly binary treatments are not affected

by this bias. For example, get-out-the-vote canvassing is unlikely to impact respondents that did

not answer the door (e.g. Gerber and Green 2000).

Fortunately, I show that a consistent causal estimate can still be recovered without requiring

that the strong exclusion restriction holds. In particular, provided that the treatment can be coded

as a multi-valued variable where the instrument affects multiple treatment intensities, two stage

least squares (2SLS) consistently estimates the local average per-unit treatment effect: the average

causal effect of a unit increase in the treatment among compliers, who only received a particular

treatment intensity because of the instrument. The formal analysis presents several important justi-

fications for using this approach, which is already standard practice when the treatment is believed

to linearly affect the outcome. First, by not coarsening the treatment, coarsening bias cannot oc-

cur. Second, although an average across intensities (weighted by the proportion of compliers at

each intensity) may not be the ideal quantity of interest, it always recovers a consistent estimate

even when the the treatment is non-linearly related to the outcome. Third, since coding the treat-

ment as a linear objects always yields a smaller estimate than coarsening, this approach is more

conservative. Fourth, by not requiring the strong exclusion restriction, it does not require that the

researcher correctly specifies the treatment’s functional form. Finally, under reasonable conditions

this approach is robust even without observing all treatment intensities.

I illustrate the extent of coarsening bias in the context of identifying the causal effect of high

school education on vote choice in Great Britain. To address the selection concern that certain

types of individual receive more education (e.g. Kam and Palmer 2008), I leverage a regression

discontinuity (RD) design comparing the first cohorts to be affected by Britain’s 1947 school leav-

ing age reform—which raised the minimum leaving age from 14 to 15—to cohorts slightly too

old to be affected. The reform induced around 40% of students to stay in school until 15, but

also induced 15% to go on to complete high school at age 16. Using this reform to instrument

for completing high school is thus liable to over-estimate high school’s political effects if an ad-

6

ditional year of education at age 15 (without completing high school) has downstream effects on

political preferences. However, instrumenting for years of schooling consistently estimates the

complier-weighted causal effect of an additional year of late high school.

The results show that while receiving late high school education does significantly increase the

likelihood that an individual votes for the Conservative party later in life, coarsening substantially

upwardly biases the IV estimate for completing high school. Instrumenting for years of schooling

in the context of a “fuzzy” RD, an additional year of late high school education increases the

probability of voting Conservative by 12 percentage points. However, using the 1947 reform to

instead instrument for completing high school suggests that completing high school somewhat

implausibly increases Conservative support by 46 percentage points. A decomposition analysis,

which ensures that this large estimate cannot be accounted for by unusual compliers, demonstrates

that coarsening is responsible for upwardly biasing the estimate for completing high school by a

factor of three.

This paper is organized as follows. Section 2 formally characterizes coarsening bias and shows

how a linear measure of the treatment can consistently estimate a weighted causal effect. Section

3 uses simulations to demonstrate the analytical results. Section 4 estimates the effect of an ad-

ditional year of schooling on Conservative voting in Britain, and calculates the bias arising from

using a dichotomous measure of high school. Section 5 concludes.

2 Formal analysis of IV with coarsened treatments

This section demonstrates the upward bias of coarsening an endogenous treatment intensity, fo-

cusing on the simplest case where there is a binary instrument and the treatment is coarsened into

a binary indicator.5 After briefly reviewing the IV assumptions in the heterogeneous potential out-

comes framework (see Angrist, Imbens and Rubin 1996), I first show how coarsening bias arises

5All results naturally extend to multi-valued instruments, multiple instruments, multi-valuedcoarsened treatments, and the inclusion of control variables.

7

from a subtle exclusion restriction violation, before analyzing how the extent of coarsening bias

depends upon the relationship between the treatment and the outcome. Finally, I provide justifica-

tions for a simple linear operationalization of the treatment intensity.

2.1 IV framework review

Instrumental variable techniques are designed to address the concern that a treatment may be cor-

related with an omitted variable that affects the outcome of interest. An instrumental variable—

which is correlated with the treatment, but does not directly affect the outcome—is used to instru-

ment for the endogenous treatment in order to isolate an exogenous effect of the treatment. Since

only variation in the treatment induced by an effectively random instrument is exploited, the causal

effect identified only pertains to compliers—observations that only received a particular treatment

intensity because they received the instrument.

Formally, denote the instrument, for each observation i ∈N ≡ {1, ...,n}, as Zi ∈ {0,1}. The

observed endogenous treatment intensity of observation i, Ti ∈ {1, ...,J}, takes one of J ordered

values/intensities. Yi is i’s observed outcome of interest. We first assume that the stable unit treat-

ment value assumption (SUTVA) holds, which requires that potential outcomes are independent of

the instruments and treatments received by other individuals. Given SUTVA, Ti(Zi = z) denotes

i’s potential outcomes of Ti conditional on receiving instrument Zi = z, while Yi(Zi = z,Ti(z) = t)

correspondingly denotes i’s potential outcomes of Yi conditional on receiving treatment intensity

Ti = t and instrument Zi = z.6


In addition to SUTVA (A1), IV estimation requires four additional assumptions. First, assume

that Zi is randomly assigned (A2), and thus that the instrument is independent of potential outcomes

6Observed outcomes relate to potential outcomes through Ti = ZiTi(1) + (1− Zi)Ti(0) andYi = ZiYi(1,Ti(1))+ (1−Zi)Yi(0,Ti(0)).

8

Instrument (Z) Treatment (T) Outcome (Y)

X

(a) Weak exclusion restriction

Instrument (Z) Coarsened Treatment (D) Outcome (Y)

X

Other treatment intensities (t≠k)

X

(b) Strong exclusion restriction

Figure 2: Graphical representation of weak and strong exclusion restrictions

9

and potential treatment intensities. Second, assume that there exists a first stage (A3), such that

the instrument affects the intensity of the treatment i receives. Third, monotonicity (A4) requires

that, for all individuals, the instrument either never decreases the treatment or never increases

the treatment. Fourth, the weak exclusion restriction requires that Zi only affects Yi through the

treatment Ti (A5). Consequently, Yi(z, t) = Yi(t) for any z. The diagram in Figure 2(a) represents

this weak exclusion restriction, marked by an “X”, graphically. The weakness of this assumption

when Ti is coarsened is explained below. These standard assumptions, which are explained in

greater detail elsewhere (e.g. Angrist, Imbens and Rubin 1996; Angrist and Pischke 2008; Sovey

and Green 2011), are formalized below.

A1. Stable unit treatment value assumption: for all i ∈N ,

(a) Ti(Zi = z,Z−i = w) = Ti(Zi = z,Z−i = w′) for all z, w, and w′,

(b) Yi(Zi = z,Ti(z) = t,Z−i = w,T−i(z) = v) = Yi(Zi = z,Ti(z) = t,Z−i = w′,T−i(z) = v′)

for all z, w, w′, v and v′,

where Z−i and T−i represent the instrument and treatment assignments for all observations

except i.

A2. Instrument independence: for all i ∈N , either Ti(0) and Ti(1) are jointly independent of

Zi.

A3. First stage: E[Ti|Zi = 1]−E[Ti|Zi = 0] 6= 0.

A4. Monotonicity: for all i ∈N , Ti(1)−Ti(0) ≥ 0 or Ti(1)−Ti(0) ≤ 0.

A5. Weak exclusion restriction: for all i ∈N , Yi(z, t) = Yi(z′, t) for any t and all z and z′.

Without loss of generality, I analyze the case where Ti(1)−Ti(0) ≥ 0.

10

2.2 Formal demonstration of coarsening bias

Coarsening bias occurs when the researcher, whether by choice or constrained by data availabil-

ity, coarsens the treatment intensity. In particular, in the hope of identifying the effect of treat-

ment intensity k > 1, Ti is partitioned by defining the indicator Dik ≡ 1(Ti ≥ k). This binary

variable—which could arise from coarsening years of schooling into completing high school—

indicates whether an individual receives at least treatment intensity k.

The researcher seeking to identify the effect of obtaining Ti = k is thus interested in estimating

the following causal quantity:

βk ≡E[Yi(k)−Yi(k−1)|Ti(1) ≥ k > Ti(0)]. (1)

βk is the local average treatment effect (LATE) of obtaining intensity k beyond only obtaining

the preceding level k− 1 for instrument compliers (i.e. individuals that only reached category k

because of the treatment). In the case of schooling, this could be the effect of completing high

school beyond completing the penultimate grade of high school. To give another example, this

could be the effect of identifying as a Republican partisan beyond leaning Republican.

As is well known, the IV framework seeks to estimate the effect of the endogenous binary

treatment Dik using the following system of equations:

Yi = βkDik + ui, (2)

Dik = γZi + εi. (3)

Equation (2) is the structural model relating Dik to the outcome Yi, while equation (3) is the first

stage regression estimating the effect of the instrument Zi on Dik.

The IV estimator of the effect of Dik on Yi, denoted β IVk , divides the reduced form estimate of

11

the effect of Zi on Yi by the first stage effect of Zi on Dik.7 Causal estimates of the reduced form

and first stage are identified under assumptions A1 and A2. Using assumptions A4 and A5, the

IV estimator can be expressed as the weighted sum of the causal effect for compliers moving from

intensity t− 1 to t for each such interval (see the proof in the Online Appendix and Angrist and

Imbens 1995):

βIVk ≡

E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Dik|Zi = 1]−E[Dik|Zi = 0]

=∑

Jt=2 ptβt

pk, (4)

where pt ≡ Pr(Ti(1)≥ t > Ti(0)) denotes the probability that an individual only reaches treatment

intensity t because they received the instrument Zi = 1, and thus represents the proportion of

compliers at treatment intensity t in the population. Analogously, pk = Pr(Ti(1) ≥ k > Ti(0)) =

E[Dik|Zi = 1]−E[Dik|Zi = 0] is the relevant first stage for reaching treatment intensity k because of

the instrument. The estimator is well-defined under A3, provided pk 6= 0. Finally, βt ≡ E[Yi(t)−

Yi(t − 1)|Ti(1) ≥ t > Ti(0)] is the LATE for compliers moving from treatment intensity t − 1 to

treatment intensity t. Collecting the effect of the treatment among compliers (the LATE, βt) at

each intensity defines the causal response function (CRF). The CRF thus describes how each level

of the treatment affects the outcome.

However, under the standard assumptions enumerated above, the IV estimator—which should

return βk, the effect of treatment intensity k for compliers—is often inconsistent. Inspection of

equation (4) shows that β IVk is only equal to βk when ∑t 6=k ptβt = 0. Therefore, β IV

k is consistent

only in four special cases. First, when the instrument only affects reaching intensity k; or pt =

0,∀t 6= k. Second, when the effect at all intensities other than k is zero; or βt = 0,∀t 6= k. Third,

when one of the preceding conditions holds for each intensity t, ensuring that ptβt = 0 for all t 6= k.

Fourth, if the direction of the effects (weighted by pt) differ across intensities but ultimately cancel

out.7Adding covariates provides similar results, while multiple instruments involves weighting

multiple Wald estimators (Angrist and Pischke 2008).

12

The following proposition summarizes these insights, formally proving that the coarsened IV

estimator only consistently estimates the LATE of obtaining treatment intensity k on Yi in the

special cases enumerated above.

Proposition 1. (Coarsening bias) Assume, in addition to assumptions A1, A2, A3, A4 and A5, that

pk > 0 holds. The coarsened IV estimator β IVk can be expressed as:

βIVk = βk +

∑t 6=k ptβt

pk. (5)

Provided sign(βk) = sign(βt) for all t 6= k where pt 6= 0, the coarsened IV estimator accentuates

the true causal effect: |βk| ≤ |β IVk |.

All proofs are provided in the Online Appendix.

Proposition 1 proves that, beyond the special cases enumerated above, the IV estimate resulting

from the coarsening of Ti is inconsistent, and thus biased even as the sample size becomes large.8

Provided that each treatment intensity causes Yi to move in the same direction, i.e. the CRF is

monotonic, equation (5) demonstrates that the IV estimate is upwardly biased (in magnitude).

Furthermore, coarsening bias is increasing in both pt/pk and |βt |, for any t 6= k. In other words,

the bias is greater when the first stage is relatively large for intensities other than k and the LATE

for intensities other than k is large.

The general inconsistency of the coarsened IV estimator, which has not previously been high-

lighted to applied researchers, reflects a subtle exclusion restriction violation that arises from coars-

ening Ti into Dik. Although assumption A5 requires that Zi does not affect Yi through avenues other

than Ti, the coarsening of Ti into Dik allows for Ti to affect Yi without going through Dik and without

violating A5. Consequently, while any effect of Zi on Yi is registered in the reduced form estimate

(the numerator of equation (4)), the first stage (the denominator) only registers cases where Zi

induces i to pass the threshold used to define the treatment indicator (e.g. moving from intensity

8IV estimators are biased but consistent in finite samples (e.g. Staiger and Stock 1997).

13

k− 1 to k). Coarsening bias thus arises if values of Ti other than k, such as years of high school

before completing high school, also affect Yi because such changes in the reduced form are not

captured in the first stage.

Consistent estimation of the causal effect of Dik generally requires stronger assumptions. In

particular, the strong exclusion restriction (A5*), such that Zi only affects Yi through Dik, is suffi-

cient.9

A5*. Strong exclusion restriction: for all i ∈N , all t such that pt 6= 0, and all z and z′,

(a) Yi(z, t) = Yi(z′, t ′) for all t, t ′ ≥ k,

(b) Yi(z, t) = Yi(z′, t ′) for all t, t ′ < k.

As illustrated in Figure 2(b), this assumption is more demanding than A5 because, in addition to

requiring that the instrument only affect the outcome by altering the intensity of the treatment, it

also requires that the researcher coarsen the treatment such that βt = 0 for all intensities t 6= k that

are affected by the instrument. In other words, the stronger assumption of A5* requires knowledge

of the functional form relating the treatment to the outcome to ensure that other levels of the

treatment are not also affecting the outcome.

The following proposition establishes the consistency of the coarsened IV estimator when A5*

holds, and thus demonstrates the importance of the seemingly minor distinction between assump-

tions A5 and A5* when Ti is coarsened into Dik.

Proposition 2. (Consistency with the strong exclusion restriction holds) If pk > 0 and assumptions

A1, A2, A3, A4, and A5* hold, then β IVk consistently estimates βk.

The economic or political effects of high school education represent an important example

where coarsening bias could arise from instrumenting for completing high school rather than a

9This captures the first, second and third of the four special cases described above. The secondis probably the most empirically relevant, given that βt typically all go in the same direction andmost instruments induce pt 6= for some t 6= k.

14

continuous measure of education. Consider an instrument that increases both the probability that

individuals complete the penultimate year of high school (i.e. pk−1 > 0) and complete high school

(i.e. pk > 0). Proposition 1 then shows that coarsening bias arises if the penultimate year of school-

ing affects the outcome of interest (e.g. βk−1 > 0). For outcomes like income, additional schooling

could easily increase labor market returns (e.g. Becker 1993). Furthermore, if increased income

affects political preferences, then political outcomes such as vote choice may also be biased. Pro-

vided we cannot find an instrument that only affects completing high school, Proposition 2 shows

that consistent estimation is only possible if the penultimate year of schooling does not affect the

outcome (i.e. βk−1 = 0).

2.3 When is coarsening bias severe?

Proposition 1 demonstrated that the extent of coarsening bias depends upon the first stage and the

CRF. Specifically, bias depends upon the LATE at different treatment intensities (βt) weighted by

the relative effects of the instrument on treatment intensities no defining the coarsening treatment

(∑t 6=k pt/pk). Since the CRF is almost never known in advance, it is essential to understand the

types of causal relationships for which the strong exclusion restriction is tenable, and what causal

quantities can be recovered when only the weak exclusion restriction is tenable.

2.3.1 Sharp jumps in the CRF

When the CRF exhibits sharp discontinuities, as exemplified in Figure 3, coarsening can be appro-

priate. Provided the researcher is able to correctly identify intensity k—the only point at which

there is a (positive) causal effect in the figure—as the key jump, then βk can be consistently esti-

mated when a suitable instrument ensures pk > 0. This works because βt = 0 for all t 6= k, and

thus the coarsened IV estimator is consistent regardless of whether pt > 0 for some other t 6= k.


15

Treatment intensity (Ti)

Out

com

e (Y

i)

●

k k+1

Figure 3: Discontinuous causal response function

16

In practice, however, it is hard to know a priori whether k correctly captures the true discon-

tinuity. In general, tipping points are not straight-forward to precisely theorize. In experiments

where subjects cannot be partially treated it is easier to determine clear cutoffs. As noted above,

in the case of Gerber and Green’s (2000) get-out-the-vote canvassing, knocking on a door is only

likely to affect respondents that opened the door to receive the treatment. But even experiments

can be hard to evaluate if there is partial compliance, such that individuals can experience some

of the treatment without being designated as treated. This could occur, for example, if some sub-

jects in a treatment group receive information about the treatment but do not ultimately receive the

treatment.

Conversely, if the researcher incorrectly surmises that k+1 is the correct threshold, at best they

fail to detect the existence of the effect of intensity k but correctly identify no effect at k+ 1. In

the simple example of Figure 3, where βt = 0 for all t 6= k, the researcher correctly concludes that

βk+1 = 0 only if their instrument does not induce subjects to reach intensity k. In other words,

pk = 0 ensures a correct estimate of a quantity that was probably not of primary interest. However,

when pk > 0 and pk+1 > 0, the coarsened IV estimator will produce the following inconsistent

estimate of the LATE at intensity k+ 1:

βIVk+1 =

pkβk

pk+1> 0. (6)

Although this estimate is approximately correct in the sense that there is a causal effect nearby,

it both wrongly attributes the effect to intensity k+ 1 and does not even consistently estimate βk

unless pk+1 = pk. This bias is increasing in the relative first stage for intensity k.

2.3.2 Linear CRFs

When the instrument affects another intensity t 6= k, the coarsening bias associated with the coars-

ened IV estimator for category k can be particularly large when the true CRF is linear. Letting the

17

causal effect associated with each interval be βt = τ 6= 0, the coarsened IV estimator yields:

βIVk = βk +

∑t 6=k pt

pkτ . (7)

Given βk = τ , coarsening bias is increasing in the first stage at intensities other than k. In particular,

more than one half of all compliers must be reaching intensity k for the estimator to yield less than

double the size of the true effect.10 This concern also increases with how close the treatment

intensity categories are to one another (i.e. increases in the number of categories J), because it

becomes increasingly implausible that any instrument could ensure pt = 0 for all t 6= k, while

∑t 6=k pt/pk becomes increasingly large.

When the causal response is linear and Ti is observable, an IV estimator replacing Dik with

Ti in the first stage—which is the standard IV estimator used when the treatment is regarded as

continuous—is more appropriate. The first stage thus regresses Ti, rather than Dik, on Zi. By using

Ti in the first stage to re-scale the reduced form estimate, this approach instead identifies the local

average per-unit treatment effect (LAPTE). This estimator is defined as:

βIVLAPT E ≡

E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Ti|Zi = 1]−E[Ti|Zi = 0]

=∑

Jt=2 ptβt

∑Jt=2 pt

. (8)

The LAPTE is a linear approximation, where the causal effect at each treatment intensity is

weighted by the number of compliers at that intensity. It is easy to see that when the true effect is

τ at each interval, it is exactly recovered by β IVLAPT E .

The LAPTE estimator can be interpreted as correcting the first stage by appropriately mea-

suring the instrument’s effect on the endogenous treatment. This approach is also more generally

robust in two attractive respects, which provide new justifications for political scientists already

10Note that∑t 6=k pt

pk=

p− pk

pk< 1,

only when pk > p/2, where p≡ ∑Jt=2 pt .

18

using this approach. First, only the weaker exclusion restriction (A5) is required for consistent

estimation of the LAPTE, even when the true causal effect is not linear. This is because the first

stage is assigned a linear form that captures all changes in Ti.

Second, the linear approach can be robust even without observing all categories. If the J

observed categories represent a coarsening of the true intervals (e.g. because T is continuous), the

linear causal effect can still be recovered provided the intervals are equally spaced.

Proposition 3. (Robustness of LAPTE to missing categories) Let only J equally-spaced categories

of Ti be observed when there are in fact αJ equally-spaced categories, where α > 1 is finite and αJ

is an integer. Denote βIV ,JLAPT E and β

IV ,αJLAPT E respectively as the IV estimators in the observed sample

(denoted by superscript J) and unobserved sample (denoted by superscript αJ). Let assumptions

A1-A5 hold, and assume pt > 0 for at least two intensities. If the effect of Ti is linear such that

β Jj = τ for all intervals j, then β

IV ,JLAPT E = αβ

IV ,αJLAPT E .

Consequently, obtaining the coefficient on the quantity of interest only requires an adjustment by

factor α to identify the average linear causal effect for any desired unit interval. A first stage

is required for at least two treatment intensities to ensure that the IV estimate averages across

coarsened categories; without estimates for two intensities to “draw a line through”, the estimates

would be equally susceptible to coarsening bias. If measurement of the treatment is sufficiently

coarse that the instrument only affects a single intensity, researchers may require a new dataset,

or a second dataset drawn from the same population that can separately estimate the first stage

(see Angrist and Krueger 1992; Franklin 1989). Otherwise, they should focus on the reduced form

estimates.

19

2.3.3 Other CRFs

In practice, the CRF is often non-linear. However, given the impracticality of implementing non-

linear IV estimation strategies,11 an important question is: when are coarsening and the LAPTE

appropriate approximations?

As demonstrated above, the coarsened IV estimate will generally be upwardly biased when the

CRF is not discontinuous at only the coarsened treatment intensity. Furthermore, it is easy to show

that coarsening always yields a coefficient at least as large as the LAPTE.12 Accordingly, if the

CRF is that in Figure 3, then the linear approach under-estimates the true causal effect at intensity

k when βt = 0 for all t 6= k but pt > 0 for some t 6= k:

βIVLAPT E =

pkβk

∑Jt=2 pt

=

(1βk

+∑t 6=k pt

pk

)−1

≤ βk (9)

Although the LAPTE does not consistently recover the effect at k, it remains a consistent estimate

of the complier-weighted average across the intervals. Consequently, where the instrument only af-

fects the first stage of interest (i.e. pt = 0,∀t 6= k) the LAPTE estimator yields an identical estimate

to the coarsened IV estimator. Furthermore, when the causal effect differs across intensities, such

a complier-weighted estimate is informative about the average effect in the particular sample. The

LAPTE is thus always a consistent causal estimate of a complier-weighted average, even though it

may not necessarily be the quantity the researcher would ideally estimate.

11Semi-parametric approaches rely on substantially stronger assumptions than typical IV esti-mation, and are challenging to estimate (e.g. Blundell, Chen and Kristensen 2007; Newey andPowell 2003).

12Comparing denominators, ∑Jt=2 pt ≥ pk if monotonicity holds.

20

3 Illustrating the analytical results: simulated data

Before examining coarsening bias in a political application with observational data, it is important

to first illustrate and verify the importance of the analytical results using purely simulated data that

can exactly control the true empirical relationships. I demonstrate that the size of the effect at each

treatment intensity on the outcome (i.e. the shape of the CRF) and the strength of the first stage at

intensity k relative to the first stage at all other intensities (∑t 6=k pt/pk) are critical in determining

the extent of coarsening bias.

3.1 Monte Carlo simulations

The analytical insights are captured in a simple simulation framework. For each simulated dataset,

I draw a random sample of 1,000 independent observations. Each observation i has a 50:50 chance

of receiving a binary instrument Zi ∼ Bernoulli(0.5). The observation’s endogenous treatment

intensity Ti is given by the nearest round number to Zi + ξi, where ξi ∼ Normal(10,σ2). The

treatment thus takes multiple integer intensities, where the mean for observations that did not

receive the instrument is 10 and the mean for observations that did is 11. By increasing the variance

parameter σ2, the instrument has a relatively larger effect on reaching treatment intensities other

than 11. Finally, the treatment is coarsened as an indicator for receiving an intensity of at least 11:

Di ≡ 1(Ti ≥ 11).

I consider two types of CRF. I first examine a linear CRF where each additional intensity

increases the outcome by 0.05 units:

Yi = 0.05∗T

∑t=1

1(Ti ≥ t)+ηi (10)

where ηi ∼ Normal(0.4,0.2) determines the location of the outcome and its variation. This is

held constant across CRFs. I also consider a discontinuous CRF where only the eleventh intensity

21

increases the outcome by 0.05 units:

Yi = 0.05∗1(Ti ≥ 11)+ηi. (11)

In both cases, the standard IV assumptions (A1-A5) hold. However, the strong exclusion restriction

(A5*) only holds for the discontinuous CRF.

To illustrate the bias associated with coarsening, I vary σ2 to examine how 2SLS estimates for

the different CRFs depend upon ∑t 6=k pt/pk (which increases in σ2). For each σ2, I examine 1,000

simulated datasets and present the coarsened IV estimates (using the indicator Di for reaching the

eleventh intensity) and the LAPTE (treating Ti as linear).13

I first analyze the linear CRF. As demonstrated in Proposition 1, Figure 4(a) shows that the bias

associated with coarsening the treatment can be substantial and is increasing linearly in ∑t 6=k pt/pk.

When only the coarsened intensity is affected by the instrument (i.e. pk 6= 0 and pt = 0,∀t 6=

k), or as ∑t 6=k pt/pk ↓ 0, the IV estimate is essentially unbiased. However, bias increases in the

relative magnitude of the effect of the instrument on other intensities. Once only 50% (25%) of

the instrument’s effect occurs at intensity k, the IV estimate is fully two (four) times larger than

the true causal effect. This reflects that fact that the instrument is inducing many changes in the

outcome through intensities that are not captured by the coarsened first stage. Conversely, Figure

4(b) shows that the LAPTE estimate correctly identifies the true causal effect regardless of which

intensities are affected by the instrument, and does not lose precision because each intensity is

equally relevant. This is not surprising given that the LAPTE is designed precisely for this case.


Second, consider a truly discontinuous CRF (like Figure 3), where the researcher correctly

identifies the discontinuity in the CRF at the eleventh intensity. Reinforcing Proposition 2, Figure

4(c) confirms that coarsening the treatment unsurprisingly produces an unbiased estimate of the

13Specifically, I consider 14 values of σ2, linearly increasing from 0.17 to 2.33.

22

0.1

.2.3

.4.5

Coa

rsen

ed e

stim

ate

0 1 2 3 4 5

St¹kpt / pk

(a) Linear CRF, coarsened IV estimate

.02

.04

.06

.08

LAP

TE

est

imat

e

0 1 2 3 4 5

St¹kpt / pk

(b) Linear CRF, LAPTE estimate

-.1

0.1

.2

Coa

rsen

ed e

stim

ate

0 1 2 3 4 5

St¹kpt / pk

(c) Discontinuous CRF, correctly coars-ened IV estimate

-.02

0.0

2.0

4.0

6.0

8

LAP

TE

est

imat

e

0 1 2 3 4 5

St¹kpt / pk

(d) Discontinuous CRF, LAPTE estimate

0.0

2.0

4.0

6

Coa

rsen

ed e

stim

ate

3 4 5 6 7 8

St¹kpt / pk

(e) Discontinuous CRF, incorrectly coars-ened IV estimate

Figure 4: Monte Carlo simulations illustrating how IV estimates of a 0.05 causal effect dependupon the CRF, coarsening, and the relative strength of the instrument on different treatment

intensities (95% confidence intervals)

Notes: Estimates at each level of ∑t 6=11 pt/pk are based on 1,000 simulated datasets containing 1,000 ob-servations. For the linear CRF, the effect of each additional intensity is 0.05. For the discontinuous CRF,the effect is 0.05 only at intensity k = 11. When correctly coarsened, the coarsened treatment is definedas Di = 1(Ti≥ k). When incorrectly coarsened, the coarsened treatment is defined as D′i = 1(Ti≥ k+1).See the main text for the underlying equations and distributions. Point estimates are the average esti-mate across simulations; the horizontal dashed line denotes the true causal effect the researcher seeks toestimate. 95% confidence intervals are based on variation within and between simulated estimates.

23

effect at the eleventh intensity when the strong exclusion restriction holds. The precision of this

estimate is decreasing in the relative first stage at other (uninformative) intensities. Turning to

the LAPTE estimate, Figure 4(d) demonstrates that—as shown analytically in equation (9)—the

LAPTE only returns the causal effect at the eleventh intensity when the instrument only affects

this intensity. The size of the LAPTE declines with ∑t 6=k pt/pk, as the relative weight attached

to compliers with zero effects increases. However, it is important to reiterate that the LAPTE

remains a consistent causal estimate—it just weights the effects at all the intensities affected by the

instrument. Furthermore, the comparison of Figures 4(c) and 4(d) shows the general results that

the LAPTE will always be smaller than the coarsened IV estimate. When the CRF is neither linear

nor discontinuous, the LAPTE thus provides both consistent and conservative estimates.

Finally, consider the same discontinuous CRF but where the researcher incorrectly coarsens the

treatment at intensity 12. Figure 4(e) shows that coarsening often incorrectly ascribes a significant

positive effect to the twelfth intensity where no such effect exists. This occurs where, in addition

to inducing subjects to receive intensity 12 (where there is no effect), the instrument also induces

subjects to receive the eleventh treatment intensity where there exists a positive effect. As the

figure shows, this is not simply an issue of interpretation because the points systematically fail to

even recover the true effect of the eleventh intensity.

3.2 Implications for applied research

The simulations clearly show that coarsening an endogenous treatment substantially upwardly bi-

ases IV estimates, except in the special cases where the instrument only induces subjects to reach

the intensity where the treatment is coarsened or the CRF is discontinuous. For most CRFs, esti-

mating the LAPTE is more appropriate. Even when the CRF is not linear, treating the endogenous

treatment as linear has four important advantages: the LAPTE (1) does not suffer from coarsening

bias, (2) always provide a consistent causal estimate, (3) does not rely on researchers correctly

specifying the intensity where the effect is large, and (4) provides a conservative estimate relative

24

to the coarsened IV estimator.

In general, researchers must rely on their theoretical intuition and descriptive data—including

the reduced form relationship, separate first stage regressions and the (endogenous) OLS relationship—

to determine the appropriate specification in the standard case where only a single instrument is

available. Under special circumstances where multiple instruments are available, I show below

that a sharper empirical assessment is possible. This will allow me to isolate the coarsening bias

associated with instrumenting for a dichotomous measure of schooling.

4 Observational application: high school education’s effect on

voting Conservative in Great Britain

This section illustrates the risk of coarsening bias using observational data. In particular, I quantify

coarsening bias in the context of examining how high school education affects vote choice in Great

Britain. Given that education has many downstream consequences for adulthood, it is potentially

one of the most important determinants of voting behavior. In particular, the increased income

associated with higher education may cause voters to support the right-wing economic policies of

Britain’s Conservative party (e.g. Meltzer and Richard 1981).

Analyses of education’s effect on electoral turnout suggest that existing IV estimates may be

vulnerable to coarsening bias. Recent field and natural experiments using IV methods have re-

ported extremely large effects of completing high school on turnout (Milligan, Moretti and Ore-

opoulos 2004; Sondheimer and Green 2010).14 These estimates could be upwardly biased if each

additional year of schooling increases turnout while the instrument increases schooling for many

students without inducing them to complete high school.

14Sondheimer and Green (2010:185) pool three U.S. field experiments that encourage studentsto complete high school, and find that “a high school dropout with a 15.6% chance of voting wouldhave a 65.2% chance of turnout if randomly induced to graduate from high school.” The equallyhuge effects reported by Milligan, Moretti and Oreopoulos (2004) are noted in the introduction.

25

To estimate the extent of coarsening bias, I use Britain’s major 1947 school leaving reform

to instrument for measures of schooling in the context of a regression discontinuity (RD) design.

Comparing essentially identical students just affected and just too old to be affected by the reform

addresses the selection concern that the types of individuals which receive more education is un-

likely to be random. The main focus is to demonstrate the bias that arises in IV estimates when

years of schooling is coarsened as an indicator for completing high school.

4.1 Britain’s 1947 reform raising the school leaving age to 15

Britain’s education laws define the maximum age by which students must start school and the

minimum age at which students can leave school. Following the landmark Education Act 1944,

the leaving age in Britain was increased from from 14 to 15 for students who reached the age

of 14 after April 1st 1947.15 The reform substantially altered the education profile of Britain’s

students. Around 40% of all students remained in school at least one year longer following the 1947

reform. Creating the risk of coarsening bias, the reform also had “knock-on” effects, increasing

the proportion staying in school until 16 and thus completing high school. However, as the top

and bottom trends lines in Figure 5 show, only these two levels of schooling were affected by the

reform.


4.2 Data and estimation

In order to examine how coarsening bias affects estimates of high school’s effect on voting Con-

servative, I utilize data from the British Election Survey (BES). I use the eight elections from 1979

15The circumstances surrounding this reform, which did not affect Northern Ireland, and educa-tional policy in Britain are described in greater detail in the Online Appendix.

26

http://www.legislation.gov.uk/ukpga/1944/31/pdfs/ukpga_19440031_en.pdf

0.2

.4.6

.81

Pro

port

ion

leav

ing

1925 1930 1935 1940 1945 1950 1955 1960 1965 1970

Cohort: year aged 14

Leave before 14 Leave before 15 Leave before 16 Leave before 17

Figure 5: 1947 compulsory schooling reform and student leaving age by cohort

Notes: Data from my BES sample (described below). Curves represent fourth-order polynomial fits.Grey dots are birth-year cohort averages, and their size reflects their weight in the sample.

27

to 2010 for which data on the outcome, instrument and treatment measures are available, and fo-

cus on working age respondents (aged below 70). This produced a maximum sample of 13,853

observations. The data and variables used in this application are described in detail in the Online

Appendix.

I code the decision to vote Conservative as an indicator for the 34% of respondents that re-

ported voting for the Conservative party at the last election. The instrument is simply an indicator

for respondents that turned 14 in 1947 or later, and were thus affected by the higher school leav-

ing age.16 The reduced form plot in Figure 6 suggests a clear upward shift in Conservative voting

among the cohorts affected by the increased leaving age. Finally, schooling—the endogenous treat-

ment variable in this application—is measured both as a linear intensity and a coarsened indicator.

The linear intensity is the number of years of schooling.17 The coarsened treatment is an indicator

for the 64% of respondents that completed high school.18


To identify the effect of late high school education on voting Conservative, I use Britain’s 1947

leaving age reform to instrument for both measures of schooling in the context of a “fuzzy” RD

design.19 In particular, whether an individual was affected by the reform is a discontinuous func-

tion of their birth year cohort (the “running variable”). However, since the reform could not force

every student to remain in school for (at least) an additional year, the 1947 reform is used as an

instrument that discontinuously increases the probability that a student received greater schooling

(Hahn, Todd and Van der Klaauw 2001).16Although month of birth is unavailable in the BES, the large first stage is very similar to Clark

and Royer’s (2013) study where monthly data were available.17Comparing university and vocational programs is difficult after beyond 18, so years of school-

ing is top-coded at 13 years. Since the 1947 reform did not affect education beyond age 16, thiscoding is inconsequential.

18Completing high school is defined as leaving school at age 16 (or later) or receiving a highschool leaving diploma for a minimum level of achievement.

19Due to the dramatic changes in education it induced, Britain’s 1947 has proved a popularinstrument among economists (e.g. Clark and Royer 2013; Oreopoulos 2006).

28

.2.3

.4.5

.6

Vot

e sh

are

1925 1930 1935 1940 1945 1950 1955 1960 1965 1970


Figure 6: Proportion voting Conservative by birth year cohort

Notes: Curves represent fourth-order polynomial fits. Grey dots are birth-year cohort averages, and theirsize reflects their weight in the sample.

29

The fuzzy RD entails estimating the following structural equation for individuals i from birth

year cohort c:

Vote Conservativeic = βSchoolingic + f (Birth yearc)+ εic, (12)

where f is a flexible function of the running variable designed to capture trends away from the

reform discontinuities. Specifically, I estimate local linear regressions (LLRs) where f includes

linear birth year trends either side of the discontinuity, and only utilizes observations within the

Imbens and Kalyanaraman (2012) optimal bandwidth (of 11.514 cohorts). To maximize the com-

parability of treated and untreated cohorts, observations are weighted according to their proximity

to the discontinuity using a triangle kernel. The Online Appendix demonstrates that the results are

insensitive to particular specification choices. As noted above, I examine two measures of school-

ing: instrumenting for years of schooling estimates the LAPTE of schooling, while instrumenting

for the indicator for completing high school aims to estimate the effect of completing high school.

Conditioning on f (Birth yearc), these estimates respectively correspond to equations (8) and (4).

The first stage regression generating exogenous variation in Schoolingic is given by:

Schoolingic = αPost 1947 reformc + f (Birth yearc)+ εic. (13)

A strong first stage implies that the instrument explains substantial variation in schooling. Testing

the exclusion of the instrument, an F statistic exceeding 10 is sufficient to ensure that “weak

instrument bias” is negligible (Staiger and Stock 1997).

Given that Figure 5 clearly suggests a strong and monotonic first stage, the key identifying

assumptions are that Conservative voting is continuous across cohorts in all covariates other than

the school leaving age at the reform discontinuity (this implies assumption A2 above) and that

the reform only affected Conservative voting by increasing years of schooling (assumption A5

above). In the Online Appendix, I offer substantial support for these assumptions using qualitative

30

evidence, balance tests around the discontinuity, and placebo reform dates.20 Crucially, consistent

estimation of the effect of completing high school also requires that the strong exclusion restriction

(assumption A5*) holds.

4.3 IV estimates and coarsening bias

The first stage estimates in Table 1 verify that the 1947 reform substantially increased schooling.

Column (1) examines years of schooling, and reinforces the graphical analysis in Figure 5 by show-

ing that the reform increased the average schooling by 0.63 years. The large F statistic indicates

a strong first stage. Turning to the dummy for completing high school, column (2) shows that—

in addition to the keeping students in school until 15—the reform also significantly increased the

probability of completing high school by 10 percentage points. This increase is highly statistically

significant (F statistic of 24.8), confirming that any substantial bias in the IV estimates does not

simply reflect a weak instrument.

[Table 1 about here]

I now examine the IV estimates in order to evaluate the impact of coarsening bias. I first in-

strument for years of schooling to estimate the LAPTE. As noted above, this provides a consistent

causal estimate weighting the effects of the penultimate and final year of high school by the pro-

portion of students induced by reform to remain in school until that stage. The LAPTE estimate

in column (3) shows that an additional year of late high school increases a compliers’s probability

of voting Conservative in later life by 12 percentage points. Reinforcing the reduced form rela-

tionship in Figure 6, this large and statistically significant coefficient provides clear evidence that

high school education increases support for the Conservative party in later life. Marshall (2015)

examines the political implications of the reform in detail and provides evidence that, by increasing

20Given the linear and coarsened treatments both require these assumptions, the effects of coars-ening can still be identified if the standard IV assumptions fail to hold.

31

Table 1: Estimates of schooling’s effect on voting Conservative

Years Completed Vote Voteof high Con. Con.

schooling schoolLLR LLR LLR IV LLR IV(1) (2) (3) (4)

Post 1947 reform 0.628*** 0.150**(0.107) (0.029)

Years of schooling 0.118**(0.053)

Completed high school 0.464**(0.209)

Observations 4,820 4,820 4,820 4,820First stage F statistic 37.5 24.8 37.5 24.8

Notes: Specifications (1) and (2) present the first stage estimates where completed high school andyears of schooling are respectively the endogenous treatment variable. Specifications (3) and (4) arerespectively the IV estimates for completed high school and years of schooling. All specifications arelocal linear regressions using a triangular kernel and the Imbens and Kalyanaraman (2012) optimalbandwidth of 11.514. Robust standard errors in parentheses. * denotes p < 0.1, ** denotes p < 0.05,*** denotes p < 0.01.

32

income and support for the policies of more right-wing economic policies, high school education

causes voters to become more conservative.

To see how coarsening the treatment variable affects the IV estimates, I turn to the case where

schooling is dichotomized as an indicator for completing high school. Column (4) reports that

voters induced to complete high school by the reform are fully 46 percentage points more likely

to vote Conservative in later life. This statistically significant estimate is almost implausibly large,

particularly when considering that around half the student population were compliers, and given

that the average effect of an additional year’s schooling is four times smaller. Furthermore, the

conditions under which coarsening bias is large—highlighted in equation (5)—appear to be sat-

isfied. First, the reform predominantly kept students in school until age 15, but did not compel

most students to complete high school (i.e. large pt/pk). Second, there are theoretical reasons

to believe that an additional year of late high school, without completing high school, still affects

income and ultimately political preferences (i.e. large βt , t 6= k). Supporting this claim, the large

reduced form effect in Figure 6 suggests that raising the leaving age must have affected more stu-

dents than just those that completed high school. Third, since both IV estimates rely on the weak

exclusion restriction, differences cannot be explained by the 1947 reform affecting vote choice

through channels other than schooling. There are thus strong reasons to believe that the estimate

for completing high school suffers from substantial coarsening bias.

Nevertheless, although such large effects of completing high school are surprising, they are not

completely infeasible for predominantly working class (and likely pro-Labour) compliers. Fur-

thermore, the huge effect of completing high school could perhaps be squared with the smaller

average effect of an additional year of schooling if the effect of staying until 15 is actually zero. To

confirm that the large estimate results from coarsening bias, I exploit Britain’s 1972 reform raising

the school leaving age from 15 to 16 to separate the effects of the penultimate and final year of

high school. Because the 1972 reform also significantly increased the proportion of students leav-

ing school at age 16 without inducing them to stay in education past 16, I can use both reforms

33

simultaneously as instruments to identify the effects of the penultimate year of high school and

completing high school.21 The availability of two instruments that only affect two levels of the

treatment is uncommon, so this represents a rare opportunity to quantity coarsening bias.

The results, presented in detail in the Online Appendix, show that completing the penultimate

year of high school increases the probability of voting Conservative later in life by 10 percent-

age points, while completing high school similarly increases Conservative voting by a further

17 percentage points. This implies that the effect of additional late high school on Conservative

voting is relatively linear. These estimates are also consistent with the LAPTE estimate of 12

percentage points for each additional year (in column (3) of Table 1). Furthermore, by failing to

return the coarsened IV estimate for completing high school of 0.46, the results demonstrate that

the strong exclusion restriction fails to hold, and therefore the coarsened IV estimate substantially

over-estimates the effect of completing high school. In fact, coarsening is responsible for upwardly

biasing the IV estimate by a factor of three.

5 Conclusion

Both theoretically and empirically, this article demonstrates that coarsening bias should be a ma-

jor concern when utilizing IV methods. Coarsening bias arises where a multi-valued treatment is

transformed into a binary indicator or short scale. Except in special cases that are likely to be rare

in applied empirical research, such coarsening subtly violates the exclusion restriction that requir-

ing the instrument only affects the outcome through the measured treatment. As demonstrated in

the case of high school’s long-run effects on vote choice in Britain, the upward bias can be consid-

erable. I show that using a binary indicator for completing high school over-estimated the causal

effect of the final year of high school on voting Conservative by nearly three times. The extent of

coarsening bias clearly has important implications for the interpretation of academic findings, but

21This is because pt ≈ 0 for all t other than leaving school at 15 and leaving school at 16.

34

could also substantially mislead policy-makers seeking to choose the most efficacious policy. Con-

sequently, as IV methods become a standard part of a political scientist’s methodological toolkit,

it is essential that researchers become aware of the risks of coarsening bias.

Coarsening bias may be common in applied research using IV methods. Coarsening an en-

dogenous treatment variable may appear to be theoretically appealing or easier to interpret, to

avoid strong linearity assumptions, or be necessitated by data unavailability. However, in gen-

eral, only when the treatment’s effects are truly discontinuous—in that only a certain level of the

treatment has a causal effect—and the researcher is able to pinpoint the specific threshold where

this causal effect occurs will coarsening produce unbiased estimates of the desired causal quantity.

Otherwise, it is more appropriate to invoking the strong exclusion restriction required to coarsen

an endogenous treatment, and instead consistently estimate the local average per-unit treatment

effect by measuring the treatment intensity using a linear treatment variable. Unlike coarsening

bias, which arises when the true causal response function is not discontinuous, this approach pro-

vides consistent causal estimates even when the underlying causal response function is not linear.

Provided that the instrument affects at least two intensities of the treatment, this approach is also

robust to failing to observe missing categories.

35

References

Acemoglu, Daron, Simon Johnson and James A. Robinson. 2001. “The Colonial Origins of Com-

parative Development: An Empirical Investigation.” American Economic Review 91(5):1369–

1401.

Angrist, Joshua D. and Alan B. Krueger. 1992. “The Effect of Age at School Entry on Educational

Attainment: An Application of Instrumental Variables with Moments from Two Samples.” Jour-

nal of the American Statistical Association 87(418):328–336.

Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average

Causal Effects in Models With Variable Treatment Intensity.” Journal of the American Statistical

Association 90(430):431–442.

Angrist, Joshua D., Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal Effects

Using Instrumental Variables.” Journal of the American Statistical Association 91(June):444–

455.

Angrist, Joshua D. and Jorn-Steffan Pischke. 2008. Mostly Harmless Econometrics: An Empiri-

cist’s Companion. Princeton, NJ: Princeton University Press.

Becker, Gary S. 1993. Human Capital: A Theoretical and Empirical Analysis, with Special Refer-

ence to Education. University of Chicago Press.

Blundell, Richard, Xiaohong Chen and Dennis Kristensen. 2007. “Semi-Nonparametric IV Esti-

mation of Shape-Invariant Engel Curves.” Econometrica 75(6):1613–1669.

Buthe, Tim and Helen V. Milner. 2008. “The politics of foreign direct investment into develop-

ing countries: increasing FDI through international trade agreements?” American Journal of

Political Science 52(4):741–762.

36

Clark, Damon and Heather Royer. 2013. “The Effect of Education on Adult Mortality and Health:

Evidence from Britain.” American Economic Review 103(6):2087–2120.

Franklin, Charles H. 1989. “Estimation across data sets: two-stage auxiliary instrumental variables

estimation (2SAIV).” Political Analysis 1(1):1–23.

Gerber, Alan. 1998. “Estimating the Effect of Campaign Spending on Senate Election Outcomes

Using Instrumental Variables.” American Political Science Review 92(2):401–411.

Gerber, Alan S. and Donald P. Green. 2000. “The Effects of Canvassing, Telephone Calls, and Di-

rect Mail on Voter Turnout: A Field Experiment.” American Political Science Review 94(3):653–

663.

Gerber, Alan S., Gregory A. Huber and Ebonya Washington. 2010. “Party affiliation, partisanship,

and political beliefs: A field experiment.” American Political Science Review 104(4):720–744.

Gillard, Derek. 2011. “Education in England: A Brief History.” Web link.

Hahn, Jinyong, Petra Todd and Wilbert Van der Klaauw. 2001. “Identification and estimation of

treatment effects with a regression-discontinuity design.” Econometrica 69(1):201–209.

Imbens, Guido W. and Karthik Kalyanaraman. 2012. “Optimal bandwidth choice for the regression

discontinuity estimator.” Review of Economic Studies 79(3):933–959.

Kam, Cindy D. and Carl L. Palmer. 2008. “Reconsidering the Effects of Education on Political

Participation.” Journal of Politics 70(3):612–631.

Marshall, John. 2015. “Education and voting Conservative: Evidence from a major schooling

reform in Great Britain.” Working paper.

McCrary, Justin. 2008. “Manipulation of the running variable in the regression discontinuity de-

sign: A density test.” Journal of Econometrics 142(2):698–714.

37

http://www.educationengland.org.uk/history/

Meltzer, Allan H. and Scott F. Richard. 1981. “A rational theory of the size of government.”

Journal of Political Economy 89:914–927.

Milligan, Kevin, Enrico Moretti and Philip Oreopoulos. 2004. “Does education improve citizen-

ship? Evidence from the United States and the United Kingdom.” Journal of Public Economics

88:1667–1695.

Newey, Whitney K. and James L. Powell. 2003. “Instrumental variable estimation of nonparamet-

ric models.” Econometrica 71(5):1565–1578.

Office for National Statistics. 2013. “Vital Statistics: Population and Health Reference Tables—

Annual Time Series Data.” Web link.

Oreopoulos, Philip. 2006. “Estimating Average and Local Average Treatment Effects of Education

when Compulsory Schooling Laws Really Matter.” American Economic Review 96(1):152–175.

Pierskalla, Jan H. and Florian M. Hollenbach. 2013. “Technology and collective action: The

effect of cell phone coverage on political violence in Africa.” American Political Science Review

107(2):207–224.

Sondheimer, Rachel M. and Donald P. Green. 2010. “Using Experiments to Estimate the Effects

of Education on Voter Turnout.” American Journal of Political Science 41(1):178–189.

Sovey, Allison J. and Donald P. Green. 2011. “Instrumental variables estimation in political sci-

ence: A readers’ guide.” American Journal of Political Science 55(1):188–200.

Staiger, Douglas and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instru-

ments.” Econometrica 65(3):557–586.

Woodin, Tom, Gary McCulloch and Steven Cowan. 2013. “Raising the participation age in

historical perspective: policy learning from the past?” British Educational Research Journal

39(4):635–653.

38

http://www.ons.gov.uk/ons/publications/re-reference-tables.html?edition=tcm%3A77-289020

Woodin, Tom, McCulloch Gary and Steven Cowan. 2013. Secondary Education and the Raising

of the School Leaving Age: Coming of Age? New York: Palgrave MacMillan.

39

6 Appendix

6.1 Proofs from main article

Proof of Proposition 1. Assumption A1 allows us to write the potential outcomes of Yi as Yi(Zi,Ti)

and potential outcomes of Ti as Ti(Zi). Without loss of generality, let A4 hold such Ti(1)−Ti(0)≥

0. (All results hold for Ti(0)− Ti(1) ≥ 0, where βT and pt are respectively redefined as βk ≡

E[Yi(k)−Yi(k−1)|Ti(0) ≥ k > Ti(1)] and pt ≡ Pr(Ti(0) ≥ t > Ti(1)).) The reduced form can be

written as:

E[Yi|Zi = 1]−E[Yi|Zi = 0] = E[Yi(1,Ti(1))−Yi(0,Ti(0))]

= E

[ J

∑t=2

I(Ti(1) ≥ t)[Yi(1, t)−Yi(1, t−1)]−

J

∑t=2

I(Ti(0) ≥ t)][Yi(0, t)−Yi(0, t−1)]

= E

[ J

∑t=2

[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Yi(t)−Yi(t−1)]]

=J

∑t=2

Pr[Ti(1) ≥ t > Ti(0)]E[Yi(t)−Yi(t−1)|Ti(1) ≥ t > Ti(0)]

=J

∑t=2

ptβt ,

where the first line uses random assignment (A2), the third uses A5 to re-write Yi(z, t) = Yi(z′, t)≡

Yi(t), and the fourth line uses A4 (which implies that Ti(1) ≥ t > Ti(0) is either 0 or 1).

In terms of potential outcomes, the coarsening treatment indicator is given by Dik(Ti(Zi = z) =

t); because assignment from t to 0 or 1, z cannot affect Dik. Using assumption A2, the first stage

40

for the coarsening indicator Dik is given by:

E[Dik|Zi = 1]−E[Dik|Zi = 0] = E

[ J

∑t=2

[Dik(t)−Dik(t−1)]I(Ti(1) ≥ t)−

J

∑t=2

I(Ti(0) ≥ t)[Dik(t)−Dik(t−1)]]

= E

[ J

∑t=2

[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Dik(t)−Dik(t−1)]]

= E

[I(Ti(1) ≥ k)− I(Ti(0) ≥ k)

]= Pr(Ti(1) ≥ k > Ti(0))

= pk,

where the third line follows from the fact that, by definition of Dik(t), only Dik(k)−Dik(k−1) = 1

(all other jumps are 0), while the fourth line follows from A4 (as above).

Combining the first stage with the reduced form, the Wald IV estimator of the LATE for Dik is:

βIVk =


=∑

Jt=2 ptβt

pk

= βk +∑

Jt=2,t 6=k ptβt

pk,

where the final line follows from simple algebra, and the final term is the additional bias of the esti-

mator (beyond finite sample bias). Assumption A3 ensures that β IVk is well-defined, by preventing

the denominator from equalling zero.

It is immediately clear that the additional bias term is positive whenever ∑Jt=2,t 6=k ptβt > 0,

given that A4 ensures pt ≥ 0,∀t. βt > 0 for all t is thus a sufficient condition for positive bias; the

bias is thus upward in magnitude if βk > 0. Similarly for ∑Jt=2,t 6=k ptβt < 0 and βk < 0. Conse-

quently, sign(βk) = sign(βt),∀t implies |βk| ≤ |β IVk |. �

41

Proof of Proposition 2. Consistency requires that ∑Jt=2,t 6=k ptβt = 0. I now show that A5* is

a sufficient condition. Both A5 and A5* entail that [Yi(z, t)−Yi(z, t − 1)] = [Yi(t)−Yi(t − 1)].

Furthermore, the definition of A5* entails that [Yi(t)−Yi(t− 1)] = 0 for all t 6= k where pt 6= 0.

Consequently,

E[Yi|Zi = 1]−E[Yi|Zi = 0] = E

[ J

∑t=2

[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Yi(t)−Yi(t−1)]]

= E

[ J

∑t=2,t 6=k

[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)]][Yi(t)−Yi(t−1)]+

E

[[I(Ti(1) ≥ k)− I(Ti(0) ≥ k)][Yi(k)−Yi(k−1)]

]= 0+ pkβk,

where the first line follows from A5 and the third line requires A5*. Under A5*, it is thus clear

that the Wald estimator then yields:

βIVk =


=pkβt

pk= βk.

Therefore, β IVk is a consistent estimator under A1, A2, A3, A4 and A5* (which implies A5). �

Proof of Proposition 3. Note, using the first part of the proof of Proposition 1, that βW ,JLAPT E =

∑Jt=2 ptβ

Jt

∑Jt=2 pt

= τ and βW ,αJLAPT E = ∑

αJt=2 ptβ

αJt

∑αJt=2 pt

= τ/α , where the linearity of the causal effect at each

intensity interval implies αβ αJt = β J

t . The result follows. �

6.2 History of school leaving age reforms in Britain

There have been three landmark pieces of legislation in the area of CSLs in the twentieth century.22

First, David Lloyd George’s Liberal government moved on the recommendations of the Lewis Re-

22This brief history borrows from Gillard (2011), Woodin, Gary and Cowan (2013, 2013) andthe relevant legislative documents.

42

port 1916 to raise the school leaving age from 13 to 14 as part of the post-WW1 reforms under

the Education Act 1918 (or Fisher Act). The Act was ambitious in that it also aimed to institution-

alize schooling until 18 and expand higher education, as well as abolish fees at state-run schools

(although secondary education beyond the age of 14 did not become free until 1944) and establish

a national schooling infrastructure. However, the change in the school leaving age was not imple-

mented by Lloyd George until the Education Act 1921, coming into effect in 1922. Although the

1918 Act had intended for further increases in the leaving age, these did not transpire for financial

reasons despite repeated attempts in the 1920s and 1930s (Oreopoulos 2006). In practice, this Act

had relatively little effect on school enrollment: most students remained in school until age 14.

Staying in school beyond 14 typically required attending a grammar school (given secondary ed-

ucation was otherwise limited in supply and poorly subsidized), which entailed significant school

fees and passing an entrance exam. Consequently, the large majority of working class families did

not send their children to secondary school.

Second, as part of the Beveridge reforms, Churchill’s wartime coalition government passed

the Education Act 1944 (or Butler Act), which increased the school leaving age from 14 to 15 in

England and Wales;23 the Education (Scotland) Act 1945 cemented the same reform in Scotland.

No such reform occurred in Northern Ireland until 1957, which is not included in the BES sur-

veys. The leaving age did not come into force until 1st April 1947, giving the education system

time to expand its operations to accommodate the changes in the system (as well as many other

new provisions under the 116-page monolith).24 However, the reform also established the new

Tripartite system (coming into force in 1945), which meant that in most parts of the country for-

mal secondary education began at 11 (rather than 14) and whether students attended a grammar,

secondary technical or secondary modern school was determined by the “eleven plus” examination

23The Education Act 1936 had determined that the age should be raised in 1939, but this did notoccur because of the onset of WW2.

24The lack of teachers was a serious concern, requiring an emergency training program in 1945to address the lack of capacity.

43

http://www.educationengland.org.uk/documents/pdfs/1918-education-act.pdf

http://www.educationengland.org.uk/documents/acts/1921-education-act.pdf



http://www.educationengland.org.uk/documents/acts/1936-education-act.pdf

0.2

.4.6

.8P

ropo

rtio

n le

avin

g

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995


Leave before 14 Leave before 15 Leave before 16 Leave before 17

Figure 7: 1972 compulsory schooling reform and student leaving age by cohort

Notes: Data from the BES. Curves represent fourth-order polynomial fits. Grey dots are birth-yearcohort averages, and their size reflects their weight in the sample.

taken at age 11.25 As shown in the main paper, although fees had already been removed in 1944

and the new Tripartite system adopted in 1945, it was not until the schooling leaving age was raised

that the dramatic increase in enrollment occurred (see also Oreopoulos 2006).

The Education Act 1944 also provided for raising the leaving age to 16 once practical. Con-

sequently the leaving age could be raised to 16 by an Order of Council.26 Conservative Prime

25The Tripartite system was abolished in England in 1976 by the Labour government, and wasreplaced by the comprehensive system which did not bifurcate students at age 11.

26An Order of Council does not require approval like an Act. It may be lain before the House ofCommons and is accepted unless a resolution is passed against it.

44

Minister Harold Macmillan presided over plans to raise the school leaving age to 16 in the Edu-

cation Act 1962, which ultimately fixed spring and summer leaving dates, although it was Con-

servative Edward Heath who finalized the update to the current system under Statutory Instrument

444 (1972). The new rule, which had been overseen by Margaret Thatcher and heavily pushed by

the Crowther Report 1959, was implemented for the academic year starting 1st September 1972 in

England and Wales. Statutory Instrument 59 (1972) raised the leaving age more flexibly in Scot-

land to allow local authorities, who were very concerned about teacher shortages (especially in

Strathclyde/Glasgow), to allow part-time schooling and early leaving in the summer terms. Conse-

quently, the 1972 reform was relatively weak for many Scottish students. The reform in Scotland

was not fully implemented until the Education Act 1976. The Education (School-leaving Dates)

Act 1976 introduced slightly more subtle leaving age rules—which are not utilized in this paper

as they require monthly birth data (as in Clark and Royer 2013). In England, Wales and Scotland

these reforms again raised education participation rates, although less dramatically than the 1944

Act (Milligan, Moretti and Oreopoulos 2004). Because the reform also introduced middle schools

in some parts of the country, which meant that students entered secondary school one year later,

the secondary population remained relatively steady. Nevertheless, as Figure 7 indicates, the pro-

portion of students staying in school until increased substantially. However, this did not carry over

to higher levels of education: the plot clearly indicates that the proportion of students leaving at

age 17 or higher was not affected by the 1972 reform.

Although the 1947 and especially 1972 reforms were implemented by Conservatives, Labour

had consistently pushed for the increases in the leaving age (Gillard 2011). This suggests that the

reforms were not politically motivated. In practice, the reforms were slow to occur, despite persis-

tent campaigning, because they posed severe infrastructural challenges and entailed considerable

cost (Woodin, McCulloch and Cowan 2013).

In 2008, Labour Prime Minister Gordon Brown passed the Education and Skills Act 2008. This

requires that by 2013 young people must remain in at least part-time education or training until age

45



http://www.legislation.gov.uk/uksi/1972/444/pdfs/uksi_19720444_en.pdf

http://www.legislation.gov.uk/uksi/1972/444/pdfs/uksi_19720444_en.pdf


http://www.educationengland.org.uk/documents/pdfs/1976-education-(school-leaving-dates)-act.pdf

http://www.educationengland.org.uk/documents/pdfs/1976-education-(school-leaving-dates)-act.pdf


17; by 2015, this rises to 18. Although regional implementation will vary, the Act applies across

the entire United Kingdom.

6.3 BES survey data

All variables are from the British Election Survey (BES). The BES uses a multi-stage design,

randomly selecting postal addresses from several wards from randomly sampled constituencies

(stratifying by region). The BES has been conducted following every general election since 1964,

although I only use the surveys since 1979 where appropriate variables are available. As noted in

the main paper, the sample is restricted to working age respondents (i.e. aged 70 or below), and

those aged 18 at the time of the survey. The sample is restricted to those aged below 70 given the

likelihood that education affects political behavior through earned income (see Marshall 2015).

Summary statistics for the RD and full samples are provided in Table 2.

• Vote Conservative. Indicator coded one for respondents who reported voting for the Conser-

vative party at the last general election. Only respondents which refused to respond, did not

answer or did not vote were excluded.

• Years of schooling. Years of schooling is calculated as the age that the respondent left full

time education minus five (the age at which students start formal schooling). Years of school-

ing is top-coded at 13 years to ensure comparability and focus on state-provided education.

Indicators for 10 and 11 years of schooling are defined according to this measure.

• Completed high school. Indicator coded one for respondents that answered that either: (1)

possess a grade 1 Certificate of Secondary Education (CSE), 5 O-levels at A-C, 5 General

Certificates of Secondary Education (GCSEs) at A*-C or a lower grade on the Scottish Cer-

tification of Education (SCE); or (2) left school at age 16 or later.

• Birth year. Birth-year is estimated by subtracting age at the date of the survey from the year

46

Tabl

e2:

Sum

mar

yst

atis

tics:

RD

and

full

BE

Ssa

mpl

es

RD

sam

ple

Full

BE

Ssa

mpl

eO

bs.

Mea

nSt

d.de

v.M

in.

Max

.O

bs.

Mea

nSt

d.de

v.M

in.

Max

.

Dep

ende

ntva

riab

leVo

teC

onse

rvat

ive

4,82

00.

409

0.49

20

113

,853

0.34

30.

475

01

End

ogen

ous

trea

tmen

tvar

iabl

esY

ears

ofsc

hool

ing

4,82

010

.317

1.41

70

1313

,853

10.9

821.

436

013

Com

plet

edhi

ghsc

hool

4,82

00.

400

0.49

00

113

,853

0.64

20.

480

01

Exc

lude

din

stru

men

tsPo

st19

47re

form

4,82

00.

589

0.49

20

113

,853

0.49

30.

500

01

Post

1972

refo

rm4,

820

0.00

00.

000

00

13,8

530.

314

0.46

40

1

Pre

-tre

atm

entc

ovar

iate

sM

ale

4,82

00.

474

0.49

90

113

,853

0.46

60.

499

01

Whi

te4,

820

0.98

40.

126

01

13,8

530.

965

0.18

30

1B

lack

4,82

00.

006

0.07

90

113

,853

0.00

90.

094

01

Asi

an4,

820

0.00

70.

084

01

13,8

530.

016

0.12

70

1A

ge4,

820

54.7

328.

323

3569

13,8

5343

.365

14.0

7318

69Fa

ther

man

ual/u

nski

lled

job

3,82

00.

724

0.44

70

19,

922

0.67

20.

470

01

Surv

ey4,

820

1988

.912

7.34

219

7920

1013

,853

1991

.697

8.92

619

7920

10B

irth

year

4,82

019

34.1

806.

465

1922

1944

13,8

5319

48.3

3216

.153

1910

1992

47

in which the survey was conducted. I then add 14 for year aged 14. Non-responses were

deleted.

• Post 1947 reform. Indicator coded one for students aged 14 or below in 1947, and aged 15

or above in 1972.

• Post 1972 reform. Indicator coded one for students aged 14 or below in 1972.

• Male. Indicator coded one for respondents identifying as male. Non-responses were deleted.

• Age. Standardized age at the date of the survey.

• Race. Indicators coded one for respondents who respectively identify their ethnicity as white,

black, or Asian (including South Asian ethnicities and Chinese).

• Father manual/unskilled job. Indicator coded one for respondent’s who answered that their

father had a manual or unskilled job.

• Survey year. Year in which the survey was conducted.

1979 survey did not ask this question, the 1983 survey used a 21-point scale that was re-

scaled to 0-10, while the 1974 survey (which is included) asked about just social services.

Respondents that answered “don’t know” were excluded.

6.4 Continuity across the discontinuity

The key “sorting” concern for RD designs is that another key variable simultaneously changes at

the discontinuity. However, selection into cohorts in Britain is implausible since parents could

not have precisely predicted CSL reforms over a decade in advance. The lack of heaping around

the 1947 reform is confirmed by annual birth rate data (Office for National Statistics 2013). Fur-

thermore, a McCrary (2008) density, shown in Figure 8, confirms that the density of the data is

48

0.0

05.0

1.0

15.0

2.0

25

Den

sity

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010


Figure 8: McCrary (2008) sorting test

indistinguishable across the reform discontinuities.27

Table 3 confirms that pre-treatment variables do not differ significantly across the 1947 reform.

In no case, is the coefficient on post 1947 reform close to being statistically significant. In each

case I use the bandwidth used for the main results presented in the main paper.

27The null hypothesis of a difference in density across the discontinuity is far from rejected.

49

Table 3: Continuity in other variables around the 1947 reform

Male White Black Asian Fathermanual/unskilled

jobLLR LLR LLR LLR LLR(1) (2) (3) (4) (5)

Post 1947 reform 0.012 0.001 0.005 -0.004 -0.030(0.032) (0.008) (0.006) (0.005) (0.030)

Observations 4,820 4,820 4,820 4,820 3,820

Notes: All specifications are local linear regressions using a triangular kernel and a bandwidth of 11.514.Robust standard errors in parentheses. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

6.5 Robustness checks

6.5.1 Regression discontinuity specification tests

I now show that the reduced form and IV estimates are robust to various specification concerns.

First, I demonstrate that the results are not sensitive to the regression discontinuity (RD) speci-

fication. Figure 9 shows that the point estimates are stable across bandwidths and the choice of

triangular or rectangular kernel, although the precision of the estimates unsurprisingly declines for

very small bandwidths.

Second, Table 4 shows that the results are robust to including quadratic or cubic polynomial

trends in the running variable (birth year) either side of the discontinuity. Since I change the order

of the polynomial, I also allow the Imbens and Kalyanaraman (2012) optimal bandwidth to adjust

accordingly. Unsurprisingly, the estimates become less precise as higher-order polynomials are

included.

Finally, I examine placebo tests treating each of the ten years prior to the 1947 reform as

a placebo reform. Figure 10 presents the results of placebo reforms. Each bar is the estimate

50

-.1

0.1

.2.3

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of 1947 reform (triangle)

-.1

0.1

.2.3

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of 1947 reform (rectangular)

-.2

0.2

.4.6

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of years of schooling (triangle)-.

20

.2.4

.6

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of years of schooling (rectangular)

-10

12

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of completing high school (triangle)

-10

12

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Bandwidth

Effect of completing high school (rectangular)

Figure 9: Robustness to choice of bandwidth and kernel

Notes: Triangle and rectangular denote the choice of kernel used for the specifications in each plot. Barsrepresent 95% confidence intervals.

51

-.15

-.1

-.05

0.0

5.1

Effe

ct o

f pla

cebo

ref

orm

on

Vot

e C

onse

rvat

ive

1937 1938 1939 1940 1941 1942 1943 1944 1945 1946

Placebo reform year

Figure 10: Effect of placebo reforms on Conservative voting

Notes: All specifications are local linear regressions using a triangular kernel and the Imbens and Kalya-naraman (2012) optimal bandwidth for each placebo reform. Bars represent robust 95% confidenceintervals.

52

Table 4: Robustness to higher-order polynomials in the running variable

Vote Vote Vote VoteCon. Con. Con. Con.

LLR IV LLR IV LLR IV LLR IV(1) (2) (3) (4)

Years of schooling 0.144** 0.158*(0.071) (0.092)

Completed high school 0.514** 0.552*(0.253) (0.319)

Running variable polynomial order quadratic cubic quadratic cubicOptimal bandwidth 15.580 20.052 15.580 20.052Observations 6,515 8,325 6,515 8,325First stage F statistic 20.5 12.3 17.6 11.4

Notes: All specifications are local linear regressions using a triangular kernel. Robust standard errors inparentheses. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.

associated with treating the corresponding year as the 1947 reform. In no prior year, including

those immediately before the reform, do we observe any evidence of an increase in Conservative

voting. This provides strong support for the concern that the results are not being driven by pre-

trend.

6.5.2 Exclusion restriction (weak)

Second, the weak exclusion restriction is violated if the 1947 reform affected political preferences

through channels other than schooling. Although political or cultural changes are unlikely to differ-

entially affect cohorts one year apart, it is possible that an additional year in school could affect life

choices—such as marriage or having children—by simply keeping students in school, but without

operating through schooling itself. Figure 11 examines these possibilities using data from Labour

Force Surveys (for the election years in the BES sample). The graphs indicate that around nei-

ther the 1947 nor 1972 reform (used to confirm coarsening bias) was there a clear change in how

cohorts born just before and after the reforms tend to have children or get married. In particular,

53

.2.4

.6.8

1P

ropo

rtio

n

1940 1960 1980 2000Cohort: year aged 14

Dependent child

68

1012

1416

Ave

rage

age

1940 1960 1980 2000Cohort: year aged 14

Age of eldest dependent child0

.51

1.5

Ave

rage

num

ber

of c

hild

ren

1940 1960 1980 2000Cohort: year aged 14

Number of dependent children

0.2

.4.6

.81

Pro

port

ion

1940 1960 1980 2000Cohort: year aged 14

Once married

Figure 11: Exclusion restriction checks

Notes: All data are from the Labor Force Surveys associated with the election years in the BES sample.Dependent children are those under the age of 19.

whether the respondent has a dependent children (defined by being under the age of 19), the age

of their eldest dependent child, their number of dependent children, and having once been married

are all effectively continuous through the reforms. These results indicate that simply remaining in

school did not substantially alter other life choices proximate to the decision to remain in school.

An alternative concern relates to the possibility that other concurrent changes differentially

affected students aged 14 before and after 1947. However, the most significant post-war changes

in the education system had already been implemented before 1947. Fees for secondary schooling

were removed in 1944, while the new Tripartite system—which formally established three types of

54

secondary school emphasizing academic, scientific and practical skills—came into force in 1945.

However, as Figure 5 indicates, these structural reforms did not affect enrollment (Oreopoulos

2006). Furthermore, prior to the 1947 reform, the government pre-emptively engaged in a major

expansion effort to maintain quality by increasing the number of teachers, buildings and classroom

materials (Woodin, McCulloch and Cowan 2013). The additional year of schooling was primarily

intended to ensure that students grasped the material they had previously been taught (Clark and

Royer 2013). Nevertheless, any reduction in schooling quality or spillover causing older cohorts

to behave more like treated cohorts would reduce between-cohort differences around the reforms,

and thus downwardly bias the estimates.

6.6 Using the 1972 school leaving reform to identify coarsening bias

As noted in the main text, the availability of two instruments offers the opportunity to precisely

estimate the extent of coarsening bias. In particular, the availability of two instruments means that

it is possible to instrument for two treatment variables—in this case completing the penultimate

year of high school (i.e. Penultimateic = 1(Schoolingic = 10)) and completing at least the final

year of high school (as defined in the main paper). Since Figure 5 in the main paper and Figure

7 respectively show that the 1947 and 1972 reforms did not affect students leaving at older or

younger ages, the reforms only affected whether students remained in school for the penultimate

or final year of high school. Consequently, the first stage at all other levels of the treatment is zero

(i.e. pt = 0). Therefore, instrumenting for two indicator variables—completing the penultimate

and final years of high school—does not suffer from coarsening bias because the special case

that pt = 0 at all other t is satisfied. This enables me estimation of the causal effect βt for both

additional years of high school, and thus to exactly identify the effect of completing high school.

This latter quantity can be then be compared to the estimate for completing high school in the main

paper to adduce the extent of coarsening bias.

To simultaneously estimate the effects of the penultimate and final year of high school, I use

55

2SLS to estimate the following structural equation:

Vote Conservativeic = β1Penultimateic +β2Completed high schoolic + f (Birth yearc)+ εic, (14)

where the first stage regressions generating exogenous variation in educational attainment are given

by:

Penultimateic = α1Post 1947 reformc +α2Post 1972 reformc + f (Birth yearc)+ εic(15)

Completed high schoolic = γ1Post 1947 reformc + γ2Post 1972 reformc + f (Birth yearc)+ εic.(16)

Because I now use two discontinuities as instruments, this system cannot be estimated using local

linear regression, as in the main paper. However, I adopt a similar approach by using the full

BES sample and letting f include cubic global polyomials in the running variable (birth year). I

use flexible polynomials in order to capture general trends in Conservative support across cohorts

around both reforms, and thus mimic the local linear approach in the main paper.28

Table 5 reports the reduced form and IV estimates when exploiting both reforms simultane-

ously. Column (1) presents the reduced form estimates, and demonstrates that both reforms sig-

nificantly increase the probability that an individual votes Conservative. Consistent with the 1947

reform having a larger effect on education, the 1947 reform also has a larger effect on Conservative

voting: the 1947 increased Conservative voting per cohort by 6 percentage points, while the 1972

reform added a further two percentage points. Column (2) estimates the LAPTE for an additional

year of schooling by using the 1947 and 1972 reforms as instruments for years of schooling. Sim-

ilarly to the main paper, which reports a 12 percentage point increase, the estimates suggest that

an additional year increases the probability of voting Conservative later in life by 13 percentage

28Encouragingly, column (1) of Table 5 shows that the cubic polynomials report a similar esti-mate for the 1947 reform as the local linear regression approach in the main paper. Nevertheless, Ifound very similar results when using fourth, fifth, sixth and seventh-order polynomials.

56

Table 5: Using the 1972 reform to identify the extent of coarsening bias

Vote Vote VoteCon. Con. Con.OLS 2SLS 2SLS(1) (2) (3)

Post 1947 reform 0.059***(0.021)

Post 1972 reform 0.079***(0.030)

Years of schooling 0.130***(0.046)

Penultimate year of high school 0.098**(0.039)

Completed high school 0.269**(0.124)

Observations 13,853 13,853 13,853First stage F statistic 28.5 658.4/52.1

Notes: All specifications include standardized cubic polynomials in birth year. Robust standard errorsin parentheses (which are always larger than errors clustered by cohort). * denotes p < 0.1, ** denotesp < 0.05, *** denotes p < 0.01.

57

points. Furthermore, a Sargan overidentification χ21 test fails to reject the null hypothesis that the

instruments produce different IV estimates (p = 0.70). This provides evidence that both reforms

had similar effects on voters, and again suggests that an additional year of compulsory educa-

tion substantially increases the probability of voting Conservative. The similarity of the effects

means that despite using two different instruments, I am able to plausibility use both instruments

to separate out the effects of the penultimate and final years of schooling.

I now turn to the key element of this exercise: determining the extent of coarsening bias. Col-

umn (3) estimates equation (14), and reports the estimates for completing the penultimate and final

year of high school. The results indicate that the penultimate year of high school increases Conser-

vative voting by 10 percentage points, while completing high school adds a further 17 percentage

points. This implies that the effect of completing high school, over completing a lower level of

schooling, is to increase the probability of voting Conservative by 17 percentage points. Therefore,

this demonstrates that coarsening bias almost triples the size of this estimate (46 percentage points

in the main paper). Although the coefficients come from different samples, plugging the estimates

for βt and pt into equation (4) in the main paper yields an IV estimate of around 0.5. This is similar

to the 0.46 estimate in the main paper.

58

COARSENING BIAS: HOW INSTRUMENTING FOR INSTRUMENTAL ...

Documents

Transcript of COARSENING BIAS: HOW INSTRUMENTING FOR INSTRUMENTAL ...