COARSENING BIAS: HOW INSTRUMENTING FOR INSTRUMENTAL ...
Transcript of COARSENING BIAS: HOW INSTRUMENTING FOR INSTRUMENTAL ...
COARSENING BIAS: HOW INSTRUMENTING FOR
COARSENED TREATMENTS UPWARDLY BIASES
INSTRUMENTAL VARIABLE ESTIMATES
JOHN MARSHALL∗
JANUARY 2015
Political scientists increasingly use instrumental variable (IV) methods, and must of-ten choose between operationalizing their endogenous treatment variable as discreteor continuous. For theoretical and data availability reasons, researchers frequentlycoarsen treatments with multiple intensities (e.g. treating a continuous treatment asbinary). I demonstrate that such coarsening can substantially upwardly bias IV es-timates by subtly violating the exclusion restriction. However, I show that using atreatment where multiple intensities are affected by the instrument—even when a fine-grained measure of all intensities is unavailable—recovers a consistent causal estimate.These important analytical insights are illustrated using simulations and in the contextof identifying the long-run effect of high school education on voting Conservative inGreat Britain. Exploiting a major school leaving age reform to instrument for school-ing, I demonstrate that coarsening years of schooling into a dummy for completinghigh school upwardly biases the nevertheless large IV estimate by a factor of three.
∗PhD candidate, Department of Government, Harvard University. [email protected]. I thank John Bullock,Anthony Fowler, Andy Hall, Torben Iversen, Rakeen Mabud, Horacio Larreguy, Brandon Stewart, Dustin Tingley andTess Wise for illuminating discussions or useful comments.
1
1 Introduction
Instrumental variable (IV) techniques are now a standard part of the political scientist’s method-
ological toolkit. IV analyses have illuminated complex relationships such as the effects of democ-
racy on economic development (Acemoglu, Johnson and Robinson 2001), international trade
agreements on foreign direct investment (Buthe and Milner 2008), and campaign spending on
election outcomes (Gerber 1998). Given an appropriate instrument can identify important causal
relationships that cannot be easily disentangled, it is not surprising to find that the number of ar-
ticles published in the American Journal of Political Science and the American Political Science
Review using IV techniques has almost doubled over the last decade (see Figure 1).
[Figure 1 about here]
Best practice for using IV methods is now receiving greater scrutiny (e.g. Angrist and Pis-
chke 2008; Sovey and Green 2011). However, this article highlights a previously-unrecognized
but potentially severe source of bias: how coarsening a continuous or multi-valued (endogenous)
treatment variable can substantially upwardly bias IV estimates.1 Given that 36% of AJPS and
APSR publications since 2005 (that use IV methods) instrument for binary treatments or ordinal
treatments with few categories, this risk of bias is highly relevant.2 Such coarsening may appear
appealing when: (1) the researcher believes that coarsening aids interpretation, or avoids imposing
linearity on a statistical relationship; (2) theory suggests that the treatment effect may be non-
linear; or (3) more granular measures of the treatment are unavailable. However, using formal
proofs, simulations and an empirical application, this article clearly demonstrates that coarsening
can cause large biases in the IV context.
1Like Angrist, Imbens and Rubin (1996), I refer to the endogenous variable as the “treatment”.Although it is not random, the predicted values from the first stage are effectively random.
2I count ordinal treatments with five or fewer categories.
2
02
46
Num
ber
of a
rtic
les
publ
ishe
d
1986 1990 1994 1998 2002 2006 2010 2014
Year
AJPS APSR
Figure 1: Annual trends in the usage of instrument variable techniques in political science
Notes: Counts are based on data provided by Allison Carnegie (from Sovey and Green 2011) and theauthor’s own reading of AJPS and APSR articles. The APSR count for 2014 includes only the first three(of four) editions of each journal. Any reference to implementing an IV technique is included.
3
I first analytically characterize coarsening bias and illustrate the results using Monte Carlo sim-
ulations, before leveraging a natural discontinuity in the school leaving age to illustrate its extent
in the context of identifying high school’s effect on vote choice in Great Britain. I show that instru-
menting for completing high school over-estimates the nevertheless large causal effect of late high
school education on voting Conservative later in life by a factor of three. In general, these results
suggest that coarsening bias can explain why IV estimates are often orders of magnitude larger
than the analogous OLS or reduced form estimates. However, I show that using a linear measure
of the treatment—which consistently and robustly recovers a causal effect weighting the causal
effect at each intensity by the relative proportion of subjects induced to reach that intensity by the
instrument—represents an appealing alternative. To avoid drawing false inferences, this article
thus demonstrates that it is imperative that political scientists become aware of how coarsening
bias can substantially inflate IV estimates.
This article’s primary theoretical contribution characterizes the potential biases associated with
coarsening an (endogenous) treatment variable in an IV analysis. For ease of exposition, I focus on
instrumenting for a treatment intensity—a treatment that takes multiple values—that is coded as a
binary variable.3 Intuitively, coarsening bias arises when an instrument affects the intensity of the
underlying treatment, which in turn affects the outcome of interest, in a way that is not registered
when the treatment is operationalized as a binary variable. For example, a dummy for completing
high school would fail to register the effects of school leaving laws that encouraged some students
to stay in school for an additional year without completing high school. By grouping together
multiple years of schooling (or treatment intensities) that each affect the outcome, coarsening
falsely creates the impression that the sum of the effects at each intensity can be attributed to
completing high school. Coarsening bias therefore violates the exclusion restriction underpinning
IV estimation (e.g. Angrist, Imbens and Rubin 1996), because the instrument affects the outcome
3The argument equally applies when only a single value of an ordinal treatment is affected bythe instrument because the treatment effectively serves as a dummy variable.
4
through an avenue not captured by the dichotomous treatment variable in the first stage.
Coarsening bias is especially large when: (1) any change in treatment intensity affects the
outcome, e.g. if the treatment’s true effect is linear; and (2) when the instrument has large effects on
intensities other than the coarsened threshold. In general, only when the true effect of a treatment is
discontinuous (among the intensities affected by the instrument) and the researcher is able to both
recognize and measure the value of the treatment where this occurs, will the IV estimate associated
with a coarsened treatment variable be unbiased. These conditions constitute a significantly more
demanding condition that I refer to as the strong exclusion restriction. In many application, this is
unlikely to hold.
Both observational and experimental data are vulnerable to such coarsening bias. For exam-
ple, in the case of Pierskalla and Hollenbach (2013), who use local communication regulations to
instrument for an indicator of local cell phone coverage, favorable regulations could increase cell
phone usage without affecting the services used to define their treatment. Lacking the correct first
stage could explain why their IV estimates are 20 times larger than their OLS estimates. Similarly,
Milligan, Moretti and Oreopoulos (2004) use the compulsory school laws of U.S. states to instru-
ment for completing high school, and find that completing high school increases turnout by 30-40
percentage points and the likelihood that an individual follows politics by 40-85 percentage points.
These enormous estimates would be upwardly biased if the instrument increases education levels,
below the point of completing high school, that also increase these outcomes. In experimental stud-
ies, the extent of coarsening bias depends upon the type of endogenous variable that a randomized
instrument affects. For Gerber, Huber and Washington (2010), who in one specification instrument
for an individual’s partisan identification and find effects 15 times larger than their corresponding
OLS estimates, upward bias may occur if their randomized mailing causes voters to move toward
a political party without passing the threshold required to register a new partisan identification.4
4Gerber, Huber and Washington (2010) also consider another specification where partisanshipis coded using a 7-point scale. This is unlikely to be biased.
5
Conversely, experiments inducing respondents to uptake truly binary treatments are not affected
by this bias. For example, get-out-the-vote canvassing is unlikely to impact respondents that did
not answer the door (e.g. Gerber and Green 2000).
Fortunately, I show that a consistent causal estimate can still be recovered without requiring
that the strong exclusion restriction holds. In particular, provided that the treatment can be coded
as a multi-valued variable where the instrument affects multiple treatment intensities, two stage
least squares (2SLS) consistently estimates the local average per-unit treatment effect: the average
causal effect of a unit increase in the treatment among compliers, who only received a particular
treatment intensity because of the instrument. The formal analysis presents several important justi-
fications for using this approach, which is already standard practice when the treatment is believed
to linearly affect the outcome. First, by not coarsening the treatment, coarsening bias cannot oc-
cur. Second, although an average across intensities (weighted by the proportion of compliers at
each intensity) may not be the ideal quantity of interest, it always recovers a consistent estimate
even when the the treatment is non-linearly related to the outcome. Third, since coding the treat-
ment as a linear objects always yields a smaller estimate than coarsening, this approach is more
conservative. Fourth, by not requiring the strong exclusion restriction, it does not require that the
researcher correctly specifies the treatment’s functional form. Finally, under reasonable conditions
this approach is robust even without observing all treatment intensities.
I illustrate the extent of coarsening bias in the context of identifying the causal effect of high
school education on vote choice in Great Britain. To address the selection concern that certain
types of individual receive more education (e.g. Kam and Palmer 2008), I leverage a regression
discontinuity (RD) design comparing the first cohorts to be affected by Britain’s 1947 school leav-
ing age reform—which raised the minimum leaving age from 14 to 15—to cohorts slightly too
old to be affected. The reform induced around 40% of students to stay in school until 15, but
also induced 15% to go on to complete high school at age 16. Using this reform to instrument
for completing high school is thus liable to over-estimate high school’s political effects if an ad-
6
ditional year of education at age 15 (without completing high school) has downstream effects on
political preferences. However, instrumenting for years of schooling consistently estimates the
complier-weighted causal effect of an additional year of late high school.
The results show that while receiving late high school education does significantly increase the
likelihood that an individual votes for the Conservative party later in life, coarsening substantially
upwardly biases the IV estimate for completing high school. Instrumenting for years of schooling
in the context of a “fuzzy” RD, an additional year of late high school education increases the
probability of voting Conservative by 12 percentage points. However, using the 1947 reform to
instead instrument for completing high school suggests that completing high school somewhat
implausibly increases Conservative support by 46 percentage points. A decomposition analysis,
which ensures that this large estimate cannot be accounted for by unusual compliers, demonstrates
that coarsening is responsible for upwardly biasing the estimate for completing high school by a
factor of three.
This paper is organized as follows. Section 2 formally characterizes coarsening bias and shows
how a linear measure of the treatment can consistently estimate a weighted causal effect. Section
3 uses simulations to demonstrate the analytical results. Section 4 estimates the effect of an ad-
ditional year of schooling on Conservative voting in Britain, and calculates the bias arising from
using a dichotomous measure of high school. Section 5 concludes.
2 Formal analysis of IV with coarsened treatments
This section demonstrates the upward bias of coarsening an endogenous treatment intensity, fo-
cusing on the simplest case where there is a binary instrument and the treatment is coarsened into
a binary indicator.5 After briefly reviewing the IV assumptions in the heterogeneous potential out-
comes framework (see Angrist, Imbens and Rubin 1996), I first show how coarsening bias arises
5All results naturally extend to multi-valued instruments, multiple instruments, multi-valuedcoarsened treatments, and the inclusion of control variables.
7
from a subtle exclusion restriction violation, before analyzing how the extent of coarsening bias
depends upon the relationship between the treatment and the outcome. Finally, I provide justifica-
tions for a simple linear operationalization of the treatment intensity.
2.1 IV framework review
Instrumental variable techniques are designed to address the concern that a treatment may be cor-
related with an omitted variable that affects the outcome of interest. An instrumental variable—
which is correlated with the treatment, but does not directly affect the outcome—is used to instru-
ment for the endogenous treatment in order to isolate an exogenous effect of the treatment. Since
only variation in the treatment induced by an effectively random instrument is exploited, the causal
effect identified only pertains to compliers—observations that only received a particular treatment
intensity because they received the instrument.
Formally, denote the instrument, for each observation i ∈N ≡ {1, ...,n}, as Zi ∈ {0,1}. The
observed endogenous treatment intensity of observation i, Ti ∈ {1, ...,J}, takes one of J ordered
values/intensities. Yi is i’s observed outcome of interest. We first assume that the stable unit treat-
ment value assumption (SUTVA) holds, which requires that potential outcomes are independent of
the instruments and treatments received by other individuals. Given SUTVA, Ti(Zi = z) denotes
i’s potential outcomes of Ti conditional on receiving instrument Zi = z, while Yi(Zi = z,Ti(z) = t)
correspondingly denotes i’s potential outcomes of Yi conditional on receiving treatment intensity
Ti = t and instrument Zi = z.6
[Figure 2 about here]
In addition to SUTVA (A1), IV estimation requires four additional assumptions. First, assume
that Zi is randomly assigned (A2), and thus that the instrument is independent of potential outcomes
6Observed outcomes relate to potential outcomes through Ti = ZiTi(1) + (1− Zi)Ti(0) andYi = ZiYi(1,Ti(1))+ (1−Zi)Yi(0,Ti(0)).
8
Instrument (Z) Treatment (T) Outcome (Y)
X
(a) Weak exclusion restriction
Instrument (Z) Coarsened Treatment (D) Outcome (Y)
X
Other treatment intensities (t≠k)
X
(b) Strong exclusion restriction
Figure 2: Graphical representation of weak and strong exclusion restrictions
9
and potential treatment intensities. Second, assume that there exists a first stage (A3), such that
the instrument affects the intensity of the treatment i receives. Third, monotonicity (A4) requires
that, for all individuals, the instrument either never decreases the treatment or never increases
the treatment. Fourth, the weak exclusion restriction requires that Zi only affects Yi through the
treatment Ti (A5). Consequently, Yi(z, t) = Yi(t) for any z. The diagram in Figure 2(a) represents
this weak exclusion restriction, marked by an “X”, graphically. The weakness of this assumption
when Ti is coarsened is explained below. These standard assumptions, which are explained in
greater detail elsewhere (e.g. Angrist, Imbens and Rubin 1996; Angrist and Pischke 2008; Sovey
and Green 2011), are formalized below.
A1. Stable unit treatment value assumption: for all i ∈N ,
(a) Ti(Zi = z,Z−i = w) = Ti(Zi = z,Z−i = w′) for all z, w, and w′,
(b) Yi(Zi = z,Ti(z) = t,Z−i = w,T−i(z) = v) = Yi(Zi = z,Ti(z) = t,Z−i = w′,T−i(z) = v′)
for all z, w, w′, v and v′,
where Z−i and T−i represent the instrument and treatment assignments for all observations
except i.
A2. Instrument independence: for all i ∈N , either Ti(0) and Ti(1) are jointly independent of
Zi.
A3. First stage: E[Ti|Zi = 1]−E[Ti|Zi = 0] 6= 0.
A4. Monotonicity: for all i ∈N , Ti(1)−Ti(0) ≥ 0 or Ti(1)−Ti(0) ≤ 0.
A5. Weak exclusion restriction: for all i ∈N , Yi(z, t) = Yi(z′, t) for any t and all z and z′.
Without loss of generality, I analyze the case where Ti(1)−Ti(0) ≥ 0.
10
2.2 Formal demonstration of coarsening bias
Coarsening bias occurs when the researcher, whether by choice or constrained by data availabil-
ity, coarsens the treatment intensity. In particular, in the hope of identifying the effect of treat-
ment intensity k > 1, Ti is partitioned by defining the indicator Dik ≡ 1(Ti ≥ k). This binary
variable—which could arise from coarsening years of schooling into completing high school—
indicates whether an individual receives at least treatment intensity k.
The researcher seeking to identify the effect of obtaining Ti = k is thus interested in estimating
the following causal quantity:
βk ≡E[Yi(k)−Yi(k−1)|Ti(1) ≥ k > Ti(0)]. (1)
βk is the local average treatment effect (LATE) of obtaining intensity k beyond only obtaining
the preceding level k− 1 for instrument compliers (i.e. individuals that only reached category k
because of the treatment). In the case of schooling, this could be the effect of completing high
school beyond completing the penultimate grade of high school. To give another example, this
could be the effect of identifying as a Republican partisan beyond leaning Republican.
As is well known, the IV framework seeks to estimate the effect of the endogenous binary
treatment Dik using the following system of equations:
Yi = βkDik + ui, (2)
Dik = γZi + εi. (3)
Equation (2) is the structural model relating Dik to the outcome Yi, while equation (3) is the first
stage regression estimating the effect of the instrument Zi on Dik.
The IV estimator of the effect of Dik on Yi, denoted β IVk , divides the reduced form estimate of
11
the effect of Zi on Yi by the first stage effect of Zi on Dik.7 Causal estimates of the reduced form
and first stage are identified under assumptions A1 and A2. Using assumptions A4 and A5, the
IV estimator can be expressed as the weighted sum of the causal effect for compliers moving from
intensity t− 1 to t for each such interval (see the proof in the Online Appendix and Angrist and
Imbens 1995):
βIVk ≡
E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Dik|Zi = 1]−E[Dik|Zi = 0]
=∑
Jt=2 ptβt
pk, (4)
where pt ≡ Pr(Ti(1)≥ t > Ti(0)) denotes the probability that an individual only reaches treatment
intensity t because they received the instrument Zi = 1, and thus represents the proportion of
compliers at treatment intensity t in the population. Analogously, pk = Pr(Ti(1) ≥ k > Ti(0)) =
E[Dik|Zi = 1]−E[Dik|Zi = 0] is the relevant first stage for reaching treatment intensity k because of
the instrument. The estimator is well-defined under A3, provided pk 6= 0. Finally, βt ≡ E[Yi(t)−
Yi(t − 1)|Ti(1) ≥ t > Ti(0)] is the LATE for compliers moving from treatment intensity t − 1 to
treatment intensity t. Collecting the effect of the treatment among compliers (the LATE, βt) at
each intensity defines the causal response function (CRF). The CRF thus describes how each level
of the treatment affects the outcome.
However, under the standard assumptions enumerated above, the IV estimator—which should
return βk, the effect of treatment intensity k for compliers—is often inconsistent. Inspection of
equation (4) shows that β IVk is only equal to βk when ∑t 6=k ptβt = 0. Therefore, β IV
k is consistent
only in four special cases. First, when the instrument only affects reaching intensity k; or pt =
0,∀t 6= k. Second, when the effect at all intensities other than k is zero; or βt = 0,∀t 6= k. Third,
when one of the preceding conditions holds for each intensity t, ensuring that ptβt = 0 for all t 6= k.
Fourth, if the direction of the effects (weighted by pt) differ across intensities but ultimately cancel
out.7Adding covariates provides similar results, while multiple instruments involves weighting
multiple Wald estimators (Angrist and Pischke 2008).
12
The following proposition summarizes these insights, formally proving that the coarsened IV
estimator only consistently estimates the LATE of obtaining treatment intensity k on Yi in the
special cases enumerated above.
Proposition 1. (Coarsening bias) Assume, in addition to assumptions A1, A2, A3, A4 and A5, that
pk > 0 holds. The coarsened IV estimator β IVk can be expressed as:
βIVk = βk +
∑t 6=k ptβt
pk. (5)
Provided sign(βk) = sign(βt) for all t 6= k where pt 6= 0, the coarsened IV estimator accentuates
the true causal effect: |βk| ≤ |β IVk |.
All proofs are provided in the Online Appendix.
Proposition 1 proves that, beyond the special cases enumerated above, the IV estimate resulting
from the coarsening of Ti is inconsistent, and thus biased even as the sample size becomes large.8
Provided that each treatment intensity causes Yi to move in the same direction, i.e. the CRF is
monotonic, equation (5) demonstrates that the IV estimate is upwardly biased (in magnitude).
Furthermore, coarsening bias is increasing in both pt/pk and |βt |, for any t 6= k. In other words,
the bias is greater when the first stage is relatively large for intensities other than k and the LATE
for intensities other than k is large.
The general inconsistency of the coarsened IV estimator, which has not previously been high-
lighted to applied researchers, reflects a subtle exclusion restriction violation that arises from coars-
ening Ti into Dik. Although assumption A5 requires that Zi does not affect Yi through avenues other
than Ti, the coarsening of Ti into Dik allows for Ti to affect Yi without going through Dik and without
violating A5. Consequently, while any effect of Zi on Yi is registered in the reduced form estimate
(the numerator of equation (4)), the first stage (the denominator) only registers cases where Zi
induces i to pass the threshold used to define the treatment indicator (e.g. moving from intensity
8IV estimators are biased but consistent in finite samples (e.g. Staiger and Stock 1997).
13
k− 1 to k). Coarsening bias thus arises if values of Ti other than k, such as years of high school
before completing high school, also affect Yi because such changes in the reduced form are not
captured in the first stage.
Consistent estimation of the causal effect of Dik generally requires stronger assumptions. In
particular, the strong exclusion restriction (A5*), such that Zi only affects Yi through Dik, is suffi-
cient.9
A5*. Strong exclusion restriction: for all i ∈N , all t such that pt 6= 0, and all z and z′,
(a) Yi(z, t) = Yi(z′, t ′) for all t, t ′ ≥ k,
(b) Yi(z, t) = Yi(z′, t ′) for all t, t ′ < k.
As illustrated in Figure 2(b), this assumption is more demanding than A5 because, in addition to
requiring that the instrument only affect the outcome by altering the intensity of the treatment, it
also requires that the researcher coarsen the treatment such that βt = 0 for all intensities t 6= k that
are affected by the instrument. In other words, the stronger assumption of A5* requires knowledge
of the functional form relating the treatment to the outcome to ensure that other levels of the
treatment are not also affecting the outcome.
The following proposition establishes the consistency of the coarsened IV estimator when A5*
holds, and thus demonstrates the importance of the seemingly minor distinction between assump-
tions A5 and A5* when Ti is coarsened into Dik.
Proposition 2. (Consistency with the strong exclusion restriction holds) If pk > 0 and assumptions
A1, A2, A3, A4, and A5* hold, then β IVk consistently estimates βk.
The economic or political effects of high school education represent an important example
where coarsening bias could arise from instrumenting for completing high school rather than a
9This captures the first, second and third of the four special cases described above. The secondis probably the most empirically relevant, given that βt typically all go in the same direction andmost instruments induce pt 6= for some t 6= k.
14
continuous measure of education. Consider an instrument that increases both the probability that
individuals complete the penultimate year of high school (i.e. pk−1 > 0) and complete high school
(i.e. pk > 0). Proposition 1 then shows that coarsening bias arises if the penultimate year of school-
ing affects the outcome of interest (e.g. βk−1 > 0). For outcomes like income, additional schooling
could easily increase labor market returns (e.g. Becker 1993). Furthermore, if increased income
affects political preferences, then political outcomes such as vote choice may also be biased. Pro-
vided we cannot find an instrument that only affects completing high school, Proposition 2 shows
that consistent estimation is only possible if the penultimate year of schooling does not affect the
outcome (i.e. βk−1 = 0).
2.3 When is coarsening bias severe?
Proposition 1 demonstrated that the extent of coarsening bias depends upon the first stage and the
CRF. Specifically, bias depends upon the LATE at different treatment intensities (βt) weighted by
the relative effects of the instrument on treatment intensities no defining the coarsening treatment
(∑t 6=k pt/pk). Since the CRF is almost never known in advance, it is essential to understand the
types of causal relationships for which the strong exclusion restriction is tenable, and what causal
quantities can be recovered when only the weak exclusion restriction is tenable.
2.3.1 Sharp jumps in the CRF
When the CRF exhibits sharp discontinuities, as exemplified in Figure 3, coarsening can be appro-
priate. Provided the researcher is able to correctly identify intensity k—the only point at which
there is a (positive) causal effect in the figure—as the key jump, then βk can be consistently esti-
mated when a suitable instrument ensures pk > 0. This works because βt = 0 for all t 6= k, and
thus the coarsened IV estimator is consistent regardless of whether pt > 0 for some other t 6= k.
[Figure 3 about here]
15
Treatment intensity (Ti)
Out
com
e (Y
i)
●
k k+1
Figure 3: Discontinuous causal response function
16
In practice, however, it is hard to know a priori whether k correctly captures the true discon-
tinuity. In general, tipping points are not straight-forward to precisely theorize. In experiments
where subjects cannot be partially treated it is easier to determine clear cutoffs. As noted above,
in the case of Gerber and Green’s (2000) get-out-the-vote canvassing, knocking on a door is only
likely to affect respondents that opened the door to receive the treatment. But even experiments
can be hard to evaluate if there is partial compliance, such that individuals can experience some
of the treatment without being designated as treated. This could occur, for example, if some sub-
jects in a treatment group receive information about the treatment but do not ultimately receive the
treatment.
Conversely, if the researcher incorrectly surmises that k+1 is the correct threshold, at best they
fail to detect the existence of the effect of intensity k but correctly identify no effect at k+ 1. In
the simple example of Figure 3, where βt = 0 for all t 6= k, the researcher correctly concludes that
βk+1 = 0 only if their instrument does not induce subjects to reach intensity k. In other words,
pk = 0 ensures a correct estimate of a quantity that was probably not of primary interest. However,
when pk > 0 and pk+1 > 0, the coarsened IV estimator will produce the following inconsistent
estimate of the LATE at intensity k+ 1:
βIVk+1 =
pkβk
pk+1> 0. (6)
Although this estimate is approximately correct in the sense that there is a causal effect nearby,
it both wrongly attributes the effect to intensity k+ 1 and does not even consistently estimate βk
unless pk+1 = pk. This bias is increasing in the relative first stage for intensity k.
2.3.2 Linear CRFs
When the instrument affects another intensity t 6= k, the coarsening bias associated with the coars-
ened IV estimator for category k can be particularly large when the true CRF is linear. Letting the
17
causal effect associated with each interval be βt = τ 6= 0, the coarsened IV estimator yields:
βIVk = βk +
∑t 6=k pt
pkτ . (7)
Given βk = τ , coarsening bias is increasing in the first stage at intensities other than k. In particular,
more than one half of all compliers must be reaching intensity k for the estimator to yield less than
double the size of the true effect.10 This concern also increases with how close the treatment
intensity categories are to one another (i.e. increases in the number of categories J), because it
becomes increasingly implausible that any instrument could ensure pt = 0 for all t 6= k, while
∑t 6=k pt/pk becomes increasingly large.
When the causal response is linear and Ti is observable, an IV estimator replacing Dik with
Ti in the first stage—which is the standard IV estimator used when the treatment is regarded as
continuous—is more appropriate. The first stage thus regresses Ti, rather than Dik, on Zi. By using
Ti in the first stage to re-scale the reduced form estimate, this approach instead identifies the local
average per-unit treatment effect (LAPTE). This estimator is defined as:
βIVLAPT E ≡
E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Ti|Zi = 1]−E[Ti|Zi = 0]
=∑
Jt=2 ptβt
∑Jt=2 pt
. (8)
The LAPTE is a linear approximation, where the causal effect at each treatment intensity is
weighted by the number of compliers at that intensity. It is easy to see that when the true effect is
τ at each interval, it is exactly recovered by β IVLAPT E .
The LAPTE estimator can be interpreted as correcting the first stage by appropriately mea-
suring the instrument’s effect on the endogenous treatment. This approach is also more generally
robust in two attractive respects, which provide new justifications for political scientists already
10Note that∑t 6=k pt
pk=
p− pk
pk< 1,
only when pk > p/2, where p≡ ∑Jt=2 pt .
18
using this approach. First, only the weaker exclusion restriction (A5) is required for consistent
estimation of the LAPTE, even when the true causal effect is not linear. This is because the first
stage is assigned a linear form that captures all changes in Ti.
Second, the linear approach can be robust even without observing all categories. If the J
observed categories represent a coarsening of the true intervals (e.g. because T is continuous), the
linear causal effect can still be recovered provided the intervals are equally spaced.
Proposition 3. (Robustness of LAPTE to missing categories) Let only J equally-spaced categories
of Ti be observed when there are in fact αJ equally-spaced categories, where α > 1 is finite and αJ
is an integer. Denote βIV ,JLAPT E and β
IV ,αJLAPT E respectively as the IV estimators in the observed sample
(denoted by superscript J) and unobserved sample (denoted by superscript αJ). Let assumptions
A1-A5 hold, and assume pt > 0 for at least two intensities. If the effect of Ti is linear such that
β Jj = τ for all intervals j, then β
IV ,JLAPT E = αβ
IV ,αJLAPT E .
Consequently, obtaining the coefficient on the quantity of interest only requires an adjustment by
factor α to identify the average linear causal effect for any desired unit interval. A first stage
is required for at least two treatment intensities to ensure that the IV estimate averages across
coarsened categories; without estimates for two intensities to “draw a line through”, the estimates
would be equally susceptible to coarsening bias. If measurement of the treatment is sufficiently
coarse that the instrument only affects a single intensity, researchers may require a new dataset,
or a second dataset drawn from the same population that can separately estimate the first stage
(see Angrist and Krueger 1992; Franklin 1989). Otherwise, they should focus on the reduced form
estimates.
19
2.3.3 Other CRFs
In practice, the CRF is often non-linear. However, given the impracticality of implementing non-
linear IV estimation strategies,11 an important question is: when are coarsening and the LAPTE
appropriate approximations?
As demonstrated above, the coarsened IV estimate will generally be upwardly biased when the
CRF is not discontinuous at only the coarsened treatment intensity. Furthermore, it is easy to show
that coarsening always yields a coefficient at least as large as the LAPTE.12 Accordingly, if the
CRF is that in Figure 3, then the linear approach under-estimates the true causal effect at intensity
k when βt = 0 for all t 6= k but pt > 0 for some t 6= k:
βIVLAPT E =
pkβk
∑Jt=2 pt
=
(1βk
+∑t 6=k pt
pk
)−1
≤ βk (9)
Although the LAPTE does not consistently recover the effect at k, it remains a consistent estimate
of the complier-weighted average across the intervals. Consequently, where the instrument only af-
fects the first stage of interest (i.e. pt = 0,∀t 6= k) the LAPTE estimator yields an identical estimate
to the coarsened IV estimator. Furthermore, when the causal effect differs across intensities, such
a complier-weighted estimate is informative about the average effect in the particular sample. The
LAPTE is thus always a consistent causal estimate of a complier-weighted average, even though it
may not necessarily be the quantity the researcher would ideally estimate.
11Semi-parametric approaches rely on substantially stronger assumptions than typical IV esti-mation, and are challenging to estimate (e.g. Blundell, Chen and Kristensen 2007; Newey andPowell 2003).
12Comparing denominators, ∑Jt=2 pt ≥ pk if monotonicity holds.
20
3 Illustrating the analytical results: simulated data
Before examining coarsening bias in a political application with observational data, it is important
to first illustrate and verify the importance of the analytical results using purely simulated data that
can exactly control the true empirical relationships. I demonstrate that the size of the effect at each
treatment intensity on the outcome (i.e. the shape of the CRF) and the strength of the first stage at
intensity k relative to the first stage at all other intensities (∑t 6=k pt/pk) are critical in determining
the extent of coarsening bias.
3.1 Monte Carlo simulations
The analytical insights are captured in a simple simulation framework. For each simulated dataset,
I draw a random sample of 1,000 independent observations. Each observation i has a 50:50 chance
of receiving a binary instrument Zi ∼ Bernoulli(0.5). The observation’s endogenous treatment
intensity Ti is given by the nearest round number to Zi + ξi, where ξi ∼ Normal(10,σ2). The
treatment thus takes multiple integer intensities, where the mean for observations that did not
receive the instrument is 10 and the mean for observations that did is 11. By increasing the variance
parameter σ2, the instrument has a relatively larger effect on reaching treatment intensities other
than 11. Finally, the treatment is coarsened as an indicator for receiving an intensity of at least 11:
Di ≡ 1(Ti ≥ 11).
I consider two types of CRF. I first examine a linear CRF where each additional intensity
increases the outcome by 0.05 units:
Yi = 0.05∗T
∑t=1
1(Ti ≥ t)+ηi (10)
where ηi ∼ Normal(0.4,0.2) determines the location of the outcome and its variation. This is
held constant across CRFs. I also consider a discontinuous CRF where only the eleventh intensity
21
increases the outcome by 0.05 units:
Yi = 0.05∗1(Ti ≥ 11)+ηi. (11)
In both cases, the standard IV assumptions (A1-A5) hold. However, the strong exclusion restriction
(A5*) only holds for the discontinuous CRF.
To illustrate the bias associated with coarsening, I vary σ2 to examine how 2SLS estimates for
the different CRFs depend upon ∑t 6=k pt/pk (which increases in σ2). For each σ2, I examine 1,000
simulated datasets and present the coarsened IV estimates (using the indicator Di for reaching the
eleventh intensity) and the LAPTE (treating Ti as linear).13
I first analyze the linear CRF. As demonstrated in Proposition 1, Figure 4(a) shows that the bias
associated with coarsening the treatment can be substantial and is increasing linearly in ∑t 6=k pt/pk.
When only the coarsened intensity is affected by the instrument (i.e. pk 6= 0 and pt = 0,∀t 6=
k), or as ∑t 6=k pt/pk ↓ 0, the IV estimate is essentially unbiased. However, bias increases in the
relative magnitude of the effect of the instrument on other intensities. Once only 50% (25%) of
the instrument’s effect occurs at intensity k, the IV estimate is fully two (four) times larger than
the true causal effect. This reflects that fact that the instrument is inducing many changes in the
outcome through intensities that are not captured by the coarsened first stage. Conversely, Figure
4(b) shows that the LAPTE estimate correctly identifies the true causal effect regardless of which
intensities are affected by the instrument, and does not lose precision because each intensity is
equally relevant. This is not surprising given that the LAPTE is designed precisely for this case.
[Figure 4 about here]
Second, consider a truly discontinuous CRF (like Figure 3), where the researcher correctly
identifies the discontinuity in the CRF at the eleventh intensity. Reinforcing Proposition 2, Figure
4(c) confirms that coarsening the treatment unsurprisingly produces an unbiased estimate of the
13Specifically, I consider 14 values of σ2, linearly increasing from 0.17 to 2.33.
22
0.1
.2.3
.4.5
Coa
rsen
ed e
stim
ate
0 1 2 3 4 5
St¹kpt / pk
(a) Linear CRF, coarsened IV estimate
.02
.04
.06
.08
LAP
TE
est
imat
e
0 1 2 3 4 5
St¹kpt / pk
(b) Linear CRF, LAPTE estimate
-.1
0.1
.2
Coa
rsen
ed e
stim
ate
0 1 2 3 4 5
St¹kpt / pk
(c) Discontinuous CRF, correctly coars-ened IV estimate
-.02
0.0
2.0
4.0
6.0
8
LAP
TE
est
imat
e
0 1 2 3 4 5
St¹kpt / pk
(d) Discontinuous CRF, LAPTE estimate
0.0
2.0
4.0
6
Coa
rsen
ed e
stim
ate
3 4 5 6 7 8
St¹kpt / pk
(e) Discontinuous CRF, incorrectly coars-ened IV estimate
Figure 4: Monte Carlo simulations illustrating how IV estimates of a 0.05 causal effect dependupon the CRF, coarsening, and the relative strength of the instrument on different treatment
intensities (95% confidence intervals)
Notes: Estimates at each level of ∑t 6=11 pt/pk are based on 1,000 simulated datasets containing 1,000 ob-servations. For the linear CRF, the effect of each additional intensity is 0.05. For the discontinuous CRF,the effect is 0.05 only at intensity k = 11. When correctly coarsened, the coarsened treatment is definedas Di = 1(Ti≥ k). When incorrectly coarsened, the coarsened treatment is defined as D′i = 1(Ti≥ k+1).See the main text for the underlying equations and distributions. Point estimates are the average esti-mate across simulations; the horizontal dashed line denotes the true causal effect the researcher seeks toestimate. 95% confidence intervals are based on variation within and between simulated estimates.
23
effect at the eleventh intensity when the strong exclusion restriction holds. The precision of this
estimate is decreasing in the relative first stage at other (uninformative) intensities. Turning to
the LAPTE estimate, Figure 4(d) demonstrates that—as shown analytically in equation (9)—the
LAPTE only returns the causal effect at the eleventh intensity when the instrument only affects
this intensity. The size of the LAPTE declines with ∑t 6=k pt/pk, as the relative weight attached
to compliers with zero effects increases. However, it is important to reiterate that the LAPTE
remains a consistent causal estimate—it just weights the effects at all the intensities affected by the
instrument. Furthermore, the comparison of Figures 4(c) and 4(d) shows the general results that
the LAPTE will always be smaller than the coarsened IV estimate. When the CRF is neither linear
nor discontinuous, the LAPTE thus provides both consistent and conservative estimates.
Finally, consider the same discontinuous CRF but where the researcher incorrectly coarsens the
treatment at intensity 12. Figure 4(e) shows that coarsening often incorrectly ascribes a significant
positive effect to the twelfth intensity where no such effect exists. This occurs where, in addition
to inducing subjects to receive intensity 12 (where there is no effect), the instrument also induces
subjects to receive the eleventh treatment intensity where there exists a positive effect. As the
figure shows, this is not simply an issue of interpretation because the points systematically fail to
even recover the true effect of the eleventh intensity.
3.2 Implications for applied research
The simulations clearly show that coarsening an endogenous treatment substantially upwardly bi-
ases IV estimates, except in the special cases where the instrument only induces subjects to reach
the intensity where the treatment is coarsened or the CRF is discontinuous. For most CRFs, esti-
mating the LAPTE is more appropriate. Even when the CRF is not linear, treating the endogenous
treatment as linear has four important advantages: the LAPTE (1) does not suffer from coarsening
bias, (2) always provide a consistent causal estimate, (3) does not rely on researchers correctly
specifying the intensity where the effect is large, and (4) provides a conservative estimate relative
24
to the coarsened IV estimator.
In general, researchers must rely on their theoretical intuition and descriptive data—including
the reduced form relationship, separate first stage regressions and the (endogenous) OLS relationship—
to determine the appropriate specification in the standard case where only a single instrument is
available. Under special circumstances where multiple instruments are available, I show below
that a sharper empirical assessment is possible. This will allow me to isolate the coarsening bias
associated with instrumenting for a dichotomous measure of schooling.
4 Observational application: high school education’s effect on
voting Conservative in Great Britain
This section illustrates the risk of coarsening bias using observational data. In particular, I quantify
coarsening bias in the context of examining how high school education affects vote choice in Great
Britain. Given that education has many downstream consequences for adulthood, it is potentially
one of the most important determinants of voting behavior. In particular, the increased income
associated with higher education may cause voters to support the right-wing economic policies of
Britain’s Conservative party (e.g. Meltzer and Richard 1981).
Analyses of education’s effect on electoral turnout suggest that existing IV estimates may be
vulnerable to coarsening bias. Recent field and natural experiments using IV methods have re-
ported extremely large effects of completing high school on turnout (Milligan, Moretti and Ore-
opoulos 2004; Sondheimer and Green 2010).14 These estimates could be upwardly biased if each
additional year of schooling increases turnout while the instrument increases schooling for many
students without inducing them to complete high school.
14Sondheimer and Green (2010:185) pool three U.S. field experiments that encourage studentsto complete high school, and find that “a high school dropout with a 15.6% chance of voting wouldhave a 65.2% chance of turnout if randomly induced to graduate from high school.” The equallyhuge effects reported by Milligan, Moretti and Oreopoulos (2004) are noted in the introduction.
25
To estimate the extent of coarsening bias, I use Britain’s major 1947 school leaving reform
to instrument for measures of schooling in the context of a regression discontinuity (RD) design.
Comparing essentially identical students just affected and just too old to be affected by the reform
addresses the selection concern that the types of individuals which receive more education is un-
likely to be random. The main focus is to demonstrate the bias that arises in IV estimates when
years of schooling is coarsened as an indicator for completing high school.
4.1 Britain’s 1947 reform raising the school leaving age to 15
Britain’s education laws define the maximum age by which students must start school and the
minimum age at which students can leave school. Following the landmark Education Act 1944,
the leaving age in Britain was increased from from 14 to 15 for students who reached the age
of 14 after April 1st 1947.15 The reform substantially altered the education profile of Britain’s
students. Around 40% of all students remained in school at least one year longer following the 1947
reform. Creating the risk of coarsening bias, the reform also had “knock-on” effects, increasing
the proportion staying in school until 16 and thus completing high school. However, as the top
and bottom trends lines in Figure 5 show, only these two levels of schooling were affected by the
reform.
[Figure 5 about here]
4.2 Data and estimation
In order to examine how coarsening bias affects estimates of high school’s effect on voting Con-
servative, I utilize data from the British Election Survey (BES). I use the eight elections from 1979
15The circumstances surrounding this reform, which did not affect Northern Ireland, and educa-tional policy in Britain are described in greater detail in the Online Appendix.
26
0.2
.4.6
.81
Pro
port
ion
leav
ing
1925 1930 1935 1940 1945 1950 1955 1960 1965 1970
Cohort: year aged 14
Leave before 14 Leave before 15 Leave before 16 Leave before 17
Figure 5: 1947 compulsory schooling reform and student leaving age by cohort
Notes: Data from my BES sample (described below). Curves represent fourth-order polynomial fits.Grey dots are birth-year cohort averages, and their size reflects their weight in the sample.
27
to 2010 for which data on the outcome, instrument and treatment measures are available, and fo-
cus on working age respondents (aged below 70). This produced a maximum sample of 13,853
observations. The data and variables used in this application are described in detail in the Online
Appendix.
I code the decision to vote Conservative as an indicator for the 34% of respondents that re-
ported voting for the Conservative party at the last election. The instrument is simply an indicator
for respondents that turned 14 in 1947 or later, and were thus affected by the higher school leav-
ing age.16 The reduced form plot in Figure 6 suggests a clear upward shift in Conservative voting
among the cohorts affected by the increased leaving age. Finally, schooling—the endogenous treat-
ment variable in this application—is measured both as a linear intensity and a coarsened indicator.
The linear intensity is the number of years of schooling.17 The coarsened treatment is an indicator
for the 64% of respondents that completed high school.18
[Figure 6 about here]
To identify the effect of late high school education on voting Conservative, I use Britain’s 1947
leaving age reform to instrument for both measures of schooling in the context of a “fuzzy” RD
design.19 In particular, whether an individual was affected by the reform is a discontinuous func-
tion of their birth year cohort (the “running variable”). However, since the reform could not force
every student to remain in school for (at least) an additional year, the 1947 reform is used as an
instrument that discontinuously increases the probability that a student received greater schooling
(Hahn, Todd and Van der Klaauw 2001).16Although month of birth is unavailable in the BES, the large first stage is very similar to Clark
and Royer’s (2013) study where monthly data were available.17Comparing university and vocational programs is difficult after beyond 18, so years of school-
ing is top-coded at 13 years. Since the 1947 reform did not affect education beyond age 16, thiscoding is inconsequential.
18Completing high school is defined as leaving school at age 16 (or later) or receiving a highschool leaving diploma for a minimum level of achievement.
19Due to the dramatic changes in education it induced, Britain’s 1947 has proved a popularinstrument among economists (e.g. Clark and Royer 2013; Oreopoulos 2006).
28
.2.3
.4.5
.6
Vot
e sh
are
1925 1930 1935 1940 1945 1950 1955 1960 1965 1970
Cohort: year aged 14
Figure 6: Proportion voting Conservative by birth year cohort
Notes: Curves represent fourth-order polynomial fits. Grey dots are birth-year cohort averages, and theirsize reflects their weight in the sample.
29
The fuzzy RD entails estimating the following structural equation for individuals i from birth
year cohort c:
Vote Conservativeic = βSchoolingic + f (Birth yearc)+ εic, (12)
where f is a flexible function of the running variable designed to capture trends away from the
reform discontinuities. Specifically, I estimate local linear regressions (LLRs) where f includes
linear birth year trends either side of the discontinuity, and only utilizes observations within the
Imbens and Kalyanaraman (2012) optimal bandwidth (of 11.514 cohorts). To maximize the com-
parability of treated and untreated cohorts, observations are weighted according to their proximity
to the discontinuity using a triangle kernel. The Online Appendix demonstrates that the results are
insensitive to particular specification choices. As noted above, I examine two measures of school-
ing: instrumenting for years of schooling estimates the LAPTE of schooling, while instrumenting
for the indicator for completing high school aims to estimate the effect of completing high school.
Conditioning on f (Birth yearc), these estimates respectively correspond to equations (8) and (4).
The first stage regression generating exogenous variation in Schoolingic is given by:
Schoolingic = αPost 1947 reformc + f (Birth yearc)+ εic. (13)
A strong first stage implies that the instrument explains substantial variation in schooling. Testing
the exclusion of the instrument, an F statistic exceeding 10 is sufficient to ensure that “weak
instrument bias” is negligible (Staiger and Stock 1997).
Given that Figure 5 clearly suggests a strong and monotonic first stage, the key identifying
assumptions are that Conservative voting is continuous across cohorts in all covariates other than
the school leaving age at the reform discontinuity (this implies assumption A2 above) and that
the reform only affected Conservative voting by increasing years of schooling (assumption A5
above). In the Online Appendix, I offer substantial support for these assumptions using qualitative
30
evidence, balance tests around the discontinuity, and placebo reform dates.20 Crucially, consistent
estimation of the effect of completing high school also requires that the strong exclusion restriction
(assumption A5*) holds.
4.3 IV estimates and coarsening bias
The first stage estimates in Table 1 verify that the 1947 reform substantially increased schooling.
Column (1) examines years of schooling, and reinforces the graphical analysis in Figure 5 by show-
ing that the reform increased the average schooling by 0.63 years. The large F statistic indicates
a strong first stage. Turning to the dummy for completing high school, column (2) shows that—
in addition to the keeping students in school until 15—the reform also significantly increased the
probability of completing high school by 10 percentage points. This increase is highly statistically
significant (F statistic of 24.8), confirming that any substantial bias in the IV estimates does not
simply reflect a weak instrument.
[Table 1 about here]
I now examine the IV estimates in order to evaluate the impact of coarsening bias. I first in-
strument for years of schooling to estimate the LAPTE. As noted above, this provides a consistent
causal estimate weighting the effects of the penultimate and final year of high school by the pro-
portion of students induced by reform to remain in school until that stage. The LAPTE estimate
in column (3) shows that an additional year of late high school increases a compliers’s probability
of voting Conservative in later life by 12 percentage points. Reinforcing the reduced form rela-
tionship in Figure 6, this large and statistically significant coefficient provides clear evidence that
high school education increases support for the Conservative party in later life. Marshall (2015)
examines the political implications of the reform in detail and provides evidence that, by increasing
20Given the linear and coarsened treatments both require these assumptions, the effects of coars-ening can still be identified if the standard IV assumptions fail to hold.
31
Table 1: Estimates of schooling’s effect on voting Conservative
Years Completed Vote Voteof high Con. Con.
schooling schoolLLR LLR LLR IV LLR IV(1) (2) (3) (4)
Post 1947 reform 0.628*** 0.150**(0.107) (0.029)
Years of schooling 0.118**(0.053)
Completed high school 0.464**(0.209)
Observations 4,820 4,820 4,820 4,820First stage F statistic 37.5 24.8 37.5 24.8
Notes: Specifications (1) and (2) present the first stage estimates where completed high school andyears of schooling are respectively the endogenous treatment variable. Specifications (3) and (4) arerespectively the IV estimates for completed high school and years of schooling. All specifications arelocal linear regressions using a triangular kernel and the Imbens and Kalyanaraman (2012) optimalbandwidth of 11.514. Robust standard errors in parentheses. * denotes p < 0.1, ** denotes p < 0.05,*** denotes p < 0.01.
32
income and support for the policies of more right-wing economic policies, high school education
causes voters to become more conservative.
To see how coarsening the treatment variable affects the IV estimates, I turn to the case where
schooling is dichotomized as an indicator for completing high school. Column (4) reports that
voters induced to complete high school by the reform are fully 46 percentage points more likely
to vote Conservative in later life. This statistically significant estimate is almost implausibly large,
particularly when considering that around half the student population were compliers, and given
that the average effect of an additional year’s schooling is four times smaller. Furthermore, the
conditions under which coarsening bias is large—highlighted in equation (5)—appear to be sat-
isfied. First, the reform predominantly kept students in school until age 15, but did not compel
most students to complete high school (i.e. large pt/pk). Second, there are theoretical reasons
to believe that an additional year of late high school, without completing high school, still affects
income and ultimately political preferences (i.e. large βt , t 6= k). Supporting this claim, the large
reduced form effect in Figure 6 suggests that raising the leaving age must have affected more stu-
dents than just those that completed high school. Third, since both IV estimates rely on the weak
exclusion restriction, differences cannot be explained by the 1947 reform affecting vote choice
through channels other than schooling. There are thus strong reasons to believe that the estimate
for completing high school suffers from substantial coarsening bias.
Nevertheless, although such large effects of completing high school are surprising, they are not
completely infeasible for predominantly working class (and likely pro-Labour) compliers. Fur-
thermore, the huge effect of completing high school could perhaps be squared with the smaller
average effect of an additional year of schooling if the effect of staying until 15 is actually zero. To
confirm that the large estimate results from coarsening bias, I exploit Britain’s 1972 reform raising
the school leaving age from 15 to 16 to separate the effects of the penultimate and final year of
high school. Because the 1972 reform also significantly increased the proportion of students leav-
ing school at age 16 without inducing them to stay in education past 16, I can use both reforms
33
simultaneously as instruments to identify the effects of the penultimate year of high school and
completing high school.21 The availability of two instruments that only affect two levels of the
treatment is uncommon, so this represents a rare opportunity to quantity coarsening bias.
The results, presented in detail in the Online Appendix, show that completing the penultimate
year of high school increases the probability of voting Conservative later in life by 10 percent-
age points, while completing high school similarly increases Conservative voting by a further
17 percentage points. This implies that the effect of additional late high school on Conservative
voting is relatively linear. These estimates are also consistent with the LAPTE estimate of 12
percentage points for each additional year (in column (3) of Table 1). Furthermore, by failing to
return the coarsened IV estimate for completing high school of 0.46, the results demonstrate that
the strong exclusion restriction fails to hold, and therefore the coarsened IV estimate substantially
over-estimates the effect of completing high school. In fact, coarsening is responsible for upwardly
biasing the IV estimate by a factor of three.
5 Conclusion
Both theoretically and empirically, this article demonstrates that coarsening bias should be a ma-
jor concern when utilizing IV methods. Coarsening bias arises where a multi-valued treatment is
transformed into a binary indicator or short scale. Except in special cases that are likely to be rare
in applied empirical research, such coarsening subtly violates the exclusion restriction that requir-
ing the instrument only affects the outcome through the measured treatment. As demonstrated in
the case of high school’s long-run effects on vote choice in Britain, the upward bias can be consid-
erable. I show that using a binary indicator for completing high school over-estimated the causal
effect of the final year of high school on voting Conservative by nearly three times. The extent of
coarsening bias clearly has important implications for the interpretation of academic findings, but
21This is because pt ≈ 0 for all t other than leaving school at 15 and leaving school at 16.
34
could also substantially mislead policy-makers seeking to choose the most efficacious policy. Con-
sequently, as IV methods become a standard part of a political scientist’s methodological toolkit,
it is essential that researchers become aware of the risks of coarsening bias.
Coarsening bias may be common in applied research using IV methods. Coarsening an en-
dogenous treatment variable may appear to be theoretically appealing or easier to interpret, to
avoid strong linearity assumptions, or be necessitated by data unavailability. However, in gen-
eral, only when the treatment’s effects are truly discontinuous—in that only a certain level of the
treatment has a causal effect—and the researcher is able to pinpoint the specific threshold where
this causal effect occurs will coarsening produce unbiased estimates of the desired causal quantity.
Otherwise, it is more appropriate to invoking the strong exclusion restriction required to coarsen
an endogenous treatment, and instead consistently estimate the local average per-unit treatment
effect by measuring the treatment intensity using a linear treatment variable. Unlike coarsening
bias, which arises when the true causal response function is not discontinuous, this approach pro-
vides consistent causal estimates even when the underlying causal response function is not linear.
Provided that the instrument affects at least two intensities of the treatment, this approach is also
robust to failing to observe missing categories.
35
References
Acemoglu, Daron, Simon Johnson and James A. Robinson. 2001. “The Colonial Origins of Com-
parative Development: An Empirical Investigation.” American Economic Review 91(5):1369–
1401.
Angrist, Joshua D. and Alan B. Krueger. 1992. “The Effect of Age at School Entry on Educational
Attainment: An Application of Instrumental Variables with Moments from Two Samples.” Jour-
nal of the American Statistical Association 87(418):328–336.
Angrist, Joshua D. and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average
Causal Effects in Models With Variable Treatment Intensity.” Journal of the American Statistical
Association 90(430):431–442.
Angrist, Joshua D., Guido W. Imbens and Donald B. Rubin. 1996. “Identification of Causal Effects
Using Instrumental Variables.” Journal of the American Statistical Association 91(June):444–
455.
Angrist, Joshua D. and Jorn-Steffan Pischke. 2008. Mostly Harmless Econometrics: An Empiri-
cist’s Companion. Princeton, NJ: Princeton University Press.
Becker, Gary S. 1993. Human Capital: A Theoretical and Empirical Analysis, with Special Refer-
ence to Education. University of Chicago Press.
Blundell, Richard, Xiaohong Chen and Dennis Kristensen. 2007. “Semi-Nonparametric IV Esti-
mation of Shape-Invariant Engel Curves.” Econometrica 75(6):1613–1669.
Buthe, Tim and Helen V. Milner. 2008. “The politics of foreign direct investment into develop-
ing countries: increasing FDI through international trade agreements?” American Journal of
Political Science 52(4):741–762.
36
Clark, Damon and Heather Royer. 2013. “The Effect of Education on Adult Mortality and Health:
Evidence from Britain.” American Economic Review 103(6):2087–2120.
Franklin, Charles H. 1989. “Estimation across data sets: two-stage auxiliary instrumental variables
estimation (2SAIV).” Political Analysis 1(1):1–23.
Gerber, Alan. 1998. “Estimating the Effect of Campaign Spending on Senate Election Outcomes
Using Instrumental Variables.” American Political Science Review 92(2):401–411.
Gerber, Alan S. and Donald P. Green. 2000. “The Effects of Canvassing, Telephone Calls, and Di-
rect Mail on Voter Turnout: A Field Experiment.” American Political Science Review 94(3):653–
663.
Gerber, Alan S., Gregory A. Huber and Ebonya Washington. 2010. “Party affiliation, partisanship,
and political beliefs: A field experiment.” American Political Science Review 104(4):720–744.
Gillard, Derek. 2011. “Education in England: A Brief History.” Web link.
Hahn, Jinyong, Petra Todd and Wilbert Van der Klaauw. 2001. “Identification and estimation of
treatment effects with a regression-discontinuity design.” Econometrica 69(1):201–209.
Imbens, Guido W. and Karthik Kalyanaraman. 2012. “Optimal bandwidth choice for the regression
discontinuity estimator.” Review of Economic Studies 79(3):933–959.
Kam, Cindy D. and Carl L. Palmer. 2008. “Reconsidering the Effects of Education on Political
Participation.” Journal of Politics 70(3):612–631.
Marshall, John. 2015. “Education and voting Conservative: Evidence from a major schooling
reform in Great Britain.” Working paper.
McCrary, Justin. 2008. “Manipulation of the running variable in the regression discontinuity de-
sign: A density test.” Journal of Econometrics 142(2):698–714.
37
Meltzer, Allan H. and Scott F. Richard. 1981. “A rational theory of the size of government.”
Journal of Political Economy 89:914–927.
Milligan, Kevin, Enrico Moretti and Philip Oreopoulos. 2004. “Does education improve citizen-
ship? Evidence from the United States and the United Kingdom.” Journal of Public Economics
88:1667–1695.
Newey, Whitney K. and James L. Powell. 2003. “Instrumental variable estimation of nonparamet-
ric models.” Econometrica 71(5):1565–1578.
Office for National Statistics. 2013. “Vital Statistics: Population and Health Reference Tables—
Annual Time Series Data.” Web link.
Oreopoulos, Philip. 2006. “Estimating Average and Local Average Treatment Effects of Education
when Compulsory Schooling Laws Really Matter.” American Economic Review 96(1):152–175.
Pierskalla, Jan H. and Florian M. Hollenbach. 2013. “Technology and collective action: The
effect of cell phone coverage on political violence in Africa.” American Political Science Review
107(2):207–224.
Sondheimer, Rachel M. and Donald P. Green. 2010. “Using Experiments to Estimate the Effects
of Education on Voter Turnout.” American Journal of Political Science 41(1):178–189.
Sovey, Allison J. and Donald P. Green. 2011. “Instrumental variables estimation in political sci-
ence: A readers’ guide.” American Journal of Political Science 55(1):188–200.
Staiger, Douglas and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instru-
ments.” Econometrica 65(3):557–586.
Woodin, Tom, Gary McCulloch and Steven Cowan. 2013. “Raising the participation age in
historical perspective: policy learning from the past?” British Educational Research Journal
39(4):635–653.
38
Woodin, Tom, McCulloch Gary and Steven Cowan. 2013. Secondary Education and the Raising
of the School Leaving Age: Coming of Age? New York: Palgrave MacMillan.
39
6 Appendix
6.1 Proofs from main article
Proof of Proposition 1. Assumption A1 allows us to write the potential outcomes of Yi as Yi(Zi,Ti)
and potential outcomes of Ti as Ti(Zi). Without loss of generality, let A4 hold such Ti(1)−Ti(0)≥
0. (All results hold for Ti(0)− Ti(1) ≥ 0, where βT and pt are respectively redefined as βk ≡
E[Yi(k)−Yi(k−1)|Ti(0) ≥ k > Ti(1)] and pt ≡ Pr(Ti(0) ≥ t > Ti(1)).) The reduced form can be
written as:
E[Yi|Zi = 1]−E[Yi|Zi = 0] = E[Yi(1,Ti(1))−Yi(0,Ti(0))]
= E
[ J
∑t=2
I(Ti(1) ≥ t)[Yi(1, t)−Yi(1, t−1)]−
J
∑t=2
I(Ti(0) ≥ t)][Yi(0, t)−Yi(0, t−1)]
= E
[ J
∑t=2
[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Yi(t)−Yi(t−1)]]
=J
∑t=2
Pr[Ti(1) ≥ t > Ti(0)]E[Yi(t)−Yi(t−1)|Ti(1) ≥ t > Ti(0)]
=J
∑t=2
ptβt ,
where the first line uses random assignment (A2), the third uses A5 to re-write Yi(z, t) = Yi(z′, t)≡
Yi(t), and the fourth line uses A4 (which implies that Ti(1) ≥ t > Ti(0) is either 0 or 1).
In terms of potential outcomes, the coarsening treatment indicator is given by Dik(Ti(Zi = z) =
t); because assignment from t to 0 or 1, z cannot affect Dik. Using assumption A2, the first stage
40
for the coarsening indicator Dik is given by:
E[Dik|Zi = 1]−E[Dik|Zi = 0] = E
[ J
∑t=2
[Dik(t)−Dik(t−1)]I(Ti(1) ≥ t)−
J
∑t=2
I(Ti(0) ≥ t)[Dik(t)−Dik(t−1)]]
= E
[ J
∑t=2
[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Dik(t)−Dik(t−1)]]
= E
[I(Ti(1) ≥ k)− I(Ti(0) ≥ k)
]= Pr(Ti(1) ≥ k > Ti(0))
= pk,
where the third line follows from the fact that, by definition of Dik(t), only Dik(k)−Dik(k−1) = 1
(all other jumps are 0), while the fourth line follows from A4 (as above).
Combining the first stage with the reduced form, the Wald IV estimator of the LATE for Dik is:
βIVk =
E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Dik|Zi = 1]−E[Dik|Zi = 0]
=∑
Jt=2 ptβt
pk
= βk +∑
Jt=2,t 6=k ptβt
pk,
where the final line follows from simple algebra, and the final term is the additional bias of the esti-
mator (beyond finite sample bias). Assumption A3 ensures that β IVk is well-defined, by preventing
the denominator from equalling zero.
It is immediately clear that the additional bias term is positive whenever ∑Jt=2,t 6=k ptβt > 0,
given that A4 ensures pt ≥ 0,∀t. βt > 0 for all t is thus a sufficient condition for positive bias; the
bias is thus upward in magnitude if βk > 0. Similarly for ∑Jt=2,t 6=k ptβt < 0 and βk < 0. Conse-
quently, sign(βk) = sign(βt),∀t implies |βk| ≤ |β IVk |. �
41
Proof of Proposition 2. Consistency requires that ∑Jt=2,t 6=k ptβt = 0. I now show that A5* is
a sufficient condition. Both A5 and A5* entail that [Yi(z, t)−Yi(z, t − 1)] = [Yi(t)−Yi(t − 1)].
Furthermore, the definition of A5* entails that [Yi(t)−Yi(t− 1)] = 0 for all t 6= k where pt 6= 0.
Consequently,
E[Yi|Zi = 1]−E[Yi|Zi = 0] = E
[ J
∑t=2
[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)][Yi(t)−Yi(t−1)]]
= E
[ J
∑t=2,t 6=k
[I(Ti(1) ≥ t)− I(Ti(0) ≥ t)]][Yi(t)−Yi(t−1)]+
E
[[I(Ti(1) ≥ k)− I(Ti(0) ≥ k)][Yi(k)−Yi(k−1)]
]= 0+ pkβk,
where the first line follows from A5 and the third line requires A5*. Under A5*, it is thus clear
that the Wald estimator then yields:
βIVk =
E[Yi|Zi = 1]−E[Yi|Zi = 0]E[Dik|Zi = 1]−E[Dik|Zi = 0]
=pkβt
pk= βk.
Therefore, β IVk is a consistent estimator under A1, A2, A3, A4 and A5* (which implies A5). �
Proof of Proposition 3. Note, using the first part of the proof of Proposition 1, that βW ,JLAPT E =
∑Jt=2 ptβ
Jt
∑Jt=2 pt
= τ and βW ,αJLAPT E = ∑
αJt=2 ptβ
αJt
∑αJt=2 pt
= τ/α , where the linearity of the causal effect at each
intensity interval implies αβ αJt = β J
t . The result follows. �
6.2 History of school leaving age reforms in Britain
There have been three landmark pieces of legislation in the area of CSLs in the twentieth century.22
First, David Lloyd George’s Liberal government moved on the recommendations of the Lewis Re-
22This brief history borrows from Gillard (2011), Woodin, Gary and Cowan (2013, 2013) andthe relevant legislative documents.
42
port 1916 to raise the school leaving age from 13 to 14 as part of the post-WW1 reforms under
the Education Act 1918 (or Fisher Act). The Act was ambitious in that it also aimed to institution-
alize schooling until 18 and expand higher education, as well as abolish fees at state-run schools
(although secondary education beyond the age of 14 did not become free until 1944) and establish
a national schooling infrastructure. However, the change in the school leaving age was not imple-
mented by Lloyd George until the Education Act 1921, coming into effect in 1922. Although the
1918 Act had intended for further increases in the leaving age, these did not transpire for financial
reasons despite repeated attempts in the 1920s and 1930s (Oreopoulos 2006). In practice, this Act
had relatively little effect on school enrollment: most students remained in school until age 14.
Staying in school beyond 14 typically required attending a grammar school (given secondary ed-
ucation was otherwise limited in supply and poorly subsidized), which entailed significant school
fees and passing an entrance exam. Consequently, the large majority of working class families did
not send their children to secondary school.
Second, as part of the Beveridge reforms, Churchill’s wartime coalition government passed
the Education Act 1944 (or Butler Act), which increased the school leaving age from 14 to 15 in
England and Wales;23 the Education (Scotland) Act 1945 cemented the same reform in Scotland.
No such reform occurred in Northern Ireland until 1957, which is not included in the BES sur-
veys. The leaving age did not come into force until 1st April 1947, giving the education system
time to expand its operations to accommodate the changes in the system (as well as many other
new provisions under the 116-page monolith).24 However, the reform also established the new
Tripartite system (coming into force in 1945), which meant that in most parts of the country for-
mal secondary education began at 11 (rather than 14) and whether students attended a grammar,
secondary technical or secondary modern school was determined by the “eleven plus” examination
23The Education Act 1936 had determined that the age should be raised in 1939, but this did notoccur because of the onset of WW2.
24The lack of teachers was a serious concern, requiring an emergency training program in 1945to address the lack of capacity.
43
0.2
.4.6
.8P
ropo
rtio
n le
avin
g
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995
Cohort: year aged 15
Leave before 14 Leave before 15 Leave before 16 Leave before 17
Figure 7: 1972 compulsory schooling reform and student leaving age by cohort
Notes: Data from the BES. Curves represent fourth-order polynomial fits. Grey dots are birth-yearcohort averages, and their size reflects their weight in the sample.
taken at age 11.25 As shown in the main paper, although fees had already been removed in 1944
and the new Tripartite system adopted in 1945, it was not until the schooling leaving age was raised
that the dramatic increase in enrollment occurred (see also Oreopoulos 2006).
The Education Act 1944 also provided for raising the leaving age to 16 once practical. Con-
sequently the leaving age could be raised to 16 by an Order of Council.26 Conservative Prime
25The Tripartite system was abolished in England in 1976 by the Labour government, and wasreplaced by the comprehensive system which did not bifurcate students at age 11.
26An Order of Council does not require approval like an Act. It may be lain before the House ofCommons and is accepted unless a resolution is passed against it.
44
Minister Harold Macmillan presided over plans to raise the school leaving age to 16 in the Edu-
cation Act 1962, which ultimately fixed spring and summer leaving dates, although it was Con-
servative Edward Heath who finalized the update to the current system under Statutory Instrument
444 (1972). The new rule, which had been overseen by Margaret Thatcher and heavily pushed by
the Crowther Report 1959, was implemented for the academic year starting 1st September 1972 in
England and Wales. Statutory Instrument 59 (1972) raised the leaving age more flexibly in Scot-
land to allow local authorities, who were very concerned about teacher shortages (especially in
Strathclyde/Glasgow), to allow part-time schooling and early leaving in the summer terms. Conse-
quently, the 1972 reform was relatively weak for many Scottish students. The reform in Scotland
was not fully implemented until the Education Act 1976. The Education (School-leaving Dates)
Act 1976 introduced slightly more subtle leaving age rules—which are not utilized in this paper
as they require monthly birth data (as in Clark and Royer 2013). In England, Wales and Scotland
these reforms again raised education participation rates, although less dramatically than the 1944
Act (Milligan, Moretti and Oreopoulos 2004). Because the reform also introduced middle schools
in some parts of the country, which meant that students entered secondary school one year later,
the secondary population remained relatively steady. Nevertheless, as Figure 7 indicates, the pro-
portion of students staying in school until increased substantially. However, this did not carry over
to higher levels of education: the plot clearly indicates that the proportion of students leaving at
age 17 or higher was not affected by the 1972 reform.
Although the 1947 and especially 1972 reforms were implemented by Conservatives, Labour
had consistently pushed for the increases in the leaving age (Gillard 2011). This suggests that the
reforms were not politically motivated. In practice, the reforms were slow to occur, despite persis-
tent campaigning, because they posed severe infrastructural challenges and entailed considerable
cost (Woodin, McCulloch and Cowan 2013).
In 2008, Labour Prime Minister Gordon Brown passed the Education and Skills Act 2008. This
requires that by 2013 young people must remain in at least part-time education or training until age
45
17; by 2015, this rises to 18. Although regional implementation will vary, the Act applies across
the entire United Kingdom.
6.3 BES survey data
All variables are from the British Election Survey (BES). The BES uses a multi-stage design,
randomly selecting postal addresses from several wards from randomly sampled constituencies
(stratifying by region). The BES has been conducted following every general election since 1964,
although I only use the surveys since 1979 where appropriate variables are available. As noted in
the main paper, the sample is restricted to working age respondents (i.e. aged 70 or below), and
those aged 18 at the time of the survey. The sample is restricted to those aged below 70 given the
likelihood that education affects political behavior through earned income (see Marshall 2015).
Summary statistics for the RD and full samples are provided in Table 2.
• Vote Conservative. Indicator coded one for respondents who reported voting for the Conser-
vative party at the last general election. Only respondents which refused to respond, did not
answer or did not vote were excluded.
• Years of schooling. Years of schooling is calculated as the age that the respondent left full
time education minus five (the age at which students start formal schooling). Years of school-
ing is top-coded at 13 years to ensure comparability and focus on state-provided education.
Indicators for 10 and 11 years of schooling are defined according to this measure.
• Completed high school. Indicator coded one for respondents that answered that either: (1)
possess a grade 1 Certificate of Secondary Education (CSE), 5 O-levels at A-C, 5 General
Certificates of Secondary Education (GCSEs) at A*-C or a lower grade on the Scottish Cer-
tification of Education (SCE); or (2) left school at age 16 or later.
• Birth year. Birth-year is estimated by subtracting age at the date of the survey from the year
46
Tabl
e2:
Sum
mar
yst
atis
tics:
RD
and
full
BE
Ssa
mpl
es
RD
sam
ple
Full
BE
Ssa
mpl
eO
bs.
Mea
nSt
d.de
v.M
in.
Max
.O
bs.
Mea
nSt
d.de
v.M
in.
Max
.
Dep
ende
ntva
riab
leVo
teC
onse
rvat
ive
4,82
00.
409
0.49
20
113
,853
0.34
30.
475
01
End
ogen
ous
trea
tmen
tvar
iabl
esY
ears
ofsc
hool
ing
4,82
010
.317
1.41
70
1313
,853
10.9
821.
436
013
Com
plet
edhi
ghsc
hool
4,82
00.
400
0.49
00
113
,853
0.64
20.
480
01
Exc
lude
din
stru
men
tsPo
st19
47re
form
4,82
00.
589
0.49
20
113
,853
0.49
30.
500
01
Post
1972
refo
rm4,
820
0.00
00.
000
00
13,8
530.
314
0.46
40
1
Pre
-tre
atm
entc
ovar
iate
sM
ale
4,82
00.
474
0.49
90
113
,853
0.46
60.
499
01
Whi
te4,
820
0.98
40.
126
01
13,8
530.
965
0.18
30
1B
lack
4,82
00.
006
0.07
90
113
,853
0.00
90.
094
01
Asi
an4,
820
0.00
70.
084
01
13,8
530.
016
0.12
70
1A
ge4,
820
54.7
328.
323
3569
13,8
5343
.365
14.0
7318
69Fa
ther
man
ual/u
nski
lled
job
3,82
00.
724
0.44
70
19,
922
0.67
20.
470
01
Surv
ey4,
820
1988
.912
7.34
219
7920
1013
,853
1991
.697
8.92
619
7920
10B
irth
year
4,82
019
34.1
806.
465
1922
1944
13,8
5319
48.3
3216
.153
1910
1992
47
in which the survey was conducted. I then add 14 for year aged 14. Non-responses were
deleted.
• Post 1947 reform. Indicator coded one for students aged 14 or below in 1947, and aged 15
or above in 1972.
• Post 1972 reform. Indicator coded one for students aged 14 or below in 1972.
• Male. Indicator coded one for respondents identifying as male. Non-responses were deleted.
• Age. Standardized age at the date of the survey.
• Race. Indicators coded one for respondents who respectively identify their ethnicity as white,
black, or Asian (including South Asian ethnicities and Chinese).
• Father manual/unskilled job. Indicator coded one for respondent’s who answered that their
father had a manual or unskilled job.
• Survey year. Year in which the survey was conducted.
1979 survey did not ask this question, the 1983 survey used a 21-point scale that was re-
scaled to 0-10, while the 1974 survey (which is included) asked about just social services.
Respondents that answered “don’t know” were excluded.
6.4 Continuity across the discontinuity
The key “sorting” concern for RD designs is that another key variable simultaneously changes at
the discontinuity. However, selection into cohorts in Britain is implausible since parents could
not have precisely predicted CSL reforms over a decade in advance. The lack of heaping around
the 1947 reform is confirmed by annual birth rate data (Office for National Statistics 2013). Fur-
thermore, a McCrary (2008) density, shown in Figure 8, confirms that the density of the data is
48
0.0
05.0
1.0
15.0
2.0
25
Den
sity
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Cohort: year aged 14
Figure 8: McCrary (2008) sorting test
indistinguishable across the reform discontinuities.27
Table 3 confirms that pre-treatment variables do not differ significantly across the 1947 reform.
In no case, is the coefficient on post 1947 reform close to being statistically significant. In each
case I use the bandwidth used for the main results presented in the main paper.
27The null hypothesis of a difference in density across the discontinuity is far from rejected.
49
Table 3: Continuity in other variables around the 1947 reform
Male White Black Asian Fathermanual/unskilled
jobLLR LLR LLR LLR LLR(1) (2) (3) (4) (5)
Post 1947 reform 0.012 0.001 0.005 -0.004 -0.030(0.032) (0.008) (0.006) (0.005) (0.030)
Observations 4,820 4,820 4,820 4,820 3,820
Notes: All specifications are local linear regressions using a triangular kernel and a bandwidth of 11.514.Robust standard errors in parentheses. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.
6.5 Robustness checks
6.5.1 Regression discontinuity specification tests
I now show that the reduced form and IV estimates are robust to various specification concerns.
First, I demonstrate that the results are not sensitive to the regression discontinuity (RD) speci-
fication. Figure 9 shows that the point estimates are stable across bandwidths and the choice of
triangular or rectangular kernel, although the precision of the estimates unsurprisingly declines for
very small bandwidths.
Second, Table 4 shows that the results are robust to including quadratic or cubic polynomial
trends in the running variable (birth year) either side of the discontinuity. Since I change the order
of the polynomial, I also allow the Imbens and Kalyanaraman (2012) optimal bandwidth to adjust
accordingly. Unsurprisingly, the estimates become less precise as higher-order polynomials are
included.
Finally, I examine placebo tests treating each of the ten years prior to the 1947 reform as
a placebo reform. Figure 10 presents the results of placebo reforms. Each bar is the estimate
50
-.1
0.1
.2.3
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of 1947 reform (triangle)
-.1
0.1
.2.3
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of 1947 reform (rectangular)
-.2
0.2
.4.6
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of years of schooling (triangle)-.
20
.2.4
.6
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of years of schooling (rectangular)
-10
12
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of completing high school (triangle)
-10
12
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bandwidth
Effect of completing high school (rectangular)
Figure 9: Robustness to choice of bandwidth and kernel
Notes: Triangle and rectangular denote the choice of kernel used for the specifications in each plot. Barsrepresent 95% confidence intervals.
51
-.15
-.1
-.05
0.0
5.1
Effe
ct o
f pla
cebo
ref
orm
on
Vot
e C
onse
rvat
ive
1937 1938 1939 1940 1941 1942 1943 1944 1945 1946
Placebo reform year
Figure 10: Effect of placebo reforms on Conservative voting
Notes: All specifications are local linear regressions using a triangular kernel and the Imbens and Kalya-naraman (2012) optimal bandwidth for each placebo reform. Bars represent robust 95% confidenceintervals.
52
Table 4: Robustness to higher-order polynomials in the running variable
Vote Vote Vote VoteCon. Con. Con. Con.
LLR IV LLR IV LLR IV LLR IV(1) (2) (3) (4)
Years of schooling 0.144** 0.158*(0.071) (0.092)
Completed high school 0.514** 0.552*(0.253) (0.319)
Running variable polynomial order quadratic cubic quadratic cubicOptimal bandwidth 15.580 20.052 15.580 20.052Observations 6,515 8,325 6,515 8,325First stage F statistic 20.5 12.3 17.6 11.4
Notes: All specifications are local linear regressions using a triangular kernel. Robust standard errors inparentheses. * denotes p < 0.1, ** denotes p < 0.05, *** denotes p < 0.01.
associated with treating the corresponding year as the 1947 reform. In no prior year, including
those immediately before the reform, do we observe any evidence of an increase in Conservative
voting. This provides strong support for the concern that the results are not being driven by pre-
trend.
6.5.2 Exclusion restriction (weak)
Second, the weak exclusion restriction is violated if the 1947 reform affected political preferences
through channels other than schooling. Although political or cultural changes are unlikely to differ-
entially affect cohorts one year apart, it is possible that an additional year in school could affect life
choices—such as marriage or having children—by simply keeping students in school, but without
operating through schooling itself. Figure 11 examines these possibilities using data from Labour
Force Surveys (for the election years in the BES sample). The graphs indicate that around nei-
ther the 1947 nor 1972 reform (used to confirm coarsening bias) was there a clear change in how
cohorts born just before and after the reforms tend to have children or get married. In particular,
53
.2.4
.6.8
1P
ropo
rtio
n
1940 1960 1980 2000Cohort: year aged 14
Dependent child
68
1012
1416
Ave
rage
age
1940 1960 1980 2000Cohort: year aged 14
Age of eldest dependent child0
.51
1.5
Ave
rage
num
ber
of c
hild
ren
1940 1960 1980 2000Cohort: year aged 14
Number of dependent children
0.2
.4.6
.81
Pro
port
ion
1940 1960 1980 2000Cohort: year aged 14
Once married
Figure 11: Exclusion restriction checks
Notes: All data are from the Labor Force Surveys associated with the election years in the BES sample.Dependent children are those under the age of 19.
whether the respondent has a dependent children (defined by being under the age of 19), the age
of their eldest dependent child, their number of dependent children, and having once been married
are all effectively continuous through the reforms. These results indicate that simply remaining in
school did not substantially alter other life choices proximate to the decision to remain in school.
An alternative concern relates to the possibility that other concurrent changes differentially
affected students aged 14 before and after 1947. However, the most significant post-war changes
in the education system had already been implemented before 1947. Fees for secondary schooling
were removed in 1944, while the new Tripartite system—which formally established three types of
54
secondary school emphasizing academic, scientific and practical skills—came into force in 1945.
However, as Figure 5 indicates, these structural reforms did not affect enrollment (Oreopoulos
2006). Furthermore, prior to the 1947 reform, the government pre-emptively engaged in a major
expansion effort to maintain quality by increasing the number of teachers, buildings and classroom
materials (Woodin, McCulloch and Cowan 2013). The additional year of schooling was primarily
intended to ensure that students grasped the material they had previously been taught (Clark and
Royer 2013). Nevertheless, any reduction in schooling quality or spillover causing older cohorts
to behave more like treated cohorts would reduce between-cohort differences around the reforms,
and thus downwardly bias the estimates.
6.6 Using the 1972 school leaving reform to identify coarsening bias
As noted in the main text, the availability of two instruments offers the opportunity to precisely
estimate the extent of coarsening bias. In particular, the availability of two instruments means that
it is possible to instrument for two treatment variables—in this case completing the penultimate
year of high school (i.e. Penultimateic = 1(Schoolingic = 10)) and completing at least the final
year of high school (as defined in the main paper). Since Figure 5 in the main paper and Figure
7 respectively show that the 1947 and 1972 reforms did not affect students leaving at older or
younger ages, the reforms only affected whether students remained in school for the penultimate
or final year of high school. Consequently, the first stage at all other levels of the treatment is zero
(i.e. pt = 0). Therefore, instrumenting for two indicator variables—completing the penultimate
and final years of high school—does not suffer from coarsening bias because the special case
that pt = 0 at all other t is satisfied. This enables me estimation of the causal effect βt for both
additional years of high school, and thus to exactly identify the effect of completing high school.
This latter quantity can be then be compared to the estimate for completing high school in the main
paper to adduce the extent of coarsening bias.
To simultaneously estimate the effects of the penultimate and final year of high school, I use
55
2SLS to estimate the following structural equation:
Vote Conservativeic = β1Penultimateic +β2Completed high schoolic + f (Birth yearc)+ εic, (14)
where the first stage regressions generating exogenous variation in educational attainment are given
by:
Penultimateic = α1Post 1947 reformc +α2Post 1972 reformc + f (Birth yearc)+ εic(15)
Completed high schoolic = γ1Post 1947 reformc + γ2Post 1972 reformc + f (Birth yearc)+ εic.(16)
Because I now use two discontinuities as instruments, this system cannot be estimated using local
linear regression, as in the main paper. However, I adopt a similar approach by using the full
BES sample and letting f include cubic global polyomials in the running variable (birth year). I
use flexible polynomials in order to capture general trends in Conservative support across cohorts
around both reforms, and thus mimic the local linear approach in the main paper.28
Table 5 reports the reduced form and IV estimates when exploiting both reforms simultane-
ously. Column (1) presents the reduced form estimates, and demonstrates that both reforms sig-
nificantly increase the probability that an individual votes Conservative. Consistent with the 1947
reform having a larger effect on education, the 1947 reform also has a larger effect on Conservative
voting: the 1947 increased Conservative voting per cohort by 6 percentage points, while the 1972
reform added a further two percentage points. Column (2) estimates the LAPTE for an additional
year of schooling by using the 1947 and 1972 reforms as instruments for years of schooling. Sim-
ilarly to the main paper, which reports a 12 percentage point increase, the estimates suggest that
an additional year increases the probability of voting Conservative later in life by 13 percentage
28Encouragingly, column (1) of Table 5 shows that the cubic polynomials report a similar esti-mate for the 1947 reform as the local linear regression approach in the main paper. Nevertheless, Ifound very similar results when using fourth, fifth, sixth and seventh-order polynomials.
56
Table 5: Using the 1972 reform to identify the extent of coarsening bias
Vote Vote VoteCon. Con. Con.OLS 2SLS 2SLS(1) (2) (3)
Post 1947 reform 0.059***(0.021)
Post 1972 reform 0.079***(0.030)
Years of schooling 0.130***(0.046)
Penultimate year of high school 0.098**(0.039)
Completed high school 0.269**(0.124)
Observations 13,853 13,853 13,853First stage F statistic 28.5 658.4/52.1
Notes: All specifications include standardized cubic polynomials in birth year. Robust standard errorsin parentheses (which are always larger than errors clustered by cohort). * denotes p < 0.1, ** denotesp < 0.05, *** denotes p < 0.01.
57
points. Furthermore, a Sargan overidentification χ21 test fails to reject the null hypothesis that the
instruments produce different IV estimates (p = 0.70). This provides evidence that both reforms
had similar effects on voters, and again suggests that an additional year of compulsory educa-
tion substantially increases the probability of voting Conservative. The similarity of the effects
means that despite using two different instruments, I am able to plausibility use both instruments
to separate out the effects of the penultimate and final years of schooling.
I now turn to the key element of this exercise: determining the extent of coarsening bias. Col-
umn (3) estimates equation (14), and reports the estimates for completing the penultimate and final
year of high school. The results indicate that the penultimate year of high school increases Conser-
vative voting by 10 percentage points, while completing high school adds a further 17 percentage
points. This implies that the effect of completing high school, over completing a lower level of
schooling, is to increase the probability of voting Conservative by 17 percentage points. Therefore,
this demonstrates that coarsening bias almost triples the size of this estimate (46 percentage points
in the main paper). Although the coefficients come from different samples, plugging the estimates
for βt and pt into equation (4) in the main paper yields an IV estimate of around 0.5. This is similar
to the 0.46 estimate in the main paper.
58