Chapter 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals.
Weak Instruments and What To Do About Them · 1-2 Overview and Summary Topic: IV regression with a...
Transcript of Weak Instruments and What To Do About Them · 1-2 Overview and Summary Topic: IV regression with a...
1-1
NBER Summer Institute 2018
Methods Lectures
Weak Instruments and What To Do About Them
Isaiah Andrews, Harvard University
James H. Stock, Harvard University
July 22, 2018 (updated July 25, 2018)
3-4:20pm 1. Weak instruments in the wild
2. Detecting weak instruments
Stock
Stock
4:20-4:40pm Break
4:40-6pm 3. Inference with weak instruments
4. Open issues and recent research
Andrews
Andrews
1-2
Overview and Summary
Topic: IV regression with a single included endogenous regressor, control
variables, and non-homoskedastic errors.
This covers heteroskedasticity, HAC, cluster, etc.
We assume that consistent robust SEs exist for the reduced form & first stage
regressions.
Early literature (through ~2006): homoskedastic case
This mini-course focuses on weak instruments in the non-homoskedastic
case (i.e., the relevant case).
Outline
1) So what?
2) Detecting weak instruments
3) Estimation (brief)
4) Weak instrument-robust inference about parameter of interest (β)
5) Extensions
1-3
So what? (1) Theory
An instrumental variable is weak if its correlation with the included
endogenous regressor is small.
1. “small” depends on the inference problem at hand, and on the sample size
With weak instruments, TSLS is biased towards OLS, and TSLS tests have
the wrong size.
Distribution of the TSLS
t-statistic (Nelson-Startz
(1990a,b))
Dark line = irrelevant
instruments
dashed light line = strong
instruments
intermediate cases = weak
instruments
1-4
So what? (2) Simulation
DGP: 8 AER papers 2014-2018 (Sample: 17 that use IV; 16 with a single X; 8 in simulation sample)
Median of TSLS t-statistic under the null
1-5
So what? (3) Practice (the “in the wild” bit)
Histogram of first-stage Fs in AER papers (108 specifications), 2014-2018
The first-stage F tests the
hypothesis that the first-stage
coefficients are zero.
Of the 17 papers, all but 1 report
first-stage Fs for at least one
specification; the histogram is of
the 108 specifications that report
a first-stage F (72 of which are
<50 and are in the plot).
Great that
authors/editors/referees are
aware of the potential
importance of weak
instruments, as evidence by
nearly all papers reporting first stages Fs.
The spike at F = 10 is “interesting”
1-6
Detecting Weak Instruments
It is convenient to have a way to decide if instruments are strong (TSLS “works”)
or weak (use weak-instrument robust methods).
The standard method is “the” first-stage F. Candidates:
FN – nonrobust
FR – robust (HR, HAC, cluster), also called Kleibergen-Paap (2006)
FE – Effective first-stage F statistic of Montiel Olea and Plueger (2013)
Actually there are other candidates too, not used and not to be discussed here
including Hahn-Hausman (2002), Shea’s (1997) partial R2
1-7
Detecting weak instruments in practice
Reported first-stage F’s: what authors say they use
Candidates: FN – nonrobust
FR – robust (HR, HAC, cluster), also called Kleibergen-Paap (2006)
FE – Effective first-stage F statistic of Montiel Olea and Plueger (2013)
1-8
Detecting weak instruments in practice, ctd
Actual first-stage F’s: what authors actually use
Candidates: FN – nonrobust
FR – robust (HR, HAC, cluster), also called Kleibergen-Paap (2006)
FE – Effective first-stage F statistic of Montiel Olea and Plueger (2013)
1-9
Our recommendations (1 included endogenous regressor)
Do:
o Use the Montiel Olea-Pflueger (2013) effective first-stage F statistic
FEff = FN × correction factor for non-homoskedasticity
o Report FEff
o Compare FEff to MOP critical values (weakivtest.ado), or to 10.
o If FEff ≥ MOP critical value, or ≥ 10 for rule-of-thumb method, use TSLS
inference; else use weak-instrument robust inference.
Don’t
o use/report p-values of test of π = 0 (null of irrelevant instruments)
o use/report nonrobust first stage F (FN)
o use/report usual robust first-stage F (except OK for k = 1 where FR = FEff)
o use/report Kleibergen-Paap (2006) statistic (same thing).
o compare HR/HAC/Kleibergen-Paap to Stock-Yogo critical values
o reject a paper because FEff < 10!
Instead, tell the authors to use weak-IV robust inference.
1-10
Notation and Review of IV Regression
IV regression model with a single endogenous regressor and k instruments
1i i i iY X W (Structural equation) (1)
2i i i iX Z W V (First stage) (2)
where W includes the constant. Substitute (2) into (1):
3i i i iY Z W U (Reduced form) (3)
where δ = πβ and i i iU V .
OLS is in general inconsistent: 2
ˆ pOLS X
X
.
β can be estimated by IV using the k instruments Z.
By Frisch-Waugh, you can eliminate W by regressing Y, X, Z against W and
using the residuals. This applies to everything we cover in the linear model
so we drop W henceforth.
1-11
Setup: i i iY X (Structural equation) (1)
i i iX Z V (First stage) (2)
, i i iY Z U , ε = U – βV. (Reduced form) (3) __________________________________________________________________________________________________________________________________________________________________________________
The two conditions for instrument validity
(i) Relevance: cov(Z,X) ≠ 0 or π ≠ 0 (general k)
(ii) Exogeneity: cov(Z,ε) = 0
The IV estimator when k = 1 (Wright 1926)
cov( , ) cov( , ) cov( , ) cov( , )
cov( , ) by (i)
Z Y Z X Z X Z
Z X
so
cov( , )
by (ii)cov( , )
Z Y
Z X
IV estimator:
1
1
1
1
ˆˆ
ˆ
n
i iIV i
n
i ii
n Z Y
n Z X
1-12
Setup: i i iY X (Structural equation) (1)
i i iX Z V (First stage) (2)
, i i iY Z U , ε = U – βV. (Reduced form) (3) __________________________________________________________________________________________________________________________________________________________________________________
k > 1: Two stage least squares (TSLS) 1
1
1 2
1
1
1
ˆˆ ˆ, where predicted value from first stage
ˆ
ˆ ˆˆ ˆ, where ˆˆ ˆ
n
i iTSLS iin
ii
nZZZZ i ii
ZZ
n X YX
n X
QQ n Z Z
Q
-1
-1
X Z(Z Z) Z Y
X Z(Z Z) Z X
The weak instruments problem is a “divide by zero” problem
cov(Z,X) is nearly zero; or π is nearly zero; or
ˆˆ ˆZZQ is noisy
Weak IV is a subset of weak identification (Stock-Wright 2000, Nelson-
Starts 2006, Andrews-Cheng 2012)
1-13
Statistics for measuring instrument strength
Non-robust: 2
ˆˆ ˆ
ˆ
N ZZ
V
QF n
k
Robust: 1ˆˆ ˆRF
k
MOP Effective F:
2
1/2 1/2 1/2 1/2
ˆ ˆˆ ˆ
ˆ ˆˆ ˆ ˆ ˆ
Eff NVZZ
ZZ ZZ
kQF F
tr Q tr Q
compare to TSLS: ˆ ˆˆˆˆˆ ˆ
TSLS ZZ
ZZ
Q
Q
Intuition
FN measures the right thing ( ZZQ ), but gets the SEs wrong
FR measures the wrong thing (1
), but gets the SEs right
FEff measures the right thing and gets SEs right “on average”
1-14
Distributional assumptions
Setup: i i iX Z V (First stage) (2)
, i i iY Z U , ε = U – βV. (Reduced form) (3)
CLT:
*
ˆ
0, ,ˆ
dn
N
n
Σ* is HR/HAC/Cluster (henceforth, “HR”)
(i) CLT limit holds exactly: 1 *ˆ
, , where ˆ
N n
(ii) Reduced form variance & moment matrices are all known: Σ, QZZ
A lot is going on here!
HR/HAC/cluster variance estimators are consistent
1950s-1970s finite-sample normal (fixed Z’s) literature
1-15
A lot is going on here, ctd
From
*
ˆ
0,ˆ
dn
N
n
to 1 *ˆ
, , where ˆ
N n
Weak IV asymptotics (Staiger-Stock 1997): /C n .
*
1 1
* 1
* 1 2
;
ˆ ˆˆ ˆ ˆ ˆ/
ˆˆ ˆ
ˆˆ ˆ
R
d
k C C
kF n n n
n n n n
n C n C
Limit experiment interpretation (Hirano-Porter 2015)
Uniformity (D. Andrews-Cheng 2012)
1-16
Homework problem
Let k = 2 and 2ˆ
ZZQ I . Suppose 2 2
2 2
0/
0
U UV
UV V
n
.
1) Show that:
a) 1/2 1/2 2 2 2( ) /ZZ Vtr Q n .
b) 2 22 2
1 ,1 2 ,2
1
2
NF z z
c) 1
2
RF z z
d)
2 22 2
1 ,1 2 ,2
2 2
Effz z
F
2) Adopt the weak instrument nesting π = n-1/2C, where C1, C2 ≠ 0. Show that as 2 :
a) “bias” of 2ˆ ˆ= plimTSLS OLS
V V
b) pNF c) pRF
d) 2
1
dEffF
3) Discuss
1-17
Work out the details for k = 1 first.
Preliminaries:
(a) Use distributional assumption (i)
1 *ˆ
, , where ˆ
N n
to write,
ˆ
ˆ
, where 0,N
(b) Connect to the structural regression:
ˆ ˆ
, where
-1(Z Z) Z ε
(c) Standardize:
1/2
1/2
ˆ ~ z
z
, where 1/2
and 1
~ 0,1
zN
z
(d) Project & orthogonalize:
z z , where 2~ (0,1 ), vN z ,
1-18
What parameter governs departures from usual asymptotics (k = 1)?
1/2
1/2
ˆˆ
ˆ
ˆˆ ˆ( )ˆ add and subtract
ˆ
use representations in (a) and (b)
standardize using representation in (c)
IV
z
z
z
z z
using projection (d)
“bias” “noise”
Parameter measuring instrument strength (k = 1) is 2 2
1-19
“Bias” part of IV representation
ˆ IV z
z
, where 1/2
Instrument strength depends on λ2
Strong instruments: 2 , usual asymptotic distribution
Irrelevant instruments: π = 0 so λ = 0: 1/2
1/2ˆ IV
z
~ Cauchy centered at
o In homoskedastic case, 2ˆplim OLSV
V
In the homoskedastic case, λ2 = the concentration parameter (old Edgeworth
expansion/finite sample distribution literature)
1-20
Instrument strength, k = 1, ctd.
How big does λ need to be? A “bias” heuristic:
2
2 2
ˆ
/
1 /
11 ...
IVE zE
z
zE
z
z z zE E
For bias, relative to unidentified case, to be <0.1, need λ2 > 10.
But we don’t know λ! So, we need a statistic with a distribution that depends
on λ, which we can use to back out an estimate/test/rule of thumb.
This is the Nagar (1959) expansion for the bias
How do the three candidate first-stage Fs fare?
1-21
Distributions of the three first-stage Fs, k = 1
First note that, when k = 1, FR = FEff:
2
1/2 1/2
ˆˆ ˆ ˆ
ˆˆˆ ˆ
Eff RZZ
ZZ
QF F
tr Q
Distributions
2
22 2
1;
ˆ, ( )
ˆEff R
vF F z
*22
2 22
ˆˆ ˆ ˆ( )
ˆˆ ˆ
N ZZ
V V ZZV ZZ
QF n z
QnQ
Implications
FR, FEff can be used for inference about λ2 when k = 1
Estimation: 2 2 1eff
VEF E z , so 2ˆ 1EffF
Testing: H0: “bias” ≤ 0.1. Reject H0 if FEff > critical value.
Rule of thumb: Feff < 10 will detect weak IVs with probability that increases
as λ2 gets smaller
1-22
Implications, ctd. *
2
2( )N
V
V ZZ
F zQ
2
,Eff R
VF F z
FN is misleading in the HR case.
Suppose *
is large (i.e., first stage HR SEs are a lot bigger than NR SEs)
2
* *2 2
2 2 1;( )N
V
V ZZ V ZZ
F zQ Q
where λ2 = π2/Σππ. For *
large, 2 0 , and *
2
12~N
V ZZ
FQ
i.e., Instruments are in the limit irrelevant – but NF .
In the k = 1 case, FR = FEff. These differ in the k > 1 case, where FEff is preferred.
1-23
Homework problem
Let k = 2 and 2ˆ
ZZQ I . Suppose 2 2
2 2
0/
0
U UV
UV V
n
.
1) Show that:
a) 1/2 1/2 2 2 2( ) /ZZ Vtr Q n .
b) 2 22 2
1 ,1 2 ,2
1
2
NF z z
c) 1
2
RF z z
d)
2 22 2
1 ,1 2 ,2
2 2
Effz z
F
2) Adopt the weak instrument nesting π = n-1/2C, where C1, C2 ≠ 0. Show that as 2 :
a) “bias” of 2ˆ ˆ~ = plimTSLS OLS
V V
b) pNF c) pRF
d) 2
1
dEffF
3) Discuss
1-24
Homework problem solution
Let k = 2 and 2ˆ
ZZQ I . Suppose 2 2
2 2
0/
0
U UV
UV V
n
and π1, π2 ≠ 0
1(a) Direct calculation: 1/2 1/2 2 2 2( ) /ZZ Vtr Q n
1(b)-(d): We have already done the work to get the expressions below following
“~”, and the final expressions come from substitution of QZZ and Σ:
1/2 1/2
2 2
2 22 2
1 ,1 2 ,2
ˆˆˆ ˆ(b)
1
2
zzN zz
V V
z n Q zn QF
k k
z z
1ˆˆ ˆ 1(c)
2
V VR z zF z z
k k
1/2 1/2
1/2 1/2 1/2 1/2
2 22 2
1 ,1 2 ,2
2 2
ˆˆˆ ˆ(d)
ˆˆ ˆ
zzEff ZZ
ZZ ZZ
z Q zQF
tr Q tr Q
z z
1-25
Homework problem solution, ctd.
2) Adopt the weak instrument nesting π = n-1/2C, where C1, C2 ≠ 0. Show that as 2 :
a) “bias” of 2ˆ ˆ~ plimTSLS OLS
V V
Last part first: 2 2ˆplim OLS
X X V V because π = n1/2C.
Next obtain the expression (several tedious steps),
“Bias” part ( )ˆ
( ) ( )
TSLS z HRz
z H z
where 2
1/2 1/2 2
2
0/
0ZZ VH Q n
and
1/2 1/2
22
V
V
R I
.
For the weak instrument nesting, /12
2
1/2 2
2
11
1/2 1
2
0/
0
/0/
/0
V
V
V
V
n
Cn
C
1-26
Homework problem solution, ctd.
Now substitute these expressions for λ, H, and R into the “bias” part:
2
2
22
2
1 1
1 ,1 ,1 2 ,2 ,2
2 1 2 2
1 ,1 2 ,2
1
2
0( )
0ˆ0
( ) ( )0
( / ) ( / )
( / ) ( / )
1 ( )
TSLS V
V
V V V
V V V
Vp
V
z z
z z
C z z C z z
C z C z
O
1-27
Homework problem solution, ctd.
Remaining parts by substitution and taking limits:
2 22 2
1 ,1 2 ,2
22 1 2 2
1 ,1 2 ,2 1
1(b)
2
1 1/ / ( )
2 2
N
V V p
F z z
C z C z O
2 21
1 ,1 2 ,2
222
2
1(c)
2
1/ /
2
1( )
2
R
V V
p
V
F z z
C z C z
CO
2 2
,1 2 1 2
,1 12 2 2 2
( )(d) ( ) ~
NpEff
p
z OFF z O
3) Discuss
1-28
OK, FEff – but what cutoff?
1/2 1/2
1/2 1/2, where
( )
Eff ZZ
ZZ
QF z H z H
tr Q
~ weighted average of noncentral χ2’s – depends on full matrix H,
0 ≤ eigenvalues (H) ≤ 1
Hierarchy of options
1. Testing approach: test null of H ≥ some threshold (e.g. 10% bias)
a) (MOP Monte Carlo method) Given H , compute cutoff H ; critical value
by simulation
b) (MOP Paitnik-Nagar method) Approximate weighted average of noncentral
χ2’s by noncentral χ2; compute cutoff value of H using Nagar
approximation to the bias, with some maximal allowable bias. Implemented in
weakivtest.ado.
c) (MOP simple method) Pick a maximal allowable bias (or size distortion) and
use their “simple” critical values (based on noncentral χ2 bounding
distribution). These are simple, but conservative.
2. Consistent sequence approach: “Weak” if FEff < n , n (but what is n ?)
3. Rule-of-thumb approach: “Weak” if FEff < 10
1-29
k=1 case, additional comments about FEff and FR
ˆ IV z
z
, where 1/2
0
1/22
ˆ
ˆ( )1 2
IVIV
IV
zt
SEz z
z z
, where
1/2
R EffF F z z
By maximizing over ρ you can find worst case size distortion for usual IV t-
stat testing β0. This depends on λ, which can be estimated from FR = FEff.
These are the same expressions, with different definition of λ, as in
homoskedastic case (special to k = 1)
Critical values for k = 1 – two choices:
Nagar bias ≤ 10%: 23 (5% critical value from 2
2
1; 0
) (MOP)
Maximum tIV size distortion of 0.10: 16.4; of 0.15: 9.0
But with k = 1 there are fully robust methods that are easy and have very
strong theoretical properties (AR) (Lecture 3).
1-30
Detecting weak instruments with multiple included endogenous regressors
Methods are based on multivariate F: Cragg-Donald statistic and robust variants
Nonrobust:
o Minimum eigenvalue of Cragg-Donald statistic, Stock-Yogo (2005)
critical values
o Sanderson-Windmeijer (2016)
HR: Main method used is Kleibergen-Paap statistic, which is HR Cragg-
Donald.
o But recall that this doesn’t work (theory) for 1 X, and having multiple X’s
doesn’t improve things.
MOP Effective F: Hasn’t been developed.
More work is needed….
1-31
What if you plan to use efficient 2-step GMM, not TSLS?
Everything above is tailored to TSLS!
Suppose that, if you have strong instruments, you use efficient 2-step GMM: 1
1
ˆˆˆˆˆˆ ˆ
GMM
, where
2(1)
1
1ˆ ˆn
i i i
i
Z Zn
where (1)
i is the residual from a first-stage estimate of β, e.g. TSLS.
Things get complicated because the first step (TSLS) isn’t consistent with
weak instruments.
o ˆ converges in distribution to a random limit
o If were known (infeasible),
1/2 1/2
1/2 1 1/2
ˆGMM z z
z z
In general none of the F’s discussed so far get at the right object, 1/2 1 1/2 1/2 1 1/2/ ( )tr . (And this is “right” only if Σεε is known.)
1-32
OK – now what should you do if you have weak instruments?
Wrong answer: reject the paper.
Size distortion from screening based on first stage F
Isaiah’s will discuss further…
1-33
Estimation – What have we learned/state of knowledge
k = 1
There does not exist an unbiased or asymptotically-unbiased IV estimator
(folk theorem; Hirano and Porter 2015).
Only one moment condition, so weighting (HR) isn’t an issue
LIML=TSLS=IV doesn’t have moments…
Fuller seems to have advantage over IV in terms of “bias” (location) in
simulations (e.g., Hahn, Hausman, Kuersteiner (2004), I. Andrews and
Armstrong 2017) (so should k-class).
If you know a-priori the sign of π, then unbiased, strong-instrument efficient
estimation is possible (I. Andrews and Armstrong 2017)
1-34
Estimation – What have we learned/state of knowledge, ctd.
k > 1
There does not exist an unbiased or asymptotically-unbiased IV estimator
(folk theorem; Hirano and Porter 2015).
The IV estimators that were developed in the 60s-90s (LIML, k-class, double
k-class, JIVE, Fuller) are special to the homoskedastic case, and in general
lose their good properties in the HR case
Different IV estimators place different weights on the moments, and thus in
general have different LATEs
With heterogeneity, the LIML estimand (Fuller too?) can be outside the
convex hull of the LATEs of the individual instruments (Kolesár 2013)
For GMM applications estimating a structural parameter (e.g. New
Keynesian Phillips Curve, etc.), the LATE concerns don’t apply, however
when the moment conditions are nonlinear in θ, things get difficult.
If you know a-priori the sign of π, then unbiased estimation is possible (I.
Andrews and Armstrong 2017)
1-35
References
Andrews, D.W.K. and X. Cheng (2012). “Estimation and Inference with Weak, Semi-Strong, and Strong
Identification,” Econometrica 80, 2153-2211.
Andrews, I. and T. Armstrong (2017). “Unbiased Instrumental Variables Estimation under Known First-Stage
Sign,” Quantiative Economics 8, 479-503.
Hahn, J., J. Hausman, and G. Kuersteiner (2004), “Estimation with weak instruments: Accuracy of higher order
bias and MSE approximations,” Econometrics Journal, 7, 272–306.
Kleibergen, F., and R. Paap (2006). “Generalized Reduced Rank Tests using the Singular Value
Decomposition.” Journal of Econometrics 133: 97–126.
Montiel Olea, J.L. and C.E. Pflueger (2013). “A Robust Test for Weak Instruments,” Journal of Business and
Economic Statistics 31, 358-369.
Nagar, A. L. (1959). “The bias and moment matrix of the general k-class estimators of the parameters in
simultaneous equations,” Econometrica 27: 575–595.
Nelson, C. R., and R. Startz (1990). “Some further results on the exact small sample properties of the
instrumental variable estimator.” Econometrica 58, 967–976.
Nelson, C. R., and R. Startz (2006). “The zero-information-limit condition and spurious inference in weakly
identified models,” Journal of Econometrics 138, 47-62.
Pflueger, C.E. and S. Wang (2015). “A Robust Test for Weak Instruments in Stata,” Stata Journal 15, 216-225.
Sanderson, E. and F. Windmeijer (2016). “A Weak Instrument F-test in Linear IV Models with Multiple
Endogenous Variables,” Journal of Econometrics 190, 212-221.
Staiger, D., and J. H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65:
557–86.
1-36
Stock, J.H. and M. Yogo (2005). “Testing for Weak Instruments in Linear IV Regression,” Ch. 5 in J.H. Stock
and D.W.K. Andrews (eds), Identification and Inference for Econometric Models: Essays in Honor of
Thomas J. Rothenberg, Cambridge University Press, 80-108.
3: Robust Inference with Weak Instruments
July 22, 2018
The Story So Far...
Conventional t-test-based confidence intervals can under-covertrue parameter value when instruments are weakEffective First-stage F-statistic provides a guide to bias
But screening applications on F-statistics can induce sizedistortions
This section: identification-robust confidence setsEnsure correct coverage regardless of instrument strengthNo need to screen on first stage
Avoids pretesting biasAvoids throwing away applications with valid instruments justbecause weakConfidence sets can be informative even with weakinstruments
Reminder: Normal Model
To discuss these issues, continue to consider the normal model(δπ
)∼ N
((δπ
),Σ
)where
δ is the reduced-form OLS coefficientπ is the first-stage OLS coefficientΣ is known
IV model implies δ = πβ
Negative ResultInitial Question: can we obtain correct coverage by adjustingour standard errors?
Confidence interval[β ± b
(δ, π)]
for some b (·, ·)
Answer: no (unless b(δ, π)can be infinite)
Gleser and Hwang (1989) and Dufour (1997) show that forany robust confidence set CS with coverage 1− α,
Prβ,π β ∈ CS ≥ 1− α for all β, π,
we must have
Prβ,π CS has infinite length > 0 for all β, π
Inutition: in case with π = 0, must cover every value β withprobability 1− α
Adjusting our (finite) standard errors isn’t enough: needalternative approach
Test Inversion
Leading alternative: test inversionIdea: Define a family of tests φ (·) where
φ (β0) test for H0 : β = β0φ (β0) = 1 if reject H0, 0 otherwise
Suppose φ (β0) has size α for all β0, i.e.
Eβ0,π [φ (β0)] ≤ α for all β0, π
If we form CS by collecting the non-rejected values
CS = β : φ (β) = 0
then CS has coverage 1− αCalled test inversion
Hence, to form an identification-robust confidence set, we onlyneed to form identification-robust tests of H0 : β = β0
Restriction Implied By IV Model
To implement test inversion, need to find a testTo construct robust test, use restrictions that hold regardlessof instrument strength
IV model implies that δ − πβ = 0
Under H0 : β = β0,
δ − πβ0 ∼ N (0,Ω (β0))
forΩ(β0) = Σδδ − β0(Σδπ + Σπδ) + β2
0Σππ
Holds regardless of instrument strength
AR Statistic
Building on this observation, can introduce AR statistic
AR (β0) =(δ − πβ0
)′Ω (β0)−1
(δ − πβ0
)Originally introduced by Anderson and Rubin (1949) forhomoskedastic normal caseHere, generalization to non-homoskedastic case
Under H0 : β = β0, AR (β0) ∼ χ2k for all π
Recall k = dim (Zi )
AR test φAR (β0) = 1AR (β0) > χ2
k,1−α
χ2k,1−α the 1− α quantile of a χ2
k distribution
AR Confidence set CSAR =β : AR (β) ≤ χ2
k,1−α
AR test and CS fully robust to weak instruments
The Form of AR Confidence Sets
CSAR can behave in counterintuitive waysIn just-identified setting (k = dim (X ) = 1) can take form of:
bounded interval: CSAR = [a, b]real line: CSAR = (−∞,∞)real line excluding bounded interval: CSAR = (−∞, a] ∪ [b,∞)
In over-identified settings can also be empty, CSAR = ∅In overidentifed non-homoskedastic settings, can takeadditional forms
The Form of AR Confidence Sets
Infinite confidence sets strange-looking...but have natural explanation
Unboundedness consistent with Gleser and Hwang (1989),Dufour (1997)Moreover, can show that
limβ0→±∞
AR (β0) = π′Σ−1ππ π = k · FR
Implies that CSAR unbounded if and only if first-stage F-testcannot reject π = 0 at level α
Unbounded AR confidence sets arise only when cannot rejectthat model totally unidentified
The Form of AR Confidence Sets
Empty confidence sets more awkwardArise from fact that AR tests H0 : δ = πβ0. Can bedecomposed into
Parameteric restriction β = β0Overidentifying restrictions δ ∝ π if k > 1
If k = 1, no overidentifying restrictions to test
Empty AR confidence sets can be interpreted as a rejection ofthe overidentifying restrictions
Unfortunate feature: how to interpret a small CS?Confidence sets non-empty with probability one injust-identified case
Optimality of AR in Just-Identified Models
In just-identified case with single endogenous regressor, AR isoptimal
101 out of 230 specifications in our AER sample arejust-identified with a single endogenous regressor
Moreira (2009) shows that AR test uniformly most powerfulunbiasedAR equivalent to two-sided t-test when instruments are strongIn just-identified settings, strong case for using AR CS
Optimal among CS robust to weak instrumentsNo loss of power relative to t-test if instruments strong
AR Tests in Applications
To examine practical impact of using CSAR , return to our AERsampleLimit attention to just-identified specifications with singleendogenous regressor where can estimate variance-covariancematrix of
(δ, π)
Yields 36 specificationsComparing 95% t and AR confidence sets, find infinite AR CSin two cases. In remaining cases:
AR confidence sets 56.5% longer on average in allspecifications20.3% longer on average in specifications that report F>100.04% longer on average in specifications that report F>50
AR Tests in Overidentified Models
Strong argument for using AR in just-identified settingsAR tests and CS perform worse in over-identified settings
As already noted, CSAR may be emptyAlso inefficient under strong instruments
Tests violations of H0 : β = β0, and of overidentifyingrestrictionsIf only care about parametric restrictions, “wastes” degrees offreedom
Improving Efficiency in Over-identified Settings
To obtain efficiency under strong instruments, need alternativetestsFor example, tests based on t-statistic
t (β0) =
∣∣∣β − β0
∣∣∣σβ
Problem: distribution of t (β0) under H0 : β = β0 depends onπ
Already know this: distribution of t-statistic depends oninstrument strengthSince π unknown, not clear what critical values to use witht (β0)
Conditional Critical Values
Moreira (2003): conditional critical valuesOriginally for homoskedastic case. Here discuss generalization
Idea: Find a sufficient statistic D (β0) for π under H0 : β = β0
Means conditional distribution of t (β0) |D (β0) doesn’t dependon π under H0Once condition on D (β0) , can compute data-dependentcritical values cα (D (β0))
Question: how to find D (β0)
Conditional Critical ValuesIdea for sufficient statistic: separate parts of
(δ, π)that
do/don’t depend on πDefine
g (β) = δ − πβLet
D(β) = π − (Σπδ − Σππβ) Ω(β)−1g(β),
denote π orthogonalized with respect to g (β)Under H0 : β = β0(
g(β0)D(β0)
)∼ N
((0π
),
(Ω (β0) 0
0 Ψ(β0)
))Conditional distribution of g(β0) given D (β0) doesn’t dependon π under H0 : β = β0
g (β0) |D (β0) ∼ N (0,Ω (β0))
(g(β0),D (β0)) one-to-one transformation of(δ, π)
⇒D (β0) is sufficient statistic for π
Conditional Critical Values
To construct conditional distribution of t (β0) |D (β0):1 Fix D (β0) at observed value2 Repeatedly draw g∗ (β0) ∼ N (0,Ω (β0))
3 Construct(δ∗, π∗
)from (g∗ (β0) ,D (β0))
4 Calculate t∗ (β0) based on(δ∗, π∗
)Conditional critical value cα (D (β0)): 1− α quantile of t∗ (β0)
Conditional Critical Values
Conditional t-test that rejects when t (β0) exceeds cα (D (β0))
φ (β0) = 1 t (β0) > cα (D (β0))
is fully robust to weak instruments,
Eβ0,π [φ (β0)] = α for all π
Conditioning not specific to t (β0), works for any statistics (β0)
In each case construct data-dependent critical valuecα (D (β0))Yields tests that control size
Question what statistic s (β0) to use
Alternative Test StatisticsMany possible choices of statistic s (β0)
t-statistic
t (β0) =
∣∣∣β − β0
∣∣∣σβ
Score statistic (Kleibergen 2002, 2005)
K (β0) = g (β0)′Ω (β0)−1 D (β0)×(D (β0)′Ω (β0)−1 D (β0)
)−1D (β0)′Ω (β0)−1 g (β0)
AR statistic
AR (β0) = g (β0)′Ω (β0)−1 g (β0)
LR statistic
LR (β0) = 2(maxβ,π
` (β, π)−maxπ` (β0, π)
)
Properties
Different test statistics imply different cα (D (β0))
For t and LR, need data-dependent critical valuesConditional distributions AR (β0) |D (β0) and K (β0) |D (β0)don’t depend on D (β0)
Can use χ2k and χ2
1 critical values, respectively
Conditional t, K, and LR tests efficient under stronginstruments
AR test inefficient in overidentified modelsUnder weak instruments, also yield different power properties
Conditional two-sided t-test poor powerK test sometimes poor powerConditional LR (CLR) test performs well (in homoskedasticcase)
See D. Andrews Moreira and Stock (2006), (2007) Plots
Near-Optimality of CLR Test
D. Andrews, Moreira, and Stock (2006) show that CLR testnear-optimal
In homoskedastic case with single endogenous regressor
Power close to upper bound for a natural class of tests over awide range of parameter valuesConsensus in literature that CLR is a good test forhomoskedastic settings
... but homoskedasticity assumption unappealing
Tests for Non-Homoskedastic ModelsVariety of CLR extensions for non-homoskedastic case
D. Andrews Moreira and Stock (2004), Kleibergen (2005), D.Andrews and Guggenberger (2015), I. Andrews (2016), I.Andrews and Mikusheva (2016)All efficient with strong instruments, but only simulationevidence on power with weak instruments
Alternative: tests proposed that maximize weighted averagepower
Maximize integral of power function with respect to someweightsMoreira and Moreira (2015), Montiel Olea (2017), Moreira andRidder (2018)Question: what are “right” weights?
Many options, but so far no consensus on what tests should beused in over-identified and non-homoskedastic models
In just-identified setting, use ARIn over-identified settings, use something that’s efficient understrong instruments
Two-Step Confidence Sets
Robust confidence sets not widely used in practiceWhen reported, usually only after authors find evidence ofweak instrumentsIn AER sample, reported in 2 papers. Minimal first-stage F of2.3 and 6.3, respectively
If only report robust confidence set when F small, can view asconstructing confidence set in two steps
If F ≥ 10, report t-statistic confidence setIf F < 10, report robust CS
Screening applications on first-stage F can generate very badbehavior. Does two-step CS do the same?
Positive results for FN in homoskedastic case based on Stockand Yogo (2005)Negative result for FN in non-homoskedastic case based onMontiel Olea and Pflueger (2013), for FR with conventionalcritical values based on I. Andrews (2018)
Negative results based on extreme forms ofnon-homoskedasticity: open question how bad in practice
Implementation
Implementations of some weak-IV tests are in Stata packageweakiv, available on SSC
Finlay, Magnusson, and SchafferVersions of CLR, AR, K, and other tests applicable tonon-homoskedastic modelsCan be used with fixed effects, clustered standard errors, etc.Stata Journal article on previous version of package: Finlayand Magnusson (2009)
Summary
A number of tests and confidence sets are available that arefully robust to weak instruments
Avoid pretesting bias, discarding applicationsMany efficient under strong instruments
In just-identified models, strong case for using AR CSCovers many applications
In over-identified models, less clearCLR if assume homoskedasticNo consensus for non-homoskedastic case...other than using something efficient under stronginstruments
ReferencesAnderson T, Rubin H. 1949. Estimators for the parameters ofa single equation in a complete set of stochastic equations.Annals of Mathematical Statistics 21:570–582Andrews D, Guggenberger P. 2015. Identification- andsingularity-robust inference for moment condition models.Unpublished ManuscriptAndrews D, Moreira M, Stock J. 2004. Optimal invariantsimilar tests of instrumental variables regression. UnpublishedManuscriptAndrews D, Moreira M, Stock J. 2006. Optimal two-sidedinvariant similar tests of instrumental variables regression.Econometrica 74:715–752Andrews D, Moreira M, Stock J. 2007. Performance ofconditional wald tests in iv regression with weak instruments .Econometrica 74:715–752Andrews I. 2016. Conditional linear combination tests forweakly identified models. Econometrica 84:2155–2182
ReferencesAndrews I, Mikusheva A. 2016. Conditional inference with afunctional nuisance parameter. Econometrica 84:1571–1612.Dufour J. 1997. Some impossibility theorems in econometricswith applications to structural and dynamic models.Econometrica 65:1365–1387Finlay K, Magnusson L. 2009. Implementing weak-instrumentrobust tests for a general class of instrumental-variablesmodels. Stata Journal 9, 398-421Geser L, Hwang J. 1987. The nonexistence of 100(1-α)%confidence sets of finite expected diameter inerrors-in-variables and related models. Journal of the AmericanStatistical Association 15:1341–1362Kleibergen F. 2002. Pivotal statistics for testing structuralparameters in instrumental variables regression. Econometrica70:1781–1803Kleibergen F. 2005. Testing parameters in gmm withoutassuming they are identified. Econometrica 73:1103–1123
References
Montiel Olea J. 2017. Admissible, similar tests: Acharacterization. Unpublished ManuscriptMontiel Olea J, Pflueger C. 2013. A robust test for weakinstruments. Journal of Business and Economic Statistics31:358–369Moreira H, Moreira M. 2015. Optimal two-sided tests forinstrumental variables regression with heteroskedastic andautocorrelated errors. Unpublished ManuscriptMoreira M. 2003. A conditional likelihood ratio test forstructural models. Econometrica 71:1027–1048Moreira M. 2009. Tests with correct size when instrumentscan be arbitrarily weak. Journal of Econometrics 152:131–140Moreira M, Ridder G. 2017. Optimal invariant tests in aninstrumental variables regression with heteroskedastic andautocorrelated errors. Unpublished Manuscript
Power Comparisons
Figure: Power of AR, K, and C LR tests in homoskedastic case (from D.Andrews, Moreira, and Stock (2006))
Power Comparisons
Figure: Power of Conditional t-and LR-tests in homokedastic (from D.Andrews, Moreira, and Stock (2007)) Return
4: Open Issues and Recent Research
July 22, 2018
Outline
Two goals for this section1 Examine practical importance of issues covered so far
Simulations calibrated to specifications published in AER2 Discuss other open questions and recent research on weak
instruments
AER Simulation Specifications
To assess practical importance of weak instrument issues,calibrate simulations to AER dataSpecifications from AER articles (excluding Papers andProceedings) from 2014-2018 that:
1 Published in main text2 Allow us to estimate variance matrix Σ of
(δ, π)
Mostly papers with replication dataIn one other case, back out Σ from published results
Yields 124 specifications from 8 papersAll specifications have a single endogenous variable
Simulation Design
To focus attention on weak instrument issues, simulations usenormal model (
δπ
)∼ N
((δπ
),Σ
)with Σ known, δ = πβ
Simulations fix β, π, and Σ at estimated valuesAbstracts away from:
Non-normality of δ, πEstimation error in ΣWill return to these later
Any disortions must arise from weak instruments
Distribution of t-Statistics
Theoretical results show t-tests can perform poorly wheninstruments weak
Distribution of t-statistics may not be centered at zeroRejection probability of 5% t-tests may be much larger
In each of our 124 AER specifications simulateMedian t-statistic
Med
(β − β0
σβ
)Size of 5% t-tests
Pr
∣∣∣β − β0
∣∣∣σβ
> 1.96
Plot against average effective first-stage F-stat
Little action for E[FE]> 50. Limit plot to E
[FE]≤ 50
Includes 106 of 124 specifications
Distribution of t-Statistics
0 5 10 15 20 25 30 35 40 45 50
Average Effective F-Statistic
-1.5
-1
-0.5
0
0.5
1
1.5
Media
n t-S
tatistic
Distribution of t-Statistics
0 5 10 15 20 25 30 35 40 45 50
Average Effective F-statistic
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5S
ize o
f 5%
t-T
est
Distribution of t-Statistics
Weak instrument issues apparent in some specificationsMedian t-statistic far from zeroSize of nominal 5% t-test much larger than 5%
Problems limited to specifications with a small averageeffective F-stat
No large distortions in specifications with E[F E]> 10
Population rule of thumb seems to work pretty wellNot a theorem!
Weak instrument issues appear relevant for some recently-publishedspecifications, but only in cases with E
[FE]small
Screening on F-Statistics
Given that average effective F-statistics seem to capture weakinstrument issues, tempting to screen applications on F
e.g. only pursue applications with F E > 10
Distribution of F-statistics in AER sample suggests may becommonAs already discussed, can introduce size distortionsExamine effect in our AER specifications
For each specification, calculate size of 5% t-test conditionalon F E > 10
Screening on F-Statistics
0 5 10 15 20 25 30 35 40 45 50
Average Effective F-Statistic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1S
ize o
f 5%
t-T
est, S
cre
ened o
n E
ffective F
>10
Screening on F-Statistics
Screening leads to much larger size for some specificationsNot specific to F E , same issues appear for FN , FR
Not specific to threshold of 10: if move threshold, getdistortions in neighborhood of new cutoff
Screening on F-statistics can make published results less reliable
Robust Confidence Sets
Rather than screening applications on FE , can compute robustconfidence sets
Guaranteed to have correct coverage regardless of instrumentstrength
For illustration, here consider Anderson-Rubin (AR)Plot size of AR tests in AER specifications
Robust Confidence Sets
0 5 10 15 20 25 30 35 40 45 50
Average Effective F-statistic
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5S
ize o
f 5%
Anders
on-R
ubin
Test
Robust Confidence Sets
AR size is flat at 5% regardless of instrument strengthAR also efficient in just-identified case
For over-identified models, variety of robust tests andconfidence sets available
Many ensure efficiency in strongly-identified caseAll again ensure correct size regardless of instrument strength
Robust confidence sets eliminate size distortions from weakinstruments
Two-Step Confidence Sets
Robust confidence sets currently little-used in practiceWhen used, often because weak identification is suspected
When used in this way, can be viewed as alternative toscreening on F-statistic. For example
If F E ≥ 10, report t-statisticIf F E < 10, report AR
Alternatively, could use Montiel Olea and Pflueger (2013)critical valuesMay introduce size distortions, but not clear how large
Examine performance in our AER specifications
Two-Step Confidence Sets
0 5 10 15 20 25 30 35 40 45 50
Average Effective F-statistic
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5S
ize o
f 5%
Tw
o-S
tep T
est w
ith c
uto
ff 1
0
Two-Step Confidence Sets
Observe some size distortions for specifications withE[FE]≈ 10
True size never above 10% in these simulationsResults similar if instead use Montiel Olea and Pflueger (2013)critical values
Also not a theorem!
Deciding to use a robust confidence set based on FE isn’ttheoretically guaranteed to work, but cost appears small in our AERspecifications
Summary of Simulation Results
1 Weak instruments appear to be a problem in some publishedspecifications
2 Bad behavior largely limited to specifications with E[FE]< 10
3 Screening on FE can amplify problems4 Robust confidence sets eliminate size distortions5 Choosing whether to report robust CS based on FE introduces
some distortions, but small
Questions from Simulations: Performance of F-Statistics
Simulation results suggest some questions1 Theoretical justification for FE in Montiel Olea and Pflueger
(2013) only concerns bias. Appears to also diagnose sizeproblems reasonably well. Can this be formalized?
2 Two-step confidence sets based on FE work reasonably well insimulations. Can this be formalized?
Questions from Simulations: Normal Approximation
Our simulations take(δ, π)to be normally distributed with
known varianceFocuses attention solely on distortions from weak instruments
Results from Young (2018) suggest may be problematicBased on sample of papers from AEA journals (larger than,but overlaps with, our AER sample)
Young (2018) finds that1 A small number of observations have a large influence on
estimates and p-values2 Variance estimates Σ often extremely noisy in simulation3 As a result of noisy estimates Σ, AR tests can have large size
distortions in over-identified settings
Further exploration of interaction between weak instrumentsand issues discussed by Young (2018) of considerable interest
Other Research: Subvector Inference
Some applications have more than one endogenous regressor19 out of 230 specifications in AER sample
Most tests previously discussed extend to tests of dim (X ) × 1vector β in settings with multiple endogenous variables
Imply joint confidence sets for full vectorJoint confidence sets rarely reported in strong-instrumentsettings. Instead, usually report e.g. estimates and standarderrors for each element of β separately
Write β = (β1, β2) , and want confidence set for β1 aloneIf assume instruments strong for β2, simple solution
“Strong for β2” meaning strong if treat β1 as knownPlug appropriate estimate β2 (β1) into robust test statistics(see e.g. Stock and Wright (2000))
If instruments weak for β2, hard problem
Other Research: Subvector Inference
One option projection methodForm joint confidence set for (β1, β2), and collect implied setof values for β1Can have very low power
Several recent papers seeking to improve power of projectionmethod
Smaller critical values for AR statistic in homoskedastic case:Guggenberger et al. (2012)Modified projection approach to improve power inwell-identified case: Chaudhuri and Zivot (2011), D. Andrews(2017)
Active area of research
Subvector Inference: Implementation
Stata package weakiv canCompute joint confidence sets for βPlug in estimates for strongly identified β2Implement projection method
Package twostepweakiv (also on SSC/Github) implementsrefined projection based on Chaudhuri and Zivot (2011)
Stata Journal article: Sun (Forthcoming)Nearly-efficient inference on β1 under strong identificationAlso implements two-step CS with guaranteed coverage
Other Research: Nonlinear Models
All the results discussed for IV apply directly to linear GMMGMM moments linear in parameters
Many (though not all) generalize to nonlinear GMMNo known analog of first-stage F-statistic
Alternative for approach detecting weak identification: I.Andrews (2018)
Many procedures for robust inference, e.g. Stock and Wright(2000), Kleibergen (2005), D. Andrews and Guggenberger(2015), I. Andrews and Mikusheva (2016)
The End
Thank you!
ReferencesAndrews D. 2017. Identification-robust subvector inference.Unpublished ManuscriptAndrews D, Guggenberger P. 2015. Identification- andsingularity-robust inference for moment condition models.Unpublished ManuscriptAndrews I. 2018. Valid two-step identification-robustconfidence sets for gmm. Review of Economics and Statistics100:337–348Andrews I, Mikusheva A. 2016. Conditional inference with afunctional nuisance parameter. Econometrica 84:1571–1612.Chaudhuri S, Zivot E. 2011. A new method ofprojection-based inference in gmm with weakly identifiednuisance parameters. Journal of Econometrics 164:239–251Guggenberger P, Kleibergen F, Mavroeidis S, Chen L. 2012.On the asymptotic sizes of subset anderson–rubin and lagrangemultiplier tests in linear instrumental variables regression.Econometrica 80:2649–2666
References
Kleibergen F. 2005. Testing parameters in gmm withoutassuming they are identified. Econometrica 73:1103–1123Montiel Olea J, Pflueger C. 2013. A robust test for weakinstruments. Journal of Business and Economic Statistics31:358–369Sun L. Forthcoming. Implementing valid two-stepidentification-robust confidence sets for linearinstrumental-variables models. Stata JournalStock J, Wright J. 2000. Gmm with weak identification.Econometrica 68:1055–1096Young A. 2018. Consistency without inference: Instrumentalvariables in practical application. Unpublished Manuscript