Weak Instruments and What To Do About Them · 1-2 Overview and Summary Topic: IV regression with a...

1-1

NBER Summer Institute 2018

Methods Lectures

Weak Instruments and What To Do About Them

Isaiah Andrews, Harvard University

James H. Stock, Harvard University

July 22, 2018 (updated July 25, 2018)

3-4:20pm 1. Weak instruments in the wild

2. Detecting weak instruments

Stock

Stock

4:20-4:40pm Break

4:40-6pm 3. Inference with weak instruments

4. Open issues and recent research

Andrews

Andrews

1-2

Overview and Summary

Topic: IV regression with a single included endogenous regressor, control

variables, and non-homoskedastic errors.

This covers heteroskedasticity, HAC, cluster, etc.

We assume that consistent robust SEs exist for the reduced form & first stage

regressions.

Early literature (through ~2006): homoskedastic case

This mini-course focuses on weak instruments in the non-homoskedastic

case (i.e., the relevant case).

Outline

1) So what?

2) Detecting weak instruments

3) Estimation (brief)

4) Weak instrument-robust inference about parameter of interest (β)

5) Extensions

1-3

So what? (1) Theory

An instrumental variable is weak if its correlation with the included

endogenous regressor is small.

1. “small” depends on the inference problem at hand, and on the sample size

With weak instruments, TSLS is biased towards OLS, and TSLS tests have

the wrong size.

Distribution of the TSLS

t-statistic (Nelson-Startz

(1990a,b))

Dark line = irrelevant

instruments

dashed light line = strong

instruments

intermediate cases = weak

instruments

1-4

So what? (2) Simulation

DGP: 8 AER papers 2014-2018 (Sample: 17 that use IV; 16 with a single X; 8 in simulation sample)

Median of TSLS t-statistic under the null

1-5

So what? (3) Practice (the “in the wild” bit)

Histogram of first-stage Fs in AER papers (108 specifications), 2014-2018

The first-stage F tests the

hypothesis that the first-stage

coefficients are zero.

Of the 17 papers, all but 1 report

first-stage Fs for at least one

specification; the histogram is of

the 108 specifications that report

a first-stage F (72 of which are

<50 and are in the plot).

Great that

authors/editors/referees are

aware of the potential

importance of weak

instruments, as evidence by

nearly all papers reporting first stages Fs.

The spike at F = 10 is “interesting”

1-6

Detecting Weak Instruments

It is convenient to have a way to decide if instruments are strong (TSLS “works”)

or weak (use weak-instrument robust methods).

The standard method is “the” first-stage F. Candidates:

FN – nonrobust

FR – robust (HR, HAC, cluster), also called Kleibergen-Paap (2006)

FE – Effective first-stage F statistic of Montiel Olea and Plueger (2013)

Actually there are other candidates too, not used and not to be discussed here

including Hahn-Hausman (2002), Shea’s (1997) partial R2

1-7

Detecting weak instruments in practice

Reported first-stage F’s: what authors say they use

Candidates: FN – nonrobust



1-8

Detecting weak instruments in practice, ctd

Actual first-stage F’s: what authors actually use

Candidates: FN – nonrobust



1-9

Our recommendations (1 included endogenous regressor)

Do:

o Use the Montiel Olea-Pflueger (2013) effective first-stage F statistic

FEff = FN × correction factor for non-homoskedasticity

o Report FEff

o Compare FEff to MOP critical values (weakivtest.ado), or to 10.

o If FEff ≥ MOP critical value, or ≥ 10 for rule-of-thumb method, use TSLS

inference; else use weak-instrument robust inference.

Don’t

o use/report p-values of test of π = 0 (null of irrelevant instruments)

o use/report nonrobust first stage F (FN)

o use/report usual robust first-stage F (except OK for k = 1 where FR = FEff)

o use/report Kleibergen-Paap (2006) statistic (same thing).

o compare HR/HAC/Kleibergen-Paap to Stock-Yogo critical values

o reject a paper because FEff < 10!

Instead, tell the authors to use weak-IV robust inference.

1-10

Notation and Review of IV Regression

IV regression model with a single endogenous regressor and k instruments

1i i i iY X W (Structural equation) (1)

2i i i iX Z W V (First stage) (2)

where W includes the constant. Substitute (2) into (1):

3i i i iY Z W U (Reduced form) (3)

where δ = πβ and i i iU V .

OLS is in general inconsistent: 2

ˆ pOLS X

X

.

β can be estimated by IV using the k instruments Z.

By Frisch-Waugh, you can eliminate W by regressing Y, X, Z against W and

using the residuals. This applies to everything we cover in the linear model

so we drop W henceforth.

1-11

Setup: i i iY X (Structural equation) (1)

i i iX Z V (First stage) (2)

, i i iY Z U , ε = U – βV. (Reduced form) (3) __________________________________________________________________________________________________________________________________________________________________________________

The two conditions for instrument validity

(i) Relevance: cov(Z,X) ≠ 0 or π ≠ 0 (general k)

(ii) Exogeneity: cov(Z,ε) = 0

The IV estimator when k = 1 (Wright 1926)

cov( , ) cov( , ) cov( , ) cov( , )

cov( , ) by (i)

Z Y Z X Z X Z

Z X

so

cov( , )

by (ii)cov( , )

Z Y

Z X

IV estimator:

1

1

1

1

ˆˆ

ˆ

n

i iIV i

n

i ii

n Z Y

n Z X

1-12

Setup: i i iY X (Structural equation) (1)

i i iX Z V (First stage) (2)

, i i iY Z U , ε = U – βV. (Reduced form) (3) __________________________________________________________________________________________________________________________________________________________________________________

k > 1: Two stage least squares (TSLS) 1

1

1 2

1

1

1

ˆˆ ˆ, where predicted value from first stage

ˆ

ˆ ˆˆ ˆ, where ˆˆ ˆ

n

i iTSLS iin

ii

nZZZZ i ii

ZZ

n X YX

n X

QQ n Z Z

Q

-1

-1

X Z(Z Z) Z Y

X Z(Z Z) Z X

The weak instruments problem is a “divide by zero” problem

cov(Z,X) is nearly zero; or π is nearly zero; or

ˆˆ ˆZZQ is noisy

Weak IV is a subset of weak identification (Stock-Wright 2000, Nelson-

Starts 2006, Andrews-Cheng 2012)

1-13

Statistics for measuring instrument strength

Non-robust: 2

ˆˆ ˆ

ˆ

N ZZ

V

QF n

k

Robust: 1ˆˆ ˆRF

k

MOP Effective F:

2

1/2 1/2 1/2 1/2

ˆ ˆˆ ˆ

ˆ ˆˆ ˆ ˆ ˆ

Eff NVZZ

ZZ ZZ

kQF F

tr Q tr Q

compare to TSLS: ˆ ˆˆˆˆˆ ˆ

TSLS ZZ

ZZ

Q

Q

Intuition

FN measures the right thing ( ZZQ ), but gets the SEs wrong

FR measures the wrong thing (1

), but gets the SEs right

FEff measures the right thing and gets SEs right “on average”

1-14

Distributional assumptions

Setup: i i iX Z V (First stage) (2)

, i i iY Z U , ε = U – βV. (Reduced form) (3)

CLT:

*

ˆ

0, ,ˆ

dn

N

n

Σ* is HR/HAC/Cluster (henceforth, “HR”)

(i) CLT limit holds exactly: 1 *ˆ

, , where ˆ

N n

(ii) Reduced form variance & moment matrices are all known: Σ, QZZ

A lot is going on here!

HR/HAC/cluster variance estimators are consistent

1950s-1970s finite-sample normal (fixed Z’s) literature

1-15

A lot is going on here, ctd

From

*

ˆ

0,ˆ

dn

N

n

to 1 *ˆ

, , where ˆ

N n

Weak IV asymptotics (Staiger-Stock 1997): /C n .

*

1 1

* 1

* 1 2

;

ˆ ˆˆ ˆ ˆ ˆ/

ˆˆ ˆ

ˆˆ ˆ

R

d

k C C

kF n n n

n n n n

n C n C

Limit experiment interpretation (Hirano-Porter 2015)

Uniformity (D. Andrews-Cheng 2012)

1-16

Homework problem

Let k = 2 and 2ˆ

ZZQ I . Suppose 2 2

2 2

0/

0

U UV

UV V

n

.

1) Show that:

a) 1/2 1/2 2 2 2( ) /ZZ Vtr Q n .

b) 2 22 2

1 ,1 2 ,2

1

2

NF z z

c) 1

2

RF z z

d)

2 22 2

1 ,1 2 ,2

2 2

Effz z

F

2) Adopt the weak instrument nesting π = n-1/2C, where C1, C2 ≠ 0. Show that as 2 :

a) “bias” of 2ˆ ˆ= plimTSLS OLS

V V

b) pNF c) pRF

d) 2

1

dEffF

3) Discuss

1-17

Work out the details for k = 1 first.

Preliminaries:

(a) Use distributional assumption (i)

1 *ˆ

, , where ˆ

N n

to write,

ˆ

ˆ

, where 0,N

(b) Connect to the structural regression:

ˆ ˆ

, where

-1(Z Z) Z ε

(c) Standardize:

1/2

1/2

ˆ ~ z

z

, where 1/2

and 1

~ 0,1

zN

z

(d) Project & orthogonalize:

z z , where 2~ (0,1 ), vN z ,

1-18

What parameter governs departures from usual asymptotics (k = 1)?

1/2

1/2

ˆˆ

ˆ

ˆˆ ˆ( )ˆ add and subtract

ˆ

use representations in (a) and (b)

standardize using representation in (c)

IV

z

z

z

z z

using projection (d)

“bias” “noise”

Parameter measuring instrument strength (k = 1) is 2 2

1-19

“Bias” part of IV representation

ˆ IV z

z

, where 1/2

Instrument strength depends on λ2

Strong instruments: 2 , usual asymptotic distribution

Irrelevant instruments: π = 0 so λ = 0: 1/2

1/2ˆ IV

z

~ Cauchy centered at

o In homoskedastic case, 2ˆplim OLSV

V

In the homoskedastic case, λ2 = the concentration parameter (old Edgeworth

expansion/finite sample distribution literature)

1-20

Instrument strength, k = 1, ctd.

How big does λ need to be? A “bias” heuristic:

2

2 2

ˆ

/

1 /

11 ...

IVE zE

z

zE

z

z z zE E

For bias, relative to unidentified case, to be <0.1, need λ2 > 10.

But we don’t know λ! So, we need a statistic with a distribution that depends

on λ, which we can use to back out an estimate/test/rule of thumb.

This is the Nagar (1959) expansion for the bias

How do the three candidate first-stage Fs fare?

1-21

Distributions of the three first-stage Fs, k = 1

First note that, when k = 1, FR = FEff:

2

1/2 1/2

ˆˆ ˆ ˆ

ˆˆˆ ˆ

Eff RZZ

ZZ

QF F

tr Q

Distributions

2

22 2

1;

ˆ, ( )

ˆEff R

vF F z

*22

2 22

ˆˆ ˆ ˆ( )

ˆˆ ˆ

N ZZ

V V ZZV ZZ

QF n z

QnQ

Implications

FR, FEff can be used for inference about λ2 when k = 1

Estimation: 2 2 1eff

VEF E z , so 2ˆ 1EffF

Testing: H0: “bias” ≤ 0.1. Reject H0 if FEff > critical value.

Rule of thumb: Feff < 10 will detect weak IVs with probability that increases

as λ2 gets smaller

1-22

Implications, ctd. *

2

2( )N

V

V ZZ

F zQ

2

,Eff R

VF F z

FN is misleading in the HR case.

Suppose *

is large (i.e., first stage HR SEs are a lot bigger than NR SEs)

2

* *2 2

2 2 1;( )N

V

V ZZ V ZZ

F zQ Q

where λ2 = π2/Σππ. For *

large, 2 0 , and *

2

12~N

V ZZ

FQ

i.e., Instruments are in the limit irrelevant – but NF .

In the k = 1 case, FR = FEff. These differ in the k > 1 case, where FEff is preferred.

1-23

Homework problem

Let k = 2 and 2ˆ

ZZQ I . Suppose 2 2

2 2

0/

0

U UV

UV V

n

.

1) Show that:

a) 1/2 1/2 2 2 2( ) /ZZ Vtr Q n .

b) 2 22 2

1 ,1 2 ,2

1

2

NF z z

c) 1

2

RF z z

d)

2 22 2

1 ,1 2 ,2

2 2

Effz z

F


a) “bias” of 2ˆ ˆ~ = plimTSLS OLS

V V

b) pNF c) pRF

d) 2

1

dEffF

3) Discuss

1-24

Homework problem solution

Let k = 2 and 2ˆ

ZZQ I . Suppose 2 2

2 2

0/

0

U UV

UV V

n

and π1, π2 ≠ 0

1(a) Direct calculation: 1/2 1/2 2 2 2( ) /ZZ Vtr Q n

1(b)-(d): We have already done the work to get the expressions below following

“~”, and the final expressions come from substitution of QZZ and Σ:

1/2 1/2

2 2

2 22 2

1 ,1 2 ,2

ˆˆˆ ˆ(b)

1

2

zzN zz

V V

z n Q zn QF

k k

z z

1ˆˆ ˆ 1(c)

2

V VR z zF z z

k k

1/2 1/2

1/2 1/2 1/2 1/2

2 22 2

1 ,1 2 ,2

2 2

ˆˆˆ ˆ(d)

ˆˆ ˆ

zzEff ZZ

ZZ ZZ

z Q zQF

tr Q tr Q

z z

1-25

Homework problem solution, ctd.


a) “bias” of 2ˆ ˆ~ plimTSLS OLS

V V

Last part first: 2 2ˆplim OLS

X X V V because π = n1/2C.

Next obtain the expression (several tedious steps),

“Bias” part ( )ˆ

( ) ( )

TSLS z HRz

z H z

where 2

1/2 1/2 2

2

0/

0ZZ VH Q n

and

1/2 1/2

22

V

V

R I

.

For the weak instrument nesting, /12

2

1/2 2

2

11

1/2 1

2

0/

0

/0/

/0

V

V

V

V

n

Cn

C

1-26


Now substitute these expressions for λ, H, and R into the “bias” part:

2

2

22

2

1 1

1 ,1 ,1 2 ,2 ,2

2 1 2 2

1 ,1 2 ,2

1

2

0( )

0ˆ0

( ) ( )0

( / ) ( / )

( / ) ( / )

1 ( )

TSLS V

V

V V V

V V V

Vp

V

z z

z z

C z z C z z

C z C z

O

1-27


Remaining parts by substitution and taking limits:

2 22 2

1 ,1 2 ,2

22 1 2 2

1 ,1 2 ,2 1

1(b)

2

1 1/ / ( )

2 2

N

V V p

F z z

C z C z O

2 21

1 ,1 2 ,2

222

2

1(c)

2

1/ /

2

1( )

2

R

V V

p

V

F z z

C z C z

CO

2 2

,1 2 1 2

,1 12 2 2 2

( )(d) ( ) ~

NpEff

p

z OFF z O

3) Discuss

1-28

OK, FEff – but what cutoff?

1/2 1/2

1/2 1/2, where

( )

Eff ZZ

ZZ

QF z H z H

tr Q

~ weighted average of noncentral χ2’s – depends on full matrix H,

0 ≤ eigenvalues (H) ≤ 1

Hierarchy of options

1. Testing approach: test null of H ≥ some threshold (e.g. 10% bias)

a) (MOP Monte Carlo method) Given H , compute cutoff H ; critical value

by simulation

b) (MOP Paitnik-Nagar method) Approximate weighted average of noncentral

χ2’s by noncentral χ2; compute cutoff value of H using Nagar

approximation to the bias, with some maximal allowable bias. Implemented in

weakivtest.ado.

c) (MOP simple method) Pick a maximal allowable bias (or size distortion) and

use their “simple” critical values (based on noncentral χ2 bounding

distribution). These are simple, but conservative.

2. Consistent sequence approach: “Weak” if FEff < n , n (but what is n ?)

3. Rule-of-thumb approach: “Weak” if FEff < 10

1-29

k=1 case, additional comments about FEff and FR

ˆ IV z

z

, where 1/2

0

1/22

ˆ

ˆ( )1 2

IVIV

IV

zt

SEz z

z z

, where

1/2

R EffF F z z

By maximizing over ρ you can find worst case size distortion for usual IV t-

stat testing β0. This depends on λ, which can be estimated from FR = FEff.

These are the same expressions, with different definition of λ, as in

homoskedastic case (special to k = 1)

Critical values for k = 1 – two choices:

Nagar bias ≤ 10%: 23 (5% critical value from 2

2

1; 0

) (MOP)

Maximum tIV size distortion of 0.10: 16.4; of 0.15: 9.0

But with k = 1 there are fully robust methods that are easy and have very

strong theoretical properties (AR) (Lecture 3).

1-30

Detecting weak instruments with multiple included endogenous regressors

Methods are based on multivariate F: Cragg-Donald statistic and robust variants

Nonrobust:

o Minimum eigenvalue of Cragg-Donald statistic, Stock-Yogo (2005)

critical values

o Sanderson-Windmeijer (2016)

HR: Main method used is Kleibergen-Paap statistic, which is HR Cragg-

Donald.

o But recall that this doesn’t work (theory) for 1 X, and having multiple X’s

doesn’t improve things.

MOP Effective F: Hasn’t been developed.

More work is needed….

1-31

What if you plan to use efficient 2-step GMM, not TSLS?

Everything above is tailored to TSLS!

Suppose that, if you have strong instruments, you use efficient 2-step GMM: 1

1

ˆˆˆˆˆˆ ˆ

GMM

, where

2(1)

1

1ˆ ˆn

i i i

i

Z Zn

where (1)

i is the residual from a first-stage estimate of β, e.g. TSLS.

Things get complicated because the first step (TSLS) isn’t consistent with

weak instruments.

o ˆ converges in distribution to a random limit

o If were known (infeasible),

1/2 1/2

1/2 1 1/2

ˆGMM z z

z z

In general none of the F’s discussed so far get at the right object, 1/2 1 1/2 1/2 1 1/2/ ( )tr . (And this is “right” only if Σεε is known.)

1-32

OK – now what should you do if you have weak instruments?

Wrong answer: reject the paper.

Size distortion from screening based on first stage F

Isaiah’s will discuss further…

1-33

Estimation – What have we learned/state of knowledge

k = 1

There does not exist an unbiased or asymptotically-unbiased IV estimator

(folk theorem; Hirano and Porter 2015).

Only one moment condition, so weighting (HR) isn’t an issue

LIML=TSLS=IV doesn’t have moments…

Fuller seems to have advantage over IV in terms of “bias” (location) in

simulations (e.g., Hahn, Hausman, Kuersteiner (2004), I. Andrews and

Armstrong 2017) (so should k-class).

If you know a-priori the sign of π, then unbiased, strong-instrument efficient

estimation is possible (I. Andrews and Armstrong 2017)

1-34

Estimation – What have we learned/state of knowledge, ctd.

k > 1

There does not exist an unbiased or asymptotically-unbiased IV estimator

(folk theorem; Hirano and Porter 2015).

The IV estimators that were developed in the 60s-90s (LIML, k-class, double

k-class, JIVE, Fuller) are special to the homoskedastic case, and in general

lose their good properties in the HR case

Different IV estimators place different weights on the moments, and thus in

general have different LATEs

With heterogeneity, the LIML estimand (Fuller too?) can be outside the

convex hull of the LATEs of the individual instruments (Kolesár 2013)

For GMM applications estimating a structural parameter (e.g. New

Keynesian Phillips Curve, etc.), the LATE concerns don’t apply, however

when the moment conditions are nonlinear in θ, things get difficult.

If you know a-priori the sign of π, then unbiased estimation is possible (I.

Andrews and Armstrong 2017)

1-35

References

Andrews, D.W.K. and X. Cheng (2012). “Estimation and Inference with Weak, Semi-Strong, and Strong

Identification,” Econometrica 80, 2153-2211.

Andrews, I. and T. Armstrong (2017). “Unbiased Instrumental Variables Estimation under Known First-Stage

Sign,” Quantiative Economics 8, 479-503.

Hahn, J., J. Hausman, and G. Kuersteiner (2004), “Estimation with weak instruments: Accuracy of higher order

bias and MSE approximations,” Econometrics Journal, 7, 272–306.

Kleibergen, F., and R. Paap (2006). “Generalized Reduced Rank Tests using the Singular Value

Decomposition.” Journal of Econometrics 133: 97–126.

Montiel Olea, J.L. and C.E. Pflueger (2013). “A Robust Test for Weak Instruments,” Journal of Business and

Economic Statistics 31, 358-369.

Nagar, A. L. (1959). “The bias and moment matrix of the general k-class estimators of the parameters in

simultaneous equations,” Econometrica 27: 575–595.

Nelson, C. R., and R. Startz (1990). “Some further results on the exact small sample properties of the

instrumental variable estimator.” Econometrica 58, 967–976.

Nelson, C. R., and R. Startz (2006). “The zero-information-limit condition and spurious inference in weakly

identified models,” Journal of Econometrics 138, 47-62.

Pflueger, C.E. and S. Wang (2015). “A Robust Test for Weak Instruments in Stata,” Stata Journal 15, 216-225.

Sanderson, E. and F. Windmeijer (2016). “A Weak Instrument F-test in Linear IV Models with Multiple

Endogenous Variables,” Journal of Econometrics 190, 212-221.

Staiger, D., and J. H. Stock. 1997. Instrumental variables regression with weak instruments. Econometrica 65:

557–86.

1-36

Stock, J.H. and M. Yogo (2005). “Testing for Weak Instruments in Linear IV Regression,” Ch. 5 in J.H. Stock

and D.W.K. Andrews (eds), Identification and Inference for Econometric Models: Essays in Honor of

Thomas J. Rothenberg, Cambridge University Press, 80-108.

3: Robust Inference with Weak Instruments

July 22, 2018

The Story So Far...

Conventional t-test-based confidence intervals can under-covertrue parameter value when instruments are weakEffective First-stage F-statistic provides a guide to bias

But screening applications on F-statistics can induce sizedistortions

This section: identification-robust confidence setsEnsure correct coverage regardless of instrument strengthNo need to screen on first stage

Avoids pretesting biasAvoids throwing away applications with valid instruments justbecause weakConfidence sets can be informative even with weakinstruments

Reminder: Normal Model

To discuss these issues, continue to consider the normal model(δπ

)∼ N

((δπ

),Σ

)where

δ is the reduced-form OLS coefficientπ is the first-stage OLS coefficientΣ is known

IV model implies δ = πβ

Negative ResultInitial Question: can we obtain correct coverage by adjustingour standard errors?

Confidence interval[β ± b

(δ, π)]

for some b (·, ·)

Answer: no (unless b(δ, π)can be infinite)

Gleser and Hwang (1989) and Dufour (1997) show that forany robust confidence set CS with coverage 1− α,

Prβ,π β ∈ CS ≥ 1− α for all β, π,

we must have

Prβ,π CS has infinite length > 0 for all β, π

Inutition: in case with π = 0, must cover every value β withprobability 1− α

Adjusting our (finite) standard errors isn’t enough: needalternative approach

Test Inversion

Leading alternative: test inversionIdea: Define a family of tests φ (·) where

φ (β0) test for H0 : β = β0φ (β0) = 1 if reject H0, 0 otherwise

Suppose φ (β0) has size α for all β0, i.e.

Eβ0,π [φ (β0)] ≤ α for all β0, π

If we form CS by collecting the non-rejected values

CS = β : φ (β) = 0

then CS has coverage 1− αCalled test inversion

Hence, to form an identification-robust confidence set, we onlyneed to form identification-robust tests of H0 : β = β0

Restriction Implied By IV Model

To implement test inversion, need to find a testTo construct robust test, use restrictions that hold regardlessof instrument strength

IV model implies that δ − πβ = 0

Under H0 : β = β0,

δ − πβ0 ∼ N (0,Ω (β0))

forΩ(β0) = Σδδ − β0(Σδπ + Σπδ) + β2

0Σππ

Holds regardless of instrument strength

AR Statistic

Building on this observation, can introduce AR statistic

AR (β0) =(δ − πβ0

)′Ω (β0)−1

(δ − πβ0

)Originally introduced by Anderson and Rubin (1949) forhomoskedastic normal caseHere, generalization to non-homoskedastic case

Under H0 : β = β0, AR (β0) ∼ χ2k for all π

Recall k = dim (Zi )

AR test φAR (β0) = 1AR (β0) > χ2

k,1−α

χ2k,1−α the 1− α quantile of a χ2

k distribution

AR Confidence set CSAR =β : AR (β) ≤ χ2

k,1−α

AR test and CS fully robust to weak instruments

The Form of AR Confidence Sets

CSAR can behave in counterintuitive waysIn just-identified setting (k = dim (X ) = 1) can take form of:

bounded interval: CSAR = [a, b]real line: CSAR = (−∞,∞)real line excluding bounded interval: CSAR = (−∞, a] ∪ [b,∞)

In over-identified settings can also be empty, CSAR = ∅In overidentifed non-homoskedastic settings, can takeadditional forms


Infinite confidence sets strange-looking...but have natural explanation

Unboundedness consistent with Gleser and Hwang (1989),Dufour (1997)Moreover, can show that

limβ0→±∞

AR (β0) = π′Σ−1ππ π = k · FR

Implies that CSAR unbounded if and only if first-stage F-testcannot reject π = 0 at level α

Unbounded AR confidence sets arise only when cannot rejectthat model totally unidentified


Empty confidence sets more awkwardArise from fact that AR tests H0 : δ = πβ0. Can bedecomposed into

Parameteric restriction β = β0Overidentifying restrictions δ ∝ π if k > 1

If k = 1, no overidentifying restrictions to test

Empty AR confidence sets can be interpreted as a rejection ofthe overidentifying restrictions

Unfortunate feature: how to interpret a small CS?Confidence sets non-empty with probability one injust-identified case

Optimality of AR in Just-Identified Models

In just-identified case with single endogenous regressor, AR isoptimal

101 out of 230 specifications in our AER sample arejust-identified with a single endogenous regressor

Moreira (2009) shows that AR test uniformly most powerfulunbiasedAR equivalent to two-sided t-test when instruments are strongIn just-identified settings, strong case for using AR CS

Optimal among CS robust to weak instrumentsNo loss of power relative to t-test if instruments strong

AR Tests in Applications

To examine practical impact of using CSAR , return to our AERsampleLimit attention to just-identified specifications with singleendogenous regressor where can estimate variance-covariancematrix of

(δ, π)

Yields 36 specificationsComparing 95% t and AR confidence sets, find infinite AR CSin two cases. In remaining cases:

AR confidence sets 56.5% longer on average in allspecifications20.3% longer on average in specifications that report F>100.04% longer on average in specifications that report F>50

AR Tests in Overidentified Models

Strong argument for using AR in just-identified settingsAR tests and CS perform worse in over-identified settings

As already noted, CSAR may be emptyAlso inefficient under strong instruments

Tests violations of H0 : β = β0, and of overidentifyingrestrictionsIf only care about parametric restrictions, “wastes” degrees offreedom

Improving Efficiency in Over-identified Settings

To obtain efficiency under strong instruments, need alternativetestsFor example, tests based on t-statistic

t (β0) =

∣∣∣β − β0

∣∣∣σβ

Problem: distribution of t (β0) under H0 : β = β0 depends onπ

Already know this: distribution of t-statistic depends oninstrument strengthSince π unknown, not clear what critical values to use witht (β0)

Conditional Critical Values

Moreira (2003): conditional critical valuesOriginally for homoskedastic case. Here discuss generalization

Idea: Find a sufficient statistic D (β0) for π under H0 : β = β0

Means conditional distribution of t (β0) |D (β0) doesn’t dependon π under H0Once condition on D (β0) , can compute data-dependentcritical values cα (D (β0))

Question: how to find D (β0)

Conditional Critical ValuesIdea for sufficient statistic: separate parts of

(δ, π)that

do/don’t depend on πDefine

g (β) = δ − πβLet

D(β) = π − (Σπδ − Σππβ) Ω(β)−1g(β),

denote π orthogonalized with respect to g (β)Under H0 : β = β0(

g(β0)D(β0)

)∼ N

((0π

),

(Ω (β0) 0

0 Ψ(β0)

))Conditional distribution of g(β0) given D (β0) doesn’t dependon π under H0 : β = β0

g (β0) |D (β0) ∼ N (0,Ω (β0))

(g(β0),D (β0)) one-to-one transformation of(δ, π)

⇒D (β0) is sufficient statistic for π


To construct conditional distribution of t (β0) |D (β0):1 Fix D (β0) at observed value2 Repeatedly draw g∗ (β0) ∼ N (0,Ω (β0))

3 Construct(δ∗, π∗

)from (g∗ (β0) ,D (β0))

4 Calculate t∗ (β0) based on(δ∗, π∗

)Conditional critical value cα (D (β0)): 1− α quantile of t∗ (β0)


Conditional t-test that rejects when t (β0) exceeds cα (D (β0))

φ (β0) = 1 t (β0) > cα (D (β0))

is fully robust to weak instruments,

Eβ0,π [φ (β0)] = α for all π

Conditioning not specific to t (β0), works for any statistics (β0)

In each case construct data-dependent critical valuecα (D (β0))Yields tests that control size

Question what statistic s (β0) to use

Alternative Test StatisticsMany possible choices of statistic s (β0)

t-statistic

t (β0) =

∣∣∣β − β0

∣∣∣σβ

Score statistic (Kleibergen 2002, 2005)

K (β0) = g (β0)′Ω (β0)−1 D (β0)×(D (β0)′Ω (β0)−1 D (β0)

)−1D (β0)′Ω (β0)−1 g (β0)

AR statistic

AR (β0) = g (β0)′Ω (β0)−1 g (β0)

LR statistic

LR (β0) = 2(maxβ,π

` (β, π)−maxπ` (β0, π)

)

Properties

Different test statistics imply different cα (D (β0))

For t and LR, need data-dependent critical valuesConditional distributions AR (β0) |D (β0) and K (β0) |D (β0)don’t depend on D (β0)

Can use χ2k and χ2

1 critical values, respectively

Conditional t, K, and LR tests efficient under stronginstruments

AR test inefficient in overidentified modelsUnder weak instruments, also yield different power properties

Conditional two-sided t-test poor powerK test sometimes poor powerConditional LR (CLR) test performs well (in homoskedasticcase)

See D. Andrews Moreira and Stock (2006), (2007) Plots

Near-Optimality of CLR Test

D. Andrews, Moreira, and Stock (2006) show that CLR testnear-optimal

In homoskedastic case with single endogenous regressor

Power close to upper bound for a natural class of tests over awide range of parameter valuesConsensus in literature that CLR is a good test forhomoskedastic settings

... but homoskedasticity assumption unappealing

Tests for Non-Homoskedastic ModelsVariety of CLR extensions for non-homoskedastic case

D. Andrews Moreira and Stock (2004), Kleibergen (2005), D.Andrews and Guggenberger (2015), I. Andrews (2016), I.Andrews and Mikusheva (2016)All efficient with strong instruments, but only simulationevidence on power with weak instruments

Alternative: tests proposed that maximize weighted averagepower

Maximize integral of power function with respect to someweightsMoreira and Moreira (2015), Montiel Olea (2017), Moreira andRidder (2018)Question: what are “right” weights?

Many options, but so far no consensus on what tests should beused in over-identified and non-homoskedastic models

In just-identified setting, use ARIn over-identified settings, use something that’s efficient understrong instruments

Two-Step Confidence Sets

Robust confidence sets not widely used in practiceWhen reported, usually only after authors find evidence ofweak instrumentsIn AER sample, reported in 2 papers. Minimal first-stage F of2.3 and 6.3, respectively

If only report robust confidence set when F small, can view asconstructing confidence set in two steps

If F ≥ 10, report t-statistic confidence setIf F < 10, report robust CS

Screening applications on first-stage F can generate very badbehavior. Does two-step CS do the same?

Positive results for FN in homoskedastic case based on Stockand Yogo (2005)Negative result for FN in non-homoskedastic case based onMontiel Olea and Pflueger (2013), for FR with conventionalcritical values based on I. Andrews (2018)

Negative results based on extreme forms ofnon-homoskedasticity: open question how bad in practice

Implementation

Implementations of some weak-IV tests are in Stata packageweakiv, available on SSC

Finlay, Magnusson, and SchafferVersions of CLR, AR, K, and other tests applicable tonon-homoskedastic modelsCan be used with fixed effects, clustered standard errors, etc.Stata Journal article on previous version of package: Finlayand Magnusson (2009)

Summary

A number of tests and confidence sets are available that arefully robust to weak instruments

Avoid pretesting bias, discarding applicationsMany efficient under strong instruments

In just-identified models, strong case for using AR CSCovers many applications

In over-identified models, less clearCLR if assume homoskedasticNo consensus for non-homoskedastic case...other than using something efficient under stronginstruments

ReferencesAnderson T, Rubin H. 1949. Estimators for the parameters ofa single equation in a complete set of stochastic equations.Annals of Mathematical Statistics 21:570–582Andrews D, Guggenberger P. 2015. Identification- andsingularity-robust inference for moment condition models.Unpublished ManuscriptAndrews D, Moreira M, Stock J. 2004. Optimal invariantsimilar tests of instrumental variables regression. UnpublishedManuscriptAndrews D, Moreira M, Stock J. 2006. Optimal two-sidedinvariant similar tests of instrumental variables regression.Econometrica 74:715–752Andrews D, Moreira M, Stock J. 2007. Performance ofconditional wald tests in iv regression with weak instruments .Econometrica 74:715–752Andrews I. 2016. Conditional linear combination tests forweakly identified models. Econometrica 84:2155–2182

ReferencesAndrews I, Mikusheva A. 2016. Conditional inference with afunctional nuisance parameter. Econometrica 84:1571–1612.Dufour J. 1997. Some impossibility theorems in econometricswith applications to structural and dynamic models.Econometrica 65:1365–1387Finlay K, Magnusson L. 2009. Implementing weak-instrumentrobust tests for a general class of instrumental-variablesmodels. Stata Journal 9, 398-421Geser L, Hwang J. 1987. The nonexistence of 100(1-α)%confidence sets of finite expected diameter inerrors-in-variables and related models. Journal of the AmericanStatistical Association 15:1341–1362Kleibergen F. 2002. Pivotal statistics for testing structuralparameters in instrumental variables regression. Econometrica70:1781–1803Kleibergen F. 2005. Testing parameters in gmm withoutassuming they are identified. Econometrica 73:1103–1123

References

Montiel Olea J. 2017. Admissible, similar tests: Acharacterization. Unpublished ManuscriptMontiel Olea J, Pflueger C. 2013. A robust test for weakinstruments. Journal of Business and Economic Statistics31:358–369Moreira H, Moreira M. 2015. Optimal two-sided tests forinstrumental variables regression with heteroskedastic andautocorrelated errors. Unpublished ManuscriptMoreira M. 2003. A conditional likelihood ratio test forstructural models. Econometrica 71:1027–1048Moreira M. 2009. Tests with correct size when instrumentscan be arbitrarily weak. Journal of Econometrics 152:131–140Moreira M, Ridder G. 2017. Optimal invariant tests in aninstrumental variables regression with heteroskedastic andautocorrelated errors. Unpublished Manuscript

Power Comparisons

Figure: Power of AR, K, and C LR tests in homoskedastic case (from D.Andrews, Moreira, and Stock (2006))

Power Comparisons

Figure: Power of Conditional t-and LR-tests in homokedastic (from D.Andrews, Moreira, and Stock (2007)) Return

4: Open Issues and Recent Research

July 22, 2018

Outline

Two goals for this section1 Examine practical importance of issues covered so far

Simulations calibrated to specifications published in AER2 Discuss other open questions and recent research on weak

instruments

AER Simulation Specifications

To assess practical importance of weak instrument issues,calibrate simulations to AER dataSpecifications from AER articles (excluding Papers andProceedings) from 2014-2018 that:

1 Published in main text2 Allow us to estimate variance matrix Σ of

(δ, π)

Mostly papers with replication dataIn one other case, back out Σ from published results

Yields 124 specifications from 8 papersAll specifications have a single endogenous variable

Simulation Design

To focus attention on weak instrument issues, simulations usenormal model (

δπ

)∼ N

((δπ

),Σ

)with Σ known, δ = πβ

Simulations fix β, π, and Σ at estimated valuesAbstracts away from:

Non-normality of δ, πEstimation error in ΣWill return to these later

Any disortions must arise from weak instruments

Distribution of t-Statistics

Theoretical results show t-tests can perform poorly wheninstruments weak

Distribution of t-statistics may not be centered at zeroRejection probability of 5% t-tests may be much larger

In each of our 124 AER specifications simulateMedian t-statistic

Med

(β − β0

σβ

)Size of 5% t-tests

Pr

∣∣∣β − β0

∣∣∣σβ

> 1.96

Plot against average effective first-stage F-stat

Little action for E[FE]> 50. Limit plot to E

[FE]≤ 50

Includes 106 of 124 specifications


0 5 10 15 20 25 30 35 40 45 50

Average Effective F-Statistic

-1.5

-1

-0.5

0

0.5

1

1.5

Media

n t-S

tatistic


0 5 10 15 20 25 30 35 40 45 50

Average Effective F-statistic

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5S

ize o

f 5%

t-T

est


Weak instrument issues apparent in some specificationsMedian t-statistic far from zeroSize of nominal 5% t-test much larger than 5%

Problems limited to specifications with a small averageeffective F-stat

No large distortions in specifications with E[F E]> 10

Population rule of thumb seems to work pretty wellNot a theorem!

Weak instrument issues appear relevant for some recently-publishedspecifications, but only in cases with E

[FE]small

Screening on F-Statistics

Given that average effective F-statistics seem to capture weakinstrument issues, tempting to screen applications on F

e.g. only pursue applications with F E > 10

Distribution of F-statistics in AER sample suggests may becommonAs already discussed, can introduce size distortionsExamine effect in our AER specifications

For each specification, calculate size of 5% t-test conditionalon F E > 10


0 5 10 15 20 25 30 35 40 45 50

Average Effective F-Statistic

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1S

ize o

f 5%

t-T

est, S

cre

ened o

n E

ffective F

>10


Screening leads to much larger size for some specificationsNot specific to F E , same issues appear for FN , FR

Not specific to threshold of 10: if move threshold, getdistortions in neighborhood of new cutoff

Screening on F-statistics can make published results less reliable

Robust Confidence Sets

Rather than screening applications on FE , can compute robustconfidence sets

Guaranteed to have correct coverage regardless of instrumentstrength

For illustration, here consider Anderson-Rubin (AR)Plot size of AR tests in AER specifications


0 5 10 15 20 25 30 35 40 45 50


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5S

ize o

f 5%

Anders

on-R

ubin

Test


AR size is flat at 5% regardless of instrument strengthAR also efficient in just-identified case

For over-identified models, variety of robust tests andconfidence sets available

Many ensure efficiency in strongly-identified caseAll again ensure correct size regardless of instrument strength

Robust confidence sets eliminate size distortions from weakinstruments


Robust confidence sets currently little-used in practiceWhen used, often because weak identification is suspected

When used in this way, can be viewed as alternative toscreening on F-statistic. For example

If F E ≥ 10, report t-statisticIf F E < 10, report AR

Alternatively, could use Montiel Olea and Pflueger (2013)critical valuesMay introduce size distortions, but not clear how large

Examine performance in our AER specifications


0 5 10 15 20 25 30 35 40 45 50


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5S

ize o

f 5%

Tw

o-S

tep T

est w

ith c

uto

ff 1

0


Observe some size distortions for specifications withE[FE]≈ 10

True size never above 10% in these simulationsResults similar if instead use Montiel Olea and Pflueger (2013)critical values

Also not a theorem!

Deciding to use a robust confidence set based on FE isn’ttheoretically guaranteed to work, but cost appears small in our AERspecifications

Summary of Simulation Results

1 Weak instruments appear to be a problem in some publishedspecifications

2 Bad behavior largely limited to specifications with E[FE]< 10

3 Screening on FE can amplify problems4 Robust confidence sets eliminate size distortions5 Choosing whether to report robust CS based on FE introduces

some distortions, but small

Questions from Simulations: Performance of F-Statistics

Simulation results suggest some questions1 Theoretical justification for FE in Montiel Olea and Pflueger

(2013) only concerns bias. Appears to also diagnose sizeproblems reasonably well. Can this be formalized?

2 Two-step confidence sets based on FE work reasonably well insimulations. Can this be formalized?

Questions from Simulations: Normal Approximation

Our simulations take(δ, π)to be normally distributed with

known varianceFocuses attention solely on distortions from weak instruments

Results from Young (2018) suggest may be problematicBased on sample of papers from AEA journals (larger than,but overlaps with, our AER sample)

Young (2018) finds that1 A small number of observations have a large influence on

estimates and p-values2 Variance estimates Σ often extremely noisy in simulation3 As a result of noisy estimates Σ, AR tests can have large size

distortions in over-identified settings

Further exploration of interaction between weak instrumentsand issues discussed by Young (2018) of considerable interest

Other Research: Subvector Inference

Some applications have more than one endogenous regressor19 out of 230 specifications in AER sample

Most tests previously discussed extend to tests of dim (X ) × 1vector β in settings with multiple endogenous variables

Imply joint confidence sets for full vectorJoint confidence sets rarely reported in strong-instrumentsettings. Instead, usually report e.g. estimates and standarderrors for each element of β separately

Write β = (β1, β2) , and want confidence set for β1 aloneIf assume instruments strong for β2, simple solution

“Strong for β2” meaning strong if treat β1 as knownPlug appropriate estimate β2 (β1) into robust test statistics(see e.g. Stock and Wright (2000))

If instruments weak for β2, hard problem

Other Research: Subvector Inference

One option projection methodForm joint confidence set for (β1, β2), and collect implied setof values for β1Can have very low power

Several recent papers seeking to improve power of projectionmethod

Smaller critical values for AR statistic in homoskedastic case:Guggenberger et al. (2012)Modified projection approach to improve power inwell-identified case: Chaudhuri and Zivot (2011), D. Andrews(2017)

Active area of research

Subvector Inference: Implementation

Stata package weakiv canCompute joint confidence sets for βPlug in estimates for strongly identified β2Implement projection method

Package twostepweakiv (also on SSC/Github) implementsrefined projection based on Chaudhuri and Zivot (2011)

Stata Journal article: Sun (Forthcoming)Nearly-efficient inference on β1 under strong identificationAlso implements two-step CS with guaranteed coverage

Other Research: Nonlinear Models

All the results discussed for IV apply directly to linear GMMGMM moments linear in parameters

Many (though not all) generalize to nonlinear GMMNo known analog of first-stage F-statistic

Alternative for approach detecting weak identification: I.Andrews (2018)

Many procedures for robust inference, e.g. Stock and Wright(2000), Kleibergen (2005), D. Andrews and Guggenberger(2015), I. Andrews and Mikusheva (2016)

The End

Thank you!

ReferencesAndrews D. 2017. Identification-robust subvector inference.Unpublished ManuscriptAndrews D, Guggenberger P. 2015. Identification- andsingularity-robust inference for moment condition models.Unpublished ManuscriptAndrews I. 2018. Valid two-step identification-robustconfidence sets for gmm. Review of Economics and Statistics100:337–348Andrews I, Mikusheva A. 2016. Conditional inference with afunctional nuisance parameter. Econometrica 84:1571–1612.Chaudhuri S, Zivot E. 2011. A new method ofprojection-based inference in gmm with weakly identifiednuisance parameters. Journal of Econometrics 164:239–251Guggenberger P, Kleibergen F, Mavroeidis S, Chen L. 2012.On the asymptotic sizes of subset anderson–rubin and lagrangemultiplier tests in linear instrumental variables regression.Econometrica 80:2649–2666

References

Kleibergen F. 2005. Testing parameters in gmm withoutassuming they are identified. Econometrica 73:1103–1123Montiel Olea J, Pflueger C. 2013. A robust test for weakinstruments. Journal of Business and Economic Statistics31:358–369Sun L. Forthcoming. Implementing valid two-stepidentification-robust confidence sets for linearinstrumental-variables models. Stata JournalStock J, Wright J. 2000. Gmm with weak identification.Econometrica 68:1055–1096Young A. 2018. Consistency without inference: Instrumentalvariables in practical application. Unpublished Manuscript

Weak Instruments and What To Do About Them · 1-2 Overview and Summary Topic: IV regression with a...

Documents

Transcript of Weak Instruments and What To Do About Them · 1-2 Overview and Summary Topic: IV regression with a...