School of Economics Working Papers · distribution-free method such as bootstrap can improve...

40
On Bootstrap Validity for Subset Anderson-Rubin Test in IV Regressions Firmin Doko Tchatoka Wenjie Wang Working Paper No. 2015-01 January, 2015 Copyright the authors School of Economics Working Papers ISSN 2203-6024

Transcript of School of Economics Working Papers · distribution-free method such as bootstrap can improve...

On Bootstrap Validity for Subset Anderson-Rubin Test in IV

Regressions

Firmin Doko Tchatoka Wenjie Wang

Working Paper No. 2015-01 January, 2015

Copyright the authors

School of Economics

Working Papers ISSN 2203-6024

On bootstrap validity for subset Anderson-Rubin test in IV

regressions

Firmin Doko Tchatoka∗

The University of Adelaide

Wenjie Wang†

Kyoto University

January 5, 2015

∗ School of Economics, The University of Adelaide, 10 Pulteney Street, Adelaide SA 5005, Tel:+618 8313 1174,Fax:+618 8223 1460, e-mail: [email protected]

†Graduate School of Economics, Kyoto University. e-mail: [email protected]

ABSTRACT

This paper sheds new light on subset hypothesis testing in linear structural models in which instru-

mental variables (IVs) can be arbitrarily weak. For the firsttime, we investigate the validity of the

bootstrap for Anderson-Rubin (AR) type tests of hypothesesspecified on a subset of structural pa-

rameters, with or without identification. Our investigation focuses on two subset AR type statistics

based on the plug-in principle. The first one uses the restricted limited information maximum like-

lihood (LIML) as the plug-in method, and the second exploitsthe restricted two-stage least squares

(2SLS). We provide an analysis of the limiting distributions of both the standard and proposed

bootstrap AR statistics under the subset null hypothesis ofinterest. Our results provide some new

insights and extensions of earlier studies. In all cases, weshow that when identification is strong and

the number of instruments is fixed, the bootstrap provides a high-order approximation of the null

limiting distributions of both plug-in subset statistics.However, the bootstrap is inconsistent when

instruments are weak. This contrasts with the bootstrap of the AR statistic of the null hypothesis

specified on the full vector of structural parameters, whichremains valid even when identification is

weak; see Moreira et al. (2009). We present a Monte Carlo experiment that confirms our theoretical

findings.

Key words: Subset AR-test; bootstrap validity; bootstrap inconsistency; weak instruments; Edge-

worth expansion; subsampling.

JEL classification: C12; C13; C36.

i

1. Introduction

The literature on weak instruments is widespread,1 and most studies have often focused on testing

hypotheses specified on the full set of “structural parameters”. There is now a growing interest on

inference procedures for subset hypotheses; for examples,see Stock and Wright (2000), Dufour

and Jasiak (2001), Kleibergen (2004, 2008), Dufour and Taamouti (2005, 2007), and Startz, Nelson

and Zivot (2006). This literature concerned with subset hypotheses testing falls generally into two

categories.

The first category is the projection method based on identification-robust statistics.2 This

method consists of inverting robust statistics to build a confidence set for the full set of parame-

ters, and then uses projection techniques to obtain a confidence set for the subset of parameters. In

addition to being robust to weak identification, the projection method based on the Anderson and

Rubin (1949, AR) statistic also enjoys robustness to instrument exclusion. However, this method

has often been criticized for being overly conservative andhaving low power when too many instru-

ments are used.

The second category includes the robust subset procedures originally suggested by Stock and

Wright (2000) and recently developed by Kleibergen (2004, 2008), and Startz et al. (2006). These

procedures, known as plug-in based tests, consist of replacing the nuisance parameters that are not

specified by the hypothesis of interest by estimators. It is well known that plug-in based tests never

over-reject the true parameter values when the nuisance parameters are identified. Guggenberger,

Kleibergen, Mavroeidis and Chen (2012), and Guggenberger and Chen (2011) show that the plug-in

subset AR test has acorrect asymptotic sizeeven when identification is weak, i.e., this statistic is

robust to identifying assumptions, while the subset test based on Kleibergen (2002) (K) statistic is

sensitive to such assumptions. Doko Tchatoka (2014) shows that even for moderate sample size,

all conventional plug-in subset tests (including the subset AR test) are overly conservative when the

nuisance parameters which are not specified by the hypothesis of interest are weakly identified.

1See the reviews by Stock, Wright and Yogo (2002), Dufour (2003), Andrews and Stock (2007), Poskitt and Skeels(2012), and Mikusheva (2013), among others.

2See Dufour (1997), Dufour and Jasiak (2001), Dufour and Taamouti (2005, 2007), Dufour, Khalaf and Beaulieu(2010), and Doko Tchatoka and Dufour (2014).

1

This paper contributes to the literature on weak instruments by investigating whether a

distribution-free method such as bootstrap can improve thesize property of the subset AR tests, es-

pecially when identification is not very strong. To be more specific, we suggest a bootstrap method

similar to those of Moreira, Porter and Suarez (2009) for thescore test of the null hypothesis in the

structural parameters. Testing subset hypotheses is substantially more complex than testing joint

hypotheses on the full set of parameters, especially when identification is weak. Thus, the validity

of Moreira et al.’s (2009) bootstrap for the subset AR test isnot obvious, and further investigation

is required to elucidate this issue.

In this paper, our analysis focuses on two plug-in subset AR statistics. The first one uses the

restricted LIML as the plug-in estimator of the nuisance parameters, and the second uses the re-

stricted 2SLS as the plug-in method. We observe that most studies of subset hypothesis testing in

linear instrumental variables (IV) regressions [except Startz et al. (2006)] usually consider the re-

stricted LIML as the plug-in method. So, not much is known about the size property of the plug-in

AR test based on 2SLS estimator when identification is weak (weak instruments). This paper also

aims to fill this gap by extending the analysis to the plug-in method based on the 2SLS estimator.

After formulating a general asymptotic framework which allows us to address this issue in a con-

venient way, we provide an analysis of the limiting distributions of both the standard and bootstrap

AR statistics under the subset null hypothesis of interest.Our results provide some new insights

and extensions of earlier studies. In all cases, we show thatwhen identification is strong, the boot-

strap provides a high-order approximation of the null limiting distributions of both plug-in subset

statistics. However, the bootstrap is inconsistent when instruments are weak. This contrasts with

the bootstrap of the AR statistic of the null hypothesis specified on the full vector of structural pa-

rameters, which remains valid even when identification is weak; see Moreira et al. (2009). The

inconsistency of bootstrap under weak instruments is mainly due to the fact that bootstrapping fails

to mimic the concentration matrix (or parameter) that characterizes the strength of the identifica-

tion of the nuisance parameters (which are not specified by the null hypothesis) in such contexts.

Moreover, we also show that alternative resampling techniques, such as subsampling, which usually

offer a good approximation of the size of the tests in many cases where bootstrap typically fails

2

[see Andrews and Guggenberger (2009)], are invalid under Staiger and Stock’s (1997)local to zero

weak instruments asymptotic; see Section A.1 of the appendix. Finally, we present a Monte Carlo

experiment that confirms our theoretical findings.

This paper is organized as follows. In Section 2, the model and assumptions are formulated,

and the studied statistics are presented. In Section 3, the limiting behaviors of the standard subset

AR tests are provided, while in Section 4, the proposed bootstrap method as well as its asymptotic

validity are discussed, with and without weak instruments.In section 5, the Monte Carlo experiment

is presented. Conclusions are drawn in Section 6. The subsampling results, the auxiliary lemmata,

and proofs are provided in the appendix.

Throughout the paper,Iq stands for the identity matrix of orderq. For any full-column rank

n×m matrix A, PA = A(A′A)−1A′ is the projection matrix on the space ofA, andMA = In −PA.

The notationvec(A) is the nm× 1 dimensional column vectorization ofA. B > 0 for a squared

matrix B means thatB is positive definite. Convergence almost surely is symbolized by “a.s.” ,

“p→” stands for convergence in probability, while “

d→” means convergence in distribution. The

usual orders of magnitude are denoted byOp(.), op(.), O(1), and o(1). ‖U‖ denotes the usual

Euclidian or Frobenius norm for a matrixU. For any setB, ∂B is the boundary ofB and(∂B)ε

is theε-neighborhood ofB. Finally, supω ∈Ω

| f (ω)| is the supremum norm on the space of bounded

continuous real functions, with topological spaceΩ .

2. Framework

In this section, we present the model and the statistics studied. Sections 3 and 4 will give the

concrete results.

2.1. Model and assumptions

We consider a standard linear IV regression with two (possibly) endogenous right hand side (rhs)

variables andL instrumental variables (IVs). The sample size isn. The model consists of a structural

3

equation and a reduced-form equation:

y = Xβ +Wγ + ε, (2.1)

(X,W) = Z(Πx : Πw)+ (Vx,Vw), (2.2)

wherey ∈ Rn is a vector of observations on a dependent variable,X ∈ R

n andW ∈ Rn are vec-

tors of observations on (possibly) endogenous explanatoryvariables,Z ∈ Rn×L containsL exoge-

nous variables excluded from (2.1) (instruments),ε ∈ Rn is a vector of structural disturbances,

Vx ∈ Rn,Vw ∈ R

n are vectors of reduced-form disturbances,β ∈ R andγ ∈ R are (typically) un-

known fixed coefficients (structural parameters),Πx ∈ RL and Πw ∈ R

L represent vectors of un-

known (fixed) coefficients.

We suppose thatZ has full-column rankL with probability one, and thatL ≥ 2. The full-column

rank condition ofZ ensures the existence of unique least squares estimates in (2.2) whenX andW

are regressed on each column ofZ. As long asZ has full-column rank with probability one and

the conditional distributions ofX andW, given Z, are absolutely continuous (with respect to the

Lebesgue measure), the matrix[X ,W] also has full-column rank with probability one. So, the least

squares estimates ofβ and γ in (2.1) are also unique. In practice, it may be relevant to include

exogenous regressors in (2.1)-(2.2), whose coefficients remain unrestricted in the inference. If so,

the results of this paper do not change qualitatively by replacing the variables that appear currently

in (2.1)-(2.2) by the residuals that result from their regression on these exogenous variables.

If the errorsε , Vx and Vw have zero means3 and Z has full-column rank with probability

one, then the usual necessary and sufficient condition for the identification of model (2.1)-(2.2)

is Πxw = [Πx : Πw] has full-column rank. IfΠxw = 0, Z is irrelevant so thatβ andγ are completely

unidentified. IfΠxw is close to zero,β andγ are ill-determined by the data, a situation often called

“weak identification” in the literature; see Staiger and Stock (1997), Stock et al. (2002), Dufour

(2003), Andrews and Stock (2007), Poskitt and Skeels (2012), and Mikusheva (2013).

We are concerned with the validity of the bootstrap for the plug-in Anderson and Rubin (1949)

3This assumption may also be replaced by another “location assumption,” such as zero medians.

4

subset AR-test of the subset null hypothesis

H0 : β = β 0 (2.3)

without any identifying assumptions of model (2.1)-(2.2),whereβ 0 is a constant vector.

Let Y = [y,X,W] = [Y1, . . . ,Yn]′ denote the matrix of endogenous variables, where we de-

fine Yi ∈ R3 as the ith row of Y, written as a column vector, and similarly for other model

random matrices. Let alsoXn =

(Y′1, Z′

1), . . . ,(Y′n, Z′

n)

and Rn = vech[

(Y′n,Z

′n)

′(Y′

n,Z′n)]

=(

f1(Y′n,Z

′n), f2(Y′

n,Z′n), . . . , fK(Y′

n,Z′n))′, where fp(·), p= 1, . . . , K = 1

2(L+2)(L+3), are elements

of (Y′n, Z′

n)′(Y′

n, Z′n) andRn has a distributionF.

We make the following generic assumptions on the model variables.

Assumption 2.1 (i) E(‖Rn‖2+r) < ∞ for some r> 0, and (ii) limsup‖t‖→∞ | E(exp(it ′Rn)) |< 1,

where i=√−1.

Assumption 2.2 E[(ε : Vx : Vw) | Z] = 0.

Assumption 2.3 When the sample size n converges to infinity, the following convergence results

hold jointly:

(i) n−1(ε : Vx : Vw)′(ε : Vx : Vw)

p→ Ω , n−1Z′Zp→ QZ, where Ω =

σ εε σ ′Vε

σVε ΣV

, ΣV =

σVxVxσVwVx

σVwVxσVwVw

, σVε = (σVxε ,σVwε)′, σ jk ∈ R for all j ,k∈ ε,Vx,Vw ;

(ii) n−1/2Z′(ε : Vx : Vw)d→(

ψZε : ψZVx: ψZVw

)

, vec(

ψZε : ψZVx: ψZVw

)

∼ N(0, Σ ⊗ QZ),

ψZε , ψZVx, ψZVw

are L×1 random vectors.

Assumption2.1 is similar to Moreira et al. (2009, Assumptions 2-3) withr = s− 2 ands≥3. Assumption2.1-(i) requires thatRn has second moments or greater while Assumption2.1-(ii)

requires that its characteristic function be bounded aboveby 1. In particular, the second moments

of Rn exist ifE(‖(Y′n, Z′

n)‖2(r+2)

)< ∞ for somer > 0. The bound on the characteristic function is the

commonly used Cramér’s condition [see Bhattacharya and Ghosh (1978)].

5

Assumption2.2is the usual conditional zero mean assumption of the model errors. The two con-

vergences in Assumption2.3-(i) are the weak law of large numbers (WLLN) property of[ε ,Vx,Vw]

andZ, respectively, while the convergence in Assumption2.3-(ii) is the central limit theorem (CLT)

property.

From (2.1)-(2.2), we can write the reduced-form forY(β 0) = [y(β 0),W] underH0 : β = β 0 as:

Y(β 0) = ZΠw(γ : 1)+ (v1,Vw), (2.4)

wherev1 = Vwγ + ε. As long asZ has full-column rank with probability one and[ε ,Vw] has zero

mean, the coefficients of the reduced-form equation (2.4) are identifiable. As a result,γ is identifi-

able underH0 wheneverΠw 6= 0 is fixed. Let

ψv1= Q−1/2

Z (ψZVwγ +ψZε), ΩW =

σ11 σVw1

σVw1 σVwVw

,

whereσ11 = σ εε + 2σVwε γ + γ2σVwVw, andσVw1 = σVwVw

γ +σVwε . Under Assumption2.3, it is a

simple exercise to see that

n−1(v1,Vw)′(v1,Vw)

p→ ΩW, n−1Z′(v1,Vw)p→ 0, (2.5)

vec(

(Z′Z/n)−1/2n−1/2Z′(v1,Vw))

d→ vec(

ψv1,ψZVw

)

∼ N(0,ΩW ⊗ IL) . (2.6)

We will now present the subset AR statistics studied.

2.2. Plug-in subset AR statistics

The plug-in principle usually consists of two steps. The first step is to take an identification-robust

statistic [usually the Anderson and Rubin (1949, AR), Kleibergen (2002, K) and Moreira (2003,

CLR) type statistics] which results from the test of the joint hypothesisH(β 0,γ0) : β = β 0, γ = γ0.

The second step consists of replacing the nuisance parametersγ which are not specified by the subset

null hypothesis of interest (2.3) by an estimator in the expression of the above statistics. In this

paper, we consider the plug-in AR-type statistics that use the restricted LMIL and 2SLS estimators

6

of γ underH0, namelyγLIML andγ2SLS, respectively. For convenience, we adopt the notationγ j , j ∈LIML, 2SLS in the remainder of the paper. The subset AR statistic forH0 corresponding toγ j

can be expressed4 as

AR(β 0, γ j) =1L

∥S(β 0, γ j)∥

2, (2.7)

whereS(β 0, γ j) = (Z′Z)−1/2Z′Y(β 0)r j(r ′jΩWr j)−1/2, Y(β 0) = [y(β 0) :W], y(β 0) = y−Xβ0, ΩW =

1n−LY(β 0)

′MZY(β 0), r j = (1,−γ j)′. It is well known thatγ j is given by

γj=[

W′(PZ− κ jMZ)W]−1

W′ (PZ− κ jMZ)(y−Xβ0), (2.8)

whereκ2SLS= 0 andκLIML = (n−L)−1κLIML , κLIML is the smallest root of the characteristic poly-

nomial |κΩW − Y(β 0)′PZY(β 0)| = 0. Furthermore, we haveAR(β 0, γLIML

) = minγ∈R AR(β0,γ) [for

example, see Guggenberger et al. (2012)], meaning thatAR(β0, γ2SLS)≥ AR(β0, γLIML

). This means

that the test that usesγ2SLS

rejects more often the subset null hypothesis than those that usesγLIML

if

the same critical value is applied.

Guggenberger et al. (2012) and Guggenberger and Chen (2011)show that a test ofH0 based

on AR(β 0, γLIML) has a correctasymptotic sizewhen the errors are homoskedastic even whenγ

is weakly identified. However, this test is overly conservative even in moderate samples when

the identification ofγ is weak and that the usual asymptoticχ2-critical values are applied in the

inference; see Doko Tchatoka (2014). Moreover, little is known about the size property of the

test based onAR(β0, γ2SLS), especially whenγ is not identified. In this paper, we characterize the

asymptotic behavior of both tests, including whenγ is not identified, and we investigate whether

bootstrapping can improve their size.

In the remainder of the paper, letGL−1(·) andgL−1(·) denote the cumulative density function

(cdf) and the probability density function (pdf), respectively, of a χ2-distributed random variable

with L−1 degrees of freedom. We now characterize the limiting distribution underH0 of AR(β 0, γ j)

for j ∈ LIML, 2SLS .4See Stock and Wright (2000); Zivot, Startz and Nelson (2006); Guggenberger et al. (2012); and Doko Tchatoka

(2014)

7

3. Asymptotic behavior ofAR(β 0, γ j)

In this section, we study the limiting behavior ofAR(β0, γ j), j ∈ LIML, 2SLS , under the subset

null hypothesisH0 in both the strong and weak identification setups. Since strong identification is

relatively easy to tackle, we will focus on that case first. Section 3.1 presents the results.

3.1. Strong identification

We focus first on the case in whichγ is identified underH0, i.e., Πw 6= 0 is fixed. Although this

setup is widely considered in many studies of weak instruments,5 only a first-order approximation

of the limiting distribution of the subset AR-statistic is provided. Here, we provide a high-order

approximation of the limiting distributions of the statistics underH0. Theorem3.1 presents the

results.

Theorem 3.1 Suppose that Assumptions2.1 - 2.3 are satisfied with r≥ 1. If further H0 holds and

Πw 6= 0 is fixed, then we have:

supτ ∈R

∣GARj(τ)−GL−1(τ)−

r

∑h=1

n−h

phARj

(τ;F,β 0,γ ,Πx,Πw)gL−1(τ)∣

∣= o(n−r)

for j ∈ LIML, 2SLS , where phARj

is a polynomial inτ with coefficients depending onβ 0, γ , Πx,

Πw, and the moments of the distribution F of Rn, and GARj(τ) = P[AR(β 0, γ j) ≤ τ ] is the cdf of

AR(β0, γ j) evaluated atτ ∈ R.

Remark. Let cARj(α) = inf

τ ∈ R : GARj(τ)≥ 1−α

define the 1− α quantile of the dis-

tribution of AR(β0, γ j), j ∈ LIML, 2SLS. Theorem3.1 gives the conditions under which

cARj(α) ≈ χ2

L−1(α) + ∑r

h=1 n−h

qhARj

(χ2L−1

(α)) uniformly in ζ < α < 1− ζ for any 0< ζ < 1/2

[see Hall (1992)], where theqhARj

are polynomials derivable fromphARj

and χ2L−1

(α) is the 1−α

quantile of aχ2-distributed random variable withL− 1 degrees of freedom, i.e., the solution of

GL−1[χ2L−1

(α)] = 1−α. Thus, Theorem3.1 provides a more greater accurate approximation of the

distribution ofAR(β0, γ j) than the usual first-order asymptoticχ2-distribution.

5For example, see Stock and Wright (2000); Kleibergen (2004); Startz et al. (2006); Mikusheva (2010); Guggenbergeret al. (2012); Guggenberger and Chen (2011); and Doko Tchatoka (2014).

8

We will now study the null limiting behavior ofAR(β0, γ j) under Staiger and Stock’s (1997)

local to zeroweak instruments setup.

3.2. Local to zero weak instruments

We now suppose thatΠw = C0w/√

n, whereC0w ∈ RL is a fixed vector (possibly zero). Due to

identification failure, high-order refinement of the distribution of AR(β 0, γ j), such as in Theorem

3.1, is no longer possible. Because of this, we only derive the asymptotic distributions of the

statistics underH0. SinceAR(β 0, γ j) depends onγ j , S(β 0, γ j) and ˜r ′jΩWr j for j ∈ LIML, 2SLS , it

will be illuminating to study the asymptotic behavior of thelatter statistics first. Lemma3.2presents

the results, where

Σρ =

1 ρVwε

ρVwε

1

, Ψ = σ−1/2VwVw

Q1/2Z C0w+ψVw

, ψVw= σ−1/2

VwVwQ−1/2

Z ψZVw,

ψε = σ−1εε Q−1/2

Z ψZε , ρVwε

= (σ εε σVwVw)−1/2

σVwε . (3.1)

Lemma 3.2 Suppose Assumptions2.2-2.3 and H0 are satisfied. IfΠw =C0w/√

n, where C0w ∈RL

is fixed, then the following convergence holds jointly for j∈ LIML, 2SLS :

(a) γ j − γ0d→ σ1/2

εε σ−1/2

VwVw∆ j , where∆ j = (Ψ ′Ψ − κ j)

−1(Ψ ′ψε − κ jρVwε), κ2SLS= 0 and κLIML is

the smallest root of∣

∣(ψε : Ψ )′ (ψε : Ψ)−κΣρ∣

∣= 0;

(b) r ′jΩWr jd→ σ εε, j = σ εε

(

1−2ρVwε

∆ j +∆ 2j

)

;

(c) S(β 0, γ j)d→(

1−2ρVwε

∆ j +∆ 2j

)−1/2

Sj , where Sj = ψε −Ψ∆ j .

Remarks. (i) Lemma3.2- (a) shows that both the restricted LIML and 2SLS estimatorsunder

H0 are inconsistent under Staiger and Stock’s (1997)local to zeroweak instruments setup [similar to

Staiger and Stock (1997) and Doko Tchatoka (2014)]. Under Assumptions2.2-2.3, it is straightfor-

ward to see that(ψ ′ε ,ψ ′

Vw)′ is Gaussian with zero mean and covariance matrixΣρ ⊗ IL.So, givenψVw

,

ψε follows a normal distribution with meanρVWε

ψVwand covariance matrix(1−ρ2

VWε)IL. Therefore,

the conditional distribution ofγ2SLS

−γ0, givenψVw, is Gaussian with meanρ

VWε(Ψ ′Ψ )−1Ψ ′ψVw

and

9

covariance matrix(1−ρ2VWε

)(Ψ ′Ψ )−1. Since the mean and covariance matrix of the conditional dis-

tribution of γ2SLS

− γ0, given ψVw, depend only onψVw

, its unconditional distribution is a mixture

of Gaussian processes with nonzero means. This is not, however, the case for LIML. Indeed, since

κLIML 6= 0 with probability one, the conditional distribution ofγLIML

− γ0, given ψVw, is not nec-

essarily Gaussian. As a result,γLIML

− γ0 does not necessarily converge to a mixture of Gaussian

processes; see Doko Tchatoka (2014).

(ii) Lemma3.2- (b) shows that ˜r ′jΩWr j converges to a random process, rather than a constant

scalar for all j ∈ LIML, 2SLS as in the case of strong identification. Meanwhile, Lemma3.2- (c)

implies thatS(β 0, γ j) does not have a standard normal distribution even forj = 2SLS. Indeed, while

S(β 0, γ2SLS) converges to a mixture of Gaussian processes with nonzero mean underH0, the null

limiting distribution of S(β 0, γ j) is nonstandard and more complex to characterize because of the

presence ofκLIML in it.

We can now state the following theorem on the asymptotic behavior of AR(β 0, γ j) underH0 for

all j ∈ LIML, 2SLS , when instruments are weak.

Theorem 3.3 Suppose that Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw = C0w/√

n,

where C0w ∈ RL is fixed(possibly zero). Then we have:

AR(β0, γ j)d→ ξ ∞

ARj=

1L

(

1−2ρVwε

∆ j +∆ 2j

)−1/2

Sj

2

for j ∈ LIML, 2SLS ,

where Sj , ρVwε

and∆ j are defined in Lemma3.2.

Remarks. (i) Theorem 3.3 shows that the null limiting distribution ofAR(β 0, γ j), j ∈LIML, 2SLS , is nonstandard and depends on nuisance parameters under Staiger and Stock’s

(1997) local to zeroweak instruments framework. Therefore, using the asymptotic χ2-critical val-

ues in the inference is not recommended in such contexts.

(ii) Guggenberger et al. (2012, eq.14) and Guggenberger and Chen(2011) provide an upper

bound on the limiting distribution ofAR(β 0, γLIML) and show that a test based on this statistic has

correct asymptotic sizeeven with weak instruments. However, it performs poorly even with rela-

tively moderate sample sizes [see Doko Tchatoka (2014)]. Meanwhile, it is not clear from Lemma

10

3.2 and Theorem3.3 whether this result on the upper ofAR(β 0, γLIML) extends toAR(β0, γ2SLS).

The Monte Carlo results show that a test based onAR(β 0, γ2SLS) sometimes over-rejects the sub-

set null hypothesis when identification is weak and endogeneity is large, while those based on

AR(β0, γLIML) remains overly conservative with weak instruments; see Table 1.

Based on the above results, it is natural to question whetherbootstrapping can improve the size

of these subset AR-tests, especially whenγ is weakly identified. Section 4 addresses this issue.

4. Bootstrapping subset AR-test

Moreira et al. (2009) show that a residual-based bootstrap yields a test with correct size for the AR

statistic of the joint hypothesisH(β0,γ0) : β = β 0, γ = γ0 in model (2.1)-(2.2), without identifying

assumptions on model parameters. However, unlike the AR statistic of H(β 0,γ0), the subset AR

statistics ofH0 in (2.7) are not pivotal (even asymptotically) whenγ is weakly identified (see Theo-

rem3.3). So, the validity of the bootstrap for the subset AR statistics of H0 is not obvious, at least

when identification is weak, and further investigation is required. In this section, we examine the

validity of the bootstrap similar to that of Moreira et al. (2009) for the subset AR statistics in (2.7).

Before proceeding, letΠx = (Z′Z)−1Z′X andΠw = (Z′Z)−1Z′W denote the ordinary least squares

estimators ofΠx andΠw in (2.2). Let alsoγ j , j ∈ LIML, 2SLS , denote the restricted LIML and

2SLS estimators ofγ underH0 given in (2.8). We adapt the bootstrap procedure by Moreira et al.

(2009) to the subset AR statistics as follows.

1. For a givenβ 0 and the observed data, computeΠx, Πw andγ j , j ∈ LIML, 2SLS , along with

all other items necessary to obtain the realizations of the statisticAR(β 0, γ j) and the residuals

from the reduced-form equation (2.4): ˆv1(β 0)≡ v1 = y(β 0)−ZΠwγ j , Vx(β 0)≡ Vx=X−ZΠx,

andVw =W−ZΠw. These residuals are then re-centered by subtracting samplemeans to yield

(v1,Vx,Vw);

2. For each bootstrap sampleb= 1, . . . , B, the data are generated following

X∗ = Z∗Πx+V∗x , (4.1)

11

W∗ = Z∗Πw+V∗w , (4.2)

y∗ = X∗β 0+Z∗Πwγ j +v∗1, (4.3)

where (Z∗,v∗1,V∗x ,V

∗w) is drawn independently from the joint empirical distribution of

(Z, v1,Vx,Vw). The corresponding bootstrapping subset AR statisticsAR∗(b)(β 0, γ∗j ), b =

1, . . . , B, are computed as

AR∗(b)(β 0, γ∗j ) =

1L‖ S∗

(b)(β 0, γ

∗j ) ‖2, (4.4)

S∗(b)(β 0, γ

∗j ) = (Z∗′Z∗)−1/2Z∗′Y∗(β 0)r

∗j (r

∗′j Ω ∗

1Wr∗j )−1/2

, (4.5)

whereY∗(β 0) = (y∗(β 0) : W∗) and ˜r∗j = (1,−γ∗j )′;

3. The bootstrap test rejectsH0 if 1B ∑B

b=11[AR∗(b)(β 0, γ∗j ) > c∗ARj(α)] is less thenα ,

where c∗ARj(α) is the 1− α quantile of AR∗(b)(β 0, γ∗j ), i.e., the value that minimizes|

P∗[AR∗(b)(β 0, γ∗j )≤ τ]− (1−α) | overτ ∈R; see Andrews (2002).

In the reminder of the paper,Fn denotes the empirical distribution ofR∗n =

vech(

(Y∗′n ,Z∗′

n )′(Y∗′

n ,Z∗′n ))

conditional on Xn =

(Y′1, Z′

1), . . . , (Y′n, Z′

n)

, P∗

is the probabil-

ity under the empirical distribution function (conditional on Xn), and E∗

its corresponding

expectation operator. As in Section 3, we deal separately with the case where identification is

strong and the one where it is weak. Section 4.1 presents the results for strong identification.

4.1. Bootstrap consistency under strong identification

We first focus on the case in whichγ is identified and we provide an analysis of the behavior of

the proposed bootstrap subset AR statistics and the associated tests underH0. Lemma4.1states the

validity of high-order expansion for all bootstrap subset AR statistics considered and Theorem4.2

provides a high-order refinement of the size of the corresponding bootstrap subset AR tests.

Lemma 4.1 Suppose that Assumptions2.1 - 2.3 are satisfied with r≥ 1. Suppose further that H0

12

holds andΠw 6= 0 is fixed. Then we have:

supτ ∈R

∣GAR∗j

(τ)−GL−1(τ)−r

∑h=1

n−h

phARj

(τ ;Fn,β 0, γ j ,Πx,Πw)gL−1(τ)∣

∣= o(n−r) a.s.

for j ∈ LIML, 2SLS, phARj

is a polynomial inτ with coefficients depending onβ 0, γ j , Πx, Πw and

the moments of Fn, and GAR∗j

(τ) = P∗[AR∗(β 0, γ∗j )≤ τ] is the empirical cdf of AR∗(β 0, γ∗j ) evaluated

at τ ∈ R.

Remark. Lemma4.1 shows that the bootstrap estimate and the(r + 1)-term empirical Edge-

worth expansion in Theorem3.1 are asymptotically equivalent up to theo(n−r) order underH0.

Furthermore, the bootstrap makes an error of sizeO(n−1) underH0, which is smaller asn → +∞

than bothO(n−1/2) and the error made by the first-order asymptotic approximation. The bootstrap

provides a greater accuracy than theO(n−1/2) order because each subset AR statistic in (2.7) is a

quadratic function of a symmetric pivotal statistic [see Horowitz (2001, Ch. 52, eq. 3.13)] underH0

if γ is identified (Πw 6= 0 is fixed).

We can now prove the following theorem on the size of the testswhen bootstrap critical values

are used in the inference.

Theorem 4.2 Suppose that Assumptions2.1 - 2.3 are satisfied with r≥ 1. Suppose further that H0

holds andΠw 6= 0 is fixed. Then we have:

P[AR(β0, γ j)> c∗ARj] = α +o(n−1) for all j ∈ LIML, 2SLS ,

where c∗ARj≡ c∗ARj

(α) is the1−α quantile of the empirical distribution of AR∗(β 0, γ∗j ), i.e., the

value that minimizes| P∗[AR∗(β 0, γ∗j )≤ τ]− (1−α) | overτ ∈ R.

Remarks. (i) Theorem4.2 gives the conditions under which the bootstrap critical values for

AR∗(β 0, γ∗j ) yield level for theAR(β0, γ j) test that is correct throughO(n−1) underH0. So, the

bootstrap makes an error of sizeO(n−1) underH0 when γ is identified (Πw 6= 0 is fixed) for all

values ofΠx. So, the identification of the parameters specified by the subset null hypothesis (here

β ) does not affect the results of Theorem4.2.

13

(ii) Although Theorem4.2 focuses on the validity of the bootstrap underH0, there is no impedi-

ment to expanding the above results under the alternative hypothesis (β 6= β 0). In this case, we can

show that the tests are consistent when the bootstrap critical values are used in the inference and the

instruments are strong for bothX andW (i.e., if Πx 6= 0 andΠw 6= 0 are fixed). This is because the

asymptotic distribution ofAR(β 0, γ j) diverges whenβ 6= β 0, andΠx 6= 0, Πw 6= 0 are fixed. Note

that test consistency will still hold in this case if the empirical size-correctedcritical values were

used instead of the bootstrap critical values, as suggestedby Horowitz (1994). However, the tests

will have low power for weak values ofΠx even if γ is identified (Πw 6= 0 is fixed). These proofs

are omitted to shorten the exposition of our results.

We shall now focus on Staiger and Stock’s (1997)local to zeroweak instruments framework.

4.2. Bootstrap inconsistency under weak instruments

We now assume thatΠw = C0w/√

n, whereC0w ∈ RL is fixed (possibly zero), and we study the

validity of the bootstrap for the above subset AR tests. As inSection 3.2, we first characterize the

asymptotic behavior of the bootstrap statisticsγ∗j , r∗′

j Ω ∗1Wr∗j , andS∗(β 0, γ∗j ) underH0. Lemma4.3

presents the results.

Lemma 4.3 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw = C0w/√

n, where

C0w ∈RL is fixed. If for someδ > 0, E(‖Zi‖4+δ , ‖Vi‖2+δ )<+∞, then we have:

(a) γ∗j − γ j | Xnd→ σ1/2

εε , jσ−1/2

VwVw∆ B

j a.s.;

(b) r∗′

j Ω ∗1W r∗j | Xn

d→ σ εε, j

(

1−2ρVwε, j

∆ Bj +(∆ B

j )2)

a.s.;

(c) S∗(β 0, γ∗j ) | Xnd→(

1−2ρVwε, j

∆ Bj +(∆ B

j )2)−1/2

SBj a.s.

for j ∈ LIML, 2SLS where∆ Bj =

(

ΨB′Ψ B−κBj

)−1(

Ψ B′ψε, j −κBj ρ

Vwε, j

)

, whenκB2SLS= 0 and

κBLIML is the smallest root of the determinantal equation

(

ψε , j : Ψ B)′ (ψε, j : ΨB

)

−κΣρ , j

∣=

0, Ψ B = Ψ +ψVw, ψε , j = ψV1

− ψVw

(

γ0+σ1/2εε σ−1/2

VwVw∆ j

)

= ψε −σ1/2εε σ−1/2

VwVw∆ jψVw

, Σρ , j =

1 ρVwε, j

ρVwε, j

, ρVwε, j

= (σ εε , jσVwVw)−1/2σVwε , j , σ εε , j = σ εε

(

1−2ρVwε ∆ j +(∆ j)2)

, σVwε, j =

14

σVwε −σ1/2εε σ1/2

VwVw∆ j , SB

j = ψε, j −ΨB∆ Bj , and∆ j is given in Lemma3.2.

Remarks. (i) First, we note that the convergence results in Lemma4.3 (a)-(c) differ from those

in Lemma3.2 (a)-(c). This means that the bootstrap fails to mimic the asymptotic distributions of

all statistics ˜r ′jΩ1Wr j andS(β 0, γ j) underH0 whenγ j is used as the pseudo true value ofγ . Indeed,

while γ j − γ0d→ σ1/2

εε σ−1/2

VwVw∆ j in Lemma3.2-(a), we haveγ∗j − γ j | Xn

d→ σ1/2εε , jσ

−1/2

VwVw∆ B

j a.s. by

Lemma4.3-(a). Both limits can differ significantly since∆ j = (Ψ ′Ψ −κ j)−1(

Ψ ′ψε −κ jρVwε

)

6=

∆ Bj =

(

Ψ B′Ψ B−κBj

)−1(

ΨB′ψε , j −κBj ρ

Vwε, j

)

andσ εε 6= σ εε, j with probability 1.

(ii) The bootstrap failure is mainly due to the fact that replacing Πw and γ by Πw and γ j ,

respectively, in (2.4) generates an extra noise term that isadded to the original residuals when

re-sampled. This can lead to substantial differences between the limits in the original sam-

ple and those in the bootstrap sample. To be more explicit, observe that in the local to zero

weak instruments setup, we haveZ′W/√

n = (Z′Z/n)C0w + Z′Vw/√

nd→ QZC0w + ψZVw

under

Assumptions2.2-2.3. Meanwhile,Z∗′W∗/√

n = (Z∗′Z∗/√

n)Πw + Z∗′V∗w/

√n = (Z∗′Z∗/n)C0w +

(Z∗′Z∗/n)(Z′Z/n)−1(Z′Vw/√

n)+Z∗′V∗w/

√n | Xn

d→ QZC0w+ 2ψZVwa.s. under the conditions of

Lemma4.3. Hence, the bootstrap fails to mimic the limiting behavior of Z′W/√

n (therefore that of

the concentration factorW′PZW) under Staiger and Stock’s (1997)local to zeroweak instruments

asymptotic, and similarly for the other statistics (for example, see the difference betweenΨ andΨ B

in Lemma4.3).

We can prove the following theorem on the limiting distribution of AR∗(β 0, γ∗j ) underH0 and

weak instruments.

Theorem 4.4 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw =C0w/√

n, where

C0w ∈RL is fixed. If for someδ > 0, E(‖Zi‖4+δ , ‖Vi‖2+δ )<+∞, then we have:

AR∗(β 0, γ∗j ) | Xn

d→ ξ ∞AR∗j

=

(

1−2ρVwε

∆ Bj +(∆ B

j )2)−1/2

SBj

2

a.s., j ∈ LIML, 2SLS .

Remark. Theorem4.4 provides a characterization of the limiting distribution of the bootstrap

subset AR statistics underH0 similar to that in Theorem3.3. As can be seen, a straightforward

comparison confirms our previous analysis in Lemma4.3, i.e., AR∗(β 0, γ∗j ) andAR(β 0, γ j) have

15

different asymptotic behavior underH0. So, we can state the following theorem on the inconsistency

of the bootstrap for subset AR tests under weak instruments.

Theorem 4.5 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw =C0w/√

n, where

C0w ∈RL is fixed. If for someδ > 0, E(‖Zi‖4+δ , ‖V ′

i ‖2+δ )<+∞, then we have:

∣P[AR(β0, γ j)> c∗ARj

]−α∣

∣= Op(1) for all j ∈ LIML, 2SLS ,

where c∗ARj≡ c∗ARj

(α) is the1−α quantile of the empirical distribution of AR∗(β 0, γ∗j ).

Remarks. (i) Theorem4.5 shows that all subset tests do not have a correct size asymptotically

when the bootstrap critical values are used in the inference. So, the bootstrap is inconsistent when

the nuisance parameter,γ , that is not specified by the subset null hypothesis of interest, is weakly

identified.

(ii) In the econometric and statistical literature, subsampling is often considered as a more robust

resampling technique than bootstrap, and it is valid in manycases where bootstrap typically fails;

for example, see Andrews and Guggenberger (2009). In Section A.1 of the appendix, we show that

subsampling is invalid for theAR(β 0, γ j) subset tests under Staiger and Stock’s (1997)local to zero

weak instruments asymptotic.

We will now complete our analysis through a Monte Carlo experiment.

5. Monte Carlo experiment

We use simulation to examine the performance of both the standard and bootstrap subset AR tests

described above. The data generating process is described by (2.1) and (2.2) wherey, X andW are

n×1 vectors. Then rows of [ε : Vx : Vw] are i.i.d. normal with zero mean, and covariance matrix

Σ =

1 ρ ρ

ρ 1 ρ

ρ ρ 1

, where the endogeneityρ varies in0,0.1,0.5,0.9 . The L columns of the

instrument matrixZ, L ∈ 3,5,10,20 , areN(0, IL), independently from[ε : Vx : Vw]. Note thatZ

is fixed over the simulation experiment but the results do notchange qualitatively otherwise. The

16

true values ofβ andγ are set at 2 and−1, respectively. The reduced-form coefficient matrixΠxw=

[Πx : Πw] is chosen asΠxw = ( µ2

n‖ZC0‖)1/2Π0, whereΠ0 is obtained by taking the first 2 columns of

an identity matrix of orderL, andµ2 is the concentration parameter which describes the strength of

Z. In this experiment, we varyµ2 in 0, 13, 1000, whereµ2 = 0 is a complete non-identification

or irrelevant IVs setup,µ2 = 13 represents weak instruments (bothβ andγ are weakly identified),

andµ2 = 1000 implies strong instruments; see Guggenberger (2010) and Doko Tchatoka (2014) for

a similar parametrization. The rejection frequencies are computed using 1,000 replications for the

standard subset AR tests, while those of the bootstrap subset AR tests are obtained withN = 1,000

replications andB= 299 bootstrap pseudo-samples of sizen= 100. The nominal level of all tests

is set at 5%.

Table 1 presents the empirical rejection frequencies of both the standard and bootstrap subset

AR tests that use the restricted LIML and 2SLS estimators as plug-in estimators. The first column

of the table shows the statistics. The second column indicates the number of instruments (L) used

in the plug-in procedure. The other columns show, for each value of endogeneityρ and instrument

quality µ2, the rejection frequencies.

First, we note that when the restricted LIML estimator is used as a plug-in estimator and the

usual asymptoticχ2 critical values are applied, the resulting subset AR test isoverly conservative

with weak instruments (see columnsµ2 ∈ 0,13 in the table). These results are similar to those

in Doko Tchatoka (2014). However, the test has rejections close to the nominal 5% level when

identification is strong (see columnsµ2 = 1000 in the table), thus confirming our theoretical findings

in Section 3.1. Note however that the rejection frequenciesof this test are slightly greater than

the nominal 5% level as both endogeneity (ρ) and the number of instruments (L) increase (for

example, withρ = 0.9 andL = 20, the rejection frequency is around 7%). Meanwhile, the subset

AR test that uses the restricted 2SLS estimator as the plug-in based estimator is less conservative

than those with LIML when instruments are weak and the asymptotic χ2 critical values are applied

(see columnsµ2 ∈ 0,13 in the second block of Table 1). This is not surprising because we

always haveAR(β0, γ2SLS) ≥ AR(β0, γLIML

). In particular, the rejection frequencies of this test are

close to the nominal 5% level with weak instruments and smallendogeneity (see columnsρ ∈

17

0,0.1 and µ2 ∈ 0,13 in the table), but they are greater than 5% for large endogeneity (see

column ρ = 0.9 andµ2 ∈ 0,13). We also observe that this test over-rejects sometimes when

identification is strong. For example, the rejection frequencies whenµ2 = 1000 (strong instruments)

are 9.3%, 16.5%, and 40.7% forL = 5, 10, 20 instruments, respectively. So, while the subset AR

test that uses the restricted 2SLS estimator seems to outperform the one that uses LIML under weak

instruments and small endogeneity, its finite-sample size property is worse than the test with LIML

when identification is strong and endogeneity is large.

Second, we observe that bootstrapping does not improve the size property of either test when

identification is weak, as shown in columnsµ2 ∈ 0,13 of the last block of Table 1. This confirms

the inconsistency of the bootstrap for subset AR tests when identification is weak (see Section 4.2).

However, the bootstrap provides a better approximation of the size of the tests than the asymptotic

critical values when identification is strong and the numberof instruments is moderate, especially

in the case of restricted LIML estimator. Note that even in the case of restricted 2SLS estimator, the

bootstrap has improved the size of the test for a moderate or large number of instruments (L= 10,20)

and large endogeneity (ρ = 0.9). For example, the rejection frequencies of the test whenµ2 = 1000

andρ = 0.9 are 6.4% and 14.1% forL = 10,20, respectively. This represents a huge drop compared

with the usual asymptotic critical values where these rejection frequencies were 16.5%, and 40.7%,

respectively.

6. Conclusions

In this paper, we focus on linear IV regressions and study theasymptotic validity of the bootstrap

for the plug-in based subset AR type-tests. Specifically, wesuggest a bootstrap method similar to

those of Moreira et al. (2009) for the score test of the null hypothesis specified on the full set of

the structural parameters. We stress the fact that testing subset hypotheses is substantially more

complex than testing joint hypotheses on the full set of parameters, so, the validity of Moreira et

al.’s (2009) bootstrap for tests of subvectors is not obvious, especially when identification is weak.

Our analysis considers two plug-in subset AR statistics. The first one uses the restricted limited

information maximum likelihood (LIML) as the plug-in estimator of the nuisance parameters, and

18

Table 1. Rejection frequencies (in %) at 5% nominal level

Asymptoticχ2 critical valuesρ = 0 ρ = 0.1 ρ = 0.5 ρ = 0.9

Statistics L ↓ µ2 → 0 13 1000 0 13 1000 0 13 1000 0 13 1000

Restricted LIMLAR 3 0.4 0.9 5.2 0.3 0.9 5.3 0.1 0.5 5.0 0.3 1.8 5.2- 5 0.3 0.2 5.4 0.3 0.3 4.1 0.1 0.3 5.6 0.3 0.9 5.6- 10 0.3 0.3 6.2 0.1 0.0 5.7 0.5 0.2 5.1 0.3 0.6 6.5- 20 0.4 0.2 4.3 0.3 0.4 4.6 0.4 0.6 6.2 0.2 0.5 7.0

Restricted 2SLSAR 3 3.0 3.5 6.4 2.3 3.4 5.6 2.5 4.5 5.6 3.2 12.9 6.4- 5 3.5 3.4 6.8 3.4 2.8 7.0 3.1 5.0 5.5 3.5 9.5 9.3- 10 3.8 5.5 5.3 4.9 3.4 4.8 4.3 4.8 8.4 4.4 8.8 16.5- 20 6.5 6.3 7.3 6.3 6.9 7.3 5.5 8.2 11.6 7.7 7.7 40.7

Bootstrap critical valuesρ = 0 ρ = 0.1 ρ = 0.5 ρ = 0.9

Statistics L ↓ µ2 → 0 13 1000 0 13 1000 0 13 1000 0 13 1000

Restricted LIMLAR 3 0.7 1.2 5.8 0.4 0.7 4.8 1.3 0.7 4.8 1.6 1.0 4.9- 5 0.2 0.4 5.4 0.6 0.4 4.3 0.7 0.3 5.6 1.4 1.5 5.4- 10 0.3 0.1 5.1 0.4 0.0 5.2 0.3 0.4 4.7 0.3 0.3 5.1- 20 0.2 0.1 3.2 0.1 0.1 3.1 0.0 0.6 3.6 0.2 0.0 3.8

Restricted 2SLSAR 3 4.9 5.9 6.8 3.2 5.4 4.1 3.6 5.5 3.6 3.5 4.3 7.6- 5 2.2 3.2 6.7 4.0 2.3 4.1 2.2 10.0 2.1 0.6 10.7 7.3- 10 1.6 6.7 4.4 1.7 3.7 2.8 2.5 4.8 7.4 1.6 4.7 6.4- 20 4.5 3.2 5.7 4.4 3.3 3.3 2.8 2.7 6.2 4.0 5.6 14.1

19

the second uses the restricted two-stage least squares (2SLS) as the plug-in method.

We provide a characterization of the asymptotic distributions of both the standard and proposed

bootstrap AR statistics under the subset null hypothesis ofinterest. Our results provide some new

insights and extensions of earlier studies. We show that when identification is strong, the boot-

strap provides a high-order approximation of the null limiting distributions of both plug-in subset

statistics. However, the bootstrap is inconsistent when instruments are weak. This contrasts with

the bootstrap of the AR statistic of the null hypothesis specified on the full vector of structural pa-

rameters, which remains valid even when identification is weak; see Moreira et al. (2009). The

inconsistency of bootstrap is mainly due to its failure to mimic the concentration parameter that

characterizes the strength of the identification of the nuisance parameters (which are not specified

by the null hypothesis of interest) when instruments are weak. Furthermore, we show that alter-

native resampling techniques, such as subsampling, which usually offer a good approximation of

the size of the tests in many cases where bootstrap typicallyfails [see Andrews and Guggenberger

(2009)], are also invalid under Staiger and Stock’s (1997)local to zeroweak instruments asymp-

totic; see Section A.1 of the appendix. We present a Monte Carlo experiment that confirms our

theoretical findings.

A. Appendix

We begin by establishing the inconsistency of the subsampling method. The subsequent subsections

will present the supplemental lemmata and proofs.

A.1. Subsampling inconsistency under weak instruments

In this section, we establish the invalidity of subsamplingfor AR(β0, γ j), j ∈ LIML, 2SLS , under

Staiger and Stock’s (1997)local to zeroweak instruments asymptotic.

Let b be the subsample size,GbARj

(τ) the empirical distribution function of the subsample statis-

ticsARbl (β 0, γ j,b) : l = 1, ..., qn, evaluated atτ ∈R, andcb

ARj≡ cb

ARj(α) the 1−α sample quantile

of GbARj

(·), whereqn is the number of different subsamples of sizeb. With i.i.d. observations,

there areqn = n!/(n−b)!b! different subsamples of sizeb. With time series observations, there are

20

qn = n− b+ 1 subsamples each consisting ofb consecutive observations. We assume thatb > L.

The subsampled AR statistic,ARb,l (β 0, γ j,b), is defined as

ARbl (β 0, γ j,b) =

1L‖ Sb

l (β 0, γ j,b) ‖2, (A.1)

where Sbl (β 0, γ j,b) =

(

Z′bZb)−1/2

Z′bYb(β 0)r j,b

(

r ′j,bΩWb r j,b

)−1/2, ΩWb = 1

n−LY′b(β 0)MZ,bY′

b(β 0),

Yb(β 0) = [yb−Xbβ 0 : Wb], r j,b = (1,−γ j,b)′, j ∈ 2SLS,LIML, and

γ j,b =[

W′b

(

PZb − κ j,bMZb

)

Wb]−1

W′b

(

PZb − κ j,bMZb

)

(yb−Xbβ 0) (A.2)

with κ2SLS,b = 0 andκLIML,b = (n−L)−1κLIML,b whereκLIML,b is the smallest root of the characteristic

polynomial |κΩWb − Yb(β 0)′PZbYb(β 0)| = 0. As in the bootstrap case, the subsampling subset AR

test based onARbl (β 0, γ j,b) rejectsH0 if

ARbl (β 0, γ j,b)> c j,b(1−α). (A.3)

In what follows,Pb denotes the probability under the subsampling empirical distribution function

conditional onXn =

(Y′1, Z′

1), . . . , (Y′n, Z′

n)

andEb its corresponding expectation operator.

LemmaA.1 characterizes the asymptotic null distributions of the subsampled statisticsγ j,b,

r ′j,bΩWb r j,b, and Sb,l (β 0, γ j,b) under Staiger and Stock’s (1997)local to zeroweak instruments

setup, while TheoremA.2 is concerned with the behavior of the subsampled subset AR statistics

ARb,l (β , γ j,b).

Lemma A.1 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw = C0w/√

n, where

C0w ∈ RL is fixed. If b→ ∞, and b/n→ 0 as n→ ∞, then the following convergence holds jointly

for j ∈ LIML, 2SLS :

(a) γ j,b− γ0 | Xnd→ σ1/2

εε σ−1/2

VwVw∆ S

j a.s.;

(b) r ′j,bΩWb r j,b | Xnd→ σ εε

(

1−2ρVWε

∆ Sj +(∆ S

j )2)

a.s.;

(c) Sbl (β 0, γ j,b) | Xn

d→(

1−2ρVwε

∆ Sj +(∆ S

j )2)−1/2

SSj a.s.,

21

where∆ Sj = (ψ ′

VwψVw

−κSj )−1(ψ ′

Vwψε −κS

j ρVwε),Ψ S= ψVW

, κS2SLS

= 0 andκSLIML

is the smallest root

of the determinantal equation∣

(

ψε : ψVw

)′ (ψε : ψVw

)

−κΣρ

∣= 0; SSj = ψε −ψVw

∆ Sj .

Remark. As for the case of bootstrap [see Lemma4.3(a)-(c)], the convergence results in Lemma

A.1 (a)-(c) differ from that of Lemma3.2(a)-(c). So, the subsampling fails to mimic the asymptotic

distributions of the statistics ˜r ′jΩ1W r j andS(β 0, γ j) underH0 and weak instruments.

We can now state the following theorem on the inconsistency of subsampling for the subset AR

tests.

Theorem A.2 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw =C0w/√

n, where

C0w ∈RL is fixed. If b→+∞, and b/n→ 0 as n→+∞, then we have:

ARb,l (β 0, γ j,b) | Xnd→

(

1−2ρVwε

∆ Sj +(∆ S

j )2)−1/2

SSj

2

a.s. for all j ∈ LIML, 2SLS .

Remark. TheoremA.2 confirms our previous analysis in LemmaA.1. It is straightforward to

see that the limiting distributions of TheoremA.2 are different from those in Theorem3.3, thus

indicating the subsampling invalidity with weak instruments.

An alternative subsampling method that is used in the literature is the jackknife resampling

technique; see Wu (1990). This method differs from the previous subsampling procedure only by

assuming thatb → +∞ andb/n → f > 0, asn → +∞. Berkowitz, Caner and Fang (2012) show

that this method yields a test that controls the level of the AR test for full vector hypothesis without

identifying assumptions on model parameters when IVs violate locally the orthogonality conditions.

LemmaA.3 and TheoremA.4 show that this method is also invalid for the subset AR type-tests

under weak instruments.

Lemma A.3 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw = C0w/√

n, where

C0w ∈ RL is fixed. If b→+∞, and b/n→ f > 0 as n→+∞, then the following convergence holds

jointly for j ∈ LIML, 2SLS :

(a) γ j,b− γ0 | Xnd→ σ1/2

εε σ−1/2

VwVw∆ J

j a.s.;

(b) r ′j,bΩW,br j,b | Xnd→ σ εε

(

1−2ρVwε

∆ Jj +(∆ J

j )2)

a.s.;

22

(c) Sbl (β 0, γ j,b) | Xn

d→(

1−2ρVwε

∆ Jj +(∆ J

j )2)−1/2

SJj a.s.,

where ∆ Jj = (Ψ J′Ψ J − κJ

j )−1(Ψ J′ψε − κJ

j ρVwε), Ψ J = f σ−1/2

VwVwQ1/2

Z C0W + ψVw= fΨ +

(1 − f )ψVw, κJ

2SLS= 0 and κJ

LIMLis the smallest root of the determinantal equation

(

ψε : Ψ J)′ (ψε : Ψ J

)

−κΣρ

∣= 0; SJj = ψε −ΨJ∆ J

j .

Theorem A.4 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw =C0w/√

n, where

C0w ∈RL is fixed. If b→+∞, and b/n→ f > 0 as n→+∞, then we have:

ARb,l (β 0, γ j,b) | Xnd→

(

1−2ρVwε

∆ Jj +(∆ J

j )2)−1/2

SJj

2

a.s. for all j ∈ LIML, 2SLS .

The inconsistency of the jackknife method follows straightforwardly by observing that the lim-

iting distribution ofARb,l (β 0, γ j,b) in TheoremA.4 differs significantly from those of Theorem3.3

for all j ∈ LIML, 2SLS .

A.2. Supplemental lemmata

Lemma A.5 Suppose that Assumptions2.2-2.3 a and H0 are satisfied and thatΠw 6= 0 is fixed.

Then for some integer r≥ 1, we have:

supτ ∈R | P[S(β 0, γ j)≤ τ ]−Φ(τ)−∑rh=1n

−h/2ph

Sj(τ ;F,β 0,γ ,Πx,Πw)φ (τ) |= o(n−r/2)

for all j ∈ LIML, 2SLS , whereΦ(·) and φ(·) are the cdf and pdf of a standard normal random

variable, phSj

are polynomials inτ with coefficients depending onβ 0, γ , Πx, Πw and the moments of

the distribution F ofRn.

Lemma A.6 Suppose that Assumptions2.2-2.3 a and H0 are satisfied and thatΠw 6= 0 is fixed.

Then for some integer r≥ 1, we have:

supτ ∈R | P∗[S∗(β 0, γ∗j )≤ τ]−Φ(τ)−∑rh=1n

−h/2ph

Sj(τ ;Fn,β 0, γ j ,Πx,Πw)φ (τ) |= o(n−r/2)

for all j ∈ LIML, 2SLS , whereΦ(·) and φ(·) are the cdf and pdf of a standard normal random

variable, phSj

are polynomials inτ with coefficients depending onβ 0, γ j , Πx, Πw and the moments

of the distribution Fn of R∗n.

23

Lemma A.7 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. If Πw = C0w/√

n, where

C0w ∈RL is fixed, then for j∈ 2SLS,LIML we have:

(a) E∗[

(v∗1,i −V∗′w,i γ j)

2]

d→ σ εε, j = σ εε(

1−2ρVwε ∆ j +(∆ j)2)

a.s.;

(b) E∗[

V∗w,i(v

∗1,i −V∗′

w,i γ j)]

d→ σVwε , j = σVwε −σ1/2εε σ1/2

VwVw∆ j a.s.;

(c) E∗[

V∗w,iV

∗′w,i

]

p→ σVwVwa.s.

Lemma A.8 Suppose Assumptions2.2-2.3 are satisfied and H0 holds. LetΠw = C0w/√

n, where

C0w ∈RL is fixed. If, for someδ > 0, E(‖ Zi ‖4+δ ,‖Vi ‖2+δ )< ∞, then we have:

(

Z∗′Z∗n

)−1/2(Z∗′V∗√

n

)

(

ΩW)−1/2

n−1/2(

n−1M∗′1n−n−1M′

1n

)

| Xn →d

ψV1

ψVw

ψm

∼ N

I2L 0

0 Ωmm

a.s.,

where M= (m1, . . . ,mn), mi = vech(ZiZ∗i ) ∈ RL(L+1)/2, Ωmm= Var(mi), and M∗ = (m∗

1, . . . ,m∗n),

m∗i = vech(Z∗

i Z∗′i ) are the bootstrap counterparts of M and m; and1n is a n×1 vector of ones.

A.3. Proofs

PROOF OFLEMMA A.5 To shorten our exposition, we will focus on the proof forS(β 0, γ2SLS).

The proof forS(β 0, γLIML) can be deduced easily by following similar steps to those presented here.First, observe that we can writeS(β 0, γ2SLS) =

√nN/D under the subset null hypothesisH0, where

N =

(

Z′Zn

)−1/2[Z′y(β0)

n−(

Z′Wn

)

γ2SLS

]

=

(

Z′Zn

)−1/2

Z′εn

−(

Z′Wn

)

[

(

W′Zn

)(

Z′Zn

)−1(Z′Wn

)

]−1(

W′Zn

)(

Z′Zn

)−1(Z′εn

)

, (A.4)

D =y′(β0)MZy(β 0)

n−L−2

y′(β0)MZWn−L

γ2SLS+W′MZW

n−L(γ2SLS)

2, (A.5)

γ2SLS =

[

(

W′Zn

)(

Z′Zn

)−1(Z′Wn

)

]−1(

W′Zn

)(

Z′Zn

)−1(Z′y(β 0)

n

)

, and y(β0) = y−Xβ 0.

So, underH0 and from (A.4)-(A.5), we can writeS(β 0, γ2SLS) as

S(β 0, γ 2SLS) =

√nH(Rn) =

√n[H(Rn)−H(µ)] (A.6)

24

whereH(·) is a real-valued Borel measurable function onRK with derivatives of orders≥ 3 and

lower, being continuous on the neighborhood ofµ = E(Rn) whenΠw 6= 0 is fixed, andH(µ) = 0

underH0. Note that the derivatives of orders≥ 3 and lower ofH(·) are not well-defined when

Πw = 0 and does not even exist ifΠw = C0wcn for any sequencecn ↓ 0 [similar to Moreira et al.

(2009, footnote 2) and Doko Tchatoka (2013)]. The result in LemmaA.5 follows by applying

Bhattacharya and Ghosh (1978, Theorem 2) to (A.6) withs−2= r.

PROOF OFTHEOREM 3.1 From (2.7), we haveL×AR(β0, γ j) =∥

∥S(β 0, γ j)∥

2. We want to ap-

proximateP[AR(β 0, γ j)≤ τ] uniformly in τ underH0. First, we can writeP[AR(β0, γ j)≤ τ] as:

P[AR(β0, γ j)≤ τ ] = P[AR(β0, γ j) ∈ Cτ ],

whereCτ =

x∈ R;x2 ≤ τ

are convex sets. From Bhattacharya and Rao (1976, Corollary3.2), we

have supτ∈R Φ((∂Cτ)ε) ≤ d.ε for some constantd andε > 0. So, Bhattacharya and Ghosh (1978,

Theorem 1) holds withB= Cτ andWn = S(β 0, γ j), j ∈ 2SLS,LIML . By using the approximation

of P[S(β 0, γ j) ≤ τ ] in LemmaA.5 and the definition ofCτ , Theorem3.1 follows directly from the

fact that the odd terms of the quadratic expansion are even.

PROOF OFLEMMA 3.2 (a) We begin with the result forγ2SLS. We have

(

σVwVw

)−1W′PZW =

(

σVwVw

)−1(Π ′wZ′ZΠw+V ′

wZΠw+Π ′wZ′Vw+V′

wPZVw)

=(

σ−1/2VwVw

C0w

)′(Z′Zn

)

(

σ−1/2VwVw

C0w

)

+

(

σ−1/2VwVw

Z′Vw√n

)′(

σ−1/2VwVw

C0w

)

+(

σ−1/2VwVw

C0w

)′(

σ−1/2VwVw

Z′Vw√n

)

+

(

σ−1/2VwVw

Z′Vw√n

)′(Z′Zn

)−1(

σ−1/2VwVw

Z′Vw√n

)

d→ C′0wσ−1/2

VwVwQZσ−1/2

VwVwC0w+ψ ′

Vwσ−1/2

VwVwQ1/2

Z C0w+C′0wσ−1/2

VwVwQ1/2

Z ψVw+ψ ′

VwψVw

= (Q1/2Z σ−1/2

VwVwC0w+ψVw

)′(Q1/2Z σ−1/2

VwVwC0w+ψVw

) =Ψ ′Ψ .

25

By the same token, we have

(

σ εεσVwVw

)−1/2W′PZε =

(

σ εε σVwVw

)−1/2(Π ′wZ′ε +V ′

wPZε)

=(

σ−1/2VwVw

C0w

)′(

Z′Zn

)1/2(Z′Zn

)−1/2(

σ−1/2εε

Z′ε√n

)

+

(

σ−1/2VwVw

Z′Vw√n

)′(Z′Zn

)−1(

σ−1/2εε

Z′ε√n

)

d→ C′0wσ−1/2

VwVwQ1/2

Z ψε +ψ ′Vw

ψε =Ψ ′ψε .

Because,γ2SLS− γ0 = (W′PZW)−1W′PZε , the result follows immediately.

For the case ofγLIML , we note thatκLIML is the smallest root of the characteristic polynomial

∣κΩW − (y(β 0) : W)′ PZ (y(β 0) : W)∣

∣= 0.

Observe thatPZ (y(β 0) : W) = PZ

ZΠw(γ :)+ (ε : VW)

1 0

γ 1

. Substituting this into the char-

acteristic polynomial, and pre-multiplying by

1 0

−γ 1

′∣∣

and post-multiplying by

1 0

−γ 1

yields∣

∣κΣ − (ε : ZΠW +VW)′ PZ (ε : ZΠW +VW)∣

∣= 0,

whereΣ =

1 0

−γ 1

ΩW

1 0

−γ 1

σ εε σ εVw

σVwε ΣVw

.

From Theorem 1(a) and Theorem 2 in Staiger and Stock (1997), we get

(

σ−1εε σVwVw

)1/2(γLIML − γ0)

d→ ∆LIML =

Ψ ′Ψ −κLIML

−1

Ψ ′ψε −κLIML ρVwε

,

whereκLIML is the smallest root of the characteristic polynomial

∣(ψε : Ψ )′ (ψε : Ψ )−κΣρ∣

∣= 0, Σρ =

1 ρVwε

ρVwε

1

.

26

Thus, the result follows.

(b) Similarly, we have

r ′jΩ1W r j = (n−L)−1(y(β 0)−Wγ)′ MZ (y(β 0)−Wγ)

= (n−L)−1ε ′ε − (n−k)−1(γ j − γ0

)′W′MZε − (n−L)−1ε ′MZW

(

γ j − γ0

)

+(n−L)−1(γ j − γ0

)′W′MZW

(

γ j − γ0

)

d→ σ εε −(

σ1/2εε σ−1/2

VwVw∆ j

)′σVwε −σ εVW

(

σ1/2εε σ−1/2

VwVw∆ j

)

+(

σ1/2εε σ−1/2

VwVw∆ j

)′σVwVw

(

σ1/2εε σ−1/2

VwVw∆ j

)

= σ εε(

1−2ρVwε∆ j +(∆ j)2)

(c). First, note thatS(β 0, γ j) = (Z′Z)−1/2 Z′ (y(β 0) : W) r j(r ′jΩ1Wr j)−1/2. However, we have

(

Z′Z)−1/2

Z′ (y(β 0) : W) r j =

(

Z′Zn

)−1/2 Z′ε√n+

(

Z′Zn

)−1/2 Z′W√n

(

γ0− γ j

)

= σ1/2εε

σ−1/2εε

(

Z′Zn

)−1/2 Z′ε√n+

(

σ−1/2VwVw

(

Z′Zn

)1/2

Cw

+σ−1/2VwVw

(

Z′Zn

)−1/2 Z′VW√n

)

σ−1/2εε σ1/2

VwVw

(

γ0− γ j

)

d→ σ1/2εε

ψε −Ψ(

Ψ ′Ψ −κ j)−1(

Ψ ′ψε −κ jρVwε

)

= σ1/2εε (ψε −Ψ∆ j).

Combing this with (b), the result follows.

PROOF OFTHEOREM 3.3 The proof follows immediately from equation (2.7) and Lemma3.2.

PROOF OFLEMMA A.6 The proof follows the same steps as Theorem 3 of Moreira et al.(2009)

27

and is therefore omitted.

PROOF OFLEMMA 4.1 The proof follows the same steps as Theorem 3 of Moreira et al.(2009)

and is therefore omitted.

PROOF OFTHEOREM4.2 The proof is similar to Hall and Horowitz (1996) by exploiting Theorem

3.1and Lemma4.1, hence is omitted.

PROOF OFLEMMA A.7 (a). First, we have

E∗[

(v∗1,i −V∗′W,i γ j)

2]

= n−1n

∑i=1

(

v1,i −V ′w,i γ j

)2

= n−1v′1v1−2(

n−1v′1VW)

γ j + γ j

(

n−1V ′wVw)

(γ j)2

d→ σ εε(

1−2ρVwε ∆ j +(∆ j)2) .

(b). Similarly to (a), we have

E∗[

V∗w,i(v

∗1,i −V∗′

w,i γ j)]

= n−1V ′wv1−

(

n−1V ′wVw

)

γ j

= n−1V ′wv1−

(

n−1V ′wVw

)

γ0+(

n−1V ′wVw

)

(γ0− γ j)

d→ σVwε −σ1/2εε σ1/2

VwVw∆ j .

(c). By the same means, we findE∗(

V∗w,iV

∗′w,i

)

= n−1V ′wVw

p→ σVwVwa.s.

PROOF OF LEMMA A.8 The proof follows closely Lemma A.2 of Moreira et al. (2009).Let

(c′,d′)′ be a nonzero vector withc= (c′1,c′w)

′ ∈ R2L andd ∈ R

L(L+1)/2. Define

X∗n,i =

c′ (V∗i ⊗Z∗

i )+d′ (m∗i − m)

/√

n,

whereV∗i =

(

v∗1,i ,V∗′w,i

)′is the ith bootstrap draw of the (re-centered) reduced-form residuals, m=

28

n−1 ∑ni=1 mi. We use the Cramer-Wald device to verify the condition of theLiapunov Central Limit

Theorem forX∗n,i.

(a)E∗[X∗n,i] = 0 follows from the independence of the bootstrap draws andE∗[V∗

i ] = 0.

(b) By noting thatE∗[V∗

i V∗′i ] = n−1V ′V, whereV = (vi : Vw), andE∗[Z∗

i Z∗′i ] = n−1Z′Z; and that

n−1V ′Vp→ ΩW, andn−1Z′Z

p→ QZ, it follows immediately that

E∗[X∗2

n,i ] = n−1c′[(

n−1V ′V)

⊗(

n−1Z′Z)]

c+d′Ωmm

<+∞

whereΩmm= n−1 ∑ni=1(mi − m)(mi − m)′.

(c) By using the same argument as in Lemma A.2 of Moreira et al.(2009), we have

limn→∞ ∑ni=1E

∗[

∣X∗n,i

2+δ]

= 0 a.s. SinceE∗[

n−1Z∗′Z∗]

= n−1Z′Z andn−1Z∗′Z∗−n−1Z′Z |Xna.s.→

0 by the Markov law of large numbers (LLN), we also haven−1Z∗′Z∗ |Xnp→ QZ a.s. In addition, it

is easy to see thatΩWp→ ΩW. LemmaA.8 follows by the Liapunov Central Limit Theorem.

PROOF OFLEMMA 4.3

(a). As before, we begin with the 2SLS estimator. LetCB0w = C0w+

(

n−1Z′Z)−1(

n−1/2Z′Vw)

,

then

(

E∗[

V∗w,iV

∗′w,i

])−1W∗′PZ∗W∗

=(

E∗[

V∗w,iV

∗′w,i

])−1(

Π ′wZ∗′Z∗Πw+V∗′

w Z∗Πw+ Π ′wZ∗′V∗

w +V∗′w PZ∗V∗

w

)

=(

E∗[

V∗w,iV

∗′w,i

])−1

CB′0w

(

Z∗′Z∗

n

)

CB0w+

(

V∗′w Z∗√

n

)

CB0w+CB′

0w

(

Z∗′V∗w√

n

)

+

(

V∗′w Z∗√

n

)(

Z∗′Z∗

n

)−1(

Z∗′V∗w√

n

)

and similarly, we have

(

E∗ [(v∗1,i −V∗

w,i γ j)2]E∗[

V∗w,iV

∗′w,i

])−1/2W∗′PZ∗

(

V∗1 −V∗

wγ j

)

=(

E∗ [(V∗

1,i −V∗w,i γ j)

2]E∗[

V∗w,iV

∗′w,i

])−1/2

Π ′wZ∗′ (V∗

1 −V∗wγ j

)

+V∗′w PZ∗

(

V∗1 −V∗

w γ j

)

29

=(

E∗ [(V∗

1,i −V∗w,i γ j)

2]E∗[

V∗w,iV

∗′w,i

])−1/2

CB′0w

(

Z∗′ (V∗1 −V∗

wγ j

)

√n

)

+

(

V∗′w Z∗√

n

)(

Z∗′Z∗

n

)−1(Z∗′ (V∗

1 −V∗w γ j

)

√n

)

.

From LemmataA.7-A.8, we haven−1Z∗′Z∗ | Xna.s.→ E(ZiZ′

i ) = QZ, and sinceCB0w = C0w +

(

n−1Z′Z)−1(

n−1/2Z′Vw)

, it follows that

(

E∗[

V∗w,iV

∗′w,i

])−1W∗′PZ∗W∗ | Xn

d→(

σ−1/2VwVw

Q1/2Z C0w+2ψVw

)′(σ−1/2

VwVwQ1/2

Z C0w+2ψVw

)

a.s.

=(

Ψ +ψVw

)′ (Ψ +ψVw

)

≡Ψ B′Ψ B

Similarly, we have

(

E∗ [(V∗

1,i −V∗w,i γ j)

2]E∗[

V∗w,iV

∗′w,i

])−1/2W∗′PZ∗

(

V∗1 −V∗

w γ j

)

| Xnd→Ψ B′

ψε,2SLSa.s.,

whereψε,2SLS= ψV1−ψVw

(γ0+σ1/2εε σ−1/2

VwVw∆2SLS) = ψε −σ1/2

εε σ−1/2VwVw

∆2SLSψVw.

For LIML, observe thatκ∗LIML is the smallest root of

∣κΣ ∗

LIML −(

V∗1 −V∗

wγLIML : Z∗Πw+V∗w

)′PZ∗(

V∗1 −V∗

wγLIML : Z∗Πw+V∗w

)

∣= 0,

where we haveΣ ∗LIML =

1 0

−γLIML 1

Ω ∗W

1 0

−γLIML 1

.

So, by following the same steps as in the case of 2SLS and usingLemma3.2, we get (conditionally

onXn)

E∗[

V∗w,iV

∗′w,i

]

E∗(V∗1,i −V∗

w,i γLIML)2

1/2

(

γ∗LIML

− γLIML

) d→ (Ψ B′Ψ B−κB

j )−1(Ψ B′

ψε,LIML

−κBj ρ

Vwε,LIML)

30

a.s., whereκBj is the smallest root of the characteristic polynomial

(

ψε,LIML : Ψ B)′ (ψε,LIML : Ψ B)−κΣρ ,LIML

∣= 0, Σρ ,LIML =

1 ρVwε,LIML

ρVwε,LIML

1

.

This establishes Lemma4.3-(a).

(b) Similarly, we have

r∗′

j Ω ∗1Wr∗j = (n−L)−1(y∗(β 0)−W∗γ∗j

)′MZ∗

(

y∗(β 0)−W∗γ∗j)

= (n−L)−1(V∗1 −V∗

wγ j

)′ (V∗

1 −V∗w γ j

)

− (n−L)−1(γ∗j − γ j

)′W∗′MZ∗

(

V∗1 −V∗

wγ j

)

−(n−L)−1ε∗′MZ∗W∗ (γ∗j − γ j

)

+(n−L)−1(γ∗j − γ j

)′W∗′MZ∗W∗ (γ∗j − γ j

)

.

Sincen−1Z∗′Z∗ | Xna.s.→ QZ andΩ ∗

1W | Xna.s.→ ΩW, hence we have

r∗′

j Ω ∗1Wr∗j | Xn

d→ σ εε, j

(

1−2ρεVW, j∆Bj +(∆ B

j )2)

a.s.

(c) Again, we have

(

Z∗′Z∗)−1/2

Z∗′ (y∗(β 0) : W∗) r∗j

=

(

Z∗′Z∗

n

)−1/2Z∗′ (V∗

1 −V∗wγ j

)

√n

+

(

Z∗′Z∗

n

)−1/2Z∗′W∗ (γ j − γ∗j

)

√n

= σ1/2εε , j

σ−1/2εε , j

(

Z∗′Z∗

n

)−1/2Z∗′ (V∗

1 −V∗wγ j

)

√n

+

σ−1/2VwVw

(

Z∗′Z∗

n

)1/2

CB0w

+σ−1/2VwVw

(

Z∗′Z∗

n

)−1/2Z∗′V∗

w√n

σ−1/2εε , j σ1/2

VwVw

(

γ j − γ∗j)

+op(1).

Thus, by proceeding as in (a) and (b), we get

(

Z∗′Z∗)−1/2

Z∗′ (y∗(β 0) : W∗) r∗j | Xnd→ σ1/2

εε , j

ψε , j −ΨB∆ Bj

a.s.,

31

where∆ Bj =

(

Ψ B′ΨB−κBj

)−1(

Ψ B′ψε , j −κBj ρ

Vwε, j

)

. The final result follows by combining this

with (b).

PROOF OFTHEOREM 4.4 The proof follows immediately from equation (4.4) and Lemma4.3.

PROOF OFTHEOREM 4.5 From Theorems3.3and4.4, it is clear that

|P∗[AR∗(β 0, γ∗j )≤ τ ]−P[AR(β0, γ j)≤ τ] |=|P[AR(β 0, γ j)> τ ]−P

∗[AR∗(β 0, γ∗j )> τ ] |=Op(1)a.s.

for all τ ∈R. By choosingτ = c∗AR∗j(α)≡ c∗AR∗

j, we haveP∗[AR∗(β 0, γ∗j )> c∗AR∗

j] = α so that we get

∣P[AR(β0, γ j)> c∗AR∗j]−α

∣= Op(1) a.s.

PROOF OF LEMMA A.1 (a). From the subsampling algorithm, we have that asb → +∞ and

fn = b/n → 0, as n → +∞. Thus, we getb−1Z′bZb

a.s.→ QZ given the realized sampleXn. Also,

conditional onXn, we have

b−1/2Z′bVw,b = b1/2

(

b−1b

∑i=1

Zi,bVwi,b−Eb(Zi,bVwi,b)

)

+b1/2E

b(Zi,bVwi,b)

= b1/2

(

b−1b

∑i=1

Zi,bVwi,b−n−1n

∑i=1

ZiVwi

)

+ f 1/2n n1/2

n

∑i=1

ZiVwi

d→ ψZVwa.s.

Similarly, we haveb−1/2Z′bεb | Xn

d→ ψZε a.s. and

b−1/2Z′bWb = b−1/2Z′

b(ZbΠW +Vw,b)

= f 1/2n(

b−1Z′bZb)

C0w+b−1/2Z′bVw,b

32

| Xnd→ ψZVw

a.s.

So, we get

γ2SLS,b− γ0

=(

W′bPZ,bWb

)−1W′

bPZ,bεb

=(

b−1/2W′bZb

)

(

b−1Z′bZb)−1(

b−1/2Z′bWb

)−1(

b−1/2W′bZb

)

(

b−1Z′bZb)−1(

b−1/2Z′bεb

)

| Xnd→

(

ψ ′ZVw

Q−1Z ψZVw

)−1ψ ′ZVW

Q−1Z ψZε = σ1/2

εε σ−1/2VwVw

∆ S2SLSa.s.

where∆ S2SLS=

(

ψ ′Vw

ψVw

)−1ψ ′Vw

ψε . The result for LIML is deduced similarly.

(b). From (a), we have

r ′j,bΩ1W,br j,b = (b−L)−1ε ′bεb− (b−L)−1(γ j,b− γ0

)

W′bMZ,bεb− (b−L)−1ε ′

bMZbWb(

γ j,b− γ0

)

+(b−L)−1W′bMZ,bWb

(

γ j,b− γ0

)2

| Xnd→ σ εε −

(

σ1/2εε σ−1/2

VwVw∆ S

j

)

σVwε −σ εVw

(

σ1/2εε σ−1/2

VwVw∆ S

j

)

+σVwVw

(

σ1/2εε σ−1/2

VwVw∆ S

j

)2a.s.

= σ εε(

1−2ρVwε ∆ Sj +(∆ S

j )2) .

(c). We have

(

Z′bZb)−1/2

Z′b(yb(β 0) : Wb) r j,b =

(

Z′bZb

b

)−1/2 Z′bεb√

b+

(

Z′bZb

b

)−1/2 Z′bWb√

b

(

γ0− γ j,b

)

| Xnd→ σ1/2

εε

[

ψε −ΨS(Ψ S′Ψ S−κSj )−1(Ψ S′ψε −κS

j ρVwε)]

a.s.

= σ1/2εε (ψε −ΨS∆ S

j )

The result follows by combining this with (b).

PROOF OFTHEOREM A.2 The proof follows immediately from equation (A.3) and LemmaA.1.

33

PROOF OFLEMMA A.3

The proof is similar to those of LemmaA.1 by noting thatb/n→ f > 0 asn→+∞. Therefore,

it is omitted.

PROOF OFTHEOREM A.4 The proof follows immediately from equation (A.3) and LemmaA.3.

References

Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of a single equation in a complete

system of stochastic equations. Annals of Mathematical Statistics 20, 46–63.

Andrews, D. W., Guggenberger, P., 2009. Validity of subsampling and “ plug-in asymptotic” infer-

ence for parameters defined by moment inequalities. Econometric Theory 25, 669–709.

Andrews, D. W. K., 2002. High-order improvement of a computationally attractive k-step bootstrap

for extremum estimators. Econometrica 70(1), 119–162.

Andrews, D. W. K., Stock, J. H., 2007. Inference with weak instruments. In: R. Blundell, W. Newey

, T. Pearson, eds, Advances in Economics and Econometrics, Theory and Applications, 9th

Congress of the Econometric Society Vol. 3. Cambridge University Press, Cambridge, U.K.,

chapter 6.

Berkowitz, D., Caner, M., Fang, Y., 2012. The validity of instruments revisited. Journal of Econo-

metrics 166, 255–266.

Bhattacharya, R. N. , Ghosh, J. , 1978. On the validity of the formal Edgeworth expansion. The

Annals of Statistics 6, 434–451.

34

Bhattacharya, R. N., Rao, R., 1976. Normal approximation and asymptotic expansions. In: R. Bhat-

tacharya, R. Rao, eds, Normal Approximation and AsymptoticExpansions. Wiley Series in

Probability and Mathematical Analysis, New York.

Corbae, D., Durlauf, S. N., Hansen, B. E., eds, 2006. Econometric Theory and Practice: Frontiers

of Analysis and Applied Research. Cambridge University Press, Cambridge, U.K.

Doko Tchatoka, F., 2013. On bootstrap validity for specification tests with weak instruments. Tech-

nical report, School of Economics and Finance, University of Tasmania Hobart, Australia.

Doko Tchatoka, F., 2014. Subset hypotheses testing and instrument exclusion in the linear IV re-

gression. Econometric Theory forthcoming.

Doko Tchatoka, F., Dufour, J.-M., 2014. Identification-robust inference for endogeneity parameters

in linear structural models. The Econometrics Journal 17, 165–187.

Dufour, J.-M., 1997. Some impossibility theorems in econometrics, with applications to structural

and dynamic models. Econometrica 65, 1365–1389.

Dufour, J.-M. , 2003. Identification, weak instruments and statistical inference in econometrics.

Canadian Journal of Economics 36(4), 767–808.

Dufour, J.-M., Jasiak, J., 2001. Finite sample limited information inference methods for structural

equations and models with generated regressors. International Economic Review 42, 815–843.

Dufour, J.-M., Khalaf, L., Beaulieu, M.-C., 2010. Multivariate residual-based finite-sample tests for

serial dependence and GARCH with applications to asset pricing models. Journal of Applied

Econometrics 25, 263–285.

Dufour, J.-M., Taamouti, M., 2005. Projection-based statistical inference in linear structural models

with possibly weak instruments. Econometrica 73(4), 1351–1365.

Dufour, J.-M., Taamouti, M., 2007. Further results on projection-based inference in IV regressions

with weak, collinear or missing instruments. Journal of Econometrics 139(1), 133–153.

35

Guggenberger, P., 2010. The impact of a Hausman pretest on the size of the hypothesis tests. Econo-

metric Theory 156, 337–343.

Guggenberger, P., Chen, L., 2011. On the asymptotic size of subvector tests in the linear instrumental

variables model. Technical report, Department of Economics, UCSD.

Guggenberger, P., Kleibergen, F., Mavroeidis, S., Chen, L., 2012. On the asymptotic sizes of sub-

set anderson-rubin and lagrange multiplier tests in linearinstrumental variables regression.

Econometrica 80(6), 2649–2666.

Hall, P., 1992. The bootstrap and Edgeworth expansion. In: P. Hall, ed., The bootstrap and Edge-

worth expansion. Springer-Verlag New York, Inc, New York.

Hall, P., Horowitz, J. L., 1996. Bootstrap critical values for tests based on generalized-method-of-

moments estimators. Econometrica 64(4), 891–916.

Horowitz, J. L. , 1994. Bootstrap-based critical values forthe IM test. Journal of Econometrics

61, 395–411.

Horowitz, J. L., 2001. The bootstrap. In: J. Heckman, E. E. Leamer, eds, Handbook of Econometrics.

Elsvier Science, Amsterdam, The Netherlands.

Kleibergen, F. , 2002. Pivotal statistics for testing structural parameters in instrumental variables

regression. Econometrica 70(5), 1781–1803.

Kleibergen, F., 2004. Testing subsets of structural coefficients in the IV regression model. Review

of Economics and Statistics 86, 418–423.

Kleibergen, F., 2008. Subset statistics in the linear IV regression model. Technical report, Depart-

ment of Economics, Brown University, Providence, Rhode Island Providence, Rhode Island.

Mikusheva, A., 2010. Robust confidence sets in the presence of weak instruments. Journal of Econo-

metrics 157, 236–247.

36

Mikusheva, A. , 2013. Survey on statistical inferences in weakly-identified instrumental variable

models. Applied Econometrics 29(1), 117–131.

Moreira, M. J. , 2003. A conditional likelihood ratio test for structural models. Econometrica

71(4), 1027–1048.

Moreira, M. J., Porter, J., Suarez, G., 2009. Bootstrap validity for the score test when instruments

may be weak. Journal of Econometrics 149, 52–64.

Poskitt, D. , Skeels, C., 2012. Inference in the presence of weak instruments: A selected survey.

FTEcx 6(1), 26–44.

Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments. Economet-

rica 65(3), 557–586.

Startz, R., Nelson, C. R. N., Zivot, E., 2006. Improved inference in weakly identified instrumental

variables regression.in (Corbae, Durlauf and Hansen 2006), chapter 5.

Stock, J. H., Wright, J. H., 2000. GMM with weak identification. Econometrica 68, 1055–1096.

Stock, J. H., Wright, J. H., Yogo, M., 2002. A survey of weak instruments and weak identification in

generalized method of moments. Journal of Business and Economic Statistics 20(4), 518–529.

Wu, C., 1990. On the asymptotic properties of the jackknife histogram. The Annals of Statistics

18, 1438–1452.

Zivot, E., Startz, R., Nelson, C. R., 2006. Inference in partially identified instrumental variables

regression with weak instruments.in (Corbae et al. 2006), chapter 5, pp. 125–163.

37