Draft version June 21, 2017 arXiv:1706.06271v1 [astro-ph ...
York University This draft: November 30, 2021 arXiv:2111 ...
Transcript of York University This draft: November 30, 2021 arXiv:2111 ...
Robust Permutation Tests in Linear Instrumental
Variables Regression∗
Purevdorj Tuvaandorj†
York University
This draft: November 30, 2021
Abstract
This paper develops permutation versions of identification-robust tests in linear instrumental
variables (IV) regression. Unlike the existing randomization and rank-based tests in which inde-
pendence between the instruments and the error terms is assumed, the permutation Anderson-
Rubin (AR), Lagrange Multiplier (LM) and Conditional Likelihood Ratio (CLR) tests are asymp-
totically similar and robust to conditional heteroskedasticity under standard exclusion restriction
i.e. the orthogonality between the instruments and the error terms. Moreover, when the instru-
ments are independent of the structural error term, the permutation AR tests are exact, hence
robust to heavy tails. As such, these tests share the strengths of the rank-based tests and the
wild bootstrap AR tests. Numerical illustrations corroborate the theoretical results.
Keywords: Anderson-Rubin statistic, Asymptotic validity, Conditional likelihood ratio statis-
tic, Exact test, Heteroskedasticity, Identification, Lagrange Multiplier statistic, Permuta-
tion test, Randomization test
1 Introduction
This paper proposes randomization (permutation) test versions of heteroskedasticity-robust Ander-
son and Rubin (1949)’s AR, Kleibergen (2002)’s LM and Andrews and Guggenberger (2019a)’s
∗First version: January 25, 2020; Second version: December 14, 2020. I thank Dmitry Arkhangelsky, Xavier
D’Haultfœuille, Antoine Djogbenou, Hiroyuki Kasahara, Uros Petronijevic and the participants of the Econometric
Society European Winter Meeting 2019, and the 37th Canadian Econometrics Study Group (CESG) Meeting 2021
for helpful comments. I gratefully acknowledge financial support from the LA&PS Minor Research Grant, York
University. All errors are my own.†[email protected]
1
arX
iv:2
111.
1377
4v1
[ec
on.E
M]
26
Nov
202
1
CLR tests in IV regression. These tests are asymptotically similar (in a uniform sense) and
heteroskedasticity-robust under the usual exogeneity condition i.e. the orthogonality (or uncor-
relatedness) between the error terms and the instruments. When the latter assumption is replaced
by independence of the instruments and the error term in the structural equation, the permutation
AR tests become exact hence robust to heavy tails.
Permutation inference is attractive because of its exactness when relevant assumptions hold. It
also bears a natural connection to IV methods since IVs provide an exogenous variation independent
of the unobserved confounders for the endogenous regressor in the IV regression and the permutation
inference typically seeks to exploit independence of two sets of variables by permuting the elements
of one variable holding the other fixed to replicate the distribution of a test statistic at hand.
Despite the link, the literature on randomization inference in IV models is scarce. Imbens
and Rosenbaum (2005) develop exact permutation tests in the IV model where the instruments are
assigned randomly.1 Their test statistic takes the form of an inner product between the instruments
(or transformations thereof e.g. ranks) and the structural error terms evaluated under the null
hypothesis. Exact permutation tests are obtained by permuting the former while holding the latter
fixed. Given the advantage of the permutation method in an IV setting as shown by Imbens and
Rosenbaum (2005), we broaden its scope by proposing the permutation versions of the commonly
used identification-robust AR, LM and CLR test statistics.
Recently, DiCiccio and Romano (2017) propose permutation tests for correlation between a
pair of random vectors and regression coefficients. DiCiccio and Romano (2017) show that using an
appropriate studentization argument, one can obtain tests that are asymptotically pivotal under the
assumption of zero correlation between the two random variables instead of the usual independence
assumption. The results are then generalized to heteroskedastic linear regression where the error
term is conditionally mean-independent of the regressors.
We extend the result of DiCiccio and Romano (2017) to the IV setting and propose two per-
mutation AR (PAR) statistics, denoted as PAR1 and PAR2 respectively. The PAR1 statistic is
based on permuting the rows of instrument matrix that shifts endogenous regressors, and the PAR2
statistic is based on permuting the null-restricted residuals in the structural equation. We establish
the finite sample validity of the permutation AR tests under an independence assumption akin to
the one used in Andrews and Marmer (2008). We then show their asymptotic validity under the
1This leads to what is referred to as the design-based approach by Abadie et al. (2020) who make an explicit
distinction between design-based and sampling-based uncertainties and propose robust standard errors to account for
each uncertainty in a regression model.
2
usual exogeneity condition allowing for conditional heteroskedasticity. Thus, our result differs from
that of Imbens and Rosenbaum (2005) who maintain the independence between the error terms and
the instruments and do not deal with heteroskedasticity.
In addition, we consider permutation LM (PLM) and CLR (PCLR) statistics. Following Freed-
man and Lane (1983), we permute the residuals from the first-stage estimation of the reduced-form
equation and generate a permutation right-hand side endogenous variables. Based on the latter
and the permuted null-restricted residuals in the structural equation, the permutation LM statistic
is then constructed. In the PCLR tests, we condition on the nuisance parameter estimates and
permute the null-restricted residuals of the structural equation only as in the PAR2 test to generate
the conditional permutation distribution.
We compare the robust permutation tests to both the identification-robust rank-based tests
of Andrews and Marmer (2008) and Andrews and Soares (2007), and the wild bootstrap tests of
Davidson and MacKinnon (2012) through simulations. In terms of the control of type I error,
we find that the robust permutation tests outperform the rank-based tests under the standard
exclusion restriction, and perform on par with or have an edge over the wild bootstrap tests in the
heteroskedastic designs considered.
As a main technical contribution, we show the uniform asymptotic similarity of the proposed
permutation tests which, to our best knowledge, has not been shown before in the context of
bootstrap and randomization inference in the IV model. In doing so, we adapt the results of
Andrews and Guggenberger (2017a,b, 2019a,b) and Andrews et al. (2020) in a nontrivial way.
Discussion of the related literature
The literature on linear IV model is vast. Earlier surveys on this topic are provided by Stock et al.
(2002), Dufour (2003), and Andrews and Stock (2007), and recent contributions include Moreira
and Moreira (2019), Young (2020) and the survey of Andrews et al. (2019).
Andrews and Marmer (2008) develop a rank-based AR statistic under the independence assump-
tion between the instruments and the structural error term. Under their assumption, the PAR1
test proposed here is exact while the PAR2 test is so when there are no included non-constant
exogenous variables. Andrews and Soares (2007) consider a rank-version of the CLR statistic of
Moreira (2003). The latter test employs asymptotic critical value after conditioning on a rank-based
nuisance parameter estimate and is robust to heavy-tailed errors but not to heteroskedasticity.
A strand of literature that focuses on bootstrap inference in the linear IV model include, among
others, Davidson and MacKinnon (2008), Moreira et al. (2009), and Davidson and MacKinnon
3
(2012) where they document an improved performance of bootstrap inference over asymptotic in-
ference. The results of this paper complement these studies.
The key for our results is the fact that the identification-robust PAR, PLM and PCLR statistics
are studentized. The use of studentization to obtain an improved test is not new; a well-known exam-
ple is the higher-order accuracy of the bootstrap-t confidence interval, as opposed to the percentile
bootstrap, in the one-sample problem, see, for instance, Chapter 15.5 of Lehmann and Romano
(2005). So and Shin (1999) develop a persistence-robust test for autoregressive models based on a
Cauchy estimator and a studentized statistic. In the context of permutation tests, Neuhaus (1993)
proposes a studentized permutation statistic in a two-sample problem with randomly censored
data. For further extensions and applications of permutation tests based on studentized statistic,
see Janssen (1997), Neubert and Brunner (2007), Pauly (2011) and Chung and Romano (2013,
2016).
For several recent papers that consider randomization inference, see Chernozhukov et al. (2009)
for finite sample inference in quantile regression with and without endogeneity, Canay et al. (2017)
for randomization inference under approximate symmetry condition with applications to differences-
in-difference and regression models with clustered errors, Canay and Kamat (2018) and Ganong and
Jager (2018) for permutation tests in regression discontinuity and kink designs, respectively, and
Dufour et al. (2019) for permutation tests for inequality measures.
The paper is organized as follows. Section 2 introduces the model and the identification-robust
test statistics and develops the permutation tests. Simulation results comparing the performance of
alternative test procedures are provided in Section 3. Section 4 presents two empirical applications.
Section 5 briefly concludes. The proofs are collected in Appendix A.
2 Robust permutation tests
2.1 Model setup
We develop permutation-based tests for a restriction on the d× 1 vector of structural coefficients
H0 : θ = θ0,
in the linear IV regression model:
y = Y θ +Xγ + u, (2.1)
Y = WΓ +XΨ + V, (2.2)
4
where y = [y1, . . . , yn]′ is n × 1 vector of endogenous variables, Y = [Y1, . . . , Yn]′ is n × d matrix
of right-hand side endogenous variables, X = [X1, . . . , Xn]′ is n × p matrix of right-hand side
exogenous variables whose first column is the n×1 vector of ones ι = [1, . . . , 1]′, W = [W1, . . . ,Wn]′
is n × k (k ≥ d) matrix of instrumental variables that does not include ι and the exogenous
variables in X, u = [u1, . . . , un]′ is n × 1 vector of structural error terms, V = [V1, . . . , Vn]′ with
Vi = [Vi1, . . . , Vid]′, is n× d matrix of reduced-form error terms; γ is p× 1 parameter vector, and Γ
and Ψ are k × d and p× d matrices of reduced-form coefficients, respectively.
Let Xi = [Xi1, . . . , Xip]′, Wi = [Wi1, . . . ,Wik]
′, Yi = [Yi1, . . . , Yid]′ and Z = [Z1, . . . , Zn]′ ≡
MXW , where MA ≡ In − PA ≡ In − A(A′A)−1A′ for a matrix A of full column rank. Rewrite the
equations (2.1)-(2.2) as
y = Y θ +Xγ + u, (2.3)
Y = ZΓ +Xξ + V, (2.4)
where ξ ≡ (X ′X)−1X ′WΓ + Ψ . Let Y = [Y, 1, . . . , Y, d], where Y, s = [Y1s, . . . , Yns]′ denotes the s-th
column of Y , s = 1, . . . , d. Define ui(θ) ≡ yi−Y ′i θ−X ′i(X ′X)−1X ′(y−Y θ), u(θ) = MX(y−Y θ) =
[u1(θ), . . . , un(θ)]′, Y = [Y1, . . . , Yn]′ ≡MXY , and
m(θ) ≡ n−1Z ′u(θ) = n−1Z ′(y − Y θ), (2.5)
Gs(θ) ≡ n−1Z ′Y, s, Cs(θ) ≡ n−1n∑i=1
ZiZ′iYisui(θ), (2.6)
Σ(θ) ≡ n−1n∑i=1
ZiZ′iui(θ)
2, J(θ) = [J1(θ), . . . , Jd(θ)], (2.7)
Js(θ) ≡ Gs(θ)− Cs(θ)Σ(θ)−1m(θ), s = 1, . . . , d, (2.8)
where Yis is the s-th element of Yi. The heteroskedasticity-robust AR and Kleibergen (2002)’s LM
statistics are
AR = AR(W,X, u(θ0)) ≡ n m(θ0)′Σ(θ0)−1m(θ0),
LM = LM(Z, u(θ0), Y ) ≡ n m(θ0)′Σ(θ0)−1J(θ0)(J(θ0)′Σ(θ0)−1J(θ0)
)−1J(θ0)′Σ(θ0)−1m(θ0).
(2.9)
The LM statistic may also be defined as
LM ≡ (y − Y θ0)′ZΓ
(Γ ′
n∑i=1
ZiZ′iui(θ0)2Γ
)−1
Γ ′Z ′(y − Y θ0), (2.10)
5
where Γ = [Γ1, . . . , Γd] with Γs ≡ (Z ′Z)−1Z ′Y, s− (Z ′Z)−1Cs(θ0)Σ(θ0)−1Z ′(y−Y θ0), s = 1, . . . , d.
In this paper, we focus on the statistic in (2.9). Finally, we consider the CLR-type statistics. Define
S ≡ Σ(θ0)−1/2n1/2m(θ0) ∈ Rk, (2.11)
T (Σ(θ0), J(θ0), V) ≡ Σ(θ0)−1/2n1/2J(θ0)(
(θ0, Id)Ωε(Σ(θ0), V)−1(θ0, Id)
′)1/2
∈ Rk×d, (2.12)
where Ωε(·, ·) ∈ R(d+1)×(d+1) is an eigenvalue-adjusted version of the matrix defined as
Ω(Σ(θ), V) = Ωij1≤i,j≤d+1 ∈ R(d+1)×(d+1), Ωij ≡ tr(Kij(V)′Σ(θ)−1)/k ∈ R, 2 (2.13)
and the k × k matrix Kij(V) is the (i, j) submatrix of K(V) given by
K(V) ≡ (B(θ0)′ ⊗ Ik) V (B(θ0)⊗ Ik) ∈ Rk(d+1)×k(d+1), B(θ) ≡
1 01×d
−θ −Id
∈ R(d+1)×(d+1).
(2.14)
The CLR-type statistics proposed by Andrews and Guggenberger (2019a,b) are
CLRa = CLR(S, Ta) ≡ S ′S − λmin
[(S, Ta)′(S, Ta)
], (2.15)
CLRb = CLR(S, Tb) ≡ S ′S − λmin
[(S, Tb)′(S, Tb)
], (2.16)
where λmin(·) denotes the smallest eigenvalue of a matrix, and
Ta = T (Σ(θ0), J(θ0), Va(θ0)), Va(θ) ≡
Σ(θ) C(θ)
C(θ)′ ΣG
∈ Rk(d+1)×k(d+1), (2.17)
C(θ) ≡ [C1(θ), . . . , Cd(θ)] ∈ Rk×kd,
ΣG = ΣGst1≤s,t≤d ∈ Rkd×kd, ΣG
st ≡ n−1n∑i=1
ZiZ′iYisYit − (n−1
n∑i=1
ZiYis)(n−1
n∑i=1
ZiYit)′ ∈ Rk×k,
(2.18)
Tb = T (Σ(θ0), J(θ0), Vb(θ0)),
Vb(θ) ≡ n−1n∑i=1
[(εi(θ)− εin(θ))(εi(θ)− εin(θ))′
]⊗ (ZiZ
′i) ∈ Rk(d+1)×k(d+1), (2.19)
εi(θ) ≡[ui(θ),−Y ′i
]′∈ Rd+1, Y = [Y1, . . . , Yn]′ = MXY ∈ Rn×d,
εin(θ) ≡ ε(θ)′Z(Z ′Z)−1Zi ∈ Rd+1, ε(θ) ≡ [ε1(θ), . . . , εn(θ)]′ ∈ Rn×(d+1). (2.20)
The CLRa and CLRb statistics are denoted as QLR and QLRP , respectively, in Andrews and
Guggenberger (2019b), and differ only in their definitions of Va(θ0) and Vb(θ0). Andrews and
2The eigenvalue-adjustment of Andrews and Guggenberger (2019b) is as follows. Let A be a nonzero positive
6
Guggenberger (2019b) show that the CLRb statistic is asymptotically equivalent to Moreira (2003)’s
CLR statistic in the homoskedastic linear IV regression with fixed instruments and multiple endoge-
nous variables under all strengths of identification while the CLRa statistic has the same property
under only certain strengths of identification but its general version is more broadly applicable.
2.2 Main results
We begin by recalling the basic notion of randomization tests from Chapter 15.2 of Lehmann and
Romano (2005). Let Gn be the set of all permutations π = [π(1), . . . , π(n)] of 1, . . . , n, and denote
a permutation version of a generic test statistic R by PR = Rπ. The corresponding permutation
distribution function is
FPRn (x) ≡ 1
n!
∑π∈Gn
1(Rπ ≤ x),
where 1(·) is the indicator function. Let Rπ(1) ≤ · · · ≤ Rπ(n!) be the order statistics of Rπ : π ∈ Gn,
and for a nominal significance level α ∈ (0, 1), define
r ≡ n!− I(n!α), (2.21)
N+ ≡ |j = 1, . . . , n! : Rπ(j) > Rπ(r)|,
N0 ≡ |j = 1, . . . , n! : Rπ(j) = Rπ(r)|,
aPR ≡ (n!α−N+)/N0, (2.22)
where I(·) denotes the integer part of a number. Clearly, N+ and N0 are the number of values
Rπ(j), j = 1, . . . , n!, that are greater than Rπ(r) and equal to Rπ(r), respectively. A randomization test
is then defined as
φPRn ≡
1 if R > Rπ(r),
aPR if R = Rπ(r),
0 if R < Rπ(r).
(2.23)
The robust permutation test statistics are described below. Let Wπ = [Wπ(1), . . . ,Wπ(n)]′ and
uπ(θ) = [uπ(1)(θ), . . . , uπ(n)(θ)]′ be an instrument matrix and a residual vector obtained by per-
muting the rows of W and the elements of u(θ), respectively, for a uniformly chosen permutation
semi-definite matrix of dimension p × p that has a spectral decomposition A = N∆N ′, where ∆ = diag(λ1, . . . , λp),
λ1 ≥ · · · ≥ λp ≥ 0, is the diagonal matrix that consists of the eigenvalues of A, and N is an orthogonal matrix of the
corresponding eigenvectors. Given a constant ε > 0, the eigenvalue adjusted matrix is defined as Aε ≡ N∆εN ′, where
∆ε ≡ diag(maxλ1, λ1ε, . . . ,maxλp, λ1ε). Andrews and Guggenberger (2019a) recommend ε = 0.01. We refer to
Andrews and Guggenberger (2019a) for further properties of the eigenvalue-adjustment procedure.
7
π ∈ Gn. Set
Σu(θ) ≡ diag(u1(θ)2, . . . , un(θ)2), mπ(θ) ≡ n−1Z ′uπ(θ), Σπ(θ) ≡ n−1n∑i=1
ZiZ′iuπ(i)(θ)
2.
We consider two heteroskedasticity-robust permutation AR statistics defined as:
PAR1 ≡ AR(Wπ, X, u(θ0)) = u(θ0)′MXWπ(W ′πMXΣu(θ0)MXWπ)−1W ′πMX u(θ0), (2.24)
PAR2 ≡ AR(W,X, uπ(θ0)) = n mπ(θ0)′Σπ(θ0)−1mπ(θ0). (2.25)
The PAR1 statistic is based on the permutation of the rows of the instrument matrix W , and the
PAR2 statistic uses the permutation of the null-restricted residuals u(θ0). The finite sample validity
of these tests are shown under the following condition.
Assumption 1. (W ′i , X ′i, ui)′ni=1 are i.i.d., and Wi and [X ′i, ui]′ are independently distributed.
Andrews and Marmer (2008) develop exact rank-based AR tests under assumptions similar to
Assumption 1. When W is independent of X and u, W ′MX u(θ0) and W ′πMX u(θ0) have the same
distribution under H0 : θ = θ0, so the PAR1 test is exact. On the other hand, the PAR2 test is
not, in general, exact because W ′MX u(θ0) and W ′MX uπ(θ0) may not have identical distributions.
However, when X = ι in Assumption 1, Z and uπ are independent, Z ′u(θ0) and Z ′uπ(θ0) are
identically distributed, and consequently, the PAR2 test is exact. The following result summarizes
the finite sample validity of the PAR tests.
Proposition 2.1 (Finite sample validity). Under Assumption 1 and H0 : θ = θ0, E[φPAR1n ] = α for
α ∈ (0, 1). If X = ι in Assumption 1, then E[φPAR2n ] = α.
Even if [X ′i, ui]′ and Wi are not distributed independently but satisfy the orthogonality condi-
tion E[(W ′i , X′i)′ui] = 0, we show that the permutation AR tests are asymptotically similar and
heteroskedasticity-robust.
The argument for permutation LM statistic is more involved because it is difficult to construct an
estimator of the reduced-form coefficients Γ whose (asymptotic) distribution remains invariant to a
permutation of the data. The OLS estimators of the reduced-form coefficients and the corresponding
residuals are
Γ = (Z ′Z)−1Z ′Y, ξ = (X ′X)−1X ′Y, V = Y − ZΓ −Xξ = [V1, . . . , Vn]′ = [V, 1, . . . , V, d],
where V, s = [V1s, . . . , Vns]′ is the s-th column of V , s = 1, . . . , d. Now permute the residuals
Vπ = [V ′π(1), . . . , V′π(n)]
′, and let
Y π = [Y π, 1, . . . , Y
π, d] ≡ ZΓ +Xξ + Vπ,
8
where Y π, s = [Y π
1s, . . . , Yπns]′ denotes the s-th column of Y π. The idea of permuting the residuals to
obtain asymptotically valid randomization tests appears in Freedman and Lane (1983) and DiCiccio
and Romano (2017).
One may also permute the null restricted residuals V = Y − ZΓ − Xξ = [V1, . . . , Vn]′, where
Γ and ξ are the restricted estimators of the reduced form parameters given θ = θ0, and obtain
Y π = [Y π, 1, . . . , Y
π, d] = ZΓ +Xξ + Vπ, where Vπ = [V ′π(1), . . . , V
′π(n)]
′.
The Jacobian estimator used in the permutation LM statistic is
Jπ(θ) = [Jπ1 (θ), . . . , Jπd (θ)], (2.26)
Jπs (θ) ≡ n−1Z ′Y π, s − Cπs (θ)Σπ(θ)−1mπ(θ), (2.27)
Cπs (θ) ≡ n−1n∑i=1
ZiZ′iVπ(i)suπ(i)(θ), s = 1, . . . , d. (2.28)
It is not difficult to see that in strongly identified models where Γ is a fixed matrix of full rank,
Jπ(θ0) − n−1Z ′ZΓpπ−→ 0 in probability, where
pπ−→ denotes the convergence in probability with
respect to the probability measure induced by the random permutation π uniformly distributed
over Gn. The permutation LM statistic is then defined as
PLM ≡ LM(Z, uπ(θ0), Y π) = n mπ(θ0)′Σπ(θ0)−1/2PΣπ(θ0)−1/2Jπ(θ0)Σπ(θ0)−1/2mπ(θ0). (2.29)
The permutation LM statistic that corresponds to (2.10) may be defined as
PLM ≡ uπ(θ0)′ZΓπ
(Γπ′
n∑i=1
ZiZ′iuπ(i)(θ0)2Γπ
)−1
Γπ′Z ′uπ(θ0),
where Γπ = [Γπ1 , . . . , Γ
πd ] with Γπ
s ≡ (Z ′Z)−1Z ′Y π, s− (Z ′Z)−1Cπs (θ0)Σπ(θ0)−1Z ′uπ(θ0), s = 1, . . . , d.
Finally, we define the permutation CLR statistics as follows:
PCLRa ≡ CLR(Sπ, Ta) = Sπ′Sπ − λmin
[(Sπ, Ta)′(Sπ, Ta)
], (2.30)
PCLRb ≡ CLR(Sπ, Tb) = Sπ′Sπ − λmin
[(Sπ, Tb)′(Sπ, Tb)
], (2.31)
where Ta and Tb are as defined in (2.17) and (2.19), and
Sπ ≡ n1/2Σπ(θ0)−1/2mπ(θ0) =
(n∑i=1
ZiZ′iuπ(i)(θ0)2
)−1/2
Z ′uπ(θ0). (2.32)
Let FPCLRn (x, T ) ≡ (n!)−1
∑π∈Gn 1(CLR(Sπ, T ) ≤ x) denote the permutation distribution of the
statistic PCLR = CLR(Sπ, T ) ∈ PCLRa,PCLRb given T ∈ Ta, Tb, respectively. The nominal
level α PCLR test rejects when
CLR(S, T ) > PCLR(r)(T ), (2.33)
9
where PCLR(r)(T ) is the r-th order statistic of CLR(Sπ, T ) : π ∈ Gn for r defined in (2.21), and
randomizes when CLR(S, T ) = PCLR(r)(T ).
Let P denote the distribution of the vector (W ′i , X′i, ui, V
′i )′ and we shall index the relevant
quantities by P in the sequel. Define
ei ≡ [ui, V′i ]′, Σe
P ≡ VarP [ei] =
σ2P ΣuV
P
ΣV uP ΣV
P
∈ R(d+1)×(d+1), (2.34)
Z∗i ≡Wi − EP [WiX′i](EP [XiX
′i])−1Xi ∈ Rk,
VP,a ≡ VarP
Z∗i ui
vec(Z∗i (Z∗′i Γ + V ′i ))
=
ΣP CP
C ′P ΣGP
∈ Rk(d+1)×k(d+1), ΣP ≡ EP [Z∗i Z∗′i u
2i ] ∈ Rk×k,
CP = [CP1, . . . , CPd] ∈ Rk×kd, CPs ≡ EP [Z∗i Z∗′i ui(Z
∗′i Γs + Vis)] ∈ Rk×k, s = 1, . . . , d,
ΣGP = ΣG
P,st, ΣGP,st ≡ EP [Z∗i Z
∗′i (Z∗′i Γs + Vis)(Z
∗′i Γt + Vit)]− EP [Z∗i Z
∗′i Γs] EP [ZiZ
∗′i Γt]
′,
s, t = 1, . . . , d, (2.35)
with Γs denoting the s-th column of Γ , and
VP,b ≡ EP
u2i −uiV ′i
−uiVi ViV′i
⊗ (Z∗i Z∗′i )
∈ Rk(d+1)×k(d+1). (2.36)
We maintain the following assumptions to develop the asymptotic results.
Assumption 2. For some δ, δ1 > 0 and M0 <∞,
(a) (W ′i , X ′i, ui, V ′i )′ni=1 are i.i.d. with distribution P ,
(b) EP [ui(W′i , X
′i)] = 0,
(c) EP [‖(W ′i , X ′i, ui)′‖4+δ] < M0,
(d) λmin(A) ≥ δ1 for A ∈
EP [(W ′i , X′i)′(W ′i , X
′i)],EP [Z∗i Z
∗′i ],ΣP , σ
2P ,
(e) EP [Vi(W′i , X
′i)] = 0,
(f) EP [‖Vi‖4+δ] < M0,
(g) λmin(A) ≥ δ1 for A ∈
ΣVP − ΣV u
P (σ2P )−1ΣuV
P ,VP,a.
10
Next we define the parameter space for the permutation tests.
PPAR10 = PPAR2
0 ≡ P : Assumption 2(a)-(d) hold, (2.37)
PPLM0 ≡ P : Assumption 2 holds, (2.38)
PPCLR0 ≡ P : Assumption 2(a)-(f) hold, PCLR ∈ PCLRa,PCLRb. (2.39)
The distribution P in Assumption 2(a) is allowed to vary with the sample size i.e. P = Pn. For
simplicity, we suppress the dependence on n. Assumption 2(c) and (f) impose finite 4 + δ moment
on the error terms and the exogenous variables, and are slightly stronger than the cross moment
restrictions used in Andrews and Guggenberger (2019a). For pointwise asymptotic results or when
P does not vary with n, weaker restrictions would be sufficient.
The independence assumption between the instruments and the error terms, which is typically
made in the randomization test literature (see, for example, Imbens and Rosenbaum (2005)), is not
maintained here; the instruments are only assumed to satisfy the standard exogeneity conditions
in Assumption 2(b) and (e). If independence assumptions are maintained, as shown in Proposition
2.1, the PAR tests are exact and do not require the finite 4 + δ moment assumptions.
Assumption 2(d) and (g) require that the second moment matrix of the instruments and ex-
ogenous covariates, and covariance matrices are (uniformly) nonsingular, and are similar to the as-
sumptions employed by Andrews and Guggenberger (2017a, 2019a) and Andrews et al. (2020) when
developing identification-robust (but not singularity-robust) AR, LM and CLR tests. The condition
λmin(VP,a) ≥ δ1 in Assumption 2(g) plays an important role for showing the asymptotic similarity
of the LM test, and it can be substituted by the condition λmin(EP [(eie′i) ⊗ (Z∗i Z
∗′i )]) ≥ δ1, see
Section 3.2 of Andrews and Guggenberger (2017a). The condition λmin(ΣVP −ΣV u
P (σ2P )−1ΣuV
P ) ≥ δ1
is used in the asymptotic distribution of the PLM statistic.
The main result of this paper is given in the following theorem which establishes the asymptotic
validity of the robust permutation tests.
Theorem 2.2 (Asymptotic similarity). Let H0 : θ = θ0 hold. Then, for α ∈ (0, 1) and the
permutation statistics
PR ∈ PAR1,PAR2,PLM,PCLRa,PCLRb
with the corresponding parameter spaces P0 ∈ PPAR10 ,PPAR2
0 ,PPLM0 ,PPCLRa
0 ,PPCLRb0 , respec-
tively,
lim supn→∞
supP∈P0
EP [φPRn ] = lim inf
n→∞infP∈P0
EP [φPRn ] = α.
11
Theorem 2.2 shows the asymptotic validity of the permutation tests. Thus, the advantage of
the PAR1 and PAR2 tests relative to the existing tests is that they inherit the desirable properties
of both the exact and asymptotic AR tests (including other simulation-based tests that are asymp-
totically valid); the PAR tests are exact under the same assumptions as the rank-based tests are,
and asymptotically pivotal and heteroskedasticity-robust under the assumptions used to show the
validity of the asymptotic tests.
The main technical tools for the asymptotics of the permutation statistics are the Hoeffding’s
combinatorial CLT (Hoeffding (1951), Motoo (1956) and Lemma A.1 in Appendix A) and a variance
formula (the equation (10) of Hoeffding (1951) and Lemma S.3.4 of DiCiccio and Romano (2017))
for the sums
n−1/2n∑i=1
Ziuπ(i)(θ0), n−1/2n∑i=1
ZiVπ(i)s, s = 1, . . . , d.
The proof of asymptotic similarity of the permutation tests uses the generic method of Andrews et al.
(2020) for establishing the asymptotic size of tests; in particular, the derivation of the asymptotic
distribution of the PCLR statistics relies extensively on Theorems 9.1 and 16.6 of Andrews and
Guggenberger (2019a,b).
Next, we shall derive the asymptotic distribution of the permutation statistics under the local
alternatives θn = θ0 +hθn−1/2 with hθ ∈ Rd fixed, assuming strong identification. Let χ2
l (η2) denote
noncentral chi-square random variable with degrees of freedom l and noncentrality parameter η2.
Proposition 2.3 (Local asymptotic power under strong identification). Let Assumption 2 hold,
and assume that the model is strongly or semi-strongly identified in the sense that the smallest
singular value τdn of Σ−1/2P GP ((θ0, Id)Ω
ε(ΣP ,VP )−1(θ0, Id)′)1/2, where GP ≡ EP [Z∗i Z
∗′i ]Γ and VP ∈
VP,a,VP,b, satisfies n1/2τdn →∞. Then, under H1 : θn = θ0 + hθn−1/2, AR
d−→ χ2k(η
2), LMd−→
χ2d(η
2), CLRad−→ χ2
d(η2) and CLRb
d−→ χ2d(η
2), where η2 ≡ limn→∞ h′θG′Σ−1Ghθ, G = limn→∞GP
and Σ = limn→∞ΣP , and
supx∈R|FPAR1n (x)− P [χ2
k ≤ x]| p−→ 0, (2.40)
supx∈R|FPAR2n (x)− P [χ2
k ≤ x]| p−→ 0, (2.41)
supx∈R|FPLMn (x)− P [χ2
d ≤ x]| p−→ 0, (2.42)
supx∈R|FPCLRan (x, Ta)− P [χ2
d ≤ x]| p−→ 0, (2.43)
supx∈R|FPCLRbn (x, Tb)− P [χ2
d ≤ x]| p−→ 0. (2.44)
12
It follows that the asymptotic local power of the PAR tests is equal to that of the asymptotic
AR test given by P [χ2k(η
2) > r1−α(χ2k)], where r1−α(χ2
k) is the 1−α quantile of χ2k random variable.
Moreover, under strong identification, the PLM and PCLR tests, and the asymptotic LM and CLR
tests have the same asymptotic local power equal to P [χ2d(η
2) > r1−α(χ2d)].
3 Simulations
This section presents a simulation evidence on the performance of the proposed permutation tests.
The data were generated according to
yi = Yiθ + γ1 +X ′2iγ2 + ui,
Yi = W ′iΓ + ψ1 +X ′2iΨ2 + Vi, i = 1, . . . , n,
where θ = 0, d = 1, Γ = (1, . . . , 1)′√λ/(nk) (k × 1) and λ ∈ 0.1, 4, 20, X2i is (p − 1) × 1
vector of non-constant included exogenous variables with p ∈ 1, 5, and Vi = ρui +√
1− ρ2εi
with ρ = 0.5. The parameters γ1, γ2, ψ1, and Ψ2 are set equal to 0. As specified below, Wi has
zero mean and unit covariance matrix except for the first part of simulations in Section 3.1, hence
λ = nΓ ′ E[WiW′i ]Γ ≈ Γ ′W ′WΓ . We consider the values λ ∈ 0.1, 4, 20 which correspond to very
weak, weak and strong identification. The tested restriction is H0 : θ = θ0 = 0.
We implement the heteroskedasticity-robust asymptotic tests denoted as AR, LM, CLRa, CLRb,
their permutation versions PAR1, PAR2, PLM, PCLRa, PCLRb, the normal score and Wilcoxon
score rank-based AR tests of Andrews and Marmer (2008) denoted as RARn and RARw, respec-
tively, the normal score and Wilcoxon score rank-based CLR statistics of Andrews and Soares (2007)
denoted as RCLRn and RCLRw, respectively, and the wild bootstrap AR and LM tests of David-
son and MacKinnon (2012) denoted as WAR and WLM, respectively. For both the asymptotic and
permutation CLR-type tests, we do not make the eigenvalue-adjustment (i.e. ε = 0 in footnote 2).
In all simulations, the rejection probabilities of the tests (including CLR, PCLR, the rank-based
and the wild bootstrap tests considered below) are computed using 2000 replications with 999
simulated samples for each replication. We consider the following three cases in turn.
3.1 Heavy tails
To examine the finite sample validity, we consider heavy-tailed observations where each element of
the random vector (W ′i , X′2i, ui, εi)
′ is drawn independently from standard Cauchy distribution, and
set k ∈ 5, 10 n ∈ 50, 100, and λ = 4.
13
Table 1 presents the null rejection probabilities. The asymptotic tests all underreject. The PAR1
test has nearly correct levels as predicted by Theorem 2.1 in all cases, and the PAR2 test, which is
exact only when p = 1, has more accurate rejection rates in the case p = 1 than in the case p = 5.
The RARn, RARw, RCLRn and RCLRw tests are all robust against heavy-tailed errors when the
IV’s, the exogenous covariates and the error terms are independent as stated in Assumption 2.1.
The RARn and RARw tests are exact and this is borne out in the simulation results. The RCLRn
and RCLRw tests slightly underreject which may be attributed to their asymptotic nature.
Note, however, that, under the independence assumption and heavy-tailed errors, the rank-based
tests have better power properties than the asymptotic tests, see Andrews and Soares (2007) and
Andrews and Marmer (2008). In such cases, the permutation tests are likely to be dominated by the
rank-based tests in terms of power given the local asymptotic equivalence between the permutation
tests and the asymptotic tests under strong identification established in Proposition 2.3.
3.2 Independence vs. standard exclusion restriction
The next set of simulations examines the role of studentization for obtaining valid permutation
tests and highlights the difference between the independence and the standard exclusion restriction
assumption in homoskedastic setting. Two cases are considered:
1. (W ′i , X′2i, ui, εi)
′ ∼ t5[0, Ik+p+1], where t5[0, Ik+p+1] stands for (k+p+1)-variate t-distribution
with degrees of freedom 5 and covariance matrix Ik+p+1. Under this setting, these random
variables have finite fourth moments, and (W ′i , X′2i)′ satisfies the standard exclusion restriction
in Assumption 2: E[(ui, V′i )′(W ′i , X
′i)] = 0 but (ui, εi)
′ and (W ′i , X′i)′ are dependent;3
2. (W ′i , X′2i, ui, εi)
′ ∼ N [0, Ik+p+1]. Clearly, in this case the instruments and the error terms are
independent.
In addition to the previous tests, for the just-identified design with k = 1, we additionally consider
a non-studentized randomization test, labelled as PNS, that rejects when W ′u(θ0), u(θ0) = Mι(y −
Y θ0), is below 0.25 or above 0.975 quantiles of the permutation distribution of W ′uπ(θ0).
Table 2 reports the empirical level of the tests in the case with k = 1 across different sample sizes.
When the error terms and the instruments satisfy the exogeneity condition but are dependent, the
PNS test overrejects; somewhat unexpectedly, the overrejection becomes more severe as the sample
size grows. The rank-type tests also show some size distortions and it does not improve as the sample
3Because the distribution is fixed i.e. does not vary with the sample size, the permutation tests remain asymptot-
ically valid under the finite fourth moment assumption.
14
size increases. The right panel of Table 2 shows that when the instruments and the error terms
are independent, the empirical level of the asymptotic, permutation and the rank-based CLR-type
tests are close to the nominal significance level.
The result for the overidentified models with n = 100 and k = 5 is reported in Table 3. The
permutation tests perform better than the asymptotic tests regardless of whether the instruments
and the error terms are independent or not. The rank-based tests, displayed in the bottom part of
Table 3, have nearly correct level in the independent case but overreject in the dependent case.
3.3 Conditional heteroskedasticity
The next part of the simulations considers designs with conditional heteroskedasticity. The data
are generated according to (2.1) and (2.2) with
ui = Wi1υi,
Vi = ρui +√
1− ρ2εi,
where Wi1 is the first element of Wi, and similarly to the previous case
1. (W ′i , X′2i, υi, εi)
′ ∼ t5[0, Ik+p+1];
2. (W ′i , X′2i, υi, εi)
′ ∼ N [0, Ik+p+1].
The sample size and the number of instruments are n = 100 and k ∈ 2, 5, 10. The results are
displayed in Table 4-6. The asymptotic AR test tends to underreject and performs poorly compared
to the LM statistic. The permutation tests perform better than their asymptotic counterparts in
most cases, and on par with the wild bootstrap in the independent case. However, when the
instruments satisfy the standard exogeneity condition, and there are included exogenous variables,
the permutation tests appear to have an edge over the wild bootstrap AR test which overrejects.
Qualitatively, the same observations made in the homoskedastic case for the rank-based tests are
also observed in the heteroskedastic case as the rank-based tests reject by a substantial margin in
the dependent case.
4 Empirical applications
4.1 Becker and Woessmann (2010)
Becker and Woessmann (2010) study the effect of Protestantism on education before the Indus-
trialization using using counties and towns-level data from 1816 Prussia. To unravel the direction
15
Table 1: Null rejection probabilities at 5% level. Cauchy distributed IV’s, controls and error terms.
λ = 4 n = 50 n = 100
p = 1 p = 5 p = 1 p = 5
k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10
AR 0.95 0.55 0.85 0.25 0.50 1.05 0.25 0.40
LM 2.35 4.25 2.05 3.10 2.10 4.45 1.85 3.45
CLRa 1.45 1.70 1.15 0.75 0.75 1.95 0.55 1.30
CLRb 2.10 3.60 1.60 2.60 1.75 3.85 1.25 3.25
PAR1 4.40 5.30 4.70 5.60 4.05 4.95 4.45 4.70
PAR2 4.40 5.45 3.40 3.50 3.95 4.75 3.25 3.05
RARn 4.25 4.55 4.10 4.15 5.10 4.05 4.90 3.75
RARw 4.10 4.85 4.30 4.55 5.15 3.95 4.55 3.60
RCLRn 3.15 3.75 4.20 5.10 4.55 3.50 3.85 4.00
RCLRw 2.35 3.25 3.35 4.25 3.50 2.80 3.25 3.45
Note: AR, LM, and CLR are the heteroskedasticity-robust asymptotic tests, PAR1 and PAR2 denote the robust
permutation AR tests, RARn and RARw denote the normal score and Wilcoxon score rank-based AR tests of Andrews
and Marmer (2008), and RCLRn and RCLRw denote the normal score and Wilcoxon score rank-based CLR statistics
of Andrews and Soares (2007) respectively. The number of replications is 2000, and the number of permutation
samples for each replication is N = 999.
16
Table 2: Null rejection probabilities at 5% level. Just-identified model with k = p = 1.
λ = 4 t5[0, Ik+p+1] N [0, Ik+p+1]
n 50 100 200 400 50 100 200 400
AR 3.75 5.00 4.55 4.25 4.60 5.50 4.85 5.80
LM 3.75 5.00 4.55 4.25 4.60 5.50 4.85 5.80
CLRa 3.55 4.75 4.75 4.30 4.55 5.50 4.70 5.65
CLRb 3.55 4.75 4.75 4.30 4.55 5.50 4.70 5.65
PAR1 4.50 5.05 4.75 4.40 4.75 5.70 5.05 5.85
PAR2 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90
PLM 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90
PCLRa 4.25 5.40 4.90 4.30 4.75 5.65 4.90 5.90
PCLRb 4.60 5.40 4.90 4.30 4.75 5.65 4.90 5.90
RARn 10.35 11.15 12.35 12.00 4.90 5.45 4.90 6.05
RARw 7.65 9.30 8.70 8.35 4.60 5.35 4.85 5.65
RCLRn 8.10 9.65 11.40 11.65 3.45 4.45 4.25 5.50
RCLRw 7.10 8.95 8.65 8.10 4.55 5.20 5.05 5.75
WAR 4.65 6.00 5.35 4.80 4.40 5.45 4.85 5.60
WLM 4.65 6.00 5.35 4.80 4.40 5.45 4.85 5.60
PNS 15.30 16.90 17.80 19.40 5.20 6.00 5.55 6.20
Note: In the left panel, (W ′i , ui, εi)′ ∼ t5[0, Ik+p+1] and the single instrument Wi and the structural error term ui are
orthogonal but dependent. In the right panel, (W ′i , ui, εi)′ ∼ N [0, Ik+p+1] and Wi and ui are independent. PLM and
PCLR denote the robust permutation LM and CLR tests, WAR and WLM are the wild bootstrap AR and LM tests
of Davidson and MacKinnon (2012), PNS denotes the non-studentized permutation test, and the remaining tests are
as in Table 1.
17
Table 3: Null rejection probabilities at 5% level. Over-identified homoskedastic model with k = 5,
n = 100.
λ = 0.1 λ = 4 λ = 20
n = 100 t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]
k = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5 p = 1 p = 5
AR 3.15 4.50 4.50 4.25 3.15 4.50 4.50 4.25 3.15 4.50 4.50 4.25
LM 2.90 4.30 4.15 4.25 3.20 4.05 3.70 3.70 3.00 4.10 3.25 3.85
CLRa 2.90 3.95 4.00 4.05 3.20 4.15 3.30 3.55 2.95 4.25 3.45 3.75
CLRb 2.80 3.95 4.05 4.05 3.20 4.10 3.35 3.45 2.55 4.00 3.30 3.95
PAR1 5.00 5.55 5.45 4.75 5.00 5.55 5.45 4.75 5.00 5.55 5.45 4.75
PAR2 5.05 6.85 5.60 5.45 5.05 6.85 5.60 5.45 5.05 6.85 5.60 5.45
PLM 5.10 5.60 5.40 5.15 5.00 6.05 5.05 5.05 4.70 6.15 4.95 5.45
PCLRa 4.80 6.25 5.20 4.85 4.30 5.50 4.20 4.30 4.25 5.20 3.70 4.15
PCLRb 6.20 7.25 5.35 5.10 6.30 6.70 4.65 4.75 5.80 7.00 4.30 5.00
RARn 18.65 15.70 5.25 4.45 18.65 15.70 5.25 4.45 18.65 15.70 5.25 4.45
RARw 12.40 10.20 5.35 4.10 12.40 10.20 5.35 4.10 12.40 10.20 5.35 4.10
RCLRn 13.85 14.55 3.35 3.70 13.15 14.50 3.15 3.75 11.10 12.65 4.00 3.50
RCLRw 10.40 12.90 4.45 4.55 10.20 11.90 4.20 4.65 9.15 10.90 4.55 4.35
WAR 5.30 7.45 5.35 4.10 5.30 7.45 5.35 4.10 5.30 7.45 5.35 4.10
WLM 4.65 5.85 4.95 4.80 4.20 6.40 4.40 4.05 4.50 5.95 4.30 4.35
Note: t5[0, Ik+p+1] and N [0, Ik+p+1] correspond to (W ′i , X′2i, ui, εi)
′ ∼ t5[0, Ik+p+1] (dependent but uncorrelated) and
(W ′i , X′2i, ui, εi)
′ ∼ N [0, Ik+p+1] (independent), respectively. The remaining part is as in Tables 1 and 2.
18
Table 4: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with weak
identification (λ = 0.1).
n = 100 p = 1 p = 5
t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]
Tests k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10
AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60
LM 2.95 4.35 3.55 4.90 4.30 4.30 5.25 3.35 2.85 6.00 4.30 3.85
CLRa 2.40 2.05 0.80 4.05 3.65 2.00 4.10 2.40 1.35 5.30 3.30 3.00
CLRb 2.45 2.25 1.25 4.10 3.60 2.20 4.25 2.60 1.25 5.25 3.45 3.00
PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70
PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35
PLM 4.45 5.95 6.20 5.45 5.80 6.35 6.80 4.65 5.05 6.50 5.60 5.95
PCLRa 3.60 5.20 4.80 5.50 5.45 4.35 6.65 4.70 4.10 6.35 5.25 5.70
PCLRb 3.65 6.40 8.45 5.50 5.65 5.15 6.80 5.20 7.55 6.35 5.50 6.85
RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70
RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35
RCLRn 19.75 25.70 32.30 11.40 8.30 6.40 32.25 33.75 44.75 13.20 8.75 7.85
RCLRw 14.35 18.05 21.95 9.75 8.20 6.60 27.00 27.55 38.65 11.15 8.55 8.60
WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95
WLM 4.65 3.90 3.95 5.65 4.20 6.30 7.00 6.80 7.00 5.25 4.30 5.45
Note: As in Table 3.
19
Table 5: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with weak
identification (λ = 4).
n = 100 p = 1 p = 5
t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]
Tests k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10
AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60
LM 3.10 3.95 3.70 4.55 3.85 3.80 4.50 2.80 2.50 6.00 3.60 4.00
CLRa 2.45 2.10 1.00 4.15 3.20 1.55 4.25 2.55 1.25 5.70 2.85 3.15
CLRb 2.40 2.25 1.50 4.10 3.15 1.85 4.15 2.45 1.30 5.65 3.00 3.45
PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70
PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35
PLM 4.20 5.35 5.80 5.15 4.90 5.05 6.20 3.90 4.60 6.30 4.45 6.10
PCLRa 3.70 5.55 4.15 4.90 4.70 4.05 6.05 4.65 3.90 6.45 4.55 5.65
PCLRb 3.80 6.20 8.75 5.05 5.20 5.40 6.20 5.80 7.85 6.85 5.05 7.10
RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70
RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35
RCLRn 19.45 22.40 29.95 11.05 7.50 6.05 29.25 31.25 41.50 11.85 8.50 7.15
RCLRw 13.70 17.50 19.55 9.40 7.45 6.35 25.00 26.20 36.00 10.10 8.25 7.55
WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95
WLM 3.20 3.50 4.15 4.00 4.05 4.90 6.60 5.85 7.00 5.55 3.90 5.30
Note: As in Table 3.
20
Table 6: Null rejection probabilities at 5% level. Conditionally heteroskedastic design with strong
identification (λ = 20).
n = 100 p = 1 p = 5
t5[0, Ik+p+1] N [0, Ik+p+1] t5[0, Ik+p+1] N [0, Ik+p+1]
Tests k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10 k = 2 k = 5 k = 10
AR 2.35 1.75 0.60 4.05 2.90 1.40 4.30 2.40 1.15 5.25 3.25 2.60
LM 3.05 3.20 2.40 4.90 3.00 3.15 4.55 3.10 2.10 5.65 3.95 3.90
CLRa 2.25 2.00 0.95 4.60 2.80 1.85 4.55 2.70 1.45 5.70 3.50 3.20
CLRb 2.45 2.40 1.35 4.65 2.95 2.20 4.70 2.80 1.75 5.80 3.55 3.25
PAR1 3.60 5.15 4.35 4.85 5.60 5.20 7.40 7.55 8.85 5.95 4.70 4.70
PAR2 3.35 4.85 4.35 4.90 5.30 5.00 6.35 5.50 4.95 6.40 5.30 5.35
PLM 4.35 4.35 4.50 5.45 4.20 4.75 5.95 4.40 4.20 6.45 4.95 5.90
PCLRa 4.35 4.55 3.55 4.95 3.40 3.20 5.85 4.10 3.30 6.35 4.10 4.60
PCLRb 4.50 6.00 8.25 5.00 3.90 4.65 6.00 5.35 7.25 6.50 4.90 6.65
RARn 22.30 27.30 32.55 13.45 10.20 8.05 23.55 24.80 29.85 13.75 10.10 8.70
RARw 14.35 17.90 20.10 10.40 8.85 6.65 15.95 16.00 17.55 10.40 8.25 7.35
RCLRn 16.75 17.90 23.45 9.80 7.05 5.85 25.35 24.15 33.65 11.60 6.25 6.95
RCLRw 12.80 13.80 16.75 8.85 7.05 5.95 22.05 21.15 28.75 9.85 6.25 6.90
WAR 4.50 5.80 5.55 4.75 5.20 4.90 8.65 10.80 12.60 5.75 4.40 4.95
WLM 3.60 3.85 3.70 4.90 4.75 4.55 6.15 5.25 7.00 5.80 3.65 4.95
Note: As in Table 3.
21
of causation between a better education of Protestants and industrialization in 1870s, the authors
estimate IV regressions where the dependent variable is a number of primary schools per 1000 in-
habitants, and the endogenous regressor is a fraction of Protestants instrumented by the distance
to Wittenberg, the city from which Protestantism were spread in a roughly concentric manner in
Martin Luther’s times.
The authors consider several different specifications which differ in terms of the exogenous
controls used: for the counties-level data,
• Specification 1: constant;
• Specification 2: latitude, longitude, a product of latitude and longitude, percentage of popu-
lation living in towns;
• Specification 3: latitude, longitude, a product of latitude and longitude, percentage of pop-
ulation living in towns, percentage of population younger than 15 years of age, percentage
of population older than 60 years of age, number of horses per capita, number of bulls per
capita, looms per capita, share of farm laborers in total population and tonnage of transport
ships;
and for the towns-level data,
• Specification 4: latitude, longitude, a product of latitude and longitude, percentage of pop-
ulation younger than 15 years of age, percentage of population older than 60 years of age,
looms per capita, buildings w/massive walls, businesses per capita and retailers per capita.
For further details of the study and the arguments underlie the use of the distance instrument, see
Becker and Woessmann (2010).
Since there is a single instrument, all specifications are just-identified. The sample sizes are 293
for the counties-level data and 156 for the towns-level data in the sample for 1816 Prussia. We
construct confidence intervals for the coefficient on the share of protestants using 1999 simulated
samples for the asymptotic CLR, wild bootstrap, permutation and rank-based confidence intervals.
We include confidence intervals from the homoskedastic two-stage least squares t test, t2sls, and
a non-studentized permutation test, PNS, based on W ′u(θ0), u(θ0) = MX(y − Y θ0), that rejects
when it is in the lower or upper tails of the permutation distribution of W ′uπ(θ0) in all of the
specifications.
The 95% confidence intervals and their lengths based on inverting different tests are reported
in Table 8. In all of the specifications, the heteroskedasticity-robust asymptotic, permutation and
22
wild bootstrap confidence intervals are nearly identical. Interestingly, they are shorter than the
homoskedastic AR, RARn and RARw confidence intervals. The RCLRn and RCLRw confidence
intervals are comparable to the permutation confidence intervals for the counties-level data, and
shorter for the towns-level data, however these tests do not guard against heteroskedasticity. As the
Breusch-Pagan test for heteroskedasticity in the reduced-form regression of the dependent variable
and the endogenous regressor on the instrument and the exogenous covariates have p-values 0.01366
and 2.22×10−16 for Specification 1, 1.0916×10−6 and 0.0033 for Specification 2, 5.3147×10−7 and
6.8536×10−6 for Specification 3, and 0.14162 and 7.0197×10−5 for Specification 4, there is a strong
evidence of heteroskedasticity and so the heteroskedasticity-robust methods should be more reliable.
The non-studentized permutation test which is exact under independence (Assumption 2.1) and does
not account for conditional heteroskedasticity yields a confidence interval considerably wider than
the one based on the robust permutation tests.
Using the heteroskedasticity and identification-robust confidence intervals, we confirm the find-
ings of Becker and Woessmann (2010) that there is a significant effect of the share of Protestants
on the school supply in the counties-level data.
4.2 Bazzi and Clemens (2013)
Bazzi and Clemens (2013) reexamines cross-sectional IV regression results about the effect of foreign
aid on economic growth from several published studies. The variables in the “aid-growth” IV
regressions estimated by Rajan and Subramanian (2008) and Bazzi and Clemens (2013) are
• yi : the average annual growth of per capita GDP;
• Yi : the foreign-aid receipts to GDP ratio (Aid/GDP);
• Wi1 : a variable constructed from aid-recipient population size, aid-donor population size,
colonial relationship, and language traits (see Appendix A of Bazzi and Clemens (2013));
• Xi : constant, initial per capita GDP, initial level of policy, initial level of life expectancy,
geography, institutional quality, initial inflation, initial M2/GDP, initial budget balance/GDP,
Revolutions, and Ethnic fractionalization.
The data cover the period 1970-2000 and the sample size is n = 78. Bazzi and Clemens (2013)
consider the following four specifications (Table 1, Columns 1-4 therein):
• Specification 1: The baseline specification of Rajan and Subramanian (2008) (Table 4, Column
2) with the variables as above;
23
Table 7: Confidence intervals for the coefficient on the share of Protestants in Becker and Woessmann
(2010) data.
Counties, n = 293 Towns, n = 156
Spec. 1 Spec. 2 Spec. 3 Spec. 4
Test statistics 95% CI Length 95% CI Length 95% CI Length 95% CI Length
t2sls [0.24, 1.27] 1.03 [0.94, 1.80] 0.86 [0.98, 1.94] 0.96 [−0.48, 1.03] 1.51
Hom. AR [0.21, 1.25] 1.04 [0.95, 1.80] 0.85 [0.98, 1.95] 0.97 [−0.48, 1.05] 1.53
RARn [0.35, 1.28] 0.93 [0.83, 2.04] 1.21 [0.86, 2.14] 1.28 [−0.49, 1.06] 1.55
RARw [0.46, 1.31] 0.85 [0.79, 1.81] 1.02 [0.81, 1.97] 1.16 [−0.51, 0.98] 1.49
RCLRn [0.32, 1.28] 0.96 [0.99, 1.78] 0.79 [1.07, 1.89] 0.82 [−0.29, 0.73] 0.93
RCLRw [0.45, 1.32] 0.87 [1.03, 1.68] 0.65 [1.11, 1.83] 0.72 [−0.26, 0.68] 0.94
AR [0.27, 1.22] 0.95 [1.07, 1.69] 0.62 [1.10, 1.86] 0.76 [−0.51, 0.97] 1.48
PAR1 [0.28, 1.21] 0.93 [1.07, 1.69] 0.62 [1.07, 1.87] 0.80 [−0.53, 0.98] 1.51
PAR2 [0.27, 1.23] 0.96 [1.07, 1.70] 0.63 [1.10, 1.86] 0.76 [−0.51, 0.95] 1.46
PNS [0.22, 1.25] 1.03 [0.77, 2.03] 1.26 [0.72, 2.22] 1.50 [−0.89, 1.59] 2.48
WAR [0.25, 1.23] 0.98 [1.07, 1.69] 0.62 [1.10, 1.88] 0.78 [−0.56, 0.98] 1.54
F -statistic 80.78 174.65 143.78 79.50
Breusch-Pagan p-values 0.01, 0.00 0.00, 0.00 0.00, 0.00 0.14, 0.00
Note: The number of permutation samples is N = 1999. t2sls and Hom. AR are the homoskedastic two-stage least
squares t and AR tests, PNS denotes the non-studentized permutation test, and the remaining tests are as in Tables
1 and 2. The Breusch-Pagan test p-values are computed using the fitted values in the reduced-form regressions of yi
and Yi on the instrument and exogenous covariates respectively.
24
• Specification 2: log population is included in the second stage of Specification 1;
• Specification 3: Wi2 = log population replaces the instrument Wi1 in Specification 1;
• Specification 4: An instrument, Wi3, constructed from the colonial ties indicators only replaces
Wi1 in Specification 1.
Specifications 1-4 are just-identified as there is a single instrument for the single endogenous regres-
sor, hence we only consider the AR and PAR test results. In addition, we consider Specifications
5-7 with IV’s (Wi1,Wi2)′, (Wi2,Wi3)′ and (Wi1,Wi2,Wi3)′, respectively, to examine whether the
over-identified specifications could have any instrumentation power. Wi2 is included in the latter
three cases because it is the strongest instrument as reported in Bazzi and Clemens (2013). We also
include the confidence interval based on the two-stage least squares t-test, denoted as t2sls.
Tables 8 and 9 report the 95% confidence intervals for the coefficient on Aid/GDP. In all spec-
ifications, the Breusch-Pagan test p-values from the two separate reduced-form regressions of the
endogenous variables on the exogenous variables show an evidence of heteroskedasticity.
The results for Specifications 1-4 in Table 8 qualitatively agree with the homoskedastic CLR
confidence intervals reported in Bazzi and Clemens (2013). The permutation confidence intervals
are nearly identical but slightly shorter than the asymptotic and wild bootstrap confidence intervals
in Specifications 1 and 3. The infinite confidence intervals in Specifications 2 and 4 that do not use
log population or the instruments based on it indicate that the population size instrument used in
the other specifications has indeed the most identifying power.
The above results also extend to Table 9. Some noteworthy findings are as follows. The WAR
confidence intervals are wider than the AR and PAR confidence intervals. The PLM confidence
intervals are shorter than the LM and WLM confidence intervals, but still wider than the other
confidence intervals. Somewhat surprisingly, in Specification 6, the AR, CLRa, PAR and PCLR
confidence intervals exclude 0 thereby pointing towards borderline significant effect of the foreign aid
on growth despite the rather small value of the first-stage F -statistic 17.90. Also, in Specification 7,
the PCLR confidence intervals exclude 0 while the opposite holds for the asymptotic CLR confidence
intervals which underscore the utility of the proposed tests.
However, these results should be interpreted with caution because the population size might
affect the growth through multiple channels as argued by Bazzi and Clemens (2013). Overall, the
uncertainty regarding the effect of foreign aid in the sample data may be expressed more accurately
by the permutation and the wild bootstrap confidence intervals as the unobserved confounders in
25
the “aid-growth” regression are more likely to be orthogonal to, rather than independent of the
population size.
Table 8: Confidence intervals for the coefficient on Aid/GDP in Rajan and Subramanian (2008)
and Bazzi and Clemens (2013) data.
Spec. 1 Spec. 2 Spec. 3 Spec. 4
Test statistics 95% CI Length 95% CI 95% CI Length 95% CI
t2sls [−0.05, 0.24] 0.29 [−5.14, 6.96] [−0.06, 0.21] 0.27 [−973.38, 941.49]
Hom. AR [−0.02, 0.28] 0.30 (−∞,∞) [−0.03, 0.25] 0.28 (−∞,∞)
RARn [−0.05, 0.29] 0.34 (−∞,∞) [−0.06, 0.28] 0.34 (−∞,∞)
RARw [−0.06, 0.20] 0.26 (−∞,∞) [−0.07, 0.16] 0.23 (−∞,∞)
RCLRn [−0.03, 0.26] 0.29 (−∞,∞) [−0.04, 0.22] 0.26 (−∞,∞)
RCLRw [−0.04, 0.18] 0.22 (−∞,∞) [−0.05, 0.16] 0.21 (−∞,∞)
AR [−0.01, 0.29] 0.30 (−∞,∞) [−0.02, 0.25] 0.27 (−∞,∞)
PAR1 [−0.01, 0.30] 0.31 (−∞,∞) [−0.03, 0.27] 0.30 (−∞,∞)
PAR2 [−0.01, 0.28] 0.29 (−∞,∞) [−0.02, 0.25] 0.27 (−∞,∞)
PNS [−0.04, 0.37] 0.41 (−∞,∞) [−0.05, 0.30] 0.35 (−∞,∞)
WAR [−0.02, 0.32] 0.34 (−∞,∞) [−0.03, 0.28] 0.31 (−∞,∞)
F -statistic 31.63 0.13 36.30 0.00
Breusch-Pagan p-values 0.03, 0.00 0.03, 0.00 0.02, 0.00 0.01, 0.00
Note: The number of permutation samples is N = 1999. The sample size is 78. Specifications 1-4 are just-identified.
t2sls and Hom. AR are the homoskedastic two-stage least squares t and AR tests, PNS denotes the non-studentized
permutation test, and the remaining tests are as in Tables 1 and 2. The Breusch-Pagan test p-values are computed
using the fitted values in the reduced-form regressions of yi and Yi on the instrument and exogenous covariates
respectively.
5 Conclusion
This paper studies randomization inference in the linear IV model. The proposed permutation tests
are analogs of the linear IV identification-robust tests, and share the strengths of the rank/permutation
tests and the wild bootstrap tests. In particular, these tests allow for conditional heteroskedastic-
ity, do not require the instruments to be independently distributed of the error terms unlike most
26
Table 9: Confidence intervals for the coefficient on Aid/GDP in Rajan and Subramanian (2008)
and Bazzi and Clemens (2013) data.
Spec. 5 Spec. 6 Spec. 7
Test statistics 95% CI Length 95% CI Length 95% CI Length
t2sls [−0.06, 0.22] 0.28 [−0.06, 0.21] 0.27 [−0.06, 0.21] 0.27
Hom. AR [−0.05, 0.31] 0.36 [−0.02, 0.26] 0.28 [−0.04, 0.32] 0.36
Hom. LM [−0.63, 0.26] 0.89 [−0.63, 0.28] 0.91 [−0.63, 0.28] 0.91
Hom. CLR [−0.03, 0.26] 0.29 [−0.03, 0.28] 0.31 [−0.03, 0.28] 0.31
RARn [−0.08, 0.41] 0.49 [−0.06, 0.32] 0.38 [−0.09, 0.40] 0.49
RARw [−0.08, 0.29] 0.37 [−0.06, 0.23] 0.29 [−0.09, 0.27] 0.36
RCLRn [−0.03, 0.24] 0.27 [−0.03, 0.25] 0.28 [−0.03, 0.25] 0.28
RCLRw [−0.05, 0.17] 0.22 [−0.03, 0.17] 0.20 [−0.04, 0.17] 0.21
AR [−0.04, 0.34] 0.38 [0.04, 0.32] 0.28 [0.01, 0.42] 0.41
LM [−0.43, 0.27] 0.70 [−0.61, 0.36] 0.97 [−0.69, 0.35] 1.04
CLRa [−0.03, 0.28] 0.31 [0.02, 0.36] 0.34 [0.00, 0.35] 0.35
CLRb [−0.02, 0.28] 0.30 [0.00, 0.36] 0.36 [−0.03, 0.34] 0.37
PAR1 [−0.04, 0.35] 0.39 [0.04, 0.34] 0.30 [0.01, 0.43] 0.42
PAR2 [−0.04, 0.34] 0.38 [0.04, 0.31] 0.27 [0.02, 0.38] 0.36
PLM [−0.43, 0.25] 0.68 [−0.60, 0.33] 0.93 [−0.65, 0.33] 0.98
PCLRa [−0.03, 0.27] 0.30 [0.02, 0.34] 0.32 [0.01, 0.35] 0.34
PCLRb [−0.02, 0.27] 0.29 [0.02, 0.34] 0.32 [0.01, 0.35] 0.34
PNS [−0.04, 0.39] 0.43 [−0.06, 0.32] 0.38 [−0.04, 0.35] 0.39
WAR [−0.05, 0.44] 0.49 [0.00, 0.33] 0.33 [−0.04, 0.48] 0.52
WLM [−0.64, 0.30] 0.94 [−0.66, 0.30] 0.96 [−0.66, 0.33] 0.99
F -statistic 17.98 17.90 11.86
Sargan p-value 0.42 0.08 0.21
Breusch-Pagan p-values 0.03, 0.00 0.02, 0.00 0.02, 0.00
Note: The number of permutation samples is N = 1999. The sample size is 78. Specifications 5-7 are over-identified.
t2sls, Hom. AR, Hom. LM and Hom. CLR are the homoskedastic two-stage least squares t, AR, LM and CLR
tests, PNS denotes the non-studentized permutation test, and the remaining tests are as in Tables 1 and 2. The
Breusch-Pagan test p-values are computed using the fitted values in the reduced-form regressions of yi and Yi on the
instrument and exogenous covariates respectively.
27
rank-based or permutation tests currently available in the literature, and could be robust to heavy
tails in certain cases.
We plan to address the important issue of cluster and identification-robust inference in IV models
in another work.
28
References
Abadie, A., Athey, S., Imbens, G. W. and Wooldridge, J. M. (2020). Sampling-based vs.
Design-based Uncertainty in Regression Analysis. Econometrica, 88 265–296.
Anderson, T. W. and Rubin, H. (1949). Estimation of the Parameters of a Single Equation in
a Complete System of Stochastic Equations. Annals of Mathematical Statistics, 20 46–63.
Andrews, D. W. and Guggenberger, P. (2019a). Identification-and Singularity-Robust Infer-
ence for Moment Condition Models. Quantitative Economics, 10 1703–1746.
Andrews, D. W. K., Cheng, X. and Guggenberger, P. (2020). Generic Results for Establish-
ing the Asymptotic Size of Confidence Sets and Tests. Journal of Econometrics, 218 496–531.
Andrews, D. W. K. and Guggenberger, P. (2017a). Asymptotic Size of Kleibergen’s LM and
Conditional LR Tests for Moment Condition Models. Econometric Theory, 33 1046–1080.
Andrews, D. W. K. and Guggenberger, P. (2017b). Supplemental material to ”Asymptotic
Size of Kleibergen’s LM and Conditional LR Tests for Moment Condition Models”. Econometric
Theory, 33.
Andrews, D. W. K. and Guggenberger, P. (2019b). Supplemental material to “Identification-
and Singularity-Robust Inference for Moment Condition Models”. Quantitative Economics, 10.
Andrews, D. W. K. and Marmer, V. (2008). Exactly Distribution-Free Inference in Instrumental
Variables Regression with Possibly Weak Instruments. Journal of Econometrics, 142 183–200.
Andrews, D. W. K. and Soares, G. (2007). Rank Tests for Instrumental Variables Regression
with Weak Instruments. Econometric Theory, 23 1033–1082.
Andrews, D. W. K. and Stock, J. H. (2007). Inference with Weak Instruments. In Advances
in Econometrics: Proceedings of the Ninth World Congress of the Econometric Society, vol. 3.
Andrews, I., Stock, J. H. and Sun, L. (2019). Weak Instruments in Instrumental Variables
Regression: Theory and Practice. Annual Review of Economics, 11 727–753.
Bazzi, S. and Clemens, M. A. (2013). Blunt Instruments: Avoiding Common Pitfalls in Identi-
fying the Causes of Economic Growth. American Economic Journal: Macroeconomics, 5 152–86.
Becker, S. O. and Woessmann, L. (2010). The Effect of Protestantism on Education before the
Industrialization: Evidence from 1816 Prussia. Economics Letters, 107 224–228.
29
Canay, I. A. and Kamat, V. (2018). Approximate Permutation Tests and Induced Order Statistics
in the Regression Discontinuity Design. The Review of Economic Studies, 85 1577–1608.
Canay, I. A., Romano, J. P. and Shaikh, A. M. (2017). Randomization Tests under an
Approximate Symmetry Assumption. Econometrica, 85 1013–1030.
Chernozhukov, V., Hansen, C. and Jansson, M. (2009). Finite Sample Inference for Quantile
Regression Models. Journal of Econometrics, 152 93–103.
Chung, E. and Romano, J. P. (2013). Exact and Asymptotically Robust Permutation Tests.
Annals of Statistics, 41 484–507.
Chung, E. and Romano, J. P. (2016). Multivariate and Multiple Permutation Tests. Journal of
Econometrics, 193 76–91.
Davidson, R. and MacKinnon, J. G. (2008). Bootstrap Inference in a Linear Equation Estimated
by Instrumental Variables. The Econometrics Journal, 11 443–477.
Davidson, R. and MacKinnon, J. G. (2012). Wild Bootstrap Tests for IV Regression. Journal
of Business & Economic Statistics, 28 128–144.
DiCiccio, C. J. and Romano, J. P. (2017). Robust Permutation Tests for Correlation and
Regression Coefficients. Journal of the American Statistical Association, 112 1211–1220.
Dufour, J.-M. (2003). Identification, Weak Instruments and Statistical Inference in Econometrics.
Canadian Journal of Economics, 36 767–808.
Dufour, J.-M., Flachaire, E. and Khalaf, L. (2019). Permutation Tests for Comparing
Inequality Measures. Journal of Business & Economic Statistics, 37 457–470.
Freedman, D. and Lane, D. (1983). A Nonstochastic Interpretation of Reported Significance
Levels. Journal of Business & Economic Statistics, 1 292–298.
Ganong, P. and Jager, S. (2018). A Permutation Test for the Regression Kink Design. Journal
of the American Statistical Association, 113 494–504.
Hansen, B. E. (2021). Econometrics. Manuscript.
Hoeffding, W. (1951). A Combinatorial Central Limit Theorem. Annals of Mathematical Statis-
tics, 22 558–566.
30
Hu, T.-C., Moricz, F. and Taylor, R. L. (1989). Strong Laws of Large Numbers for Arrays of
Rowwise Independent Random Variables. Acta Mathematica Hungarica, 54 153–162.
Imbens, G. W. and Rosenbaum, P. R. (2005). Robust, Accurate Confidence Intervals with a
Weak Instrument: Quarter of Birth and Education. Journal of the Royal Statistical Society:
Series A (Statistics in Society), 168 109–126.
Janssen, A. (1997). Studentized Permutation Tests for Non-IID Hypotheses and the Generalized
Behrens-Fisher Problem. Statistics & Probability letters, 36 9–21.
Kleibergen, F. (2002). Pivotal Statistics for Testing Structural Parameters in Instrumental Vari-
ables Regression. Econometrica, 70 1781–1803.
Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses. 3rd ed. Springer.
Moreira, H. and Moreira, M. J. (2019). Optimal Two-Sided Tests for Instrumental Variables
Regression with Heteroskedastic and Autocorrelated Errors. Journal of Econometrics, 213 398–
433.
Moreira, M. J. (2003). A Conditional Likelihood Ratio Test for Structural Models. Econometrica,
71 1027–1048.
Moreira, M. J., Porter, J. R. and Suarez, G. A. (2009). Bootstrap Validity for the Score
Test When Instruments May be Weak. Journal of Econometrics, 149 52–64.
Motoo, M. (1956). On the Hoeffding’s Combinatorial Central Limit Theorem. Annals of the
Institute of Statistical Mathematics, 8 145–154.
Neubert, K. and Brunner, E. (2007). A Studentized Permutation Test for the Non-Parametric
Behrens–Fisher Problem. Computational Statistics & Data Analysis, 51 5192–5204.
Neuhaus, G. (1993). Conditional Rank Tests for the Two-Sample Problem under Random Cen-
sorship. Annals of Statistics, 21 1760–1779.
Pauly, M. (2011). Discussion about the Quality of F-Ratio Resampling Tests for Comparing
Variances. Test, 20 163–179.
Rajan, R. G. and Subramanian, A. (2008). Aid and Growth: What Does the Cross-Country
Evidence Really Show? The Review of Economics and Statistics, 90 643–665.
31
Schneller, W. (1988). A Short Proof of Motoo’s Combinatorial Central Limit Theorem using
Stein’s Method. Probability Theory and Related Fields, 78 249–252.
So, B. S. and Shin, D. W. (1999). Cauchy Estimators for Autoregressive Processes with Appli-
cations to Unit Root Tests and Confidence Intervals. Econometric Theory, 15 165–176.
Stock, J. H., Wright, J. H. and Yogo, M. (2002). A Survey of Weak Instruments and Weak
Identification in Generalized Method of Moments. Journal of Business and Economic Statistics,
20 518–529.
Young, A. (2020). Consistency without Inference: Instrumental Variables in Practical Application.
Tech. rep., London School of Economics.
32
A Proofs
In the sequel,d= denotes distributional equality. We use the abbrevations SLLN for the strong law of large of
numbers, WLLN for the weak law of large numbers for independent L1+δ-bounded random variables, CLT
for the central limit theorem, CMT for the continuous mapping theorem, and SV and SVD for “singular
value” and “singular value decomposition”, respectively.
Let (Ω0,F , P ) be the probability triplet underlying the observations (y, Y,W,X), and Pπ denote the
probability measure induced by a random permutation π uniformly distributed over Gn.pπ−→ and
dπ−→ denote
the convergence in probability and convergence in distribution with respect to Pπ. Also, oa.s.(·) and Oa.s.(·)
stand for o(·) P -almost surely and O(·) P -almost surely, respectively.
Proof of Proposition 2.1. Since Σu(θ0) = diag(u1(θ0)2, . . . , un(θ0)2) and ui(θ0) = ui−X ′i(X ′X)−1X ′u under
H0 : θ = θ0, we may write Σu(θ0) = Σu(u,X). Then
PAR1 = u′MXWπ(W ′πMXΣu(u,X)MXWπ)−1W ′πMXu
d= u′MXW (W ′MXΣu(u,X)MXW )−1W ′MXu
= AR, (A.1)
where the equality in distribution follows from Assumption 1. Then, letting φPAR1n = φPAR1
n (Wπ, X, u(θ0))
E[φPAR1n ] =
1
n!E
[ ∑π∈Gn
φPAR1n (Wπ, X, u(θ0))
]
=1
n!E
[N+ +
n!α−N+
N0N0
]= α,
where the first equality uses (A.1), and the second equality follows from the definitions of φPAR1n (Wπ, X, u(θ0)),
N+ and N0. The proof of the PAR2 test is similar, thus is omitted.
A.1 Outline of the asymptotic results
The proof of the asymptotic results is organized as follows. In Section A.2, we provide some asymptotic results
for triangular arrays of random variables which will be used later in the proof of asymptotic similarity. In
Section A.3, following Andrews and Guggenberger (2019b) we define the drifting sequences of parameters for
establishing the uniform validity of the permutation tests, and state two lemmas pertaining to the conditional
critical value of the CLR-type statistics. Section A.4 provides some asymptotic results for quantities that
depend on the sample data only. Section A.5 presents the main results about the asymptotic distribution of
key permutation quantities (Lemma A.6) and the PCLR statistics under the drifting parameter sequences.
The uniform asymptotic similarity is proved in Section A.6. The asymptotic distributions of the test statistics
under strong or semi-strong identification, and local alternatives are derived in Section A.7. Finally, Section
A.8 presents the proof of Lemma A.8 used in the derivation of the asymptotic distribution of the PCLR
statistics.
33
A.2 Supplementary results
We use the following auxiliary results in the proof of asymptotic results. The first result is a version of the
Hoeffding (1951)’s combinatorial CLT established by Motoo (1956) (see also Schneller (1988)) which is based
on a Lindeberg-type condition.
Lemma A.1 (Combinatorial CLT). For n2, n > 1, real scalars cn(i, j), i, j = 1, . . . , n, and a random permu-
tation π uniformly distributed over Gn, consider the sum
Sn ≡n∑i=1
cn(i, π(i))
with EPπ [Sn] = n−1∑ni=1
∑nj=1 cn(i, j) and VarPπ [Sn] = 1
n−1
∑ni=1
∑nj=1 g
2n(i, j), where
gn(i, j) = cn(i, j)− n−1n∑i=1
cn(i, j)− n−1n∑j=1
cn(i, j) + n−2n∑i=1
n∑j=1
cn(i, j).
If gn(i, j) 6= 0 for some (i, j), so that VarPπ [Sn] > 0, and for any ε > 0
limn→∞
1
n
n∑i=1
n∑j=1
g2n(i, j)
1n−1
∑ni=1
∑nj=1 g
2n(i, j)
1
|gn(i, j)|(1
n−1
∑ni=1
∑nj=1 g
2n(i, j)
)1/2> ε
= 0, (A.2)
then as n→∞Sn − EPπ [Sn]√
VarPπ [Sn]
dπ−→ N [0, 1] P -almost surely.
The next result is a triangular array SLLN.
Lemma A.2 (Corollary to Theorem 1 of Hu et al. (1989)). Let Znk : k = 1, . . . , n;n = 1, 2, . . . be an array
of i.i.d. random variables such that E[|Znk|2p] <∞ for some 1 ≤ p < 2. Then,
n−1/pn∑k=1
(Znk − E[Znk])a.s.−→ 0. (A.3)
Proof of Lemma A.2. By the cr inequality, E[(Znk−E[Znk])2p] ≤ 22p−1(E[Z2pnk]+(E[Znk])2p) <∞. Applying
Theorem 1 of Hu et al. (1989) with Xnk = Znk − E[Znk] yields that n−1/p∑nk=1Xnk converges completely
to 0 i.e. for any ε > 0∞∑n=1
P
[|n−1/p
n∑k=1
Xnk| > ε
]<∞. (A.4)
From Borel-Cantelli lemma, it follows that n−1/p∑nk=1Xnk
a.s.−→ 0.
We will also use the following fact.
Lemma A.3. If Xini=1 and X are defined on the probability space (Ω0,F , P ) and Xnp−→ X, then Xn
pπ−→
X in P -probability.
Proof of Lemma A.3. This proof is analogous to the bootstrap version of the result, see Hansen (2021),
Theorem 10.1.
34
A.3 Drifting sequences of parameters
Following Andrews and Guggenberger (2019b), we first introduce some notations, and define the DGP’s.
Henceforth, we shall suppress the argument θ0 in some quantities, and write, for example, m(θ0) = m,
Σ(θ0) = Σ, J(θ0) = J , mπ(θ0) = mπ, Σπ(θ0) = Σπ, Jπ(θ0) = Jπ.
Define GP ≡ EP [Z∗i Z∗′i ]Γ , H ≡ Σ−1/2, HP ≡ Σ
−1/2P , U ≡
((θ0, Id)Ω
ε(Σ, V)−1(θ0, Id)′)1/2
and
UP ≡((θ0, Id)Ω
ε(ΣP ,VP )−1(θ0, Id)′)1/2, where V = Va(θ0), VP = VP,a (defined in (2.17) and (2.35)) and
V = Vb(θ0), VP = VP,b (defined in (2.19) and (2.36)) for the CLRa,PCLRa and CLRb,PCLRb statistics,
respectively.
The SVD of HPGPUP is given as follows. Let κ1P ≥ · · · ≥ κdP and BP denote the ordered eigenvalues,
and a d× d orthogonal matrix of corresponding eigenvectors, respectively, of
U ′PG′PHPH
′PGPUP , (A.5)
and κ1P ≥ · · · ≥ κkP and AP denote the ordered eigenvalues, and a k×k orthogonal matrix of corresponding
eigenvectors, respectively, of
HPGPUPU′PG′PH′P . (A.6)
Let τ1P ≥ · · · ≥ τdP ≥ 0 denote the singular values of
HPGPUP . (A.7)
Define
λ1,P ≡ (τ1P , . . . , τdP )′ ∈ Rd, (A.8)
λ2,P ≡ BP ∈ Rd×d, (A.9)
λ3,P ≡ AP ∈ Rk×k, (A.10)
λ4,P ≡ GP = EP [Z∗i Z∗′i ]Γ ∈ Rk×d, (A.11)
λ5,P ≡ VP,a = EP
Z∗i ui
vec(Z∗i (Z∗′i Γ + V ′i ))
Z∗i ui
vec(Z∗i (Z∗′i Γ + V ′i ))
′ ∈ R(d+1)k×(d+1)k, (A.12)
λ6,P ≡ [λ6,1P , . . . , λ6,(d−1)P ]′ =
[τ2Pτ1P
, . . . ,τdP
τ(d−1)P
]′∈ [0, 1]d−1, (A.13)
λ7,P ≡ EP [(W ′i , X′i)′(W ′i , X
′i)] ∈ R(k+p)×(k+p), (A.14)
λ8,P ≡ Σe = VarP [(ui, V′i )′] ∈ R(d+1)×(d+1), (A.15)
λ9,P = (ΣP ,K(VP )) ∈ (Rk×k,R(d+1)k×(d+1)k), VP ∈ VP,a,VP,b, (A.16)
λ10,P ≡ P, (A.17)
λP ≡ (λ1,P , . . . , λ10,P )′. (A.18)
35
As in Andrews and Guggenberger (2019b), the drifting parameters λ1,P , . . . , λ6,P , λ9,P , λ10,P are used in the
asymptotic results for the AR, LM, CLR statistics. The parameter vector λ1,P characterizes the identifi-
cation strength. The sequences λ7,P and λ8,P are needed additionally for the asymptotic properties of the
permutation statistics.
Define
Λ ≡ λP : P ∈ PPCLR0 ,
hn(λP ) ≡ [n1/2λ1,P , λ2,P , λ3,P , λ4,P , λ5,P , λ6,P , λ7,P , λ8,P , λ9,P ],
and
H ≡h ∈ (R ∪ ±∞)dim(hn(λPn )) : hwn(λPwn )→ h for some subsequence wn of n
and some sequence λPwn ∈ Λ : n ≥ 1.
The following assumption corresponds to Assumption B∗ of Andrews and Guggenberger (2019b) which is key
for establishing the asymptotic similarity of the tests.
Assumption 3. For any subsequence wn of n and any sequence λPwn ∈ Λ : n ≥ 1 for which
hwn(λPwn )→ h ∈ H, EPwn [φPRwn ]→ α for some α ∈ (0, 1).
Let λPn ∈ Λ : n ≥ 1 be a sequence such that hn(λPn)→ h ∈ H, and
n1/2τjPn → h1,j ≥ 0, j = 1, . . . , d, (A.19)
λj,Pn → hj , j = 2, . . . , 8, (A.20)
λ5,mPn = EPn [Z∗i Z∗′i u
2i ]→ h5,m = Σ, (A.21)
λ6,jPn → h6,j , j = 1, . . . , d− 1, (A.22)
λ7,Pn → h7 ≡
Qww Qwx
Qwx Qxx
, (A.23)
λ8,Pn → h8 ≡ Σe =
σ2 ΣuV
ΣV u ΣV
, (A.24)
λ9,Pn → h9 ≡ (Σ,K(V)), (A.25)
where λ5,mPn ∈ Rk×k is the upper-left sub-matrix of λ5,Pn with the corresponding limit h5,m = Σ, h7 and h8
are partitioned conformably with λ7,Pn and λ8,Pn , respectively, and for both (ΣP ,K(VP,a)) and (ΣP ,K(VP,b)),
the limit is denoted by the same variable h9 with V in (A.25) denoting the limit of VP ∈ VP,a,VP,b defined
in (A.16).
Consider q = qh ∈ 0, . . . , d such that
h1,j =∞ for 1 ≤ j ≤ q,
h1,j <∞ for q + 1 ≤ j ≤ d.
36
The normalization matrices used to derive the limiting distribution of the test statistics are defined as follows:
SPn ≡ diag(
(n1/2τ1Pn)−1, . . . , (n1/2τqPn)−1, 1, . . . , 1)∈ Rd×d, (A.26)
TPn ≡ BPnSPn ∈ Rd×d. (A.27)
In the sequel, we index the quantities indexed by P = Pn by n, for example, write ΣP = EP [Z∗i Z∗′i u
2i ] = Σn,
GP = EP [Z∗i Z∗′i ]Γ = Gn, λwn = λPwn and λn = λPn .
Let κ1 ≥ · · · ≥ κd ≥ 0 denote the eigenvalues of
nU ′J ′HH ′J U .
Partition
An = [An,q, An,k−q], An,q ∈ Rk×q, An,k−q ∈ Rk×(k−q), (A.28)
Bn = [Bn,q, Bn,d−q] , Bn,q ∈ Rd×q, Bn,d−q ∈ Rd×(d−q), (A.29)
h2 = [h2,q, h2,d−q], h2,q ∈ Rd×q, h2,d−q ∈ Rd×(d−q), (A.30)
h3 = [h3,q, h3,k−q], h3,q ∈ Rk×q, h3,k−q ∈ Rk×(k−q). (A.31)
Define
Υn ≡
Υn,q 0q×(d−q)
0(d−q)×q Υn,d−q
0(k−d)×q 0(k−d)×(d−q)
∈ Rk×d, Υn,q ≡ diag(τ1n, . . . , τqn) ∈ Rq×q,
Υn,d−q ≡ diag(τ(q+1)n, . . . , τdn) ∈ R(d−q)×(d−q). (A.32)
For a non-random matrix J ∈ R(k−q)×(d−q), let rk,d,q(J , 1− α) denote the 1− α quantile of
Z∗′Z∗ − λmin[(J ,Z∗2 )′(J ,Z∗2 )], Z∗ = [Z∗′1 ,Z∗′2 ]′ ∼ N [0k×1, Ik], Z∗1 ∈ Rq, Z∗2 ∈ Rk−q. (A.33)
The following results due to Andrews and Guggenberger (2019b) will be used in the proof of the asymptotic
similarity of the PCLR tests.
Lemma A.4 (Lemma 27.3 of Andrews and Guggenberger (2019b)). Let Z = [Z ′1,Z ′2]′ ∼ N [0k×1, Ik], Z1 ∈
Rq, Z2 ∈ Rk−q, and Υ(τ c) ≡ [diag(τ c), 0(d−q)×(k−d)] ∈ R(k−q)×(d−q), where τ c ≡ (τ c(q+1), . . . , τcd)′ ∈ Rd−q
with τ cq+1 ≥ . . . , τ cd ≥ 0, and k ≥ d ≥ 1 and 0 ≤ q ≤ d. Then, the distribution function of
CLR∞(τ c) ≡ Z ′Z − λmin[(Υ(τ c),Z2)′(Υ(τ c),Z2)] (A.34)
is continuous and strictly increasing at its 1− α quantile rk,d,q(Υ(τ c), 1− α) for all α ∈ (0, 1).
Lemma A.5 (Lemma 16.2 of Andrews and Guggenberger (2019b)). Let J c be a (k−q)× (d−q) matrix with
the singular value decomposition AcΥcBc′, where Ac is a (k − q)× (k − q) orthogonal matrix of eigenvectors
of J cJ c′, Bc is a (d− q)× (d− q) orthogonal matrix of eigenvectors of J c′J c, and Υc is the (k− q)× (d− q)
37
matrix with the d − q singular values τ cq+1 ≥ · · · ≥ τ cd of J c as its first d − q diagonal elements and zeros
elsewhere. Then, for rk,d,q(·, 1− α) defined in (A.33)
rk,d,q(J c, 1− α) = rk,d,q(Υc, 1− α),
and the distribution of the random variable in (A.34) is identical to that of (A.33) with J c in place of J .
A.4 Probability limits of some key sample quantities
The proof of the asymptotic similarity of the permutation tests involves the derivation of the asymptotic
distributions of the AR, LM, CLRa and CLRb statistics under the drifting sequences of parameters given
above which can be derived following Andrews and Guggenberger (2017a,b, 2019a,b) based on the moment
function m∗i (θ) = Z∗i u∗i (θ), where u∗i (θ) ≡ Z∗′i Γ (θ−θ0)+ui+V
′i (θ−θ0), and the corresponding sample moment
function m∗(θ) ≡ n−1∑ni=1 Z
∗i u∗i (θ). Note here that the test statistics are based on m(θ) ≡ n−1Z ′(y−Y θ) =
n−1Z ′ZΓ (θ − θ0) + n−1Z ′u+ n−1Z ′V (θ − θ0) rather than m∗(θ). Therefore, since
m∗(θ) = n−1Z∗′Z∗Γ (θ − θ0) + n−1Z∗′u+ n−1Z∗′V (θ − θ0),
where Z∗ ≡ W − X(EP [XiX′i])−1 EP [XiW
′i ], one needs to account for the randomness that results when
Z∗ is replaced by Z in order to derive the asymptotic distributions. Since the latter does not entail any
substantial change in the proof of Andrews and Guggenberger (2017a,b, 2019a,b), we do not provide them
here to avoid overlap.
However, following Andrews and Guggenberger (2019b), we shall determine the probability limits of
certain sample random quantities that appear in the permutation statistics but do not depend on the random
permutation π, under the drifting parameter sequences.
We also remark that the proof of the asymptotic distributions of the permutation test statistics are similar
to that of the asymptotic test statistics and so one can derive the asymptotic distributions of the AR, LM,
CLRa and CLRb statistics proceeding similarly to the derivation of the asymptotic distributions of the robust
permutation statistics provided below by replacing the permutation quantities by their sample counterparts.
The limits of Va(θ0), Vb(θ0) and J . By the triangular array WLLN and the CMT, for s = 1, . . . , d,
n−1∑ni=1 ZiYis = n−1Z ′Y, s = n−1Z ′ZΓs + n−1Z ′V, s = EP [Z∗i Z
∗′i ]Γs + op(1),
D ≡ EP [WiX′i](EP [XiX
′i])−1 −W ′X(X ′X)−1 p−→ 0, (A.35)
(X ′X)−1X ′V, sp−→ 0, (A.36)
n−1X ′up−→ 0. (A.37)
We may rewrite Zi = Wi−W ′X(X ′X)−1Xi = Wi−EP [WiX′i](EP [XiX
′i])−1Xi+DXi = Z∗i +DXi, and Yis =
Z ′iΓs+Vis−X ′i(X ′X)−1X ′V, s = Z∗′i Γs+Vis+X′i(D
′Γs−(X ′X)−1X ′V, s). Note that ‖Z∗i ‖4, ‖Z∗i ‖3‖Xi‖, ‖Z∗i ‖3‖Vi‖,
‖Z∗i ‖2‖Xi‖‖Vi‖, ‖Z∗i ‖2‖Xi‖2, ‖Z∗i ‖2‖Vi‖2, ‖Z∗i ‖‖Xi‖3, ‖Z∗i ‖‖Xi‖2‖Vi‖, ‖Z∗i ‖‖Xi‖‖Vi‖2, ‖Xi‖3‖Vi‖, ‖Xi‖2‖Vi‖2,
38
have finite 1 + δ/4 moments by the Cauchy-Schwarz inequality. By (A.35), (A.36), (A.37), the triangular
array WLLN and the CMT, for s, t = 1, . . . , d
n−1n∑i=1
ZiZ′iYisYit = n−1
n∑i=1
(Z∗i +DXi)(Z∗i +DXi)
′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))
(Z∗′i Γt + Vit +X ′i(D′Γt − (X ′X)−1X ′V, t))
= n−1n∑i=1
Z∗i Z∗′i (Z∗′i Γs + Vis)(Z
∗′i Γt + Vit) + op(1)
= EP [Z∗i Z∗′i (Z∗′i Γs + Vis)(Z
∗′i Γt + Vit)] + op(1). (A.38)
Similarly,
Cs = n−1n∑i=1
ZiZ′iYisui(θ0)
= n−1n∑i=1
(Z∗i +DXi)(Z∗i +DXi)
′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))(ui −X ′i(X ′X)−1X ′u)
= n−1n∑i=1
Z∗i Z∗′i (Z∗′i Γs + Vis)ui + op(1) = EP [Z∗i Z
∗′i (Z∗′i Γs + Vis)ui] + op(1) = Cns + op(1), (A.39)
and
Σ = n−1n∑i=1
ZiZ′iui(θ0)2 = n−1
n∑i=1
(Z∗i +DXi)(Z∗i +DXi)
′(ui −X ′i(X ′X)−1X ′u)2
= EP [Z∗i Z∗′i u
2i ] + op(1) = Σn + op(1) = h5,m + op(1). (A.40)
By (A.38), (A.39), (A.40) and the CMT (see also the equations (27.73) and (27.74) of Andrews and Guggen-
berger (2019b))
Va(θ0) = Vn,a + op(1) = h5 + op(1),
K(Va(θ0)) = K(Vn,a) + op(1) = K(h5) + op(1),
(Σ,K(Va(θ0))) = (h5,m,K(h5)) + op(1),
Ωε(Σ,K(Va(θ0))) = Ωε(h5,m,K(h5)) + op(1) = Ωε(h9) + op(1), (A.41)
where the last equality holds by the continuity of the eigenvalue-adjusted matrix, see Andrews and Guggen-
berger (2019a), p.1717. To determine the limit of Vb(θ0) in (2.19), note that εi(θ0) = [ui(θ0),−Y ′i +
X ′i(X′X)−1X ′Y ]′ and
εi(θ0)−
01×k
−Γ ′
Zi =
ui − u′X(X ′X)−1Xi
−Vi + V ′X(X ′X)−1Xi
= vi − U ′X(X ′X)−1Xi, (A.42)
where vi ≡ [ui,−V ′i ]′ and U ≡ [u,−V ]. Letting Q ≡ (Z ′Z)−1Z ′U , we have
εin(θ0)− [0k×1,−Γ ]′Zi = ε(θ0)′Z(Z ′Z)−1Zi − [0k×1,−Γ ]′Zi = Q′Zi. (A.43)
39
Note here that Q = Op(n−1/2) because n−1Z ′Z = n−1W ′W −n−1W ′X(X ′X)−1X ′W
a.s.−→ Qzz which follows
from EP [Z∗i Z∗′i ]→ Qww −QwxQ−1
xxQxw ≡ Qzz, the WLLN and the CMT, and
n−1/2Z ′U = n−1/2W ′U − n−1W ′X(n−1X ′X)−1n−1/2X ′U = Op(1)
which holds due to the triangular array CLT, the WLLN and the CMT. Rewrite
Vb(θ0) = n−1n∑i=1
(εi(θ0)− εin(θ0))(εi(θ0)− εin(θ0))′
⊗ (ZiZ
′i)
= n−1n∑i=1
(εi(θ0)− [0k×1,−Γ ]′Zi)(εi(θ0)− [0k×1,−Γ ]′Zi)
′⊗ (ZiZ
′i) (A.44)
+ n−1n∑i=1
(εi(θ0)− [0k×1,−Γ ]′Zi)([0k×1,−Γ ]′Zi − εin(θ0))′
⊗ (ZiZ
′i) (A.45)
+ n−1n∑i=1
([0k×1,−Γ ]′Zi − εin(θ0))(εi(θ0)− [0k×1,−Γ ]′Zi)
′⊗ (ZiZ
′i) (A.46)
+ n−1n∑i=1
([0k×1,−Γ ]′Zi − εin(θ0))([0k×1,−Γ ]′Zi − εin(θ0))′
⊗ (ZiZ
′i). (A.47)
Using (A.42), the term (A.44) can be rewritten as
n−1n∑i=1
(εi(θ0)− [0k×1,−Γ ]′Zi)(εi(θ0)− [0k×1,−Γ ]′Zi)
′⊗ (ZiZ
′i)
= n−1n∑i=1
(vi − U ′X(X ′X)−1Xi)(vi − U ′X(X ′X)−1Xi)′ ⊗ (ZiZ
′i)
= n−1n∑i=1
(viv′i − U ′X(X ′X)−1Xiv
′i − viX ′i(X ′X)−1X ′U
+ U ′X(X ′X)−1XiX′i(X
′X)−1X ′U
)⊗ (ZiZ
′i).
Let vij and vil denote the j and l-th elements of vi, j, l = 1, . . . , d + 1. By the Cauchy-Schwarz inequality
EP [‖vijvilZ∗i Z∗′i ‖1+δ/2] ≤ (EP [‖vi‖4+δ])1/2(EP [‖Z∗i ‖4+δ])1/2 < ∞. Similarly, EP [‖vijvilXiX′i‖1+δ/2] < ∞
and EP [‖vijvilZ∗iX ′i‖1+δ/4] <∞. Then, the WLLN and (A.35) yield
n−1n∑i=1
viv′i ⊗ (ZiZ
′i)− EP [viv
′i ⊗ (Z∗i Z
∗′i )]
= n−1n∑i=1
viv′i ⊗ ((Z∗i +DXi)(Z
∗i +DXi)
′)− EP [viv′i ⊗ (Z∗i Z
∗′i )]
p−→ 0. (A.48)
Using the Cauchy-Schwarz inequality again,
EP [‖XijvilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [‖Z∗i ‖4+δ])1/2(EP [X4+δ
ij ])1/4(EP [‖vi‖4+δ])1/4 <∞,
for j = 1, . . . , p and l = 1, . . . , d + 1. Similarly, EP [‖XijvilXiZ∗′i ‖1+δ/4] < ∞ and EP [‖XijvilXiX
′i‖1+δ/4] <
∞. Therefore, by the WLLN and (A.35)
n−1n∑i=1
XijvilZiZ′i − EP [XijvilZ
∗i Z∗′i ]
= n−1n∑i=1
Xijvil(Z∗i +DXi)(Z
∗i +DXi)
′ − EP [XijvilZ∗i Z∗′i ]
p−→ 0,
40
and
n−1n∑i=1
viX′i(X
′X)−1X ′Up−→ 0. (A.49)
By the Cauchy-Schwarz inequality EP [‖XijXilZ∗i Z∗′i ‖1+δ/2] ≤ EP [‖Xi‖4+δ])1/2(EP [‖Z∗i ‖4+δ])1/2 < ∞ and
similarly
EP [‖XijXilXiZ∗′i ‖1+δ/4] <∞ and EP [‖XijXilXiX
′i‖1+δ/2] <∞.
Then, by the WLLN and (A.35)
n−1n∑i=1
U ′X(X ′X)−1XiX′i(X
′X)−1X ′U ⊗ (ZiZ′i)
= n−1n∑i=1
U ′X(X ′X)−1XiX′i(X
′X)−1X ′U ⊗ ((Z∗i +DXi)(Z∗i +DXi)
′)p−→ 0. (A.50)
Combining (A.48), (A.49) and (A.50), we obtain
n−1n∑i=1
(εi − [0k×1,−Γ ]′Zi)(εi − [0k×1,−Γ ]′Zi)
′⊗ (ZiZ
′i)− EP [(viv
′i)⊗ (Z∗i Z
∗′i )]
p−→ 0. (A.51)
We consider the term in (A.45) next. Since by the Cauchy-Schwarz inequality
EP [‖vijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [v4+δij ])1/4(EP [Z∗4+δ
il ])1/4(EP [‖Z∗i ‖4+δ])1/2 <∞,
EP [‖XijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ (EP [X4+δ
ij ])1/4(EP [Z∗4+δil ])1/4(EP [‖Z∗i ‖4+δ])1/2 <∞
and the above moment bounds hold when Z∗i Z∗′i is replaced by XiZ
∗′i and XiX
′i. Using Q = Op(n
−1/2),
(A.35), the WLLN and the CMT, the terms in (A.45) and (A.46) are op(1):
n−1n∑i=1
(εi − [0k×1,−Γ ]′Zi)([0k×1,−Γ ]′Zi − εin)′
⊗ (ZiZ
′i)
= −n−1n∑i=1
(vi(Z∗i +DXi)
′Q− U ′X(X ′X)−1Xi(Z∗i +DXi)
′Q)⊗ ((Z∗i +DXi)(Z∗i +DXi)
′)
p−→ 0. (A.52)
Finally, consider the last term in (A.47). Since E[‖Z∗ijZ∗ilZ∗i Z∗′i ‖1+δ/4] ≤ E[‖Z∗i ‖4+δ] <∞, E[‖Z∗ijZ∗ilXiZ∗′i ‖1+δ/4] <
∞, and E[‖Z∗ijZ∗ilXiX′i‖1+δ/4] <∞, by (A.35) and the WLLN
n−1n∑i=1
([0k×1,−Γ ]′Zi − εin)([0k×1,−Γ ]′Zi − εin)′
⊗ (ZiZ
′i)
= n−1n∑i=1
Q′ZiZ′iQ⊗ (ZiZ
′i) = n−1
n∑i=1
Q′((Z∗i +DXi)(Z∗i +DXi)
′)Q⊗ ((Z∗i +DXi)(Z∗i +DXi)
′)p−→ 0.
(A.53)
41
Therefore, (A.51), (A.52), (A.53) and the CMT together with the continuity of the eigenvalue-adjusted matrix
give
Vb(θ0) = Vn,b + op(1),
K(Vb(θ0)) = K(Vn,b) + op(1),
(Σ,K(Vb(θ0))) = (h5,m,K(V)) + op(1) ≡ h9 + op(1),
Ωε(Σ,K(Vb(θ0))) = Ωε(h9) + op(1). (A.54)
Next we determine the limit of n1/2(J −Gn). Rewrite
n1/2(Gn −Gn) = n−1/2(n−1Z ′Y − EP [Z∗i Z∗′i ]Γ )
= n−1/2Z ′ZΓ − n1/2 EP [Z∗i Z∗′i ]Γ + n−1/2Z ′V
= n−1/2Z ′V + (n−1/2W ′W − n1/2 EP [WiW′i ])Γ
−(n−1/2W ′X − n1/2 EP [WiX
′i])
(X ′X)−1X ′WΓ
− EP [WiX′i](n
−1X ′X)−1(n−1/2X ′W − n1/2 EP [WiX
′i])
Γ
− EP [WiX′i](n1/2(n−1X ′X)−1 − n1/2(EP [XiX
′i])−1)
EP [XiW′i ]Γ . (A.55)
On noting that n−1/2Z ′V = n−1/2W ′V − n−1W ′X(n−1X ′X)−1X ′V = n−1/2Z∗′V + op(1), n−1/2Z ′u =
n−1/2W ′u− n−1W ′X(n−1X ′X)−1X ′u = n−1/2Z∗′u+ op(1), by the multivariate triangular array Lyapunov
CLT
n1/2vec(n−1W ′W − EP [WiW′i ])
n1/2vec(n−1W ′X − EP [WiX′i])
n1/2vec(n−1X ′X − EP [XiX′i])
n−1/2vec(Z ′V )
n−1/2Z ′u
=
n1/2vec(n−1W ′W − EP [WiW′i ])
n1/2vec(n−1W ′X − EP [WiX′i])
n1/2vec(n−1X ′X − EP [XiX′i])
n−1/2vec(Z∗′V )
n−1/2Z∗′u
+ op(1)
d−→
vec(Zww)
vec(Zwx)
vec(Zxx)
vec(Zzv)
Zzu
, (A.56)
where Z∗ = W −X(EP [XiX′i])−1 EP [XiW
′i ] and
Zww ≡ [Zww,1, . . . ,Zww,k], Zww,j ∼ N [0, limn→∞
VarP [WiWij ]], j = 1, . . . , k,
Zwx ≡ [Zwx,1, . . . ,Zwx,p], Zwx,l ∼ N [0, limn→∞
VarP [WiXil]], l = 1, . . . , p,
Zxx ≡ [Zxx,1, . . . ,Zxx,p], Zxx,s ∼ N [0, limn→∞
VarP [XiXis]], s = 1, . . . , p,
Zzv ≡ [Zzw,1, . . . ,Zzv,d], Zzv,r ∼ N [0, limn→∞
EP [Z∗i Z∗′i V
2ir]], r = 1, . . . , d,
Zzu ∼ N [0, limn→∞
EP [Z∗i Z∗′i u
2i ]].
By the triangular array WLLN,
n−1W ′Wp−→ Qww, n
−1W ′Xp−→ Qwx, n
−1X ′Xp−→ Qxx. (A.57)
Next we derive the asymptotic distribution of n1/2((n−1X ′X)−1 − (EP [XiX′i])−1). Let f : vec(A) →
vec(A−1), where A ∈ Rp×p is positive definite. The Jacobian of the mapping evaluated at vec(A) = vec(Qxx)
42
is f = −(A−1)′ ⊗ A−1|A=Qxx = −Q−1xx ⊗Q−1
xx . Since n1/2(vec(n−1X ′X)− vec(EP [XiX′i]))
d−→ vec(Zxx), by
the delta method
n1/2((n−1X ′X)−1 − (EP [XiX′i])−1)
d−→ vec−1p,p(f vec(Zxx))
= −vec−1p,p((Q
−1xx ⊗Q−1
xx )vec(Zxx))
= −Q−1xxZxxQ−1
xx , (A.58)
where vec−1p,p denotes the inverse vec operator with the property that vec−1
p,p(vec(A)) = A. Since n1/2((n−1X ′X)−1−
(EP [XiX′i])−1) = −Q−1
xxn1/2(n−1X ′X − EP [XiX
′i])Q
−1xx + op(1) by (A.56) and (A.58), the convergence in
(A.58) holds jointly with (A.56). Using (A.56)-(A.58) in (A.55) and invoking the Slutsky’s lemma,
n1/2(G−Gn)d−→ Gh ≡ (Zww −ZwxQ−1
xxQxw −QwxQ−1xxZxw +QwxQ
−1xxZxxQ−1
xxQxw)Γ + Zzv. (A.59)
Then, from (A.39), (A.40), (A.56), (A.59) and the CMT
n1/2vec(J −Gn) = n−1/2n∑i=1
vec(ZiY′i − EP [Z∗i Z
∗′i ]Γ )− [C ′1, . . . , C
′d]′Σ−1n1/2m
= vec((Zww −ZwxQ−1
xxQxw −QwxQ−1xxZxw +QwxQ
−1xxZxxQ−1
xxQxw)Γ + Zzv)
− [C ′P1, . . . , C′Pd]′Σ−1P Zzu + op(1),
= vec(Gh)− [C ′P1, . . . , C′Pd]′Σ−1P Zzu + op(1). (A.60)
A further result in Lemma 16.4 of Andrews and Guggenberger (2019b) is as follows:
n1/2[m, J −Gn, HnJUnTn]dπ−→ [mh, Jh,∆h], (A.61)
where (Jh,∆h) and mh are independent, mh
vec(Jh)
∼ N0k×(d+1),
h5,m 0k×dk
0k×dk Φh
, Φh ≡ limn→∞
(ΣGn − C ′nΣ−1n Cn)
whenever the limit exists, and
∆h ≡ [∆h,q,∆h,d−q] ∈ Rk×d, ∆h,q ≡ h3,q ∈ Rk×q, ∆h,d−q ≡ h3h1,d−q + h−1/25,m Jhh91h2,d−q ∈ Rk×(d−q),
h1,d−q ≡
0q×(d−q)
diag(h1,q+1, . . . , h1,d)
0(k−d)×(d−q)
, h91 ≡ ((θ0, Id)Ωε(h9)(θ0, Id)
′)1/2, Ω = Ω(Σ,K(V)). (A.62)
A.5 Asymptotic distributions of key permutation quantities
In this section, we derive the asymptotic distributions of key quantities that underlie the permutation test
statistics under the drifting sequence of parameters.
43
Conditioning events. We first describe the events on which we condition when deriving the asymptotic
results for the permutation tests. From (A.60), for any ε > 0 we can find a constant U0 and n0 ∈ N such
that for all n ≥ n0.
Ω1 ≡ ω ∈ Ω0 : ‖n1/2(J −Gn)‖ < U0, P [Ωc1] < ε/2. (A.63)
We shall establish that for PR ∈ PAR1,PAR2,PLM,PCLRa,PCLRb and all ω ∈ Ω1
supx∈R|Pπ[PR ≤ x]− P [PR∞ ≤ x]| p−→ 0, (A.64)
where PR∞ is the limiting random variable. When (A.64) holds, for any ε, ε1 > 0, there exists n1 ∈ N
such that P [supx∈R |Pπ[PR ≤ x] − P [PR∞ ≤ x]| > ε1] ≤ ε/2 for all ω ∈ Ω1 and n ≥ n1. Then, by the
Boole-Bonferroni inequality, for all n ≥ maxn0, n1
P [ω ∈ Ω0 : |Pπ[PR ≤ x]− P [PR∞ ≤ x]| > ε1]
≤ P [ω ∈ Ω1 : |Pπ[PR ≤ x]− P [PR∞ ≤ x]| > ε1] + P [Ωc1]
≤ ε. (A.65)
Since ε > 0 is arbitrary, it will follow that
supx∈R|Pπ[PR ≤ x]− P [PR∞ ≤ x]| p−→ 0. (A.66)
This means that we can use the distributional results such as that in (A.60) in the conditioning events to
obtain the convergence in Pπ-distribution in P -probability.
The following lemma is a permutation analog of Lemma 16.4 (see also (A.61)) of Andrews and Guggen-
berger (2019b).
Lemma A.6. Under all sequences λn ∈ Λ : n ≥ 1,
n1/2[mπ, Jπ − G,HnJπUnTn]
dπ−→ [mπh, J
πh ,∆
πh] in P -probability,
where
(a) mπh
vec(Jπh )
∼ N0k×(d+1),
σ2Qzz 0k×dk
0k×dk Φπh
, Φπh ≡ (ΣV − ΣV u(σ2)−1ΣuV )⊗Qzz,
(b)
∆πh ≡ [∆π
h,q,∆πh,d−q] ∈ Rk×d, ∆π
h,q ≡ h3,q ∈ Rk×q,
∆πh,d−q ≡ h3h1,d−q + h
−1/25,m (Gh + Jπh )h91h2,d−q ∈ Rk×(d−q),
where Gh is defined in (A.59), and h1,d−q and h91 are defined in (A.62),
(c) (Jπh ,∆πh) and mπ
h are independent,
44
(d) When Un ≡ Id (as opposed to UP ≡((θ0, Id)Ω
ε(ΣP ,VP )−1(θ0, Id)′)1/2 defined in Section A.3) with the
eigenvalues, the orthogonal eigenvectors and the SV’s of the matrices (A.5)-(A.7) defined accordingly
as in Section A.3, the above results also hold with ∆πh,d−q ≡ h3h1,d−q + h
−1/25,m (Gh + Jπh )h2,d−q where
Gh is as defined in (A.59), and h1,d−q is as defined in (A.62),
(e) Under all subsequences wn and all sequences λwn,n : n ≥ 1 ∈ Λ, the convergence results above hold
with n replaced with wn.
Proof of Lemma A.6. The proof is divided into three main steps. In the first step, we prove the multivariate
normality of some key permutation quantities. In the second step, we prove the consistency of the covariance
matrix estimates. The final step shows the desired result of the lemma.
Step 1: Multivariate normality. By the triangular array SLLN (Lemma A.2), n−1W ′W−EP [WiW′i ]
a.s.−→
0, n−1X ′W − EP [XiW′i ]
a.s.−→ 0 and n−1X ′X − EP [XiX′i]
a.s.−→ 0, and by (A.23) EP [WiW′i ] → Qww,
EP [XiW′i ]→ Qxw, and EP [XiX
′i]→ Qxx. Thus,
EP [Z∗i Z∗′i ]→ Qww −QwxQ−1
xxQxw ≡ Qzz, (A.67)
and by the CMT
n−1Z ′Z = n−1W ′W − n−1W ′X(X ′X)−1X ′Wa.s.−→ Qzz. (A.68)
Note that n1/2(Jπs − n−1Z ′ZΓs) = n1/2Z ′Vπ,s − Cπs Σπn1/2mπ, s = 1, . . . , d, where Γs = (Z ′Z)−1Z ′Y, s and
Vπ,s is the s-th column of Vπ. Next we will show that n−1/2Z ′Mιuπ(θ0)
n−1/2vec(Z ′MιVπ)
dπ−→
mπh
Gπh
∼ N [0,Σe ⊗Qzz] P -almost surely. (A.69)
Using the Cramer-Wold device, it suffices to prove that for any [t′0, t′1, . . . , t
′d]′ with tl ∈ Rk, l = 0, . . . , d
t′
n−1/2Z ′Mιuπ(θ0)
n−1/2vec(Z ′MιVπ)
dπ−→ N [0, t′ (Σe ⊗Qzz) t] P -almost surely. (A.70)
Rewrite
t′
n−1/2Z ′Mιuπ(θ0)
n−1/2vec(Z ′MιVπ)
= n−1/2n∑i=1
(t′0Ziuπ(i)(θ0) + t′1ZiVπ(i)1 + · · ·+ t′dZiVπ(i)d
)
= n−1/2n∑i=1
Z ′i[t0, t1, . . . , td][uπ(i)(θ0), Vπ(i)1, . . . , Vπ(i)d]′
= n−1/2n∑i=1
Z ′iζπ(i),
45
where ζi ≡ [t0, t1, . . . , td][ui(θ0), Vi1, . . . , Vid]′, i = 1, . . . , n. We verify the condition (A.2) of Lemma A.1 with
cn(i, j) ≡ n−1/2Z ′iζj . Since
EPπ
[n−1/2
n∑i=1
t′0Ziuπ(i)(θ0)|y, Y,W,X
]= n1/2
(n−1
n∑i=1
t′0Zi
)(n−1
n∑i=1
ui(θ0)
)= 0,
EPπ
[n−1/2
n∑i=1
t′0ZiVπ(i)l|y, Y,W,X
]= n1/2
(n−1
n∑i=1
t′0Zi
)(n−1
n∑i=1
Vil
)= 0, l = 1, . . . , d,
we have
EPπ
[n−1/2
n∑i=1
t′0Ziuπ(i)(θ0)|y, Y,W,X
]= n1/2
(n−1
n∑i=1
t′0Zi
)(n−1
n∑i=1
ui(θ0)
)= 0,
EPπ
[n−1/2
n∑i=1
t′0ZiVπ(i)l|y, Y,W,X
]= n1/2
(n−1
n∑i=1
t′0Zi
)(n−1
n∑i=1
Vil
)= 0, l = 1, . . . , d,
and gn(i, j) = cn(i, j) in Lemma A.1. Next we will show that
n−1e′e = n−1n∑i=1
eie′i = n−1
u(θ0)′u(θ0) u(θ0)′V
V ′u(θ0) V ′V
a.s.−→ Σe, (A.71)
or equivalently
n−1V ′Va.s.−→ ΣV , (A.72)
n−1V u(θ0)a.s.−→ ΣV u, (A.73)
n−1n∑i=1
ui(θ0)2 a.s.−→ σ2. (A.74)
By the SLLN (Lemma A.2), n−1W ′V −EP [WiV′i ]
a.s.−→ 0, hence n−1W ′Va.s.−→ 0 and similarly n−1X ′V
a.s.−→ 0.
Combining these with n−1W ′Xa.s.−→ Qwx and n−1X ′X
a.s.−→ Qxx, and using the CMT,
n−1Z ′V = n−1W ′V − n−1W ′X(n−1X ′X)−1n−1X ′Va.s.−→ 0. (A.75)
Thus, on noting that V = Y − ZΓ −Xξ = (In − PZ − PX)V , by the SLLN and CMT
n−1V ′V − EP [ViV′i ] = n−1V ′V − n−1V ′PZV − n−1V ′PXV − EP [ViV
′i ]
a.s.−→ 0
and EP [ViV′i ] → ΣV , we have n−1V ′V
a.s.−→ ΣV which verifies (A.72). Analogously to (A.75), n−1Z ′ua.s.−→ 0
and n−1X ′ua.s.−→ 0. Then, (A.73) follows using (A.68) and on noting that
n−1V u(θ0)− EP [Viui] = n−1V ′u(θ0)− n−1V ′PZ u(θ0)− n−1V ′PX u(θ0)− EP [Viui]
= n−1V ′u− n−1V ′X(X ′X)−1X ′u− EP [Viui]− n−1V ′Z(Z ′Z)−1Z ′u
a.s.−→ 0,
46
and EP [Viui]→ ΣV u. (A.74) holds because n−1∑ni=1 ui(θ0)2−EP [u2
i ] = n−1u′u−EP [u2i ]−n−1u′X(X ′X)−1X ′u
a.s.−→
0 by the SLLN and CMT, and σ2n ≡ EP [u2
i ]→ σ2. Furthermore, note that
D ≡ 1
n− 1
n∑i=1
n∑j=1
g2n(i, j)
=1
n(n− 1)
n∑i=1
n∑j=1
ζ ′iZjZ′jζi
=1
n− 1
n∑i=1
ζ ′i
n−1n∑j=1
ZjZ′j
ζi
=1
n− 1
n∑i=1
ζ ′i(Qzz + oa.s.(1))ζi
=1
n− 1
n∑i=1
ζ ′iQzzζi +1
n− 1
n∑i=1
ζ ′iζioa.s.(1)
=1
n− 1
n∑i=1
t′(ei ⊗ Ik)Qzz(e′i ⊗ Ik)t+
1
n− 1
n∑i=1
ζ ′iζioa.s.(1)
=1
n− 1
n∑i=1
t′(eie′i ⊗Qzz)t+ oa.s.(1)
= t′(Σe ⊗Qzz)t+ oa.s.(1), (A.76)
where the third equality follows from (A.68), the second to the last equality uses (A.71) and the fact that
n−1∑ni=1 ζ
′iζi = n−1
∑ni=1 t
′(eie′i⊗ Ik)t
a.s.−→ t′(Σe⊗ Ik)t which follows from (A.71) and the Slutsky’s lemma.
By the triangular array SLLN,
n−1n∑i=1
‖Vi‖2+δ = Oa.s.(1), n−1n∑i=1
‖Xi‖2+δ = Oa.s.(1), n−1n∑i=1
‖Wi‖2+δ = Oa.s.(1), (A.77)
n−1n∑i=1
‖Zi‖2+δ ≤ 21+δ
(n−1
n∑i=1
‖Wi‖2+δ + ‖Xi‖1+δ‖(X ′X)−1X ′W‖2+δ
)= Oa.s.(1). (A.78)
By the cr inequality and the triangular array SLLN,
n−1n∑i=1
|ui(θ0)|2+δ ≤ 21+δn−1n∑i=1
|ui|2+δ + 21+δn−1n∑i=1
‖Xi‖2+δ‖(X ′X)−1X ′u‖2+δ
= Oa.s.(1). (A.79)
Since V = (In−PZ−PX)V , Vi = Vi−Z ′i(Z ′Z)−1Z ′V −X ′i(X ′X)−1X ′V , and by Jensen’s inequality (applied
with the function f(x) = x2+δ, x > 0), (A.77) and (A.78)
n−1n∑i=1
‖Vi‖2+δ ≤ 31+δ
(n−1
n∑i=1
‖Vi‖2+δ + n−1n∑i=1
‖Zi‖2+δ‖(n−1Z ′Z)−1n−1Z ′V ‖2+δ
+ n−1n∑i=1
‖Xi‖2+δ‖(n−1X ′X)−1n−1X ′V ‖2+δ
)= Oa.s.(1). (A.80)
47
By the cr inequality, (A.79) and (A.80)
n−1n∑i=1
‖ζi‖2+δ ≤ 21+δ‖t0‖2+δn−1n∑i=1
|ui(θ0)|2+δ + 21+δ‖[t1, . . . , td]‖2+δn−1n∑i=1
‖Vi‖2+δ
= Oa.s.(1). (A.81)
We show that for D defined in (A.76) and any ε > 0
limn→∞
n−2n∑i=1
n∑j=1
|Z ′iζj |2
D1
(n−1|Z ′iζj |2
D> ε2
)= 0. (A.82)
Note that
n−2n∑i=1
n∑j=1
|Z ′iζj |2
D1
(n−2|Z ′iζj |2
D> ε2
)= n−2D−1
n∑i=1
n∑j=1
|Z ′iζj |2+δ
|Z ′iζj |δ1(|Z ′iζj |δ > εδDδ/2nδ/2
)≤ n−δ/2ε−δD−1−δ/2n−2
n∑i=1
n∑j=1
|Z ′iζj |2+δ
≤ n−δ/2ε−δD−1−δ/2n−1n∑i=1
‖Zi‖2+δn−1n∑i=1
‖ζi‖2+δ
a.s.−→ 0, (A.83)
where the convergence uses (A.78) and (A.81). Thus, the Lindeberg-type condition in (A.2) is satisfied.
Lemma A.1 and (A.76) yield (A.70).
Step 2: Consistency of the covariance matrix estimates. Next we determine the limits of
Cπs , s = 1, . . . , d, and Σπ. Note first that by Lemma S.3.4 of DiCiccio and Romano (2017) (see also Hoeffding
(1951)),
EPπ [Σπ|y, Y,W,X] = EPπ
[n−1
n∑i=1
ZiZ′iuπ(i)(θ0)2|y, Y,W,X
]
= (n−1Z ′Z)(n−1n∑i=1
ui(θ0)2)a.s.−→ σ2Qzz, (A.84)
where the convergence follows from the CMT, (A.68) and (A.74). On noting that n−1X ′up−→ 0 and
n−1∑ni=1 u
4i − EP [u4
i ]p−→ 0, by Minkowski’s inequality, the triangular array WLLN and the CMT,
n−1n∑i=1
‖Zi‖4 = n−1n∑i=1
‖Wi −W ′X(X ′X)−1Xi‖4
≤ 8n−1n∑i=1
‖Wi‖4 + ‖W ′X(X ′X)−1‖48n−1n∑i=1
‖Xi‖4
= Op(1), (A.85)
n−1n∑i=1
ui(θ0)4 ≤ 8n−1n∑i=1
u4i + 8n−1
n∑i=1
‖Xi‖4‖(X ′X)−1X ′u‖4 = Op(1), (A.86)
hence
n−1n∑i=1
Z2ijZ
2il ≤
(n−1
n∑i=1
Z4ij
)1/2(n−1
n∑i=1
Z4il
)1/2
= Op(1) j, l = 1, . . . , k. (A.87)
48
Therefore, letting Σπjl denote the (j, l), j, l = 1, . . . , k, element of Σπ, (A.86) and (A.87) give
VarPπ [Σπjl|y, Y,W,X] = VarPπ
[n−1
n∑i=1
ZijZiluπ(i)(θ0)2|y, Y,W,X
]
=1
n− 1
(n−1
n∑i=1
Z2ijZ
2il − (n−1
n∑i=1
ZijZil)2
)(n−1
n∑i=1
ui(θ0)4 − (n−1n∑i=1
ui(θ0)2)2
)
≤ 1
n− 1
(n−1
n∑i=1
Z2ijZ
2il
)(n−1
n∑i=1
ui(θ0)4
)p−→ 0, (A.88)
where the second equality follows from Lemma S.3.4 of DiCiccio and Romano (2017). Combining (A.84) and
(A.88), and using Chebyshev’s inequality, Pπ[|Σπjl−EPπ [Σjl|y, Y,W,X]| > ε] ≤ VarPπ [Σπjl|y, Y,W,X]/ε2p−→
0. Thus Σπ − EPπ [Σπ|y, Y,W,X]pπ−→ 0 in P -probability. Combining this with (A.84) and using the triangle
inequality, ‖Σπ−σ2Qzz‖ ≤ ‖EPπ [Σπ|y, Y,W,X]−σ2Qzz‖+‖Σπ−EPπ [Σπ|y, Y,W,X]‖ pπ−→ 0 in P -probability,
hence
n−1n∑i=1
ZiZ′iuπ(i)(θ0)2 pπ−→ σ2Qzz in P -probability. (A.89)
Furthermore, by (A.68), (A.73) and the CMT, for s = 1, . . . , d
EPπ [Cπs |y, Y,W,X] = EPπ
[n−1
n∑i=1
ZiZ′iVπ(i)suπ(i)(θ0)|y, Y,W,X
]= (n−1Z ′Z)(n−1
n∑i=1
Visui(θ0))a.s.−→ QzzΣ
V us ,
(A.90)
where ΣV us denotes the s-th element of ΣV u. Since Vi = Vi − Z ′i(Z ′Z)−1Z ′V − X ′i(X ′X)−1X ′V , applying
Minkowski’s inequality twice and using n−1∑ni=1 ‖Vi‖4 = Op(1) and n−1
∑ni=1 ‖Xi‖4 = Op(1) which hold
by the triangular array WLLN,
n−1n∑i=1
‖Vi‖4 ≤ 8n−1n∑i=1
‖Vi‖4 + 64n−1n∑i=1
‖Zi‖4‖(n−1Z ′Z)−1n−1Z ′V ‖4
+ 64n−1n∑i=1
‖Xi‖4‖(n−1X ′X)−1n−1X ′V ‖4 = Op(1). (A.91)
Then,
VarPπ [Cπs |y, Y,W,X]
= VarPπ
[n−1
n∑i=1
ZijZilVπ(i)suπ(i)(θ0)|y, Y,W,X
]
=1
n− 1
n−1n∑i=1
Z2ijZ
2il −
(n−1
n∑i=1
ZijZil
)2n−1
n∑i=1
V 2isui(θ0)2 −
(n−1
n∑i=1
Visui(θ0)
)2
p−→ 0, (A.92)
where the convergence follows from (A.68), (A.73), (A.87), and
n−1n∑i=1
V 2isui(θ0)2 ≤
(n−1
n∑i=1
V 4is
)1/2(n−1
n∑i=1
ui(θ0)4
)1/2
= Op(1), (A.93)
49
which in turn holds by (A.86) and (A.91). Therefore, from (A.90), (A.92) and Chebyshev’s inequality
Cπs = n−1n∑i=1
ZiZ′iVπ(i)uπ(i)(θ0)
pπ−→ QzzΣV us in P -probability. (A.94)
Step 3: Completing the proof. Let Gπh = [Gπh,1, . . . , Gπh,d]. Using (A.69), (A.89) and (A.94) along
with the CMT, for s = 1, . . . , d
n1/2Jπs − n−1/2Z ′ZΓs
= n−1/2Z ′MιVπ,s −
(n−1
n∑i=1
ZiZ′iVπ(i)suπ(i)(θ0)
)(n−1
n∑i=1
ZiZ′iuπ(i)(θ0)2
)−1
n−1/2Z ′uπ(θ0)
dπ−→ Gπh,s − ΣV us (σ2)−1mπh in P -probability. (A.95)
This proves the part (a). Next we determine the limit of HnJπUnTn following Andrews and Guggenberger
(2019b). Write
n1/2HnJπUnTn = n1/2HnGnUnTn + n1/2Hn(G−Gn)UnTn + n1/2Hn(Jπ − G)UnTn. (A.96)
Using (A.26), (A.27), (A.28), (A.29), (A.32) and the SVD HnGnUn = AnΥnB′n, the first term on the
right-hand side of the equation (A.96) can be expressed as
n1/2HnGnUnTn = n1/2HnGnUnBnSn
= n1/2HnGnUn(Bn,q, Bn,d−q)
Υ−1n,qn
−1/2 0q×(d−q)
0(d−q)×q Id−q
= [HnGnUnBn,qΥ
−1n,q, n
1/2HnGnUnBn,d−q]
= [AnΥnB′nBn,qΥ
−1n,q, n
1/2HnGnUnBn,d−q]
= [An,q, n1/2HnGnUnBn,d−q], (A.97)
where we used ΥnB′nBn,qΥ
−1n,q = ΥnB
′nBn[Iq, 0q×(d−q)]
′Υ−1n,q = [Iq, 0q×(d−q)]
′. For the second sub-matrix in
(A.97),
n1/2HnGnUnBn,d−q = n1/2AnΥnB′nBn,d−q
= n1/2AnΥn[0(d−q)×q, Id−q]′
= An[0(d−q)×q, n1/2Υn,d−q, 0(d−q)×(k−d)]
′
→ h3[0(d−q)×q,diag(h1,q+1, . . . , h1,d), 0(d−q)×(k−d)]′
= h3h1,d−q, (A.98)
where h1,d−q is defined in (A.62). Furthermore,
n1/2Hn(G−Gn)UnTn = n1/2Hn(G−Gn)UnBnSn
= n1/2Hn(G−Gn)Un[Bn,q, Bn,d−q]
Υ−1n,qn
−1/2 0q×(d−q)
0(d−q)×q Id−q
= [Hn(G−Gn)UnBn,qΥ
−1n,q, n
1/2Hn(G−Gn)UnBn,d−q]. (A.99)
50
Using Hn → Σ−1/2, (A.59), Un → h91 where h91 is defined in (A.62), Bn,q → h3,q, and n1/2τjn →∞ for all
j ≤ q,
Hn(G−Gn)UnBn,qΥ−1n,q = Hnn
1/2(G−Gn)UnBn,q(n1/2Υn,q)
−1 = op(1). (A.100)
Since Bn,d−q → h2,d−q,
n1/2Hn(G−Gn)UnBn,d−qd−→ Σ−1/2Ghh91h2,d−q. (A.101)
Using (A.100) and (A.101) in (A.99),
n1/2Hn(G−Gn)UnTnd−→ [0k×q,Σ
−1/2Ghh91h2,d−q]. (A.102)
Moreover, rewrite the the third term on the right-hand side of the equation (A.96) as
n1/2Hn(Jπ − G)UnTn = n1/2Hn(Jπ − G)UnBnSn (A.103)
= n1/2Hn(Jπ − G)Un[Bn,q, Bn,d−q]
Υ−1n,qn
−1/2 0q×(d−q)
0(d−q)×q Id−q
= [Hn(Jπ − G)UnBn,qΥ
−1n,q, n
1/2Hn(Jπ − G)UnBn,d−q]. (A.104)
In addition, by the part (a) of Lemma A.6, n1/2(Jπ − G)dπ−→ Jπh in P -probability. Using an argument
analogous to those in (A.100) and (A.101) and Lemma A.3,
Hn(Jπ − G)UnBn,qΥ−1n,q = opπ (1) in P -probability, (A.105)
n1/2Hn(Jπ − G)UnBn,d−qdπ−→ Σ−1/2Jπhh91h2,d−q in P -probability. (A.106)
Using (A.105) and (A.106) in (A.104),
n1/2Hn(Jπ − G)UnTndπ−→ [0k×q,Σ
−1/2Jπhh91h2,d−q] in P -probability. (A.107)
Combining (A.97), (A.98), (A.102), (A.107), and using An,q → h3,q = ∆πh,q and an argument similar to that
in (A.66) in (A.96), we obtain
n1/2HnJπUnTn
dπ−→ (∆h,q, h3h1,d−q + Σ−1/2(Gh + Jπh )h91h2,d−q) in P -probability.
This proves the part (b). Since ∆πh,d−q is a deterministic function of Jπh , the part (c) follows. The part (d)
follows similarly to (c). The part (e) is also analogous to the parts above.
Define
J+ ≡ [J , H−1Σπ−1/2mπ] ∈ Rk×(d+1), (A.108)
U+ ≡
U 0d×1
01×d 1
∈ R(d+1)×(d+1), U+n ≡
Un 0d×1
01×d 1
∈ R(d+1)×(d+1), (A.109)
h+91 ≡
h91 0d×1
01×d 1
∈ R(d+1)×(d+1), B+n ≡
Bn 0d×1
01×d 1
∈ R(d+1)×(d+1), (A.110)
51
where h91 is as defined in (A.62). Also, let
B+n = [B+
n,q, B+n,d+1−q], B
+n,q ∈ R(d+1)×q, B+
n,d+1−q ∈ R(d+1)×(d+1−q),
J+n ≡ [Gn, 0k×1] ∈ Rk×(d+1), Υ+
n ≡ [Υn, 0k×1] ∈ Rk×(d+1),
S+n ≡ diag((n1/2τ1n)−1, . . . , (n1/2τqn)−1, 1, . . . , 1) =
Sn 0d×1
01×d 1
∈ R(d+1)×(d+1).
Let κ+1 ≥ · · · ≥ κ
+d+1 ≥ 0 denote the eigenvalues of
nU+′J+′H ′HJ+U+. (A.111)
Next we state a result analogous to Theorem 16.6 of Andrews and Guggenberger (2019b).
Theorem A.7 (Asymptotic distribution of the PCLR statistic). Under all sequences λn,h : n ≥ 1 with
λn,h ∈ Λ,
PCLRdπ−→ PCLR∞ ≡ Zπ′Zπ − λmin
[(∆h,d−q,Zπ)′h3,k−qh
′3,k−q(∆h,d−q,Zπ)
]in P -probability,
where Zπ ≡ (σ2Qzz)−1/2mπ
h ∼ N [0, Ik], and the convergence holds jointly with the convergence in Lemma A.6.
If q = d, PCLR∞ = Zπ′h3,dh′3,dZπ ∼ χ2
d. Under all subsequences wn and all sequences λwn,h : n ≥ 1
with λwn,h ∈ Λ, the above results hold with n replaced with wn.
Proof of Theorem A.7. By Lemma A.6, n1/2mπ dπ−→ mπh ∼ N [0k×1, σ
2Qzz] in P -probability and by (A.89)
Σπpπ−→ σ2Qzz in P -probability. Thus, by Slutsky’s lemma
PAR2dπ−→ mπ′
h (σ2Qzz)−1mπ
h = Zπ′Zπ ∼ χ2k in P -probability. (A.112)
By definition, HJ+U+ = [HJU , Σπ−1/2mπ]. As defined in (A.111), the eigenvalues of nU+′J+′H ′HJ+U+
are κ+1 , . . . , κ
+d+1. Then,
κ+d+1 = λmin[n(HJU , Σπ−1/2mπ)′(HJU , Σπ−1/2mπ)] = λmin[nU+′J+′H ′HJ+U+].
Now premultiply the determinantal equation
|nU+′J+′H ′HJ+U+ − κId+1| = 0
by |S+′n B+′
n U+′n (U+)−1′| and postmultiply by |(U+)−1U+
n B+n S
+n | to obtain
|Q+(κ)| = 0, Q+(κ) ≡ nS+′n B+′
n U+′n J+′H ′HJ+U+
n B+n S
+n − κS+′
n B+′n U
+′n (U+)−1′(U+)−1U+
n B+n S
+n .
(A.113)
Since |S+n |, |B+
n |, |U+n | > 0 and |U+| > 0 with Pπ-probability approaching 1 (wppa 1), the eigenvalues
κ+j , 1 ≤ j ≤ d+ 1 above also solve |Q+(κ)| = 0 wppa 1.
Letting Sn,q ≡ diag((n1/2τ1n)−1, . . . , (n1/2τqn)−1) = (n1/2Υn,q)−1 ∈ Rq×q,
B+n S
+n = [B+
n,qSn,q, B+n,d+1−q]. (A.114)
52
Thus, HJ+U+n B
+n S
+n = [HJ+U+
n B+n,qSn,q, HJ
+U+n B
+n,d+1−q], n
1/2HnJ+U+
n B+n,qSn,q = n1/2HnJUnBn,qSn,q,
and by the SVD,HnJnUn = AnΥnB′n, where Υn = [diag(τ1n, . . . , τdn), 0d×(k−d)]
′. SinceAn = [An,q, An,k−q], An,q ∈
Rk×q, An,k−q ∈ Rk×(k−q),
n1/2HJ+U+n B
+n,qSn,q = n1/2HH−1
n HnJUnBn,qSn,q
= n1/2HH−1n HnGnUnBn,qSn,q + n1/2HH−1
n Hn(J −Gn)UnBn,qSn,q
= HH−1n AnΥnB
′nBn[Iq, 0q×(d−q)]
′Υ−1n,q + n1/2HH−1
n Hn(J −Gn)UnBn,q(n1/2Υn,q)
−1
= An[Iq, 0q×(d−q)]′ + opπ (1)
= An,q + opπ (1)
pπ−→ h3,q = ∆h,q in P -probability, (A.115)
where the third equality holds by the definitions of Sn,q,Υn and Υn,q, the fourth equality holds by B′nBn = Id,
J − Gn = Op(n−1/2) (shown in (A.60)), HH−1
np−→ Ik which in turn holds by (A.40), Hn → Σ−1/2 and
the CMT, ‖Hn‖ < ∞, Un = O(1) by definition, Bn = O(1) by the orthogonality of Bn, and n1/2τjn → ∞
for all j ≤ q, and the convergence uses the definition of ∆h,q and Lemma A.3. Using (A.60), (A.98),
Bn,d−q → h2,d−q, Hn → h−1/25,m and Un → h91,
n1/2HnJUnBn,d−q = n1/2HnGnUnBn,d−q + n1/2Hn(J −Gn)UnBn,d−q
d−→ h3h1,d−q + h−1/25,m Jhh91h2,d−q = ∆h,d−q. (A.116)
Then, under H0 : θ = θ0, using (A.116) and an argument analogous to that in (A.65)
n1/2HnJ+U+
n B+n,d+1−q = [n1/2HnJUn, HnH
−1Σπ−1/2mπ]B+n,d+1−q
= [n1/2HnJUnBn,d−q, n1/2HnH
−1Σπ−1/2mπ]
dπ−→ [∆h,d−q,Zπ] in P -probability, (A.117)
where the convergence holds by Slutsky’s lemma, Σπ−1/2mπ dπ−→ Zπ in P -probability, HH−1n
pπ−→ Ik in
P -probability and (A.116). Define
E+ =
E+1 E+
2
E+′2 E+
3
≡ B+′n U
+′n (U+)−1′(U+)−1U+
n B+n − Id+1, (A.118)
where E+1 ∈ Rq×q, E+
2 ∈ Rq×(d+1−q) and E+3 ∈ R(d+1−q)×(d+1−q). By the CMT, (A.41) and (A.54),
U − Unp−→ 0 with Un positive definite, hence (U+)−1U+
np−→ Id+1. Using the CMT again in conjunction
with B+′n B
+n = Id+1, we have E+ p−→ 0(d+1)×(d+1) and by Lemma A.3
E+ pπ−→ 0(d+1)×(d+1) in P -probability. (A.119)
Using (A.115), (A.117), (A.118) and the fact that ∆′h,q∆h,q = h′3,qh3,q = limn→∞A′n,qAn,q = Iq in (A.113)
53
yields
Q+(κ)
=
nS′n,qB+′n,qU
+′n J+′H ′HJ+U+
n B+n,qSn,q nS′n,qB
+′n,qU
+′n J+′H ′HJ+U+
n B+n,d+1−q
nB+′n,d+1−qU
+′n J+′H ′HJ+U+
n B+n,qSn,q nB+′
n,d+1−qU+′n J+′H ′HJ+U+
n B+n,d+1−q
− κS+′n (E+ + Id+1)S+
n
=
Iq + opπ (1)− κSn,qE+1 Sn,q − κS2
n,q n1/2A′n,qHnJ+U+
n B+n,d+1−q + opπ (1)− κSn,qE+
2
n1/2B+′n,d+1−qU
+′n J+′HnAn,q + opπ (1)− κE+′
2 Sn,q nB+′n,d+1−qU
+′n J+′H ′nHnJ
+U+n B
+n,d+1−q − κE
+3 − κ
.If q = 0, B+
n = B+n,d+1−q and
nB+′n U
+′J+H ′nHnJ+n U
+B+n = nB+′
n ((U+n )−1U+)′(B+
n )−1′B+′n U
+′n J+′H ′n(HH−1
n )′
(HH−1n )(HnJ
+U+n B
+n )(B+
n )−1((U+n )−1U+)B+
n
dπ−→ (∆h,d−q, (σ2Qzz)
−1/2mπh)′(∆h,d−q, (σ
2Qzz)−1/2mπ
h),
where the convergence uses (A.117), (U+n )−1U+ pπ−→ Id+1 and HH−1
npπ−→ Ik in P -probability. Using the
CMT in conjunction with the fact that the smallest eigenvalue of a matrix is a continuous function of the
matrix, and h3,k−qh′3,k−q = h′3h3 = Ik, we obtain the result stated in the theorem.
Next, we consider the case q ≥ 1. Using the formula of partioned matrix determinant and provided
Q+1 (κ) is nonsingular, |Q+(κ)| = |Q+
1 (κ)||Q+2 (κ)|, where
Q+1 (κ) ≡ Iq + opπ (1)− κSn,qE+
1 Sn,q − κS2n,q,
Q+2 (κ) ≡ nB+′
n,d+1−qU+′n J+′H ′nHnJ
+U+n B
+n,d+1−q − κE
+3 − κId+1−q + opπ (1)
− [n1/2B+′n,d+1−qU
+′n J+′HnAn,q + opπ (1)− κE+′
2 Sn,q](Iq + opπ (1)− κSn,qE+1 Sn,q − κS2
n,q)−1
[n1/2A′n,qHnJ+U+
n B+n,d+1−q + opπ (1)− κSn,qE+
2 ]. (A.120)
Moreover, by Lemma A.8 and E+1 = opπ (1),
κ+j S
2n,q = opπ (1), κ+
j Sn,qE+1 Sn,q = opπ (1), j = q + 1, . . . , d+ 1, (A.121)
in P -probability. Thus,
Q+1 (κj) = Iq + opπ (1)− κ+
j Sn,qE+1 Sn,q − κ
+j S
2n,q = Iq + opπ (1),
in P -probability, and so
|Q+1 (κj)| 6= 0, j = q + 1, . . . , d+ 1, wppa 1 in P -probability. (A.122)
Furthermore,
E+3 = opπ (1), (A.123)
n1/2HnJ+U+
n B+n,d+1−q = Opπ (1), (A.124)
Sn,q = o(1), (A.125)
E+2 = opπ (1), (A.126)
κ+j E
+′2 Sn,q(Iq + opπ (1))Sn,qE
+2 = κ+
j E+′2 S2
n,qE+2 + κ+
d+1E+′2 S2
n,qE+2 opπ (1) = opπ (1), (A.127)
54
where (A.123) and (A.126) follow from (A.119), (A.124) holds by (A.117), and (A.127) uses (A.121) and
(A.126). Therefore, using the results in the previous display
Q+2 (κ+
j ) = nB+′n,d+1−qU
+′n J+′H ′nHnJ
+U+n B
+n,d+1−q + opπ (1)
− [n1/2B+′n,d+1−qU
+′n J+′HnAn,q + opπ (1)](Iq + opπ (1))[n1/2A′n,qHnJ
+U+n B
+n,d+1−q + opπ (1)]
− κ+j
[Id+1−q + E+
3 − (n1/2B+′n,d+1−qU
+′n J+′HnAn,q + opπ (1))(Iq + opπ (1))Sn,qE
+2
− E+′2 S′n,q(Iq + opπ (1))(n1/2A′n,qHnJ
+U+n B
+n,d+1−q + opπ (1)) + κ+
j E+′2 Sn,q(Iq + opπ (1))Sn,qE
+2
]= nB+′
n,d+1−qU+′n J+′H ′nHnJ
+U+n B
+n,d+1−q + opπ (1)
− [n1/2B+′n,d+1−qU
+′n J+′HnAn,q + opπ (1)](Iq + opπ (1))[n1/2A′n,qHnJ
+U+n B
+n,d+1−q + opπ (1)]
− κ+j (Id+1−q + opπ (1))
= M+d+1−q − κ
+j (Id+1−q + opπ (1)), (A.128)
where
M+d+1−q ≡ nB
+′n,d+1−qU
+′n J+′H ′nh3,k−qh
′3,k−qHnJ
+U+n B
+n,d+1−q + opπ (1). (A.129)
From (A.113), (A.122) and |Q+(κ)| = |Q+1 (κ)||Q+
2 (κ)|, |Q+2 (κ+
j )| = |M+d+1−q − κ
+j (Id+1−q + opπ (1))| = 0, j =
q + 1, . . . , d+ 1, wppa 1 in P -probability. Therefore, κj : j = q + 1, . . . , d+ 1 are the d+ 1− q eigenvalues
of (Id+1−q + opπ (1))−1/2M+d+1−q(Id+1−q + opπ (1))−1/2 wppa 1 in P -probability. Using (A.117) in (A.129),
(Id+1−q + opπ (1))−1/2M+d+1−q(Id+1−q + opπ (1))−1/2 dπ−→ (∆h,d−q,Zπ)′(∆h,d−q,Zπ) in P -probability.
(A.130)
On noting that the vector of eigenvalues of a matrix is continuous function of the matrix elements, invoking
the CMT yields the convergence in Pπ-distribution of κj : j = q + 1, . . . , d+ 1 to the vector of eigenvalues
of (∆h,d−q,Zπ)′(∆h,d−q,Zπ) in P -probability, and so
λmin[nU+′J+′H ′HJ+U+]dπ−→ λmin[(∆h,d−q,Zπ)′(∆h,d−q,Zπ)] in P -probability. (A.131)
Combining (A.131) with (A.112) completes the proof of the asymptotic distribution of the PCLR statistic.
When q = d, U+n B
+n,d+1−q = [01×d, 1]′ and
n1/2HnJ+U+
n B+n,d+1−q = n1/2Hn[J , H−1Σπ−1/2mπ][01×d, 1]′ = n1/2HnH
−1Σπ−1/2mπ dπ−→ Zπ (A.132)
in P -probability. Therefore, using (A.132) in (A.129), (A.130) and (A.131) gives
λmin[nU+′J+′H ′HJ+U+]dπ−→ Zπ′h3,k−qh
′3,k−qZπ ∼ χ2
k−d.
Since h3h′3 = Ik, PCLR∞ = Zπ′Zπ−Zπ′h3,k−qh
′3,k−qZπ = Zπ′h3,dh
′3,dZπ ∼ χ2
d. The proof for subsequences
follows similarly. Also, the convergence holds jointly with Lemma A.6 because we only used the convergence
of the permutation quantity Sπ ≡ n1/2Σπ−1/2mπ conditional on the convergence of other quantities computed
from the observed data e.g. n1/2(J −Gn) as shown in Lemma A.6.
55
The following lemma is a permutation analog of Lemma 26.1 of Andrews and Guggenberger (2019b).
Lemma A.8 (Rates of convergence of sample eigenvalues). Under all sequences λn,h : n ≥ 1 with λn,h ∈ Λ
and q ≥ 1,
(a) κ+j
pπ−→∞ in P -probability for all j ≤ q,
(b) κ+j = opπ ((n1/2τln)2) in P -probability for all l ≤ q and j = q + 1, . . . , d+ 1,
Under all subsequences wn and all sequences λwn,h : n ≥ 1 with λwn,h ∈ Λ, the same results hold with n
replaced with wn.
The proof of Lemma A.8 is essentially the same as the proofs of Lemma 26.1 of Andrews and Guggen-
berger (2019b) and Lemma 17.1 of Andrews and Guggenberger (2017b) with only minor (mostly notational)
modifications. For completeness, we reproduce the arguments but defer the proof to Section A.8.
A.6 Asymptotic similarity
Proof of Theorem 2.2. The (uniform) asymptotic similarity of the permutation tests are established using
Proposition 16.3 of Andrews and Guggenberger (2019b).
PAR1 test: Let W = [W1, . . . , Wn]′ ≡MιW . Since ι is a column of X, MιMX = MX . Write
W ′πMXΣuMXWπ = W ′πMιΣuMιWπ −W ′πMιX(X ′X)−1X ′ΣuMιWπ −W ′πMιΣuX(X ′X)−1X ′MιWπ
+W ′πMιX(X ′X)−1X ′ΣuX(X ′X)−1X ′MιWπ. (A.133)
Consider the first term n−1W ′πMιΣuMιWπ = n−1W ′πΣuWπ = n−1∑ni=1 Wπ(i)W
′π(i)ui(θ0)2 on the right-hand
side of the expression above. Proceeding similarly to (A.89) and noting that VarP [Wi]−(Qww−QwQ′w)→ 0,
where Qw denotes the column of Qwx corresponding to the element 1 of Xi for Qww and Qwx defined in
(A.23), we obtain
n−1n∑i=1
Wπ(i)W′π(i)ui(θ0)2 − (Qww −QwQ′w)σ2 pπ−→ 0 in P -probability. (A.134)
Next we determine the limit of n−1W ′πMιX. Note that letting [X1, . . . , Xn]′ ≡MιX, Wi = [Wi1, . . . , Wik]′, Wij =
Wij − Wj , Wj ≡ n−1∑ni=1Wij , and Xi = [Xi1, . . . , Xip]
′,
EPπ [n−1W ′πMιX|y, Y,W,X] =
(n−1
n∑i=1
Wi
)(n−1
n∑i=1
Xi
)= 0,
VarPπ
[n−1
n∑i=1
Wπ(i)jXil|y, Y,W,X
]=
1
n− 1
(n−1
n∑i=1
W 2ij
)(n−1
n∑i=1
X2il
)→ 0, j = 1, . . . , k, l = 1, . . . , p,
where the convergence uses n−1∑ni=1 W
2ij − (EP [W 2
ij ] − (EP [Wij ])2)
a.s.−→ 0 and n−1∑ni=1 X
2il − (EP [X2
il] −
(EP [Xil])2)
a.s.−→ 0, which, in turn, hold by the triangular array SLLN and the CMT. By Chebyshev’s inequal-
ity, we obtain
n−1W ′πMιXpπ−→ 0 a.s. (A.135)
56
Consider the part n−1X ′ΣuMιWπ = n−1∑ni=1XiW
′π(i)ui(θ0)2 in the second term of (A.133). Using the
Cauchy-Schwarz inequality twice,
‖n−1X ′ΣuMιWπ‖ ≤
(n−1
n∑i=1
‖Xi‖4)1/4(
n−1n∑i=1
‖Wi‖4)1/4( n∑
i=1
ui(θ0)4
)1/2
= Op(1), (A.136)
where we used∑ni=1 ‖Xi‖4 = Op(1) and
∑ni=1 ‖Wi‖4 ≤
∑ni=1(8‖Wi‖4 + 8‖W‖4) = Op(1) which hold by the
triangular array WLLN, and (A.86). Furthermore, note that n−1∑ni=1XijXiluiX
′i − EP [XijXiluiX
′i]
p−→ 0
and n−1∑ni=1XijXilXiX
′i − EP [XijXilXiX
′i]
p−→ 0, j, l = 1, . . . , p, by the WLLN, EP [‖XijXiluiX′i‖1+ δ
4 ] ≤
(EP [‖Xi‖4+δ])1/2(EP [u4+δi ])1/4(EP [‖Xi‖4+δ])1/4 < ∞ by the Cauchy-Schwarz inequality, and E[‖Xi‖4+δ] <
∞. Therefore, we have
n−1X ′ΣuX − EP [XiX′iu
2i ] = n−1
n∑i=1
XiX′iui(θ0)2 − EP [XiX
′iu
2i ]
= n−1n∑i=1
XiX′i(ui(θ0)2 − 2uiX
′i(X
′X)−1X ′u+ u′X(X ′X)−1XiX′i(X
′X)−1X ′u)
− EP [XiX′iu
2i ]
p−→ 0, (A.137)
where the last line uses the WLLN, CMT, EP [‖XiX′iu
2i ‖1+δ/4] ≤ (EP [‖Xi‖4+δ])1/2(EP [u4+δ
i ])1/2, n−1X ′up−→
0 and (X ′X)−1 − (EP [XiX′i])−1 p−→ 0 which follow from the WLLN and the CMT. Using (A.134), (A.135),
(A.136), (A.137), and the CMT in (A.133), we obtain
n−1W ′πMXΣuMXWπ − (Qww −QwQ′w)σ2 p−→ 0 in P -probability. (A.138)
Next, we determine the limit of n−1/2W ′πu(θ0). By the cr inequality and the triangular array SLLN,
n−1n∑i=1
ui(θ0)2+δ − 21+δ EP [u2+δi ] ≤ 21+δ(n−1
n∑i=1
u2+δi − EP [u2+δ
i ] + n−1n∑i=1
‖Xi‖2+δ‖(X ′X)−1X ′u‖2+δ)
a.s.−→ 0, (A.139)
and for any t ∈ Rk
n−1n∑i=1
‖t′Wi‖2+δ ≤ ‖t‖2+δn−1n∑i=1
‖Wi‖2+δ ≤ ‖t‖2+δ21+δ
(n−1
n∑i=1
‖Wi‖2+δ + ‖n−1n∑i=1
Wi‖2+δ
)= Oa.s.(1),
(A.140)
D ≡ 1
n− 1
n∑i=1
n∑j=1
g2n(i, j) =
1
n(n− 1)
n∑i=1
n∑j=1
(t′Wiuj(θ0))2 =1
n− 1
n∑i=1
ui(θ0)2n−1n∑i=1
(t′Wi)2
= σ2t′VarP [Wi]t+ oa.s.(1). (A.141)
57
Note that by (A.138), (A.140) and (A.141), for any ε > 0
n−2n∑i=1
n∑j=1
(t′Wiui(θ0))2
D1
(n−1(t′Wiui(θ0))2
D> ε2
)
= n−2D−1n∑i=1
n∑j=1
|t′Wiui(θ0)|2+δ
|t′Wiui(θ0)|δ1(|t′Wiui(θ0)|δ > εδDδ/2nδ/2
)≤ n−δ/2ε−δD−1−δ/2n−2
n∑i=1
n∑j=1
|t′Wiui(θ0)|2+δ
≤ n−δ/2ε−δD−1−δ/2n−1n∑i=1
‖t′Wi‖2+δn−1n∑i=1
|ui(θ0)|2+δ
a.s.−→ 0. (A.142)
Thus,
limn→∞
n−2n∑i=1
n∑j=1
(t′Wiui(θ0))2
D1
(n−1(t′Wiui(θ0))2
D> ε2
)= 0. (A.143)
On noting that we may write n−1/2W ′πMX u(θ0) = n−1/2W ′πu(θ0) = n−1/2W ′uπ(θ0) for a permutation π
uniformly distributed over Gn, Lemma A.1 gives
n−1/2t′W ′πu(θ0)dπ−→ N [0, t′(Qww −QwQ′w)tσ2] P -almost surely. (A.144)
Using the Cramer-Wold device
n−1/2W ′πu(θ0)dπ−→ N [0, (Qww −QwQ′w)σ2] P -almost surely. (A.145)
Finally, from the Slutsky’s lemma, (A.138), (A.145) followed by the Polya’s theorem (Theorem 11.2.9 of
Lehmann and Romano (2005)), we obtain PAR1d−→ PAR1,∞ ∼ χ2
k in P -probability, and
supx∈R|FPAR1n (x)− P [χ2
k ≤ x]| p−→ 0.
The above result holds for any subsequence wn of n. Let r1−α(χ2k) denote the 1 − α quantile of χ2
k
distribution. Since χ2k distribution function is continuous and strictly increasing at its 1 − α quantile, we
have PAR1(r)p−→ r1−α(χ2
k) by Lemma 11.2.1 of Lehmann and Romano (2005). By definition, EP [φPAR1n ] =
P [AR > PAR1(r)] + aPAR1P [AR = PAR1(r)], hence
P [AR > PAR1(r)] ≤ EP [φPAR1n ] ≤ P [AR ≥ PAR1(r)].
By Theorem 16.6 of Andrews and Guggenberger (2019b), ARd−→ AR∞ ∼ χ2
k. By Slutsky’s lemma (Corollary
11.2.3 of Lehmann and Romano (2005)), we have limn→∞ P [AR > PAR1(r)] = limn→∞ P [AR ≥ PAR1(r)] =
P [AR∞ > r1−α(χ2k)] because of the continuity of the χ2
k distribution function at its 1−α quantile. Therefore,
limn→∞
EP [φPAR1n ] = P [AR∞ > r1−α(χ2
k)] = α.
Thus, Assumption 3 is verified for the PAR1 statistic. By Proposition 16.3 of Andrews and Guggenberger
(2019b), we obtain the desired result.
58
PAR2 test: By (A.112), we have PAR2dπ−→ χ2
k in P -probability, and by the Polya’s theorem
supx∈R|FPAR2n (x)− P [χ2
k ≤ x]| p−→ 0.
The remaining argument is analogous to the PAR1 test.
PLM test: Note first that because Σn and Tn are nonsingular
PLM = n mπ′Σπ−1/2PΣπ−1/2Jπ Σπ−1/2mπ
= n mπ′Σπ−1/2P(Σπ−1/2Σ
1/2n )n1/2Σ
−1/2n JπTn
Σπ−1/2mπ.
By Lemma A.6(d), n1/2Σ−1/2n JπTn
dπ−→ ∆πh = [∆π
h,q,∆πh,d−q] = [h3,q, h3h1,d−q+h
−1/25,m (Gh+Jπh )h2,d−q], where
h5,m = Σ. The only random term conditional on the data is h−1/25,m Jπhh2,d−q. We apply Corollary 16.2 of
Andrews and Guggenberger (2017b) with p = d, q∗ = q, ∆q∗ = h3,q ∈ Rk×q, ∆p−q∗ = ∆πh,d−q ∈ Rk×(d−q),
M = h′3 ∈ Rk×k, M1 = h′3,q ∈ Rq×k, M2 = h′3,k−q ∈ R(k−q)×k, ξ2 ∈ Rd−q and ∆ = ∆πh ∈ Rk×d. Note that for
ξ2 ∈ Rd−q with ‖ξ2‖ = 1,
VarPπ (h′3,k−qΣ−1/2Jπhh2,d−qξ2)
= VarPπ (vec(h′3,k−qΣ−1/2Jπhh2,d−qξ2))
= VarPπ(
((h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2))(vec(Jπh ))
)=(
(h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2)
)VarPπ (vec(Jπh ))
((h2,d−qξ2)⊗ (Σ−1/2h3,k−q)
)=(
(h2,d−qξ2)′ ⊗ (h′3,k−qΣ−1/2)
) ((ΣV − ΣV u(σ2)−1ΣuV )⊗Qzz
) ((h2,d−qξ2)⊗ (Σ−1/2h3,k−q)
),
= (ξ′2h′2,d−q(Σ
V − ΣV u(σ2)−1ΣuV )h2,d−qξ2)(h′3,k−qΣ−1/2QzzΣ
−1/2h3,k−q),
where the second equality uses the formula vec(ABC) = (C ′⊗A)vec(B) and the fourth equality uses Lemma
A.6(a). Observe that ξ′2h2,d−q(ΣV − ΣV u(σ2)−1ΣuV )h2,d−qξ2 is scalar, and
ξ′2h′2,d−q(Σ
V − ΣV u(σ2)−1ΣuV )h2,d−qξ2
ξ′2h′2,d−qh2,d−qξ2
≥ λmin(ΣV − ΣV u(σ2)−1ΣuV ) > 0.
Moreover, using h′3,k−qh3,k−q = Ik−q
λmin(h′3,k−qΣ−1/2QzzΣ
−1/2h3,k−q) = minη∈Rk−q\0
η′h′3,k−qΣ−1/2QzzΣ
−1/2h3,k−qη
η′η
= minη∈Rk−q\0
η′h′3,k−qΣ−1/2QzzΣ
−1/2h3,k−qη
η′h′3,k−qh3,k−qη
≥ mina∈Rk\0
a′Σ−1/2QzzΣ−1/2a
a′a
≥ λmin(Σ−1)λmin(Qzz)
> 0.
Therefore, rank(VarPπ (h′3,k−qΣ−1/2Jπhh2,d−qξ2)) = k − q ≥ d − q, and by Corollary 16.2 of Andrews and
Guggenberger (2017b), Pπ[rank(∆πh) = d] = 1 in P -probability. Since (Σπ)−1/2Σ1/2 = (σ2Qzz)
−1/2Σ1/2 is
59
nonsingular, Pπ[rank((Σπ)−1/2Σ1/2∆πh) = d] = 1 in P -probability. As shown in the proof of Theorem 11.1
of Andrews and Guggenberger (2017b), the mapping defined as f : Rk×d → Rk×k, f(D) = D(D′D)−1D′ is
continuous at D ∈ Rk×d that has full column rank with probability one. Then,
PLM = n mπ′Σπ−1/2P(Σπ−1/2Σ
1/2n )n1/2Σ
−1/2n JπTn
Σπ−1/2mπ
dπ−→ mπ′h Σπ−1/2P(Σπ)−1/2Σ1/2∆π
hΣπ−1/2mπ
h
∼ χ2d in P -probability,
where the convergence follows from the CMT, Lemma A.6 and the full column rank property established
above, and the last line holds because using the independence between ∆πh and mπ
h, the full column rank
of ∆πh and Σπ−1/2mπ
h ∼ N [0, Ik], the equality in distribution holds conditional on ∆πh, hence also holds
unconditionally. The null asymptotic distribution of the sample LM statistic follows from Theorem 4.1 of
Andrews and Guggenberger (2017a) because the parameter space assumptions in the equations (3.3), (3.9)
and (3.10) of Andrews and Guggenberger (2017a) are implied by Assumption 2. The asymptotic similarity
of the PLM test is similar to the PAR tests, thus is omitted.
PCLR tests: By Theorem A.7,
PCLRdπ−→ PCLR∞ = Zπ′Zπ − λmin
[(∆h,d−q,Zπ)′h3,k−qh
′3,k−q(∆h,d−q,Zπ)
]in P -probability.
Denote the SV’s of h′3,q∆h,d−q by τ(2)h = [τ(q+1)n, . . . , τdn]′. Then, rk,d,q(h
′3,k−q∆h,d−q, 1−α) = rk,d,q(τ
(2)h , 1−
α) and the distribution of PCLR∞ is continuous and strictly increasing at its 1− α quantile by Lemma A.4
and A.5 because conditional on the data (e.g. ω ∈ Ω1 with Ω1 defined in (A.63)), h′3,q∆h,d−q is nonrandom.
It follows from Lemma 11.2.1 of Lehmann and Romano (2005) (see also Theorem 15.2.3 therein) that
rk,d(T , 1− α)p−→ rk,d,q(h
′3,k−q∆d−q, 1− α). (A.146)
By Lemma 16.6 of Andrews and Guggenberger (2019b),
CLRd−→ CLR∞ ≡ Z ′Z − λmin
[(∆h,d−q,Z)′h3,k−qh
′3,k−q(∆h,d−q,Z)
], Z ∼ N [0k×1, Ik]. (A.147)
By the CMT, (A.146) and (A.147)
CLR− rk,d(T , 1− α)d−→ CLR∞ − rk,d,q(h′3,k−q∆h,d−q, 1− α). (A.148)
As shown in Andrews and Guggenberger (2019b) p.72,
P [CLR∞ = rk,d,q(h′3,k−q∆h,d−q, 1− α)] = 0. (A.149)
Then,
P [CLR > PCLR(r)]→ P [CLR∞ > rk,d,q(h′3,k−q∆h,d−q, 1− α)]
= E[P [CLR∞ > rk,d,q(h
′3,k−q∆h,d−q, 1− α)|∆h,d−q]
]= α,
60
where the convergence follows from (A.148) and (A.149), the first equality holds by the law of total expec-
tations, and the last equality holds because the conditional rejection probability is α. (A.149) also implies
that limn→∞ P [CLR ≥ PCLR(r)] = α. Since EP [φPCLRn ] = P [CLR > PCLR(r)] + aPCLRP [CLR = PCLR(r)],
P [CLR > PCLR(r)] ≤ EP [φPCLRn ] ≤ P [PCLR ≥ PCLR(r)].
By the sandwich rule,
limn→∞
EP [φPCLRn ] = α.
The above result holds for any subsequence wn of n. Thus, Assumption 3 holds for the PCLR statistic.
Using Proposition 16.3 of Andrews and Guggenberger (2019b), we obtain the desired result.
A.7 Local power under strong identification
Proof of Proposition 2.3. Under H1 : θn = θ0 + hθn−1/2,
ui(θ0) = (W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn
−1/2 + ui −X ′i(X ′X)−1X ′u.
Note that ‖Z∗i ‖2‖Wi‖2, ‖Z∗i ‖2‖Xi‖2, ‖Z∗i ‖2‖Vi‖2, ‖Z∗i ‖2u2i , ‖Z∗i ‖2‖Wi‖‖Xi‖, ‖Z∗i ‖2‖Wi‖‖Vi‖, ‖Z∗i ‖2‖Wi‖|ui|,
‖Z∗i ‖2‖Xi‖‖Vi‖, ‖Z∗i ‖2‖Xi‖|ui|, ‖Z∗i ‖2‖Vi‖|ui| have finite 1+δ/2 moments by the Cauchy-Schwarz inequality.
The same moment bounds are also obtained when ‖Z∗i ‖2 in the previous quantities is replaced by ‖Xi‖2 and
‖Z∗i ‖‖Xi‖. By the triangular array WLLN, CMT and the fact that Dp−→ 0 in (A.35),
Σ = n−1n∑i=1
ZiZ′iui(θ0)2 = n−1
n∑i=1
(Z∗i +DXi)(Z∗i +DXi)
′
((W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn
−1/2 + ui −X ′i(X ′X)−1X ′u)2
= EP [Z∗i Z∗′i u
2i ] + op(1) = Σn + op(1) = h5,m + op(1). (A.150)
Proceeding similarly to the argument in (A.150),
Cs = n−1n∑i=1
ZiZ′iYisui(θ0)
= n−1n∑i=1
(Z∗i +DXi)(Z∗i +DXi)
′(Z∗′i Γs + Vis +X ′i(D′Γs − (X ′X)−1X ′V, s))
((W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn
−1/2 + ui −X ′i(X ′X)−1X ′u)
= n−1n∑i=1
Z∗i Z∗′i (Z∗′i Γs + Vis)ui + op(1) = EP [Z∗i Z
∗′i (Z∗′i Γs + Vis)ui] + op(1) = Cns + op(1). (A.151)
The limit in (A.38) remains unchanged under the local alternatives. Thus, from (A.150) and (A.151), we
have
Va(θ0) = Vn,a + op(1). (A.152)
Next to determine limit of Vb(θ0), note first that
εi(θ0)−
01×k
−Γ ′
Zi =
(W ′iΓ + V ′i )hθn−1/2 −X ′i(X ′X)−1X ′(WΓ + V )hθn
−1/2 + ui −X ′i(X ′X)−1X ′u
−Vi + V ′X(X ′X)−1Xi
61
and
εin(θ0)−
01×k
−Γ ′
Zi =
n−1/2h′θΓ ′Zi + n−1/2h′θV′Z(Z ′Z)−1Zi + u′Z(Z ′Z)−1Zi
−V ′Z(Z ′Z)−1Zi
.Hence,
εi(θ0)− εin(θ0) =
ai
0d×1
+
ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi
−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi
, (A.153)
where
ai ≡ (W ′iΓ +V ′i )hθn−1/2−X ′i(X ′X)−1X ′(WΓ +V )hθn
−1/2+n−1/2h′θΓ ′Zi+n−1/2h′θV
′Z(Z ′Z)−1Zi. (A.154)
Next we show that n−1∑ni=1 a
2iZiZ
′i
p−→ 0. For Di ∈ Wi, Xi, Vi and j + l ≤ 4, j ≥ 0, l ≥ 2, by the cr
inequality, and the WLLN,
n−1n∑i=1
‖Di‖j‖Zi‖l ≤ n−1n∑i=1
‖Di‖j2l−1(‖Wi‖l + ‖W ′X(X ′X)−1‖l‖Xi‖l) = Op(1). (A.155)
Therefore, using (A.155) and n−1X ′X = Op(1), n−1X ′W = Op(1), n−1Z ′Z = Op(1), n−1Z ′V = Op(1),
n−1X ′V = Op(1) which hold by the WLLN and the CMT,
‖n−1n∑i=1
a2iZiZ
′i‖ ≤ ‖hθ‖2n−1
(n−1
n∑i=1
‖W ′iΓ + V ′i ‖2‖Zi‖2 + ‖(X ′X)−1X ′(WΓV )‖2n−1n∑i=1
‖Xi‖2‖Zi‖2
+ ‖Γ‖2n−1n∑i=1
‖Zi‖3 + ‖V ′Z(Z ′Z)−1‖2n−1n∑i=1
‖Zi‖3
+ 2‖(X ′X)−1X ′(WΓ + V )‖n−1n∑i=1
‖Xi‖‖Zi‖2 + 2‖Γ‖n−1n∑i=1
‖W ′iΓ + V ′i ‖‖Zi‖3
+ 2‖V ′Z(Z ′Z)−1‖n−1n∑i=1
‖W ′iΓ + V ′i ‖‖Zi‖3
+ 2‖(X ′X)−1X ′(WΓ + V )‖‖Γ‖n−1n∑i=1
‖Xi‖‖Zi‖3
+ 2‖V ′Z(Z ′Z)−1‖‖(X ′X)−1X ′(WΓ + V )‖n−1n∑i=1
‖Xi‖‖Zi‖3
+ 2‖V ′Z(Z ′Z)−1‖‖Γ‖n−1n∑i=1
‖Zi‖4)
p−→ 0. (A.156)
Similarly, we have
n−1n∑i=1
ai(ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi)⊗ (ZiZ′i)
p−→ 0, (A.157)
n−1n∑i=1
ai(−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi)⊗ (ZiZ′i)
p−→ 0. (A.158)
62
As shown in (A.51)
n−1n∑i=1
ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi
−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi
ui −X ′i(X ′X)−1X ′u− u′Z(Z ′Z)−1Zi
−Vi + V ′X(X ′X)−1Xi + V ′Z(Z ′Z)−1Zi
′ ⊗ (ZiZ′i)
− EP
u2i −uiV ′i
−uiVi ViV′i
⊗ (Z∗i Z∗′i )
p−→ 0. (A.159)
By (A.153), (A.156), (A.157), (A.158), (A.159) and the CMT, we have
Vb(θ0)− Vn,bp−→ 0. (A.160)
AR and PAR statistics: Since, under H1 : θn = θ0 + hθn−1/2, y − Y θ0 = Y hθn
−1/2 +Xγ + u, using
n−1W ′MXWa.s.−→ Qzz and n−1W ′MXV
a.s.−→ 0, the convergence in Lemma 16.4 of Andrews and Guggenberger
(2019b) and Slutsky’s lemma we obtain
n1/2m(θ0) = n−1/2W ′MX(y − Y θ0)
= n−1/2W ′MX(Y hθn−1/2 +Xγ + u)
= n−1W ′MXWΓhθ + n−1W ′MXV hθ + n−1/2W ′MXu
d−→ N [Ghθ,Σ] , (A.161)
where Σ = limn→ EP [Z∗i Z∗′i u
2i ] and G = limn→∞ EP [Z∗i Z
∗′i ]Γ = h4. (A.150), (A.161) and the CMT together
yield ARd−→ χ2
k(η2).
Next we derive the asymptotic distribution of the permutation statistics. Since n−1W ′MXWa.s.−→ Qzz,
n−1V ′MXWa.s.−→ 0, n−1W ′MXu
a.s.−→ 0, n−1V ′MXVa.s.−→ ΣV , n−1u′MXV
a.s.−→ ΣuV and n−1u′MXua.s.−→ σ2,
by the SLLN and CMT
n−1u(θ0)′u(θ0) = n−1(MXWΓhθn
−1/2 +MXV hθn−1/2 +MXu
)′ (MXWΓhθn
−1/2 +MXV hθn−1/2 +MXu
)a.s.−→ σ2. (A.162)
By Jensen’s inequality, the WLLN, CMT, and (A.86)
n−1n∑i=1
ui(θ0)4 − EP [u4i ]
≤ 27n−1n∑i=1
‖W ′iΓ + V ′i ‖4‖hθ‖4n−2 + 27n−1n∑i=1
‖Xi‖4‖(X ′X)−1X ′(WΓ + V )‖4‖hθ‖4n−2
+ 27n−1n∑i=1
(ui −X ′i(X ′X)−1X ′u)4 − EP [u4i ]
p−→ 0, (A.163)
where for the first term we used n−1∑ni=1 ‖W ′iΓ + V ′i ‖4 ≤ 8n−1
∑ni=1(‖Wi‖4‖Γ‖4 + ‖Vi‖4) = Op(1) which
holds by the cr inequality and the WLLN. (A.134) holds by (A.162), (A.163) and an argument analogous to
63
the consistency of Σπ in (A.89). (A.136) holds by (A.163). (A.137) holds by an argument similar to (A.150).
Combining these and following an argument analogous to that in (A.138) then give
n−1W ′πMXΣuMXWπ − (Qww −QwQ′w)σ2 p−→ 0 in P -probability. (A.164)
Furthermore, the convergence in (A.89) holds here using (A.162) and (A.163) in (A.84) and (A.88), respec-
tively:
Σπpπ−→ σ2Qzz in P -probability. (A.165)
(A.141) holds by (A.162). A bound analogous to (A.139) is obtained by an argument similar to (A.163).
Thus, (A.142) and (A.143) hold. Therefore, by Theorem A.1
n−1/2W ′πMX(y − Y θ0) = n−1/2W ′Mιuπ(θ0)dπ−→ N
[0, (Qww −QwQ′w)σ2
]P -almost surely,
n1/2mπ(θ0) = n−1/2Z ′uπ(θ0)dπ−→ N
[0, Qzzσ
2]
P -almost surely. (A.166)
Finally, by (A.164), (A.165), (A.166) and Slutsky’s lemma, we obtain that PAR1dπ−→ χ2
k and PAR2dπ−→ χ2
k
in P -probability.
LM and PLM statistics: By definition, q = d if and only if n1/2τdn →∞. Decompose n1/2HnJUnTn =
n1/2HnGnUnTn + n1/2Hn(J −Gn)UnTn. By an analogous argument to that in (A.97) (see also Lemma 16.4
and the equations (25.5)-(25.6) of Andrews and Guggenberger (2019b)),
n1/2HnGnUnTnd−→ h3,q (A.167)
because the second term involving Bn,d−q in (A.97) drops out when q = d. The asymptotic distribution of
n1/2(G − Gn) remains invariant under H1 : θn = θ0 + hθn−1/2. This combined with (A.150), (A.151) and
(A.161) yields
n1/2(J −Gn) = Op(1). (A.168)
Therefore, n1/2Hn(J −Gn)UnTn = Hnn1/2(J −Gn)UnBn,qn
−1/2Υ−1n,q = op(1) because n1/2τdn →∞, and
n1/2HnJUnTnd−→ h3,q. (A.169)
Using Σ−1/2H−1n
p−→ Ik which holds by (A.150) and the CMT, (A.150), (A.169) and the CMT
LM = n m′Σ−1/2PΣ−1/2J Σ−1/2m = n m′Σ−1/2PΣ−1/2H−1n HnJUnTn
Σ−1/2m
d−→ χ2d(η
2), (A.170)
where we used PΣ−1/2H−1n HnJUnTn
p−→ Ph3,q , and the fact that Un and Tn are nonsingular, and h3,q = h3,d
has full column rank d.
Next consider the PLM statistic. To see that (A.79) holds under the local alternatives, note that by
64
Jensen’s inequality, the SLLN and (A.79),
n−1n∑i=1
|ui(θ0)|2+δ/2 ≤ 31+δ/2n−1n∑i=1
‖W ′iΓ + V ′i ‖2+δ/2‖hθn−1/2‖2+δ/2
+ 31+δn−1n∑i=1
‖Xi‖2+δ/2‖(X ′X)−1X ′(WΓ + V )‖2+δ/2‖hθn−1/2‖2+δ/2
+ 31+δ/2n−1n∑i=1
|ui −X ′i(X ′X)−1X ′u|2+δ/2
= Oa.s.(1), (A.171)
where n−1∑ni=1 ‖W ′iΓ + V ′i ‖2+δ/2 ≤ 21+δ/2n−1
∑ni=1(‖Wi‖2+δ/2‖Γ‖2+δ/2 + ‖Vi‖2+δ/2) = Oa.s.(1) by the cr
inequality and the triangular array SLLN. (A.81) holds by (A.171), hence (A.82) holds. Also, by the SLLN
and the CMT
n−1u(θ0)′V (A.172)
= n−1(MXWΓhθn−1/2 +MXV hθn
−1/2 +MXu)′(In − PZ − PX)V,
= n−1/2h′θn−1V ′(MX −MXW (W ′MXW )−1W ′MX)V + n−1u′(MX −MXW (W ′MXW )−1W ′MX)V
a.s.−→ ΣuV . (A.173)
Hence, (A.76) holds. By (A.82) and Lemma A.1, we then obtain (A.70). Note that (A.92) holds because
(A.93) holds by (A.163). Also, (A.90) holds by (A.173). This verifies (A.94). Combining the latter with
(A.70) and (A.165), we obtain
n1/2(Jπ − G) = Opπ (1) in P -probability. (A.174)
Thus, following the proof of Lemma A.6, noting that the term involving Bn,d−q in (A.97) evacuates when
q = d, and using (A.168) and (A.174), we obtain
n1/2HnJπUnTn = n1/2HnGnUnTn + n1/2Hn(G−Gn)UnTn + n1/2Hn(Jπ − G)UnTn
pπ−→ ∆πh = h3,d in P -probability,
where ∆πh = h3,q = h3,d = ∆h has full column rank. Since (Σπ)−1/2Σ1/2 = (σ2Qzz)
−1/2Σ1/2 is nonsingular,
Pπ[rank((Σπ)−1/2Σ1/2∆πh) = d] = 1 with P -probability one. Then,
PLM = n mπ′Σπ−1/2PΣπ−1/2Jπ Σπ−1/2mπ
= n mπ′Σπ−1/2P(Σπ−1/2Σ
1/2n )n1/2Σ
−1/2n JπUnTn
Σπ−1/2mπ
dπ−→ mπ′h Σπ−1/2P(Σπ)−1/2Σ1/2∆π
hΣπ−1/2mπ
h
∼ χ2d in P -probability, (A.175)
where the second equality holds because Σn and Tn are nonsingular, the convergence follows from the CMT,
the full column rank property verified above, and the last line uses (A.166) and the fact that P(Σπ)−1/2Σ1/2∆πh
is idempotent with rank d.
65
CLR and PCLR statistics: The asymptotic distributions of the CLR and PCLR statistics under the
sequence of local alternatives and strong or semi-strong identification with n1/2τdn → ∞, are derived pro-
ceeding similarly to Proposition A.7 and Lemma 28.1 of Andrews and Guggenberger (2019b). By definition,
q = d if and only if n1/2τdn → ∞ in which case Q+2 (κ) defined in (A.120) is scalar. From (A.128) which
also holds under the local alternatives because J −Gn = Op(n−1/2) as verified in (A.168), all the remaining
quantities n1/2Σπ−1/2mπ, Σ, V ∈ Va(θ0), Vb(θ0) have the same limits as under the null as shown in (A.150),
(A.152), (A.160), (A.165), (A.166), and
Q+2 (κ+
j ) = M+d+1−q − κ
+j (Id+1−q + opπ (1)), (A.176)
where
M+d+1−q ≡ nB
+′n,d+1−qU
+′J+′H ′nh3,k−qh′3,k−qHnJ
+U+n B
+n,d+1−q + op(1). (A.177)
Since |Q+2 (κ+
d+1)| = |M+d+1−q − κ+
d+1(Id+1−q + opπ (1))| = 0 with Pπ-probability approaching 1, in P -
probability, we obtain
κ+d+1 = nB+′
n,d+1−qU+′n J+′H ′nh3,k−qh
′3,k−qHnJ
+U+n B
+n,d+1−q(1 + opπ (1)) + opπ (1),
= n mπ′Σπ−1/2h3,k−qh′3,k−qΣ
π−1/2mπ + opπ (1) in P -probability. (A.178)
When q = d and under the local alternativesH1 : θn = θ0+hθn−1/2, as shown in (A.169), n1/2HJπUnBnSn
dπ−→
∆πh = h3,q as the term ∆π
h,d−q evacuates. Then,
PCLR = n mπ′Σπ−1mπ − κ+d+1 (A.179)
= n mπ′Σπ−1/2(Ik − h3,k−qh′3,k−q)Σ
π−1/2mπ + opπ (1)
= n mπ′Σπ−1/2h3,qh′3,qΣ
π−1/2mπ + opπ (1)
= n mπ′Σπ−1/2Ph3,qΣπ−1/2mπ + opπ (1)
= n mπ′Σπ−1/2P(Σπ−1/2Σ
1/2n )n1/2Σ
−1/2n JπUnTn
Σπ−1/2mπ + opπ (1)
dπ−→ χ2d in P -probability,
where the second equality uses (A.178), the third and the fourth equalities use Pn1/2HJπUnBnSn= Ph3,q
+
op(1) = h3,q(h′3,qh3,q)
−1h′3,q + op(1) = h3,qh′3,q + op(1) and (A.161) since h′3,qh3,q = Id, and the convergence
follows from (A.175).
Recall that by (A.167) (see also Lemma A.6, Lemma 10.3 of Andrews and Guggenberger (2017b) and
Lemma 16.4 of Andrews and Guggenberger (2019b)), n1/2HJUnBnSnd−→ h3,q. By replacing Σπ−1/2mπ in
(A.178) and (A.179) with Σ−1/2m and using (A.161), the asymptotic distribution of the CLR statistic is
obtained similarly:
CLR = n m′Σ−1/2h3,qh′3,qΣ
−1/2m+ op(1)
= n m′Σ−1/2PΣ−1/2H−1n HnJUnTn
Σ−1/2m
= n m′Σ−1/2PΣ−1/2J Σ−1/2m+ op(1)
d−→ χ2d(η
2),
66
where the second and the third equalities use PΣ−1/2J = Pn1/2HJUnBnSn= Ph3,q+op(1) = h3,q(h
′3,qh3,q)
−1h′3,q+
op(1) = h3,qh′3,q + op(1) and the convergence uses (A.170).
A.8 Proof of Lemma A.8
Proof of Lemma A.8. In view of Lemma A.3, op(1) terms are opπ (1) in P -probability. For simplicity, we
suppress the qualifier in P -probability. Also, whenever we use the convergence result n1/2(J −Gn) = Op(1)
given in (A.60), we are conditioning on the event ω ∈ Ω1 as discussed in Section A.4. The result for
subsequences follow similarly, thus we provide only the full sequence result.
Recall that by definition, h6,j ≡ limn→∞ τ(j+1)n/τjn, j = 1, . . . , d, and h6,q = 0 if q < d + 1. Define
h6,q = 0 if q = d + 1. Because τ1n ≥ · · · ≥ τ(d+1)n ≥ 0, h6,j ∈ [0, 1]. When h6,j = limn→∞ τ(j+1)n/τjn > 0,
τjn : n ≥ 1 and τ(j+1)n : n ≥ 1 have the same orders of magnitude.
Next divide the first q SV’s τ1n ≥ · · · ≥ τqn into groups such that the SV’s within each group have the same
orders of magnitude. Denote the number of groups by Gh with q ≥ Gh ≥ 1, and the first and last indices
of the SV’s in the g-th group by rg and r∗g respectively. Thus, r1 = 1, r∗g = rg+1 − 1 and r∗Gh = q with
the definition rGh+1 ≡ q + 1. By construction, h6,j > 0 for all j = rg, . . . , r∗g − 1, g = 1, . . . , Gh. Also,
limn→∞ τj′n/τjn = 0 for any j ∈ g and j′ ∈ g′ with g 6= g′. When d = 1, Gh = 1 and r1 = r∗1 = 1.
Step 1: The first group of eigenvalues. Recall that the eigenvalues of nU+′J+′H ′HJ+U+ are
denoted as κ+1 , . . . , κ
+d+1. Then,
κ+d+1 = λmin[n(HJU , Σπ−1/2mπ)′(HJU , Σπ−1/2mπ)] = λmin[nU+′J+′H ′HJ+U+].
Now premultiply the determinantal equation
|nU+′J+′H ′HJ+U+ − κId+1| = 0
by |τ−2r1nn
−1B+′n U
+′n (U+)−1′| and postmultiply by |(U+)−1U+
n B+n | to obtain
|τ−2r1nB
+′n U
+′n J+′H ′HJ+U+
n B+n − (n1/2τr1n)−2κB+′
n U+′n (U+)−1′(U+)−1U+
n B+n | = 0. (A.180)
Since |B+n |, |U+
n | > 0 and |U+| > 0 with Pπ-probability approaching 1 (wppa 1), and τr1n > 0 for n large
given n1/2τr1n →∞ for r1 ≤ q, the eigenvalues κ+j , 1 ≤ j ≤ d+ 1 above also solve (A.180) wppa 1. Thus,
(n1/2τr1n)−2κ+j : 1 ≤ j ≤ d+ 1 solve
|τ−2r1nB
+′n U
+′n J+′H ′HJ+U+
n B+n − κ(Id+1 + E+)| = 0, (A.181)
where
E+ =
E+1 E+
2
E+′2 E+
3
= B+′n U
+′n (U+)−1′(U+)−1U+
n B+n − Id+1, (A.182)
where E+1 ∈ Rr∗1×r∗1 , E+
2 ∈ Rr∗1×(d+1−r∗1 ) and E+3 ∈ R(d+1−r∗1 )×(d+1−r∗1 ).4 Note that HJ+U+
n B+n =
[HJUnBn, Σπ−1/2mπ] and HJ+
n U+n B
+n = [HGnUnBn, 0k×1], and by the SVD, HnGnUn = AnΥnB
′n. Let
4E+j , j = 1, 2, 3, and E+
j , j = 1, 2, 3, in (A.118) are different matrices because they have different dimensions.
67
O(τr2n/τr1n)d1×d2 denote a d1 × d2 matrix with O(τr2n/τr1n) elements on the main diagonal and zeros else-
where. Thus,
τ−1r1nHJ
+U+n B
+n
= τ−1r1nHJ
+n U
+n B
+n + τ−1
r1nH(J+ − J+n )U+
n B+n ,
= τ−1r1n[HGnUnBn, 0k×1] + (n1/2τr1n)−1[Hn1/2(J −Gn)UnBn, n
1/2Σπ−1/2mπ]
= τ−1r1n[HH−1
n AnΥnB′nBn, 0k×1] +Opπ ((n1/2τr1n)−1),
= τ−1r1n(Ik + opπ (1))AnΥ+
n +Opπ ((n1/2τr1n)−1),
= (Ik + opπ (1))An
h∗6,r∗1 + o(1) 0r∗1×(d−r∗1 ) 0r∗1×1
0(d−r∗1 )×r∗1 O(τr2n/τr1n)(d−r∗1 )×(d−r∗1 ) 0(d−r∗1 )×1
0(k−d)×r∗1 0(k−d)×(d−r∗1 ) 0(k−d)×1
+Opπ ((n1/2τr1n)−1), (A.183)
pπ−→ h3
h∗6,r∗1 0r∗1×(d+1−r∗1 )
0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )
, h∗6,r∗1 ≡ diag(1, h6,1, h6,1h6,2, . . . ,
r∗1−1∏l=1
h6,l),
where the third equality holds by B′nBn = Id, J − Gn = Op(n−1/2) as shown in (A.60), n1/2Σπ−1/2mπ =
Opπ (1) as shown in Lemma A.6, HH−1n
p−→ Ik, ‖Hn‖ <∞, Un = O(1) by definition, and Bn = O(1) by the
orthogonality of Bn, the fifth equality uses τjn/τr1n =∏j−1l=1 (τ(l+1)n/τln) =
∏j−1l=1 h6,l + o(1), j = 2, . . . , r∗1 ,
and τjn/τr1n = O(τjn/τr1n), j = r2, . . . , d + 1, which holds by τ1n ≥ . . . τ(d+1)n, and the convergence uses
An → h3, τr2n/τr1n → 0 and n1/2τr1n →∞ since r1 ≤ q. Therefore, using h′3h3 = Ik
τ−2r1nB
+′n U
+′n J+′H ′HJ+U+
n B+n
pπ−→
h∗6,r∗1 0r∗1×(d+1−r∗1 )
0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )
′ h′3h3
h∗6,r∗1 0r∗1×(d+1−r∗1 )
0(k−r∗1 )×r∗1 0(k−r∗1 )×(d+1−r∗1 )
=
h∗26,r∗10r∗1×(d+1−r∗1 )
0(d+1−r∗1 )×r∗1 0(d+1−r∗1 )×(d+1−r∗1 )
.Since U−Un
p−→ 0 and Un → h91 positive definite, U+−U+n
p−→ 0 and U+n → h+
91. Thus, (U+)−1U+n
p−→ Id+1
and using the CMT in conjunction with B+′n B
+n = Id+1, we have
E+ = B+′n U
+′n U+′−1U+−1U+
n B+n − Id+1
pπ−→ 0(d+1)×(d+1). (A.184)
Thus, (n1/2τr1n)−2κ+j : j = 1, . . . , d + 1 solve |τ−2
r1nB+′n U
+′n J+′H ′HJ+U+
n B+n − κId+1| = 0 with wppa
1. Then, since the ordered vector of eigenvalues of a matrix is a continuous function of the matrix, using
Slutsky’s lemma
(n−1κ+1 , . . . , n
−1κ+r∗1
)pπ−→ (1, h2
6,1, h26,1h
26,2, . . . ,
r∗1−1∏l=1
h26,l), (A.185)
n−1κ+j
pπ−→∞, j = 1, . . . , r∗1 , (A.186)
because n1/2τr1n →∞ given r1 ≤ q and h6,l > 0 for all l = 1, . . . , r∗1 −1. The same argument also yields that
the remaining d+1−r∗1 eigenvalues (n1/2τr1n)−2κ+j : j = r∗1+1, . . . , d+1 of |τ−2
r1nB+′n U
+′n J+′H ′HJ+U+
n B+n −
68
κId+1| = 0 satisfy
(n1/2τr1n)−2κ+j
pπ−→ 0, j = r∗1 + 1, . . . , d+ 1. (A.187)
Let B+n,j1,j2
denote the (d+ 1)× (j2 − j1) matrix that consists of j1 + 1, . . . , j2 columns of B+n for 0 ≤ j1 <
j2 ≤ d+ 1. Thus, we may write B+n = [B+
n,0,r∗1, B+
n,r∗1 ,d+1]. Proceeding similarly to (A.183),
τ−1r1nH
−1J+U+n B
+n,0,r∗1
= (Ik + op(1))An
h6,r∗1+ o(1)
0(k−r∗1 )×r∗1
+Opπ ((n1/2τr1n)−1),
τ−1r1nH
−1J+U+n B
+n,r∗1 ,d+1 = (Ik + op(1))An
0r∗1×(d+1−r∗1 )
O(τr2nτr1n
)(k−r∗1 )×(d+1−r∗1 )
+Opπ ((n1/2τr1n)−1).
From the last two expressions,
%n ≡ τ−2r1nB
+′n,0,r∗1
U+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1
=
h6,r∗1+ o(1)
0(k−r∗1 )×r∗1
′A′n(Ik + op(1))An
0r∗1×(d+1−r∗1 )
O(τr2nτr1n
)(k−r∗1 )×(d+1−r∗1 )
+Opπ ((n1/2τr1n)−1)
=
h6,r∗1+ o(1)
0(k−r∗1 )×r∗1
′ Ir∗1 + opπ (1) opπ (1)r∗1×(k−r∗1 )
opπ (1)(k−r∗1 )×r∗1 Ik−r∗1 + opπ (1)
0r∗1×(d+1−r∗1 )
O(τr2nτr1n
)(k−r∗1 )×(d+1−r∗1 )
+Opπ ((n1/2τr1n)−1)
= opπ (τr2n/τr1n) +Opπ ((n1/2τr1n)−1). (A.188)
Define
ξ1(κ) ≡ τ−2r1nB
+′n,0,r∗1
U+′n J+′H ′HJ+U+
n B+n,0,r∗1
− κ(Ir∗1 + E+1 ) ∈ Rr
∗1×r
∗1 ,
ξ2(κ) ≡ %n − κE+2 ∈ Rr
∗1×(d+1−r∗1 ),
ξ3(κ) ≡ τ−2r1nB
+′n,r∗1 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1 − κ(Id+1−r∗1 + E+
1 ) ∈ Rd+1−r∗1×(d+1−r∗1 ).
Next we verify that (n1/2τr1n)−2κ+j , j = r∗1 +1, . . . , d+1, cannot solve the determinantal equation |ξ1(κ)| = 0
with wppa 1. First, note that
τ−2r1nB
+′n,0,r∗1
U+′n J+′H ′HJ+U+
n B+n,0,r∗1
pπ−→ h26,r∗1
. (A.189)
Then, using (A.184), (A.187) and (A.189), for j = r∗1 + 1, . . . , d+ 1,
ξj1 ≡ ξ1((n1/2τr1n)−2κ+j )
= τ−2r1nB
+′n,0,r∗1
U+′n J+′H ′HJ+U+
n B+n,0,r∗1
− (n1/2τr1n)−2κ+j (Ir∗1 + E+
1 )
= h∗26,r∗1+ opπ (1)− opπ (1)(Ir∗1 + opπ (1))
= h∗26,r∗1+ opπ (1). (A.190)
Note that λmin(h∗26,r∗1) > 0 because h6,l > 0 for all l = 1, . . . , r∗1 − 1. Hence,
|ξ1((n1/2τr1n)−2κ+j )| 6= 0 (A.191)
69
wppa 1. Using the determinant formula for partitioned matrices,
0 =∣∣τ−2r1nB
+′n U
+′n J+′H ′HJ+U+
n B+n − κ(Id+1 + E+)
∣∣ (A.192)
=
∣∣∣∣∣∣ ξ1(κ) ξ2(κ)
ξ2(κ)′ ξ3(κ)
∣∣∣∣∣∣= |ξ1(κ)||ξ3n(κ)− ξ2(κ)′ξ1(κ)−1ξ2(κ)|
= |ξ1(κ)||τ−2r1nB
+′n,r∗1 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1 − %
′nξ1(κ)−1%n − κ
(Id+1−r∗1 + E+
3 − E+′2 ξ1(κ)−1%n
− %′nξ1(κ)−1E+2 + κE+′
2 ξ1(κ)−1E+2
)|, (A.193)
for κ equal to any solution (n1/2τr1n)−2κ+j to the equation (A.192). From (A.191), (n1/2τr1n)−2κ+
j , j =
r∗1 + 1, . . . , d+ 1, solve |ξ3(κ)− ξ2(κ)′ξ1(κ)−1ξ2(κ)| = 0. Thus,
|τ−2r1nB
+′n,r∗1 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1 − o((τr2n/τr1n)2) +Opπ ((n1/2τr1n)−2)
− (n1/2τr1n)−2κ+j (Id+1−r∗1 + Ej2)| = 0,
where Ej2 ≡ E3 − E′2ξ−1j1 %n − %′nξj1E2 + (n1/2τr1n)−2κ+
j E′2ξ−1j1 E2 ∈ R(d+1−r∗1 )×(d+1−r∗1 ). Multiplying the
equation in the previous display by τ−2r2n/τ
−2r1n and using Opπ ((n1/2τr1n)−2) = opπ (1) which follows from
r2 ≤ q and n1/2τjn →∞ for all j ≤ q,
|τ−2r2nB
+′n,r∗1 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1 + opπ (1)− (n1/2τr2n)−2κ+
j (Id+1−r∗1 + Ej2)| = 0, (A.194)
Thus, (n1/2τr2n)−2κ+j : j = r∗1 + 1, . . . , d+ 1 solve
|τ−2r2nB
+′n,r∗1 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗1 ,d+1 + opπ (1)− κ(Id+1−r∗1 + Ej2)| = 0. (A.195)
On noting that E2 = opπ (1) and E3 = opπ (1) by (A.184), ξ−1j1 = Opπ (1) by (A.190), and %n = opπ (1) using
(A.188), τr2n ≤ τr1n, n1/2τr1n → 0, and (n1/2τr1n)−2κ+j = opπ (1) for j = r∗1 + 1, . . . , d+ 1 by (A.187),
Ej2 = opπ (1). (A.196)
Now the same argument is repeated for the subsequent groups of indices. Specifically, replace (A.181) and
(A.184) by (A.195) and (A.196), and replace j = r∗1 + 1, . . . , d + 1, E, B+n , τr1n, τr2n, r∗1 , d + 1 − r∗1 ,
h∗6,r∗1 , B+n,0,r∗1
, and B+n,r∗1 ,d+1 by j = r∗2 + 1, . . . , d + 1, Ej2, B+
n,d+1−r∗1, τr2n, τr3n, r∗2 − r∗1 , d + 1 − r∗2 ,
h∗6,r∗2 ≡ diag(1, h6,r∗1+1, h6,r∗1+1h6,r∗1+2, . . . ,∏r∗2−1l=r∗1+1 h6,l) ∈ R(r∗2−r
∗1 )×(r∗2−r
∗1 ), B+
n,r∗1 ,r∗2, and B+
n,r∗2 ,d+1, respec-
tively. Moreover, Ej2 is partitioned further as
Ej2 =
E1j2 E2j2
E′2j2 E3j2
, E1j2 ∈ Rr∗2×r
∗2 , E2j2 ∈ Rr
∗2×(d+1−r∗1−r
∗2 ), E3j2 ∈ R(d+1−r∗1−r
∗2 )×(d+1−r∗1−r
∗2 ).
The analog of (A.195) is that (n1/2τr3n)−2κ+j : j = r∗2 + 1, . . . , d+ 1 solve
|τ−2r3nB
+′n,r∗2 ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗2 ,d+1 + opπ (1)− κ(Id+1−r∗2 + Ej3)| = 0, (A.197)
70
where
Ej3 ≡ E3j2 − E′2j2ξ−11j2%2n − %′2nξ−1
1j2E2j2 + (n1/2τr2n)−2κ+j E′2j2ξ
−11j2E2j2,
ξ1j2 ≡ ξ1j2((n1/2τr2n)−2κ+j )
= τ−2r2nB
+′n,r∗1 ,r
∗2U+′n J+′H ′HJ+U+
n B+n,r∗1 ,r
∗2− (n1/2τr2n)−2κ+
j (Ir∗2−r∗1 + E1j2)
= h∗26,r∗2+ opπ (1)− opπ (1)(Ir∗2−r∗1 + opπ (1))
= h∗26,r∗2+ opπ (1),
%2n ≡ τ−2r2nB
+′n,r∗1 ,r
∗2U+′n J+′H ′HJ+U+
n B+n,r∗2 ,d+1.
Analogously to (A.185), (A.186) and (A.187), it then follows that
κ+j
pπ−→∞, j = r2, . . . , r∗2 , (A.198)
(n1/2τr2n)−2κ+j = opπ (1), j = r∗2 + 1, . . . , d+ 1. (A.199)
Then, repeating the previous steps Gh − 2 more times gives (as verified below)
κ+j
pπ−→∞, j = 1, . . . , r∗Gh , (A.200)
(n1/2τrgn)−2κ+j = opπ (1), j = r∗g + 1, . . . , d+ 1, g = 1, . . . , Gh. (A.201)
Since r∗Gh = q, the part (a) of the lemma follows from (A.199). Setting g = Gh in (A.201) and noting that
r∗Gh = q yield
(n1/2τrGhn)−2κ+j = opπ (1), j = q + 1, . . . , d+ 1. (A.202)
If rGh = r∗Gh = q, from (A.202) (n1/2τqn)−2κ+j = opπ (1) for j = q + 1, . . . , d + 1. If rGh < r∗Gh = q, using
h6,l > 0 for all l = rGh , . . . , r∗Gh− 1,
limn→∞
τqnτrGhn
= limn→∞
τr∗Ghn
τrGhn=
r∗Gh−1∏
j=rGh
h6,j > 0. (A.203)
Combining (A.202) and (A.203), (n1/2τqn)−2κ+j = opπ (1) for j = q + 1, . . . , d + 1. On noting that τln ≥ τqn
for l ≤ q, the part (b) of the lemma follows. The result for all subsequences wn and λwn,h ∈ Λ : n ≥ 1
holds analogously. The formal induction proof of (A.200) and (A.201) are provided next.
Step 2: Using induction to complete the proof. Denote ogp denote a symmetric (d + 1 −
r∗g−1)× (d+ 1− r∗g−1) matrix whose (l,m), l,m = 1, . . . , d+ 1− r∗g−1, element is opπ (τ(r∗g+l)nτ(r∗g+l)n/τ2rgn) +
Opπ ((n1/2τrgn)−1). ogp = opπ (1) because r∗g−1 + l ≥ rg for l ≥ 1 as τjn’s are nonincreasing in j, and
n1/2τrgn →∞ for g = 1, . . . , Gh. The goal is to show that (n1/2τrgn)−2κ+j : j = r∗g−1 + 1, . . . , d+ 1 solve
|τ−2rgnB
+′n,r∗g−1,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g−1,d+1 + ogp − κ(Id+1−r∗g−1
+ Ejg)| = 0, (A.204)
for some (d+1−r∗g−1)×(d+1−r∗g−1) symmetric matrices Ejg = opπ (1) and ogp. This is shown using induction
over g = 1, . . . , Gh. For g = 1, the above holds upon noting that Ejg = E, ogp = 0, r∗g−1 = r∗0 ≡ 0 and
71
B+n,rg−1,d+1 = B+
n,0,d+1 = B+n . Now suppose that (A.197) holds for g with Ejg = opπ (1) and ogp. Proceeding
similarly to the argument in (A.183)
τ−1rgnHJ
+U+n B
+n,r∗g−1,d+1
= τ−1rgn(Ik + opπ (1))An
0r∗g−1×(d+1−r∗g)
diag(τrgn, . . . , τ(d+1)n)
0(k−d−1)×(d+1−r∗g−1)
+Opπ ((n1/2τrgn)−1), (A.205)
pπ−→ h3
0r∗g−1×(r∗g−r∗g−1) 0r∗g−1×(d+1−r∗g)
h∗6,r∗g 0(r∗g−r∗g−1)×(d+1−r∗g)
0(k−r∗g)×(r∗g−r∗g−1) 0(k−r∗g)×(d+1−r∗g)
, h∗6,r∗g ≡ diag(1, h6,rg , . . . ,
r∗g−1∏l=1
h6,l) ∈ R(r∗g−r∗g−1)×(r∗g−r
∗g−1),
and h∗r∗g ≡ 1 for r∗g = 1. From (A.205) and h′3h3 = Ik = limn→∞A′nAn,
τ−2rgnB
+′n,r∗g−1,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g−1,d+1
pπ−→
h∗26,r∗g0(r∗g−r∗g−1)×(d+1−r∗g)
0(d+1−r∗g)×(r∗g−r∗g−1) 0(d+1−r∗g)×d+1−r∗g)
. (A.206)
Using (A.197) which is assumed to hold in the induction step and ogp = opπ (1), wppa 1, (n1/2τrgn)−2κj :
j = r∗g−1 + 1, . . . , d+ 1 solve
|(Id+1−r∗g−1+ Ejg)
−1τ−2rgnB
+′n,r∗g−1,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g−1,d+1 + opπ (1)− κId+1−r∗g−1
| = 0. (A.207)
By the same arguments that led to (A.185)-(A.187), (A.206), (A.207) and the induction assumption that
Ejg = opπ (1),
κ+j
pπ−→∞, j = r∗g−1, . . . , r∗g , (A.208)
(n1/2τrgn)−2κ+j = opπ (1), j = r∗g + 1, . . . , d+ 1. (A.209)
Let o∗gp denote an (r∗g − r∗g−1) × (d + 1 − r∗g) matrix whose elements in column j, j = 1, . . . , d + 1 − r∗g ,
are opπ (τ(r∗g+j)n/τrgn) + Opπ ((n1/2τrgn)−1). Then o∗gp = opπ (1). Next replacing B+n,r∗g−1,d+1 in (A.183) with
Bn,r∗g ,d+1 and Bn,r∗g−1,r∗g, and multiplying the resulting matrices give
%gn ≡ τ−2rgnB
+′n,r∗g−1,r
∗gU+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1
=
0r∗g−1×(r∗g−r∗g−1)
diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn
0(k−r∗g)×(r∗g−r∗g−1)
′
A′n(Ik + opπ (1))An
0r∗g−1×(r∗g−r∗g−1)
diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn
0(k−r∗g)×(r∗g−r∗g−1)
+Opπ ((n1/2τrgn)−1)
= o∗gp ∈ R(r∗g−r∗g−1)×(d+1−r∗g), (A.210)
where diag(τ(rg−1+1)n, . . . , τr∗gn)/τrgn = h∗6,r∗g+o(1), and the last equality usesA′n(Ik+opπ (1))An = Ik+opπ (1).
Partition the (d+ 1− r∗g−1)× (d+ 1− r∗g−1) matrices ogp and Ejg as:
ogp =
o1gp o2gp
o′2gp o3gp
, Ejg =
E1jg E2jg
E′2jg E3jg
, o1gp, E1jg ∈ R(r∗g−r∗g−1)×(r∗g−r
∗g−1),
72
o2gp, E2jg ∈ R(r∗g−r∗g−1)×(d+1−r∗g), o3gp, E3jg ∈ R(d+1−r∗g)×(d+1−r∗g), j = r∗g−1 + 1, . . . , d+ 1 and g = 1, . . . , Gh.
Define
ξ1jg(κ) ≡ τ−2r1nB
+′n,r∗g−1,r
∗gU+′n J+′H ′HJ+U+
n B+n,r∗g−1,r
∗g
+ o1gp − κ(Ir∗g−r∗g−1+ E1jg) ∈ R(r∗g−r
∗g−1)×(r∗g−r
∗g−1),
ξ2jg(κ) ≡ %gn + o2gp − κE2jg ∈ R(r∗g−r∗g−1)×(d+1−r∗g),
ξ3jg(κ) ≡ τ−2rgnB
+′n,r∗g ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1 + o3gp − κ(Id+1−r∗g + E3jg) ∈ Rd+1−r∗g×(d+1−r∗g).
From (A.197), (n1/2τrgn)−2κj : j = r∗g−1 + 1, . . . , d+ 1 solve
0 =∣∣τ−2rgnB
+′n,r∗g−1,d+1U
+′n J+′
n H ′nHnJ+n U
+n B
+n,r∗g−1,d+1 + ogp − κ(Id+1−r∗g−1
+ Ejgn)∣∣ (A.211)
=
∣∣∣∣∣∣ ξ1jg(κ) ξ2jg(κ)
ξ2jg(κ)′ ξ3jg(κ)
∣∣∣∣∣∣= |ξ1jg(κ)||ξ3jg(κ)− ξ2jg(κ)′ξ1jg(κ)−1ξ2jg(κ)|
= |ξ1jg(κ)||τ−2rgnB
+′n,r∗g ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1 + o3gp − (%gn + o2gpπ )′ξ1jgn(κ)−1(%gn + o2gp)
− κ(Id+1−r∗g + E3jg − E′2jg ξ1jg(κ)−1(%gn + o2gpπ )
− (%gn + o2gp)′ξ1jg(κ)−1E2jg + κE′2jg ξ1jg(κ)−1E2jg
)|, (A.212)
where the second equality holds by the formula of partitioned matrix determinant provided
ξ1jg
((n1/2τrgn)−2κj
), j = r∗g−1 + 1, . . . , d+ 1
is nonsingular wppa 1, a fact that is verified below. By the same argument as in (A.190), E1jg = opπ (1)
which holds by definition, and ogp = opπ (1),
ξ1jg ≡ ξ1jg((n1/2τrgn)−2κj) = h∗26,r∗g+ opπ (1). (A.213)
Since λmin(h∗26,r∗g) > 0, ξ1jgn
((n1/2τrgn)−2κj
), j = r∗g−1 + 1, . . . , d+ 1, is nonsingular wppa 1. Therefore,
0 =|τ−2rgnB
+′n,r∗g ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1 + o3gp − (%gn + o2gp)
′ξ−11jg(%gn + o2gpπ )
+ (n1/2τrgn)−2κ+j (Id+1−r∗g + Ej(g+1)n)|, (A.214)
where
Ej(g+1) ≡ E3jg − E′2jg ξj1g(%gn + o2gp)− (%gn + o2gp)′ξj1gE2jg
+ (n1/2τrgn)−2κ+j E′2jg ξ
−11jgE2jg ∈ R(d+1−r∗g)×(d+1−r∗g). (A.215)
Furthermore,
o3gp − (%gn + o2gp)′ξ−1
1jg(%gn + o2gp) = o3gp − (o∗gp + o2gp)′(h∗−2
6,r∗g+ opπ (1))(o∗gp + o2gp)
= o3gp − o∗′gpo∗gp
= (τ2rg+1n/τ
2rgn)o(g+1)p, (A.216)
where
73
1. the first equality follows from (A.210) and (A.213);
2. the second equality uses o2gp = o∗gp. The latter holds because the (j,m), j = 1, . . . , r∗g − r∗g−1,m =
1, . . . , d+ 1− r∗g element is
opπ (τ(r∗g−1+j)nτ(r∗g+m)n/τ2rgn) +Opπ ((n1/2τrgn)−1) = opπ (τ(r∗g+m)n/τrgn) +Opπ ((n1/2τrgn)−1)
since r∗g−1 + j ≥ rg and (h∗−26,r∗g
+ opπ (1))o∗gp = o∗gp which in turn holds because h∗6,r∗g is diagonal and
λmin(h∗26,r∗g) > 0;
3. the third equality uses the fact that the (j,m), j,m = 1, . . . , d+ 1− r∗g , of (τ2rgn/τ
2rg+1n)o3gp is of order
opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rgn)(τ2
rgn/τ2rg+1n) +Opπ ((n1/2τrgn)−1)(τ2
rgn/τ2rg+1n)
= opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rg+1n) +Opπ ((n1/2τrg+1n)−1)(τrgn/τrg+1n)
which is the same order as the (j,m) element of o(g+1)p using τrgn/τrg+1n ≤ 1;
4. the third equality also uses the fact that the (j,m), j,m = 1, . . . , d + 1 − r∗g , is of (τ2rgn/τ
2rg+1n)o∗′gpo
∗gp
is the sum of two terms that are of orders
opπ (τ(r∗g+j)nτ(r∗g+m)n/τ2rgn)(τ2
rgn/τ2rg+1n) = opπ (τ(r∗g+j)nτ(r∗g+m)n/τ
2rg+1n),
Opπ ((n1/2τrgn)−2)(τ2rgn/τ
2rg+1n) = Opπ ((n1/2τrg+1n)−2),
respectively. Thus, (τ2rgn/τ
2rg+1n)o∗gpo
∗gp is o(g+1)p.
For j = r∗g + 1, . . . , d + 1, since E2jg = opπ (1) and E3jg = opπ (1) by (A.197), ξ1jg = Opπ (1) by (A.213),
%gn + o2gp = opπ (1) by (A.210) and since o∗gp = opπ (1) and (n1/2τrgn)−2κ+j = opπ (1) by (A.199), it follows
that
Ej(g+1) = opπ (1). (A.217)
Using (A.216) and (A.217) in (A.214),
|τ−2rg+1nB
+′n,r∗g ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1 + o(g+1)p − (n1/2τrg+1n)−2κ+
j (Id+1−r∗g + Ej(g+1))| = 0,
(A.218)
Thus, wppa 1 (n1/2τrg+1n)−2κ+j : j = rg+1, . . . , d+ 1 solve
|τ−2rg+1nB
+′n,r∗g ,d+1U
+′n J+′H ′HJ+U+
n B+n,r∗g ,d+1 + o(g+1)p − κ(Id+1−r∗g + Ej(g+1))| = 0. (A.219)
This completes the induction step and (A.197) holds for all g = 1, . . . , Gh.
74