pdfs.semanticscholar.org · Weighted Sums and Residual Empirical Processes for Time-varying...
Transcript of pdfs.semanticscholar.org · Weighted Sums and Residual Empirical Processes for Time-varying...
Weighted Sums and Residual Empirical Processes for
Time-varying Processes
1Gabriel Chandler∗ and 2Wolfgang Polonik1Department of Mathematics, Pomona College, Claremont, CA 917112Department of Statistics, University of California, Davis, CA 95616
E-mail: [email protected] and [email protected]
March 6, 2011
Abstract
Function indexed weighted sums and sequential residual empirical processes based
on time-varying AR-processes are studied. It is shown that under appropriate assump-
tions non-parametric estimation of the parameter functions does not influence the
asymptotic distribution of the residual empirical process. An exponential inequality
for weighted sums of time-varying processes provides the basis for a weak convergence
result of weighted sum processes. A motivation for studying these two types of pro-
cesses is provided by the fact that, as shown in this paper, large sample results for both
processes can be utilized to derive the asymptotic distribution of a test for modality of
the variance function that is studied by the authors in an accompanying paper.
Keywords: Cumulants, empirical process theory, exponential inequality, locally stationary pro-
cesses, nonstationary processes, investigating unimodality.
1
1 Introduction
Consider the following time-varying AR-model of the form
Yt −p∑
k=1
θk
( tn
)Yt−k = σ
( tn
)εt, t = 1, . . . , T, (1)
where θk are the autoregressive parameter functions, p is the order of the model, σ is a
function controlling the volatility, and εt ∼ (0, 1) i.i.d. Following Dahlhaus (1997), time is
rescaled to the interval [0, 1] in order to make a large sample analysis feasible. Observe that
this in particular means that Yt = Yt,T satisfying (1) in fact forms a triangular array.
The consideration of non-stationary time series models goes back to Priestly (1965) who
considered evolutionary spectra, i.e. spectra of time series evolving in time. The time-
varying AR-process has always been an important special case, either in more methodological
and theoretical considerations of non-stationary processes, or in applications such as signal
processing and (financial) econometrics, e.g. Subba Rao (1970), Grenier (1983), Hall et
al. (1983), Rjan and Rayner (1996), Girault et al. (1998), Eom (1999), Drees and Starica
(2002), Fryzlewicz et al. (2006), and Orbe et al. (2005), Chandler and Polonik (2006).
Dahlhaus (1997) advanced the formal analysis of time-varying processes by introducing the
notion of a locally stationary process. This is a time-varying processes with time being
rescaled to [0, 1] that satisfies certain regularity assumptions (see (9 - 11) below for the case
of a time-varying AR-process). We would like to point out, however, that in this paper local
stationarity is only used to calculate the asymptotic covariance function in Theorem 2. All
the other results hold under weaker assumptions.
The interest of this paper is the investigation of the large sample behavior of two types
of processes, residual sequential empirical process and weighted sums processes, based on
observations from the non-stationary process satisfying (1). One motivation for studying
the behavior of both of these processes is given by the fact, that the large sample behavior
of both of these processes enter the derivation of the limit distribution of a test statistic for
modality of the variance function in model (1). This testing procedure is discussed in an
accompanying paper Chandler and Polonik (2010); see section 4.
Residual sequential empirical process are defined as follows. Let Yt = (Yt−1, . . . , Yt−p)′.Given
an estimator θ of θ = (θ1, . . . , θp)′ and corresponding residuals ηt = Yt−θ
(tT
)′Yt we estimate
2
the distribution function of the innovations ηt = σ(tT
)εt by the empirical distribution function
of the residuals
HT (z) =1
T
T∑t=1
1{ηt ≤ z}, z ∈ R.
Observe that we can write HT (z) = 1T
∑Tt=1 1{σ( t
T) εt ≤ (θ− θ)′( t
T) Yt + z}. This motivates
to consider the following type of process. Let [a, b] ⊂ [0, 1], and let Ys, s = 1, . . . , n =
bbT c − daT e+ 1 denote the observations Yt with tT∈ [a, b]. Define
νn(α,g, z) =1√n
bαnc∑s=1
[1{σ(a+ s
T) εt ≤ g′(a+ s
T) Ys + z
}− Fa+ s
T
(g′(a+ s
T) Ys + z
) ], (2)
where α ∈ [0, 1], z ∈ R, g : [a, b] → Rp, g ∈ Gp = {g = (g1, . . . , gp)′, gi ∈ G} with G
an appropriate function class such that θ − θ ∈ Gp (see below), and Fu(z) denotes the
distribution function of σ(u)εt, u ∈ [0, 1]. The reason for considering a subinterval [a, b] ⊂[0, 1] is again motivated by our application to the results to testing for modality of the
variance function. Observe that with 0 = (0, . . . , 0)′ ∈ Gp denoting the p-vector of null-
functions, νn(z, α,0) equals the empirical process based on the innovations rather than the
residuals. The basic form of νn is standard, but in contrast to most of the related literature,
we are considering a non-parametric index class G, and of course Yt is non-stationary here.
Letting Fn(z) denote the empirical distribution function of the innovations εs, our results will
in particular imply that under appropriate assumptions we have supz∈R |Hn(z) − Fn(z)| =
oP (T−1/2) even though non-parametric estimation of the parameter functions θk is involved.
The second type considered here are weighted sum processes of the form
Zn(h) =1√n
n∑s=1
h( sn)Ys,
α ∈ [0, 1] and h ∈ H where H is an appropriate function class. Such processes can be
considered as generalizations of partial sum processes. Large sample behavior, including
exponential inequalities and weak convergence of Zn is considered below.
The obtained results will be applied to derive the large sample distribution of a test for
modality of the variance function σ2(·) that is proposed in the accompanying paper Chandler
and Polonik (2010). In fact, the corresponding test statistic essentially is a functional of
α→ νn(α, θ − θ, qγ) plus a term that can be approximated by a weighted sum, where qγ is
3
an estimate of an appropriate γ-quantile. It turns out that the test statistic is asymptotically
distribution free, again despite the non-parametric estimation of the parameter functions.
(For more details see below.)
The outline of the paper is as follows. In sections 2 and 3 we analyze the large sample
behavior of the function indexed residual empirical process and weighted sums, respectively,
under the time varying model (1). Section 4 applies some of the results from the preceding
sections to derive a crucial approximation result for a quantity based on the residual empirical
process. This approximation provides a crucial ingredient in Chandler and Polonik (2010)
to derive the asymptotic distribution free limit of a test statistics for testing modality of the
variance function in model (1). Proofs are deferred to section 5.
Remark on measurability. Suprema of function indexed processes will enter the theo-
retical results below. We assume throughout the paper that such suprema are measurable.
Otherwise statements “in probability” have to be replaced by “outer” or “inner” probability,
respectively.
2 Residual empirical processes under time varying AR-models
There exists a substantial body of work on residual empirical processes indexed by a finite
dimensional parameter. For the time series setting see for instance Horvath et al. (2001),
Stute (2001), Koul (2002), Koul and Ling (2006), Laıb et al. (2008), Muller et al. (2009),
and references therein. There is not much work available on residual empirical processes for
models involving infinite dimensional parameter spaces. Except Muller et al. (2009), who
consider a the estimation of the innovation distribution in a nonparametric autoregression
model, we are only aware of Akritas and van Keilegom (2001) and Cheng (2005). Aktiras
and van Keilegom consider the estimation of the error distribution in a heteroscedastic
non-paramteric regression model, and Cheng (2005) is estimating distribution and density
function of the errors in a nonparametric (homoscedastic) regression model utilizing a sample
splitting technique. Here we are considering function indexed residual empirical processes
based on non-stationary time series.
In order to formulate one of our main results for the residual empirical process νn(α,g, z)
defined in (2) above, we first introduce some notation. For a function h : [a, b]→ R we denote
4
‖h‖∞ := supu∈[a,b]
|h(u)| and ‖h‖2n :=1
n
∑t∈[aT,bT ]
h2(tT
).
Let H denote a class of functions defined on [a, b]. For a given δ > 0, let N(δ,H) denote the
minimal number N of ‖·‖n-balls of radius less than or equal to δ that are needed to cover H,
i.e., there exists functions gk, k = 1, . . . , N such that the balls Ak = {h ∈ H : ‖gk−h‖n ≤ δ}with H ⊂
⋃Nk=1Ak. Then logN(δ,H) is called the metric entropy of H with respect to ‖ · ‖n.
If the balls Ak are replaced by brackets Bk = {h ∈ H : gk≤ h ≤ gk} for pairs of functions
gk≤ gk, k = 1, . . . , N with ‖gk − g
k‖n ≤ δ, then the minimal number N = NB(δ,H) of
such brackets with H ⊂⋃Nk=1 Bk is called a bracketing covering number, and logNB(δ, H) is
called the metric entropy with bracketing of H with respect to ‖ · ‖n.
Assumptions. (i) The process Yt = Yt,T has an MA-type representation
Yt,T =∞∑j=0
at,T (j)εt−j
where εt ∼i.i.d. (0, 1). The distribution F of εt has a strictly positive Lipschitz continuous
Lebesgue density f. The function σ(u) in (1) is of bounded variation with 0 < m∗ < σ(u) <
m∗ <∞ for all u.
(ii) The coefficients at,T (·) of the MA-type representation of Yt,T given in (i) satisfy
sup1≤t≤T
|at,T (j)| ≤ K
`(j)
for some K > 0, and where for some κ > 0 we have `(j) = j (log j)1+κ for j > 1 and `(j) = 1
for j = 0, 1.
(iii) There exists a class G with
θk(·)− θk(·) ∈ G, k = 1, . . . , p,
such that supg∈G ‖g‖∞ <∞ and for some γ ∈ [0, 1) and C, c > 0 and for all δ > cn
logNB(δ,G) ≤
C δ−γ, for 0 < γ < 1
C log(δ−1), for γ = 0.
5
(iv) For k = 1, . . . , p we have 1T
∑Tt=1
(θk(
tT
)−θk( tT ))2
= OP (m−1n ) with mn →∞ as T →∞.
Theorem 1 Suppose that assumptions (i) and (ii) hold, and that for some c > 0 and n
sufficiently large G satisfies ∫ 1
cn
√logNB(u2,G) du <∞.
Then we have for any L > 0 and δn → 0 as n→∞ that
supα∈[0,1],z∈[−L,L], g∈Gp
‖g‖n≤δn
|νn(α, z,g)− νn(α, z,0)| = oP (1). (3)
The above type of result is typical for work on residual empirical processes (e.g. (8.2.32)
in Koul 2002). However, as metioned above, compared to the existing work we are dealing
with non-stationary observations, and are considering an index set G that is a non-parametric
class of functions. Also keep in mind that we are in fact considering triangular arrays (recall
that Yt = Yt,T ),
Discussion of the assumptions. Assumptions (i) and (ii) have been used in the literature
on locally stationary processes (e.g. Dahlhaus and Polonik, 2006, 2009). It is shown in
Dahlhaus and Polonik (2006) that (i) and (ii) (and also the additional assumptions (9) -
(11) needed for Theorem 3 below) hold for time-varying AR-processes (1) and more general,
for time-varying ARMA-models as long as the zeros of the corresponding AR-polynomials
are bounded away from the unit circle (uniformly in the rescaled time u) and the parameter
functions are of bounded variation.
Assumption (iii) controls the complexity of the class G. In fact this assumption implies that
for some c > 0 we have∫ 1
cn
√logNB(u2,G) du <∞ and
∫ 1
cn
logN(u,G) du <∞.
Both of these assumptions are used below. Notice that the standard condition on the covering
integral is∫ 1
0
√logNB(u,G) du <∞ (or similarly without bracketing). In contrast to that,
our first condition is using NB(u2,G) (rather than NB(u,G)) in the integrant, and the second
does not have a square root. This makes both our the conditions stronger than the standard
6
assumption. The reason for this is that the exponential inequality that is underlying the
derivations of our results is not of sub-Gaussian type (see Lemma 3).
A class of non-parametric estimators satisfying conditions (iii) and (iv) is given by the
wavelet estimators of Dahlhaus et al. (1999). These estimators lie in the Besov smoothness
class Bsp,q(C) where the smoothness parameters satisfy the condition s + 1
2− 1
max(2,p)>
1. The constant C > 0 is a uniform bound on the (Besov) norm of the functions in the
class. Dahlhaus et al. derive conditions under which their estimators converge at rate(lognn
)s/(2s+1)in the L2-norm. For s ≥ 1 the functions in Bs
p,q(C) have uniformly bounded
total variation. Assuming that the model parameter functions also posses this property, the
rate of convergence in the ‖ · ‖n-norm is the same as the one of the L2-norm, because in
this case the error in approximating the integral by the average over equidistant points of
order O(n−1). Consequently, in this case we have m−1n =
(lognn
)s/(2s+1). In order to verify the
condition on the bracketing covering numbers from (iii), we are using Nickl and Potscher
(2007). Their Corollary 1, applied with s = 2, p = q = 2 implies that the bracketing entropy
with respect to the L2-norm can be bounded by C δ−1/2. (When applying their Corollary to
our situation choose, in their notation, β = 0, µ = U [0, 1], r = 2 and γ = 2, say.)
The proof of Theorem 1 rests on the following lemma which is of independent interest. It is
modeled after similar results for empirical processes (see van de Geer, 2000, Theorem 5.11).
Let H = [0, 1] × [−L,L] × Gp denote the index space of the process νn where L > 0 is a
constant. Define a metric on H as
d(h1, h2) = d((α1, z1,g1), (α2, z2,g2, )
)= |α1 − α2|+ |z1 − z2|+
p∑k=1
‖g1,k − g2,k‖n. (4)
Let DB(ε,H) denote the bracketing covering number of H with respect to the metric d.
Lemma 1 For C0 > 0 let Fn ={
1n
∑ns=1−p Y
2s ≤ C2
0
}and define K∗ = 1 + (1 + pC0)
‖f‖∞m∗
.
Suppose that H is totally bounded with respect to d, that assumption 1 (i) and (ii) hold, and
7
that for C1 > 0
η ≥ 26K∗√n
(5)
η ≤ 1
2K∗√n (τ 2 ∧ τ), (6)
η ≥ C1
(∫ τ
η/28K∗√n
√logDB(u2,H) du ∨ τ
). (7)
Then, for C1 ≥ 26√
10K∗ we have
P[
supd(h1,h2)≤τ2
∣∣ νn(h1)−νn(h2)| ≥ η, Fn
]≤(
26(26 + 1)K∗
C21
+ 2
)exp
(− η2
26(26 + 1)K∗ τ 2
),
where the supremum is extended over h1, h2 ∈ H.
3 Weighted sums under local stationarity
As discussed above, the second type of process of importance in our context are weighted
partial sums of locally stationary processes given by
Zn(h) =1√n
n∑s=1
h(a+ s
T
)Ys, h ∈ H. (8)
In the iid case weighted sums have received some attention in the literature. For functional
central limit theorems and exponential inequalities see for instance Alexander and Pyke
(1986), van de Geer (2000), and references therein.
We will show below that under appropriate assumptions, Zn(h) converges weakly to a Guas-
sian process. In order to calculate the covariance function of the limit we assume that the
process Yt is locally stationary as in Dahlhaus and Polonik (2009). This means that we
assume the existence of functions a(·, j) : (0, 1]→ R with
supu|a(u, j)| ≤ K
`(j), (9)
8
supj
n∑t=1
|at,T (j)− a(t
T, j)| ≤ K, (10)
TV (a(·, j)) ≤ K
`(j), (11)
where for a function g : [0, 1] → R we denote by TV (g) the total variation of g on [0, 1].
Further we define the time varying spectral density as the function
f(u, λ) :=1
2π|A(u, λ)|2
with
A(u, λ) :=∞∑
j=−∞
a(u, j) exp(−iλj),
and
c(u, k) :=
∫ π
−πf(u, λ) exp(iλk)dλ =
∞∑j=−∞
a(u, k + j) a(u, j) (12)
is the time varying covariance of lag k at rescaled time u ∈ [0, 1].
Theorem 2 Let H denote a class of uniformly bounded, real valued functions of bounded
variation defined on [a, b]. Assume further that for some c > 0,∫ 1
cn
logN(u,H) du <∞.
Then we have under assumptions (i), (ii) and (9) - (11) that as n → ∞ the process
Zn(h), h ∈ H converges weakly to a tight, mean zero Gaussian process {G(h) , h ∈ H}with variance-covariance function C(h1, h2) = 1
b−a
∫ bah1(u)h2(u)S(u) du, where S(u) =∑∞
k=−∞ c(u, k).
Remarks. a) Here weak convergence is meant in the sense of Hoffman-Jorgensen - see van
der Vaart and Wellner (1996) for more details.
b) Partial weighted sums of the form
Zn(α, h) =1√n
bαnc∑s=1
h(a+ s
T
)Ys
9
are in fact a special case of processes considered in the theorem. This can be seen by writing
Zn(α, h) = 1√n
∑ns=1 hα(a + s
T)Ys with hα(u) = 1(a ≤ u ≤ a + α(b − a))h(u). In other
words, the partial weighted sums are weighted sums indexed a slightly modified class of sets
H = {hα = 1(a ≤ u ≤ a + α(b − a))h(u), α ∈ [0, 1], h ∈ H}. If the class H satisfies the
assumptions on the covering integral from the above theorem, then so does H. The limit
covariance can then be written as C(h1,α), h2,β) = 1b−a
∫ (α∧β2) b
ah1(u)h2(u)S(u) du,
c) Assumptions (9) - (11) are only used for calculating the covariance function of the limit
process.
The main ingredients to the proof of this theorem are presented in the following results, the
proofs of which are given in the appendix. These results are of independent interest.
Theorem 3 Let {Yt, t = 1, . . . , T} satisfy assumptions (i) and (ii) and let H = {h : [a, b]→R} be totally bounded with respect to ‖·‖n. Further assume that there exists a constant C > 0
such that for all k = 1, 2, . . . we have E|εs|k ≤ Ck, and let Fn ={
1n
∑ns=1 Y
2s ≤ M2
}where
M > 0. There exists constants c0, c1, c2 > 0 such that for all η > 0 satisfying
η < 16M√n τ (13)
and
η > c0
( ∫ τ
η8M√n
logN(u,H) d u ∨ τ)
(14)
we have
P[
suph∈H, ‖h‖n≤τ
|Zn(h)| > η, Fn
]≤ c1 exp
{− η
c2 τ
}.
The second result of importance for the proof of Theorem 2 deals with cumulants. For
random variables X1, . . . , Xk we denote by cum(X1, . . . , Xk) their joint cumulant, and if
Xi = X for all i = 1, . . . , k, then cum(X1, . . . , Xk) = cum(X, . . . , X) = cumk(X), the k-th
order cumulant of X.
Lemma 2 Let {Yt, t = 1, . . . , T} have a MA-type representation given in assumption (i)
with coefficients satisfying assumption (ii). For j = 1, 2, . . . let hj be functions defined on
[a, b] with ‖hj‖n <∞. Then there exists a constant 1 ≤ K0 <∞ such that for all k ≥ 1,∣∣ cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ Kk−1
0
∣∣ cumk(ε1)∣∣ k∏j=1
‖hj‖n.
10
If, in addition, ‖hj‖∞ ≤M <∞, j = 1, . . . , k, then for k ≥ 3,
∣∣ cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ (K0M)k−2 n−
k−22 .
The behavior of the cumulants given in the above lemma is needed for the following crucial
exponential inequality.
Lemma 3 Suppose the assumptions of Lemma 2 hold. Let h be a function with ‖h‖n < ∞.
Assume that there exists a constant C > 0 such that for all m = 1, 2, . . . we have
E|εs|m ≤ Cm. Then there exists constants c1, c2 > 0 such that for any η > 0 we have
P[|Zn(h)| > η
]≤ c1 exp
{− η
c2 ‖h‖n
}. (15)
4 An application: Asymptotic distribution of a test for modal-
ity of the variance function
In an accompanying paper, Chandler and Polonik (2010), we propose a test for modal-
ity of the variance function model (1). The test statistic is essentially given by Tn =
supα∈[0,1]
(Gn,γ(α)− αγ
)where
Gn,γ(α) =1
n
daT e+bαnc−1∑t=daT e
1(η 2t ≥ q 2
γ ), α ∈ [0, 1],
where qγ is the empirical quantile of the squared residuals on the interval [a, b] ⊂ [0, 1] under
consideration. Notice that Gn,γ(α) counts the number of large residuals within the first
(100 × α)% of the interval [a, b]. If the variance function is constant, then, since one has
a total of bnγc large residuals, the expected value of n Gn,γ(α) approximately equals αnγ.
This motivates the form of Tn. We assume the interval [a, b] to be fixed. For conducting
the test for modality, using an appropriate interval is crucial, and Chandler and Polonik
(2010) provide an estimator for the interval of interest and show that this estimation does
not influence the asymptotic limit of the test statistic.
Notice that Gn,γ(α) is closely related to the sequential residual empirical process, and as
can be seen from the proof of Theorem 4, weighted empirical processes enter the analysis
11
of Gn,γ(α) through handling the estimation of qγ. Theorem 4 below provides an approxi-
mation of the test statistic Tn by independent (but not necessarily identically distributed)
random variables. This result crucially enters the proofs in Chandler and Polonik (2010). In
particular, it implies that the large sample behavior of the test statistic Tn under the null
hypothesis is not influenced by the in general non-parametric estimation of the parameter
functions, as long as the rate of convergence of these estimators is sufficiently fast. This is
somewhat a surprise, and it is connected to the particular structure of our model.
First we introduce some additional notation. Let fu denote the pdf of σ(u) εt, i.e. fu(z) =1
σ(u)f( z
σ(u)), and
Gn,γ(α) =1
n
daT e+bαnc−1∑t=daT e
1(ε 2t σ
2(tT
)≥ q 2
γ
),
where qγ is defined via
Ψa,b(z) =
∫ 1
0
F ( zσ(a+β(b−a))) dβ −
∫ 1
0
F ( −zσ(a+β(b−a))) dβ, z ≥ 0,
as the solution to the equation
Ψa,b(qγ) = 1− γ. (16)
Notice that this solution is unique since we have assumed F to be strictly monotonic, and if
σ2(u) = σ20 is constant for all u ∈ [a, b], then q2
γ equals the upper γ-quantile of the squared
innovations η2t = σ2
0 ε2t . The following approximation result is not assuming that the variance
is constant, however.
Theorem 4 Let γ ∈ [0, 1] and suppose that 0 ≤ a < b ≤ 1 are non-random. Assume further
that E|ε|k < Ck for some C > 0 and all k = 1, 2, . . . Then, under assumptions (i) - (iv),
with n1/2
m2n logn
= o(1) we have
√n supα∈[0,1]
∣∣ Gn,γ(α)−Gn,γ(α) + c(α)(Gn,γ(1)− EGn,γ(1)
) ∣∣ = op(1), (17)
where c(α) =
∫ a+α(b−a)a
[fu(qγ)) + fu(−qγ)
]du∫ b
a
[fu(qγ) + fu(−qγ)
]du
.
12
Under the null-hypothesis σ(u) ≡ σ0 > 0 for u ∈ [a, b] we have c(α) = α. Moreover, in case
the AR-parameter in model (1) are constant, and√n-consistent estimators are used, then
the moment assumptions on the innovations can be significantly relaxed to E ε 2t <∞.
Under the worst case null-hypothesis, σ2(u) = σ20 for all u ∈ [0, 1], the innovations are iid,
we have c(α) = α, and EGn,γ(α) = αγ. Therefore the above result implies that (γ(1 −γ))−1/2
√n(Gn,γ(α)−αγ) converges weakly to a standard Brownian Bridge. For more details
we refer to Chandler and Polonik (2010).
5 Proofs
5.1 Proof of Lemma 1
We only present a brief outline. In order to simplify the notation we again assume w.l.o.g.
that a = 0. First notice that νn(α, z,g) is a sum of bounded martingale differences, because
with ξz,gs = 1(σ(a+ s
T) εs ≤ g′(a+ s
T)Ys−1 + z
)we have
νn(α, z,g) =1√n
n∑s=1
ξz,gs 1(sT≤ αb
)where ξz,gs = ξz,gs − E(ξz,gs |Fs−1) and Fs = σ(εs, εs−1, . . .) denotes the σ-algebra generated
by {εs, εs−1, . . .}, and obviously also νn(α1, z1,g1) − νn(α2, z2,g2) are sums of martingale
differences. The proof of this lemma is based on the basic chaining device that is well-known
in empirical process theory, utilizing the following exponential inequality for sums of bounded
martingale differences from Freedman (1975).
Lemma. (Freedman 1975) Let Z1, . . . , ZT denote martingale differences with respect to
a filtration {Ft, t = 0, . . . , T − 1} with |Zt| ≤ C for all t = 1, . . . , T. Let further Sn =1√T
∑Tt=1 Zt and Vn = Vn(Sn) = 1
T
∑Tt=1 E(Z2
t | Ft−1). Then we have for all ε, τ 2 > 0 that
P(Sn ≥ ε, Vn ≤ τ 2
)≤ exp
(− ε2
2τ 2 + 2εC√n
). (18)
The form of (18) motivates that we first need to control the quadratic variation Vn. Let
ηα,z,gs = ξz,gs 1(sT≤ αb
).
13
We have for h1 = (α1, z1,g1), h2 = (α2, z2,g2) ∈ H with d(h1, h2
)≤ ε that
Vn =Vn(νn(h1)− νn(h2)) =1
n
n∑s=1
E(|ηα1,z1,g1s − ηα2,z2,g2
s |∣∣Fs−1
)≤ 1
n
n∑s=1
∣∣ 1( sT≤ α1 b
)− 1(sT≤ α2 b
) ∣∣ E(|ξz1,g1s |) +
1
n
n∑s=1
E(|ξz1,g1s − ξz2,g2
s |∣∣Fs−1)
≤ 1
n
∣∣n (α2 − α1) + 1∣∣+
1
n
n∑s=1
E(|ξz1,g1s − ξz2,g2
s |∣∣Fs−1)
≤ |α1 − α2|+1
n+
1
n
n∑s=1
E(|ξz1,g1s − ξz2,g2
s |∣∣Fs−1).
On Fn we have for the last sum that
1
n
n∑s=1
E(|ξz1,g1s − ξz2,g2
s |∣∣Fs−1)
≤ 1
n
n∑s=1
∣∣Fa+ sT
(z1 + (g1(
sT
))′Ys−1
)− F s
T
(z2 + (g1(
sT
))′Ys−1
)∣∣+
1
n
n∑s=1
∣∣F sT
(z2 + (g1(
sT
))′Ys−1
)− F s
T
(z2 + (g2(
sT
))′Ys−1
)
≤ supu,x
fu(x) |z1 − z2|+ supu,x
fu(x)1
n
n∑s=1
∣∣ ((g1 − g2)(sT
))′Ys−1|
≤ ‖f‖∞m∗
|z1 − z2|+‖f‖∞m∗
p∑k=1
‖g1k − g2k‖n
√√√√ 1
n
n∑s=−p
Y 2s ≤ (1 + C0)
‖f‖∞m∗
ε. (19)
Thus, on Fn we have for h1 = (α1, z1,g1), h2 = (α2, z2,g2) ∈ H with d(h1, h2
)≤ ε and ε ≥ 1
n
that
Vn(νn(h1)− νn(h2)) ≤1
n
n∑s=1
E(∣∣ηh1
s − ηh2
s
∣∣Fs−1
)2 ≤ K∗ ε. (20)
This control of the quadratic variation in conjunction with Freedman’s exponential bound
for martingales, now enables us to apply the chaining argument in a way similar to the proof
of Theorem 5.11 in van de Geer (2000). Details are omitted.
14
5.2 Proof of Theorem 1
We will utilize Lemma 1. For η > 0 we have
P ( supα∈[0,1],z∈[−L,L],g∈Gp
|νn(α, z,g)− νn(α, z,0)| ≥ η)
≤ P((
Fn
)c )+ P
(sup
d(h1,h2)≤Cεm−1n
|νn(h1)− νn(h2)| ≥ η,Fn
),
where Fn = { 1n
∑ns=1−p Y
2s ≤ C0 } for some C0 > 0. We will see below that 1
n
∑ns=1−p Y
2s =
OP (1) as n→∞. Therefore, for any given ε > 0 we can choose C0 such that P( (
Fn
)c ) ≤ ε
for n large enough. An application of Lemma 1 now gives the assertion. Similar arguments as
those leading to (19) show that∫ 1cn
√logNB(u2,G) du <∞ implies
∫ 1cn
√logDB(u2,H) du <
∞ (see also (40)). It remains to show that in fact 1n
∑ns=1−p Y
2s = OP (1). To this end we
show that
E[ 1
n
n∑s=1
Y 2s
]<∞, (21)
Var[ 1
n
n∑s=1
Y 2s
]= o(1). (22)
These two facts can be seen by direct calculations as demonstrated now. First we consider
(21). We have
EY 2s = E
[ s∑j=−∞
as,T (s− j)εj]2
= Es∑
j=−∞
s∑k=−∞
as,T (s− j)as,T (s− k)εjεk
=s∑
j=−∞
a2s(s− j) ≤
∞∑j=0
( K
`(j)
)2
< C <∞, (23)
15
for some C > 0, where we were using assumption (ii). This implies (21). Next we indicate
(22). Straightforward calculations show that
E(Y 2t Y
2s )− E(Y 2
t ) E(Y 2s )
=t∑
i=−∞
t∑j=−∞
s∑k=−∞
s∑`=−∞
at,T (t− i)at,T (t− j)as,T (s− k)as,T (s− `)E(εi εj εk ε`)
−t∑
i=−∞
a2t,T (t− i)
s∑k=−∞
a2s,T (s− k)
= (Eε4t − 2) A(t, s) + 2B(t, s),
where
A(t, s) =∞∑i=0
a2t,T (i)a2
s,T (i+ |t− s|),
B(t, s) =( ∞∑i=0
at,T (i)as,T (i+ |t− s|))2
.
We obtain that for some C∗ <∞,
A(t, s) ≤ K4
∞∑i=0
1
`2(i)
1
`2(i+ |t− s|)≤ K4
∞∑i=0
1
`(i)
1
`(i+ |t− s|)
≤ K4 1
`(|t− s|)
∞∑i=0
1
`(i)< C∗
1
`(|t− s|),
and similarly
|B(t, s)| ≤[K2
∞∑i=0
1
`(i)
1
`(i+ |t− s|)
]2≤ C∗
1
`(|t− s|). (24)
Now we have
∣∣∣ 1
n2
n∑s=1
n∑t=1
A(t, s)∣∣∣ ≤ C∗
n2
n∑s=1
n∑t=1
1
`(|t− s|)≤ C∗
n
n−1∑k=0
2(n− k)
n
1
`(k)
≤ 2C∗
n
n−1∑k=0
1
`(k)= O
( 1
n
)as n→∞.
This implis that Var[
1n
∑ns=1 Y
2s
]= O(1/n) which is (22).
16
5.3 Proof of Theorem 2
Assume w.l.o.g. that a = 0. Showing weak convergence of the process Zn(h) means proving
asymptotic tightness and convergence of the finite dimensional distribution (e.g. see van der
Vaart and Wellner, 1996). Tightness follows from Theorem 3.
It remains to show convergence of the finite dimensional distributions. To this end we will
utilize the Cramer-Wold device in conjunction with the method of cumulants. It follows from
Lemma 2, that all the cumulants of Zn(h) of order k ≥ 3 converge to zero as n→∞. Using
the linearity of the cumulants, the same holds for any linear combination of Zn(hi), i =
1, . . . , K. The mean of all the Zn(h) equals zero. It remains to show convergence of the
covariances cov(Zn(h1), Zn(h2)). The range of the summation indices below are such that
the indices of the Y -variables are between 1 and n. For ease of notation we achieve this by
formally setting hi(u) = 0 for u ≤ 0 and u > b, i = 1, 2. We have
cov(Zn(h1), Zn(h2)) =1
n
n∑s1=1
n∑s2=1
h1
(s1T
)· h2
(s2T
)cov(Ys1 , Ys2)
=1
n
n∑s1=1
s1−1∑k=s1−n
h1
(s1T
)· h2
(s1−kT
)cov(Ys1 , Ys1−k)
=1
n
n∑s1=1
∑|k|≤√n
h1
(s1T
)· h2
(s1−kT
)cov(Ys1 , Ys1−k) + R1n (25)
where for n sufficiently large
|R1n| ≤1
n
n∑s1=1
∑|k|>√n
∣∣h1
(s1T
)· h2
(s1−kT
) ∣∣ ∣∣ cov(Ys1 , Ys1−k)∣∣.
From Proposition 5.4 of Dahlhaus and Polonik (2009) we obtain that sups |cov(Ys, Ys−k)| ≤K`(k)
for some constant K. Since both h1 and h2 are bounded and∑∞
k=−∞1`(k)
< ∞, we can
conclude that R1n = o(1). The main term in (25) can be approximated as
1
n
n∑s1=1
∑|k|≤√n
h1
(s1T
)· h2
(s1−kT
)c(s1T, k)
+R2n (26)
where
|R2,n| ≤1
n
n∑s1=1
∑|k|≤√n
∣∣h1
(s1T
)· h2
(s1−kT
) ∣∣ ∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣.
17
Proposition 5.4 of Dahlhaus and Polonik (2009) also says that for |k| ≤√n we have∑n
s1=0
∣∣ cov(Ys1 , Ys1−k) − c(s1T, k) ∣∣ ≤ K
(1 + |k|
n
)for some K > 0. Applying this result
we obtain that
|R2,n| ≤1
n
n∑s1=1
√n∑
k=−√n
∣∣h1
(s1T
)· h2
(s1−kT
) ∣∣ ∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣
≤ K11
n
√n∑
k=−√n
n∑s1=1
∣∣ cov(Ys1 , Ys1−k)− c(s1T, k) ∣∣ ≤ K1
1
n
√n∑
k=−√n
(1 + |k1|
`(k1)
)= o(1)
as n→∞. Next we replace h2(s1−kT
) in the main term of (26) by h2(s1T
). The approximation
error can be bounded by
1
n
n∑s1=1
∑|k|≤√n
∣∣h1
(s1T
)∣∣ · ∣∣h2
(s1−kT
)− h2
(s1T
) ∣∣ K`(k)
= o(1).
Here we are using the fact that supu |c(u, k)| ≤ K`(k)
(see Proposition 5.4 in Dahlhaus and
Polonik, 2009) together with the assumed (uniform) continuity of h2, the boundedness of h1
and the boundedness of∑∞
k=−∞1`(k)
. We have seen that
cov(Zn(h1), Zn(h2) ) =1
n
bα1nc∑s1=1
h1
(s1T
)· h2
(s1T
) ∑k≤√n
c(s1T, k)
+ o(1).
Since S(u) =∑∞
k=−∞ c(s1T, k)<∞ we also have
cov(Zn(h1), Zn(h2) ) =∞∑
k=−∞
1
n
n∑s1=1
h1
(s1T
)· h2
(s1T
)c(s1T, k)
+ o(1).
Finally, we utilize the fact that TV (c(·, k)) ≤ K`(k)
which is another result from Proposition 5.4
of Dahlhaus and Polonik (2009). This result, together with the assumed bounded variation
of both h1 and h2 allows us to replace the average over s1 by the integral. Recalling that
n = bbT e − daT e+ 1 = (b− a)T +O( 1T
) gives the assertion.
18
5.4 Proof of Lemma 2
For ease of notation we assume w.l.o.g. that a = 0. By utilizing multilinearity of cumulants
we obtain that
cum(Zn(h1), . . . , Zn(hk)) = n−k2
n∑s1,s2,...,sk=1
h1
(s1T
)· · ·hk
(skT
)cum(Ys1 , . . . , Ysk).
In order to estimate cum(Ys1 , . . . , Ysk) we utilize the special structure Ys. Since the εj are
independent we again obtain by using multilinearity of the cumulants together with the fact
that cum(εj1 , . . . , εjk) = 0 unless all the j`, ` = 1, . . . , k are equal, that
cum(Ys1 , . . . , Ysk) =
min{s1,...,sk}∑j=0
as1,T (s1 − j) · · · ask,T (sk − j) cum(εj, . . . , εj)
= cumk(ε1)
min{s1,...,sk}∑j=0
as1,T (s1 − j) · · · ask,T (sk − j). (27)
Thus∣∣cum(Ys1 , . . . , Ysk)
∣∣ ≤ ∣∣cumk(ε1)∣∣ ∑∞
0
∏ki=1
K`(si−j) , and consequently,
∣∣cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ n−
k2
∣∣cumk(ε1)∣∣ ∞∑j=−∞
k∏i=1
[ n∑si=0
∣∣hi( siT )∣∣ K
`(si − j)
]= n−
k2
∣∣cumk(ε1)∣∣ ∞∑j=−∞
2∏i=1
[ n∑si=0
∣∣hi( siT )∣∣ K
`(si − j)
]×
k∏i=3
[ n∑si=0
∣∣hi( siT )∣∣ K
`(si − j)
].
Utilizing Cauchy-Schwarz inequality we have for the last product
k∏i=3
[ n∑si=0
∣∣hi( siT )∣∣ 1
`(si − j)
]≤
k∏i=3
√√√√ n∑si=0
hi(siT
)2 √√√√ n∑si=0
( K
`(si − j)
)2
19
≤ nk2−1
k∏i=3
‖hi‖n
√√√√ ∞∑s=−∞
( K
`(s)
)2
≤ Kk−20 n
k2−1
k∏i=3
‖hi‖n. (28)
where we used the fact that
√∑∞s=−∞
(K`(s)
)2
≤∑∞
s=−∞K`(s)≤ K0 for some K0 <∞. Notice
that the bound (28) does not depend on the index j anymore, so that
∣∣cum(Zn(h1), . . . , Zn(hk))∣∣ ≤ Kk−2
0 n−1
k∏i=3
‖hi‖n∣∣cumk(ε1)
∣∣ ∞∑j=−∞
2∏i=1
[ n∑si=0
∣∣hi( siT )∣∣ K
`(si − j)
].
As for the last sum, we have by again using the fact that∑∞
j=−∞K
`(s1−j)K
`(s2−j) ≤K∗
`(s1−s2)(cf.
(24)) for some K∗ > 0, and the Cauchy-Schwarz inequality that
∞∑j=−∞
2∏i=1
[ n∑si=0
∣∣hi( siT )∣∣ K
`(si − j)
]=
n∑s1=0
n∑s2=0
h1
(s1T
)h2
(s2T
) ∞∑j=−∞
K
`(s1 − j)K
`(s2 − j)
≤n∑
s1=0
n∑s2=0
h1
(s1T
)h2
(s2T
) K∗
`(s1 − s2)
≤ K∗
√√√√ n∑s1=0
n∑s2=0
h1
(s1T
)2 1
`(s1 − s2)
√√√√ n∑s1=0
n∑s2=0
h1
(s2T
)2 1
`(s1 − s2)
= K∗
√√√√ n∑s1=0
h1
(s1T
)2 n∑s2=0
1
`(s1 − s2)
√√√√ n∑s1=0
h1
(s2T
)2 n∑s2=0
1
`(s1 − s2)
≤ K0 n ‖h1‖n ‖h2‖n.
This completes the proof of the first part of the lemma. The second part follows similar to
the above by observing that if ‖hi‖∞ < M for all i = 1, . . . , k, then, instead of the estimate
(28), we have
20
k∏i=3
[ n∑si=0
∣∣hi( siT )∣∣ 1
`(si − j)
]≤Mk−2
k∏i=3
n∑si=0
1
`(si − j)≤ (MK0)
k−2
with K0 =∑∞
s=−∞1`(s).
5.5 Proof of Lemma 3
First observe that since |cumk(εs)| ≤ E(|εs|k) +∑k−1
j=1
(k−1j−1
)E|εs|k−j|cumj(εs)| it is straight-
forward to see that assumption E|εs|k ≤ Ck implies that |cumk(εs)| ≤ 3k Ck. It follows by
utilizing Lemma 2 that
ΨZn(h)(t) = log Eet Zn(h) =1
K0
∞∑k=1
tk
k!cumk(Zn(h))
≤ 1
K0
∞∑k=1
tk
k!(3CK0 ‖hj‖n)k = K−1
0
(e3tCK0‖h‖n − 1
).
We obtain for any t > 0
P[|Zn(h)| > η
]≤ 2 e−tη E
(eZn(h)
)= 2 exp{−tη} exp{ΨZn(h)(t)}
≤ 2 exp{−tη +K−10
(e3tC‖h‖n − 1
)}.
Choosing t = 13C‖h‖n gives the assertion:
P[|Zn(h)| > η
]≤ 2 exp
{− η
3C‖h‖n
}eK−10 (e−1).
5.6 Proof of Theorem 3
Using the exponential inequality from Lemma 3 we can mimic the proof of Lemma 3.2
from van de Geer (2000). As compared to van de Geer, our exponential bound is of the form
c1 exp{−c2 η‖h‖n} rather than c1 exp{−c2
(η‖h‖n
)2}. It is well-known that this type of inequality
leads to the covering integral being the integral of the metric entropy rather than the square
root of the metric entropy. (See for instance Theorem 2.2.4 in van der Vaart and Wellner,
1996.) This indicates the necessary modifications to the proof in van de Geer. Details are
omitted. Condition (13) just makes sure that the upper limit in the integral from (14) is
larger than the lower limit.
21
5.7 Proof of Theorem 4
First we indicate how the two processes studied in this paper enter the picture, and simul-
taneously we provide an outline of the proof.
Recall that we have relabeled the observations Yt inside the rescaled interval [a, b] ⊂ [0, 1],
i.e. aT ≤ t ≤ bT to Ys, s = 1, . . . , n. Also, Ys−k with k ≥ s denotes YdaT e−(k−s). Let
Hn(α, z) =1
n
bαnc∑s=1
1(ηs ≤ z) and Fn(α, z) =1
n
bαnc∑s=1
1(σ(a+ sT
) εs ≤ z).
With this notation
√n ( Gn,γ(α)−Gn,γ(α) )
=[√
n(Fn(α, qγ)−Hn(α, qγ)
)−√n(Fn(α,−qγ)−Hn(α,−qγ)
) ]−[√n(Fn(α, qγ)− Fn(α, qγ)
)−√n(Fn(α,−qγ)− Fn(α,−qγ)
) ]=: In(α) − IIn(α) (29)
with In(α) and IIn(α), respectively, denoting the two quantities inside the two [·]-brackets.
The assertion of Theorem 4 follows from
supα∈[0,1]
|In(α)| = oP (1) and (30)
supα∈[0,1]
|IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1))| = oP (1). (31)
Property (31) can be shown by using empirical process theory based on independent, but not
identically distributed random variables. Verification of (30) involves both residual empirical
processes and weighted sum processes. To see the latter, observe that for each z ∈ R
Hn(α, z ) =1
n
bαnc∑s=1
1(ηs ≤ z
)=
1
n
bαnc∑s=1
1(σ(a+ s
T) εs ≤ σ(a+ s
T) εs − ηs + z
)=
1
n
bαnc∑s=1
1(σ(a+ s
T) εs ≤
((θ − θ)(a+ s
T))′Ys−1 + z
).
Recall that by assumption θn(·) − θ(·) ∈ Gp, where for g = (g1, . . . , gp)′ : [a, b]p → R we
22
write {g ∈ Gp} for {gi ∈ G, i = 1, . . . , p}. We also write
Fn(α, z,g) =1
n
bαnc∑s=1
1{σ( t
T) εt ≤ g′( t
T) Yt + z
}.
By taking into account that by assumption θ − θ = OP (m−1n ) the above implies that with
probability tending to one we have for any C > 0 that
supα∈[0,1], z∈[−L,L]
√n∣∣∣Fn(α, z)− Hn(α, z)
∣∣∣ ≤ supα∈[0,1], z∈[−L,L],
g∈Gp, ‖g‖n≤Cm−1n
√n∣∣∣ Fn(α, z)− Fn(α, z,g)
∣∣∣. (32)
Thus, if L is such that qγ ∈ (−L,L) and P (qγ ∈ [−L,L]) → 1, as n → ∞, then the right
hand side in (32) being oP (1) implies that supα |In(α)| = oP (1). In order to control the right
hand side in (32), an appropriate centering is needed. Let Fu(z) = F(
zσ(u)
)denote the cdf
corresponding to the pdf fu(z), defined above Theorem 4. Let
En(α, z,g) :=1
n
bαnc∑s=1
E[1{σ( t
T) εt ≤ g′( t
T) Yt + z
} ∣∣ εs−1, εs−2, . . .]
=1
n
bαnc∑s=1
Fa+ sT
( (g(a+ s
T))′Ys−1 + z
)(33)
and let 0 = (0, . . . , 0)′ ∈ Gp denote the p-vector of null-functions. Write
Fn(α, z)− Fn(α, z,g) =[(Fn(α, z,0)− En(α, z,0)
)−(Fn(α, z,g)− En(α, z,g)
) ]+[En(α, z,0)− En(α, z,g)
]=: Tn1(α, z,g) + Tn2(α, z,g) (34)
with Tn1 and Tn2 denoting the two terms inside the two [ ] brackets. This centering makes
T1n a sum of martingale differences (cf. proof of Lemma 1 given above). The quantity
νn(α, z,g)) =√n(Fn(α, z,g) − En(α, z,g)
)is the residual empirical process discussed in
section 2. As for Tn2(α, z,g) notice that we have
√nTn2(α, z,g) =
1√n
dnαe∑s=1
[Fa+ s
T
(z +
(g(a+ s
T
))′Ys−1
)− Fa+ s
T(z)]
23
=1√n
dnαe∑s=1
fa+ sT
(z)(g(a+ s
T
))′Ys−1 +
1√n
dnαe∑s=1
(fa+ sT
(ζs)− fa+ sT
(z))(g(a+ s
T
))′Ys−1 (35)
with ζs between z and z +(g(a + s
T
))′Ys−1. The second term in (35) is a remainder term,
while the first term is a weighted sum of the Ys’s.
The above motivates that (30) can be verified by appropriate control of both the residual
empirical process and a weighted sum process.
After this outline we now present the missing details of the proof. Without loss of generality
we assume that aT > p so that the residuals are all well defined, and we assume a = 0.
Choose L large enough such that qγ ∈ (−L,L). As outlined above, it suffices to show that
both
supα∈[0,1]
|In(α)| = oP (1) and supα∈[0,1]
|IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1))| = oP (1).
Proof of supα∈[0,1] |In(α)| = oP (1). It follows from (32) and (34) that together
supα∈[0,1],z∈[−L,L],g∈Gp
|√nTn1(α, z,g)| = oP (1) and (36)
supα∈[0,1],z∈[−L,L],g∈Gp
∣∣∣√nTn2(α, z,g)∣∣∣ = oP (1) (37)
imply the desired result, provided we can show that P (qγ /∈ [−L,L]) = o(1) as n → ∞.
Since qγ ∈ (−L,L), the desired property follows from consistency of qγ as an estimator for
qγ. This will be shown at the end of this proof.
Notice that by assumption (iv), for any given ε > 0 and n large enough there exists a constant
Cε > 0 such that with
An(ε) :={‖θn,k(u)− θk(u)‖n ≥ Cεm
−1n for all k = 1, . . . , p
}we have
P (An(ε)) ≤ ε.
Thus, on A{n(ε) we can assume that ‖g‖n ≤ Cεm
−1n , which means that on A{
n(ε) we can
24
assume d((α, z,g), (α, z,0)) ≤ Cεm−1n . Thus, on A{
n(ε)
supα∈[0,1],z∈[−L,L],g∈Gp
|νn(α, z,g)− νn(α, z,0)| ≤ supd(h1,h2)≤Cεm−1
n
|νn(h1)− νn(h2)|.
It is shown in the proof of Theorem 1 that 1n
∑ns=−p Y
2s = OP (1), and thus P (Fn) = o(1).
By definition of Tn2(α, z,g) it remains to show that and for η > 0 we have
P(
supd(h1,h2)≤Cεm−1
n
|νn(h1)− νn(h2)| ≥ η,Fn
)= o(1). (38)
This, however, is an immediate application of Theorem 1, and (36) is verified. Now we
consider (37). Again we assume that we are on A{n(ε). We have already seen in (35) that
√nTn2(α, z,g)
=1√n
dnαe∑s=1
fa+ sT
(z)(g(sT
))′Ys−1 +
1√n
dnαe∑s=1
(fa+ sT
(ζs)− fa+ sT
(z))(g(sT
))′Ys−1 (39)
with ζs between z and z+(g(sT
))′Ys−1. The second term in (39) will be treated below. The
first sum on the right hand side can be written as∑p
k=1 Zk,n(h) where
Zk,n(h) =1√n
n∑s=1
h( sT
)Ys−k, k = 1, . . . , p, (40)
for h ∈ H with H = {hα,z,g(u) = 1α(u) fu(z)g(u), α ∈ [0, 1], z ∈ R, g ∈ G} where we use
the shorthand notation 1α(u) = 1(u ≤ α b
). We will apply Theorem 3 to show that each
Zk,n(h) tends to zero uniformly in h ∈ H. Since by our assumptions the functions {fu(z), z ∈[−L,L]} are uniformly bounded, we have supα,z,g ‖hα,z,g‖2n ≤ supu,z |fu(z)|m−1
n = o(1). Since
EZn(hα,z,g) = 0, Theorem 3 implies the result if we have shown that H has a finite covering
integral.
The finiteness of the covering integral of H with respect to ‖ · ‖n follows by standard argu-
ments. In fact, it is not difficult to see that for some C0 > 0
logN(C1δ,H) ≤ −C0 log ε+ logN(ε,G). (41)
Our assumptions now allow an application of Theorem 3, showing that the first term of (39)
converges to zero in probability uniformly in (α, z, g).
25
Now we treat the second term in (39). Recall that our assumptions imply that the functions
{fu, u ∈ [0, 1]} are uniformly Lipschitz continuous with Lipschitz constant c, say. Therefore
we can estimate the last term in (39) by
c√n
n∑s=1
∣∣ (g( sT
))′Ys−1
∣∣2 ≤ c√n
n∑s=1
( p∑k=1
g2k
(sT
) p∑j=0
Y 2s−j
)≤ c p sup
−p≤t≤TY 2t
√n
p∑k=1
‖gk‖2n = c p
√n
m2n
OP (log n) = oP (1).
where the last inequality uses the fact that sup−p≤t≤T Y2t = OP (log n), which follows as in
the proof of Lemma 5.9 of Dahlhaus and Polonik (2009).
Proof of supα∈[0,1] | IIn(α)− c(α)(Gn,γ(1)− EGn,γ(1)) | = oP (1). Define
F n(α, z) := EFn(α, z) =1
n
bαnc∑s=1
Fa+ sT
(z).
(Recall that Fu(z) = F ( zσ(u)
).) We can write
IIn(α) =√n((Fn − F n)(α, qγ)− (Fn − F n)(α, qγ)
)(42)
+√n((Fn − F n)(α,−qγ) − (Fn − F n)(α,−qγ)
)(43)
+√n (F n(α, qγ)− F n(α, qγ)) −
√n (F n(α,−qγ)− F n(α,−qγ)) (44)
The process νn(α, z,0) =√n (Fn − F n)(α, z) is a sequential empirical process, or a Kiefer-
Muller process, based on independent, but not necessarily identically distributed random
variables. This process is asymptotically stochastically equicontinuous, uniformly in α with
respect to dn(v, w) =∣∣F n(1, v)− F 1,n(1, w)
∣∣, i.e. for every η > 0 there exists an ε > 0 with
lim supn→∞
P[
supα∈[0,1], dn(z1,z2)≤ε
|νn(α, z1,0)− νn(α, z2,0)| > η]
= 0. (45)
In fact, with dn(α1, z1), (α2, z2)
)= |α1 − α2|+ dn(z1, z2) we have
supα∈[0,1]
supz1,z2,∈R, dn(z1,z2)≤ε
|νn(α, z1,0)− νn(α, z2,0)|
≤ supα1,α2∈[0,1], z1,z2∈Rdn( (α,z1), (α,z2) )≤ε
|νn(α1, z1,0)− νn(α2, z2,0)|.
26
Thus, (45) follows from asymptotic stochastic dn-equicontinuity of νn(α, z,0). This in turn
follows from a proof similar to, but simpler than, the proof of Lemma 1. In fact, it can be
seen from (19) that for g1 = g2 = 0 we simply can use the metric d(
(α1, z1), (α2, z2))
=
|α1−α2|+dn(z1, z2) in the estimation of the quadratic variation, which in the simple case of
g1 = g2 = 0 amounts to the estimation of the variance, because the randomness only comes
in through the εs. With this modification the proof of the dn-equicontinuity of νn(α, z1,0)
follows the proof of Lemma 1.
Thus, if qγ is consistent for qγ with respect to dn, then it follows that both (42) and (43) are
oP (1). We now prove this consistency of qγ.
First observe that 1b
∫ b0F ( qγ
σ(v)) dv =
∫ 1
0F ( qγ
σ(b u)) du is close to F n(1, qγ). In fact, since by
assumption u → F ( qγσ(u)
) is of bounded variation, the difference is of the order O(1/n).
Consequently we have
∣∣ (1− γ)−(F n(1, qγ)− F n(1,−qγ)
) ∣∣=∣∣∣Ψ0,b(1, qγ)−
(F n(1, qγ)− F n(1,−qγ)
) ∣∣∣ ≤ c n−1 (46)
for some c ≥ 0. In fact (46) holds uniformly in γ. This follows from the fact that the functions
u→ Fu(
zσ(u)
)are Lipschitz continuous uniformly in z. (Notice that in the case where σ(·) ≡
σ0 is constant on [a, b] then Ψ0,b(z) =∫ 1
0F ( z
σ(b u)) du −
∫ 1
0F ( −z
σ(b u)) du = F ( z
σ0) − F (−z
σ0) =
F n(1, z)− F n(1,−z). In other words, in this case we can choose c = 0.) We now show that
under our assumptions we have for any fixed 0 < γ < 1 that
dn(qγ, qγ) =∣∣F n(1, qγ)− F n(1, qγ)
∣∣ = oP (1). (47)
By assumption Ψ0,b(z) is a strictly monotonic function in z. Together with (46) this implies
that (47) is equivalent to∣∣∣ (F n(1, qγ) − F n(1,−qγ)
)−(F n(1, qγ) − F n(1,−qγ)
) ∣∣∣ = oP (1),
which (by using (46)) follows from∣∣∣ (F n(1, qγ)− F n(1,−qγ))−(
1− γ) ∣∣∣ = oP (1), (48)
Since by definition of qγ we have Hn(1, qγ)− Hn(1,−qγ) = 1− γ, (48) follows from
supz|Hn(1, z)− F n(1, z)| = oP (1). (49)
27
To see (49) notice that supz |Hn(1, z)− Fn(1, z)| = oP (1). This follows from (32), (34), (36)
and (37). Utilizing triangular inequality it remains to show that supz |Fn(1, z)−F n(1, z)| =oP (1). This uniform law of large numbers result follows from the arguments given below. It
can also be seen directly by observing that for every fixed z we have |Fn(1, z)− F n(1, z)| =oP (1) (which easily follows by observing that the variance of this quantity tends to zero as
n→∞) together with a standard argument as in the proof of the classical Glivenko-Cantelli
theorem for continuous random variables, utilizing monotonicity of F n(1, z) and Fn(1, z).
This completes the proof of (47), and as outlined above, this implies that both (42) and (43)
are oP (1), uniformly in α.
It remains to consider the quantity in (44). First, we derive an upper bound for qγ. Let
Bn(ε) = {|F n(qγ) − F n(qγ)| < ε; supz |(Hn − F n)(1, z)| < δε/6} where δε > 0 is such that
|F n(qγ±δε)−F n(qγ)| < ε. The above shows that P (Bn(ε))→ 1 as n→∞ for any ε > 0. Now
choose ε > 0 small enough (such that γ > δε/6 in order for the below to be well defined).
On Bn(ε) we have
qγ = inf{z ≥ 0 : Hn(1, z)− Hn(1,−z) ≥ 1− γ; |F n(z)− F n(qγ)| < ε}
≤ inf{z ≥ 0 : F n(1, z)− F n(1,−z) ≥ 1− γ −[
(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]
+ 2 sup|Fn(v)−Fn(w)|≤ε
|(Hn − F n)(1, v)− |(Hn − F n)(1, w)|}
≤ inf{z ≥ 0 : Ψ0,b(z) ≥ 1− γ −[
(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]
+ 2 sup|Fn(v)−Fn(w)|≤ε
|(Hn − F n)(1, v)− |(Hn − F n)(1, w)|+ c n−1}
= Ψ−10,b(1− γ −Qn + rn),
where for short rn = 2 sup|Fn(v)−Fn(w)|≤ε |(Hn − F n)(1, v)− |(Hn − F n)(1, w)|+ cn−1 with c
from (46), and Qn =[
(Hn − F n)(1, qγ)− (Hn − F n)(1,−qγ)]. Now let
Ψ0,b(α, z) =
∫ α
0
F ( zσ(b u))
) du−∫ α
0
F ( −zσ(b u))
) du, z ≥ 0, α ∈ [0, 1]. (50)
Observe that Ψ0,b(z) = Ψ0,b(1, z). For ease of notation we omit the subscripts on Ψ in what
follows. Since by definition of qγ we have qγ = Ψ−1(1 − γ), and since Ψ(α, z) is strictly
increasing in z for any α we have on Bn(ε),
[F n(α, qγ)− F n(α, qγ)
]−[F n(α,−qγ)− F n(α,−qγ)
]28
= Ψ(α, qγ)−Ψ(α, qγ) + c n−1
≤ Ψ(α,Ψ−1
(1− γ −Qn + rn
) )−Ψ
(α,Ψ−1(1− γ)
)+ c n−1
= −∂∂z
Ψ(α,Ψ−1(ξ+n ))
Ψ′(Ψ−1(ξ+n ))
(Qn − rn) + c n−1
= −∫ bα
0
[fu(Ψ
−1(ξ+n )) + fu(−Ψ−1(ξ+
n ))]du∫ b
0
[fu(Ψ−1(ξ+
n )) + fu(−Ψ−1(ξ+n ))]du
(Qn − rn) + c n−1
=: − (c(α) + oP (1)) (Qn − rn) + c n−1
with ξ+n ∈ [1−γ, 1−γ− Qn+rn], fu(z) = ∂
∂zFu(z) = 1
σ(u)f(
zσ(u)
), and the oP (1)-term equals
cn(α)− c(α) with cn(α) the ratio from the second to last line in the above formula, and
c(α) =
∫ b α0
[fu(qγ)) + fu(−qγ)
]du∫ b
0
[fu(qγ) + fu(−qγ)
]du
.
The fact that |cn(α)− c(α)| = oP (1) follows from our smoothness assumptions together with
the fact that |Qn − rn| = oP (1). Similarly, we obtain a lower bound of the form
[F n(α, qγ)− F n(α, qγ)
]−[F n(α,−qγ)− F n(α,−qγ)
]≥ −(c(α) + oP (1)) (Qn + rn)− c n−1.
From the above we know that√nQn =
√n (Fn−F n)(qγ)−
√n (Fn−F n)(−qγ) + oP (1) and
√n rn = oP (1), so that
√n[F n(α, qγ)− F n(α, qγ)
]−[F n(α,−qγ)− F n(α,−qγ)
]= −c(α)√
n
n∑s=1
(1(σ2
(sT
)ε2s ≤ q2
γ)− P (σ2(sT
)ε2s ≤ q2
γ))
+ oP (1)
= − c(α)√n (Gn,γ(1)− EGn,γ(1) ) + oP (1).
Finally, observe that under H0, where σ(u) ≡ σ0, we have c(α) = α, because fu does not
depend on u ∈ [a, b]. The proof is complete once we have shown that |qγ − qγ| = oP (1)
as n → ∞ (cf. comment right after (37)). To see that, first notice that our regularity
conditions imply that infu fu(qγ) =: d∗ > 0. Again using the fact that our assumptions
imply the uniform Lipschitz continuity of the functions z → fu(z) we obtain the existence
29
of an ε0 > 0 such that fu(z) > d∗2
for all |z − qγ| ≤ ε0 and all u ∈ [0, 1]. Consequently, if
|qγ−qγ| ≤ ε0 then∣∣F s
T(qγ)−F s
T(qγ)
∣∣ =∣∣∣ ∫ bqγ
qγf sT
(u) du∣∣∣ ≥ d∗
2|qγ−qγ| ∀ s = 0, 1, . . . , n.On the
other hand, if |qγ−qγ| > ε0 then, because the fu(z) are all non-negative,∣∣F s
T(qγ)−F s
T(qγ)
∣∣ =∣∣∣ ∫ bqγqγf sT
(u) du∣∣∣ ≥ d∗
2ε0 ∀ s = 0, 1, . . . , n. Thus, for any ε > 0 there exists an η > 0 with
{|qγ − qγ| > ε} ⊂{∣∣F s
T(qγ)− F s
T(qγ)
∣∣ > η ∀ s = 0, 1, . . . , n}. (51)
Since all the function z → F sT
(z) are strictly monotone, we have
dn(qγ, qγ) =∣∣F n(qγ)− F n(qγ)
∣∣ =1
n
n∑s=1
∣∣F sT
(qγ)− F sT
(qγ))∣∣.
Consequently, (51) implies that if |qγ − qγ| is not oP (1), then also dn(qγ, qγ) is not oP (1).
This is a contradiction to (47) and this completes our proof.
6 References:
AKRITAS, M.G. and VAN KEILEGOM, I. (2001): Non-paramteric estimation of the resid-
ual distribution. Scand. J. Statist. 28, 549-567.
ALEXANDER, K.S. and PYKE, R. (1986): A uniform central limit theorem for set-indexed
partial-sum processes with finite variance. Ann. Probab. 14, 582-597.
CHANDLER, G. and POLONIK, W. (2006). Discrimination of locally stationary time series
based on the excess mass functional. J. Amer. Statist. Assoc. 101, 240-253.
CHANDLER, G. and POLONIK, W. (2010): A test of modality for the variance function
in time-varying autoregression. Submitted.
CHENG, F. (2005). Asymptotic distributions of error density and distribution function es-
timators in nonparametric regression. J. Statist. Plann. Inference 128, 327349.
DAHLHAUS, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist.
25, 1-37.
DAHLHAUS, R., NEUMANN, M. and V. SACHS, R. (1999). Nonlinear wavelet estimation
of time-varying autoregressive processes. Bernoulli, 5, 873-906.
DAHLHAUS, R. and POLONIK, W. (2006). Nonparametric quasi-maximum likelihood es-
timation for gaussian locally stationary processes. Ann. Statist. 34, 2790 - 2824.
30
DAHLHAUS, R. and POLONIK, W. (2009). Empirical spectral processes for locally sta-
tionary time series Bernoulli 15, 1 - 39.
DREES, H. and STARICA, C. (2002). A simple non-stationary model for stock returns.
Preprint.
FRYZLEWICZ, P., SAPATINAS, T., and SUBBA RAO, S. (2006). A Haar-Fisz technique
for locally stationary volatility estimation. Biometrika 93, 687-704.
EOM, K.B. (1999). Analysis of acoustic signatures from moving vehicles using time-varying
autoregressive models, Multidimensional Systems and Signal Processing 10, 357-378.
GIRAULT, J-M., OSSANT, F. OUAHABI, A., KOUAME, D. and PATAT, F. (1998). Time-
varying autoregressive spectral estimation for ultrasound attenuation in tissue characteriza-
tion. IEEE Trans. Ultrasonic, Ferroeletric, and Frequency Control 45, 650 - 659.
GRENIER, Y. (1983): Time-dependent ARMA modeling of nonstationary Signals, IEEE
Trans. on Acoustics, Speech, and Signal Processing 31 , 899911.
HALL, M.G., OPPENHEIM, A.V. and WILLSKY, A.S. (1983), Time varying parametric
modeling of speech, Signal Processing 5, 267285.
HORVATH, L., KOKOSZKA, P., TEYESSIERE, G. (2001): Empirical process of the
squared residuals of an ARCH sequence. Ann. Statist. 29, 445469.
KOUL, H. L.. (2002). Weighted empirical processes in dynamic nonlinear models, 2nd edn.,
Lecture Notes in Statistics, 166, Springer, New York.
KOUL, H.L., LING, S. (2006): Fitting an error distribution in some heteroscedastic time
series models. Ann. Statist. 34, 994-1012.
MULLER, U., SCHICK, A. and WEFELMEYER, W. (2008): Estimating the innovation
distribution in nonparametric autoregression. Probab. Theory Relat. Fields 144, 5377
LAIB, N., LEMDANI, M. and OLUD-SAID, E. (2008): On residual empirical processes of
GARCH-SM models: Application to conditional symmetry tests. J. Time Ser. Anal. 29,
762- 782.
NICKL, R. and POTSCHER, B.M. (2007). Bracketing metric entropy rates and empirical
central limit theorems for function classes of Besov- and Sobolev-type. J. Thoer. Probab.
20, 177-199.
ORBE, S., FERREIRA, E., and RODRIGUES-POO, J. (2005). Nonparametric estimation
31
of time varying parameters under shape restrictions. J. Econometrics 126, 53 - 77.
PRIESTLEY, M.B. (1965). Evolutionary spectra and non-stationary processes. J. Royal
Statist. Soc. - Ser. B 27, 204-237.
RAJAN, J.J. and Rayner, P.J. (1996). Generalized feature extraction for time-varying au-
toregressive models, IEEE Trans. on Signal Processing 44, 24982507.
STUTE, W. (2001). Residual analysis for ARCH(p)-time series. Test 1, 393403.
SUBBA RAO, T.(1970). The fitting of nonstationary time-series models with time-
dependent parameters, J. Royal Stat. Soc. - Ser. B 32, 312322.
VAN DE GEER, S. (2000): Empirical processes in M-estimation. Cambridge University
Press.
VAN DER VAART, A.W. and WELLNER, J. (1996). Weak convergence and empirical
processes. Springer, New York.
32