Estimation of conditional copulas: revisiting asymptotic ...€¦ · Estimation of conditional...

43
Estimation of conditional copulas: revisiting asymptotic results in the i.i.d. case and extension to serial dependence Félix Camirand Lemyre a , Taoufik Bouezmarni a , Jean-François Quessy b a Département de mathématiques, Université de Sherbrooke, Québec, Canada b Département de mathématiques et d’informatique, Université du Québec à Trois-Rivières, Trois-Rivières, Canada Abstract Let (Y 1 ,Y 2 ,X ) R 3 be a random vector and consider the conditional distri- bution H x (y 1 ,y 2 )= P(Y 1 y 1 ,Y 2 y 2 |X = x). The conditional copula in that context is the dependence structure that one extracts from H x via the celebrated Sklar’s theorem. Specifically, if the conditional marginal distribu- tions F 1x and F 2x of H x are continuous, then there exists a unique conditional copula C x : [0, 1] 2 [0, 1] such that H x (y 1 ,y 2 )= C x {F 1x (y 1 ),F 2x (y 2 ) for all (y 1 ,y 2 ) R 2 . Then, C x contains all the information on the form, and in particular on the strength, of the dependence between Y 1 and Y 2 for a given value of the covariate X . This paper considers the estimation of C x when serially dependent copies of (Y 1 ,Y 2 ,X ) are available, extending recent re- sults obtained by Veraverbeke et al. (2011) and Gijbels et al. (2011). Such observations naturally appear in financial contexts. It is shown that under appropriate conditions, the limiting behavior of suitably standardized ver- sions match those in the i.i.d. case. An application is given on measuring causality between Standard & Poor’s 500 index data and volume. Keywords: α-mixing processes, conditional copula, empirical copula process, functional delta method, weak convergence Email addresses: [email protected] (Félix Camirand Lemyre), [email protected] (Taoufik Bouezmarni), [email protected] (Jean-François Quessy) Preprint submitted to Elsevier November 19, 2014

Transcript of Estimation of conditional copulas: revisiting asymptotic ...€¦ · Estimation of conditional...

  • Estimation of conditional copulas: revisiting

    asymptotic results in the i.i.d. case and extension to

    serial dependence

    Félix Camirand Lemyrea, Taoufik Bouezmarnia, Jean-François Quessyb

    aDépartement de mathématiques, Université de Sherbrooke, Québec, CanadabDépartement de mathématiques et d’informatique, Université du Québec à

    Trois-Rivières, Trois-Rivières, Canada

    Abstract

    Let (Y1, Y2, X) ∈ R3 be a random vector and consider the conditional distri-bution Hx(y1, y2) = P(Y1 ≤ y1, Y2 ≤ y2|X = x). The conditional copula inthat context is the dependence structure that one extracts from Hx via thecelebrated Sklar’s theorem. Specifically, if the conditional marginal distribu-tions F1x and F2x of Hx are continuous, then there exists a unique conditionalcopula Cx : [0, 1]

    2 → [0, 1] such that Hx(y1, y2) = Cx{F1x(y1), F2x(y2) for all(y1, y2) ∈ R2. Then, Cx contains all the information on the form, and inparticular on the strength, of the dependence between Y1 and Y2 for a givenvalue of the covariate X. This paper considers the estimation of Cx whenserially dependent copies of (Y1, Y2, X) are available, extending recent re-sults obtained by Veraverbeke et al. (2011) and Gijbels et al. (2011). Suchobservations naturally appear in financial contexts. It is shown that underappropriate conditions, the limiting behavior of suitably standardized ver-sions match those in the i.i.d. case. An application is given on measuringcausality between Standard & Poor’s 500 index data and volume.

    Keywords: α-mixing processes, conditional copula, empirical copulaprocess, functional delta method, weak convergence

    Email addresses: [email protected] (Félix CamirandLemyre), [email protected] (Taoufik Bouezmarni),[email protected] (Jean-François Quessy)

    Preprint submitted to Elsevier November 19, 2014

  • 1. Introduction

    Consider a random pair (Y1, Y2) from a joint distribution H having continuousmarginal distributions F1 and F2. Then a celebrated theorem due to Sklar(1959) ensures the existence of a unique copula C : [0, 1]2 → [0, 1] such thatH(y1, y2) = C{F1(y1), F2(y2)} holds for all (y1, y2) ∈ R2; see Nelsen (2006)for details on the theory of copulas. Based on (Y11, Y21), . . . , (Y1n, Y2n) i.i.d.H , the nonparametric estimation of C is usually performed by computingthe empirical copula. A version that is asymptotically equivalent to thatoriginally proposed by Rüschendorf (1976) is

    Cn(u1, u2) =1

    n

    n∑

    i=1

    I{Y1i ≤ F−1n1 (u1), Y2i ≤ F−1n2 (u2)

    },

    where Fn1 and Fn2 are the marginal empirical distribution functions.The asymptotic behavior of the empirical copula process Cn =

    √n(Cn −

    C) has been investigated by Deheuvels (1979) under independence, i.e. whenC(u, v) = uv is the independence copula. General weak convergence in thespace D([0, 1]2) of càdlàg functions equipped with the Skorohod topologywas investigated by Gaenßler & Stute (1987); van der Vaart & Wellner (1996)show weak convergence in the space `∞([a, b]2) of bounded functions on [a, b]2

    for 0 < a < b < 1. The result was extended to the space `∞([0, 1]2) by Ferma-nian et al. (2004) while assuming the existence and continuity of the partialderivatives C [1](u1, u2) = ∂ C(u1, u2)/∂u1 and C

    [2](u1, u2) = ∂ C(u1, u2)/∂u2on [0, 1]2. In that case, Cn converges weakly with respect to the supremumdistance to a limit process of the form

    C(u1, u2) = α(u1, u2)− C [1](u1, u2)α(u1, 1)− C [2](u1, u2)α(1, u2), (1)

    where α is a continuous and centered Gaussian process such that

    E {α(u1, u2)α(u′1, u′2)} = C (u1 ∧ u′1, v1 ∧ v′1)− C(u1, u2)C(u′1, u′2).

    Here and in the sequel, a ∧ b = min(a, b) for a, b ∈ R. As shown by Segers(2012), these requirements on the partial derivatives of C are not satisfiedfor many extensively used copula models. Fortunately, this author obtainedthat the result still holds if C [1] and C [2] exist and are continuous on the sets(0, 1)× [0, 1] and [0, 1]× (0, 1), respectively. The extension of this result to

    2

  • serially dependent data has been considered recently by Bücher & Volgushev(2013) and Bücher & Ruppert (2013).

    One is often interested in the behavior of a random couple conditional onthe value taken by some covariate. Specifically, let (Y1, Y2, X) be a randomvector taking values in R3 and for a fixed x ∈ R, consider the conditionaljoint distribution of (Y1, Y2) given X = x, that is

    Hx(y1, y2) = P (Y1 ≤ y1, Y2 ≤ y2|X = x) . (2)

    The conditional marginal distributions of Y1 and Y2 given X = x obtains fromHx via F1x(y) = limw→∞Hx(y, w) and F2x(y) = limw→∞Hx(w, y). If F1x andF2x are continuous, then Sklar’s theorem ensures that there exists a uniquecopula Cx : [0, 1]

    2 → [0, 1] such that Hx(y1, y2) = Cx{F1x(y1), F2x(y2)}. Con-versely, the copula associated to the bivariate conditional distribution Hxcan be extracted from the formula

    Cx(u1, u2) = Hx{F−11x (u1), F2x(u2)

    }. (3)

    The bivariate function Cx is called the conditional copula. The latter con-tains all the dependence feature of (Y1, Y2) given a fixed value taken by thecovariate. For that reason, it is an important task to be able to estimate Cx.

    The estimation of Cx from i.i.d. observations has been considered byVeraverbeke et al. (2011) and Gijbels et al. (2011). Specifically, assumingthe availability of independent random triplets W1, . . . ,Wn, where Wi =(Y1i, Y2i, Xi), two empirical versions of Cx were proposed. First consider theestimator of the joint conditional distribution Hx given by

    Hxh(y1, y2) =

    n∑

    i=1

    whi(x) I (Y1i ≤ y1, Y2i ≤ y2) , (4)

    where wh1, . . . , whn are weight functions that smooth the covariate space andh is a bandwidth parameter. The latter typically depends on the samplesize, but the subscript n is omitted in the sequel for notational simplic-ity. The conditional empirical marginal distributions are simply F1xh(y) =limw→∞Hxh(y, w) and F2xh(y) = limw→∞Hxh(w, y). From representation (3),a natural plug-in estimator of Cx is given by

    Cxh(u1, u2) = Hxh{F−11xh(u1), F

    −12xh(u2)

    }

    =

    n∑

    i=1

    whi(x) I{Y1i ≤ F−11xh(u1), Y2i ≤ F−12xh(u2)

    }, (5)

    3

  • where for j = 1, 2, F−1jxh(u) = inf{y ∈ R : Fjxh(y) ≥ u} is the left-continuousgeneralized inverse of Fjxh. As noted by Gijbels et al. (2011), the estimatorCxh may be severely biased if any of the two marginal distributions is stronglyinfluenced by the covariate. For that reason, they proposed a second esti-mator that aims at removing this effect of the covariate on the margins andhopefully obtain a smaller bias. To this end, define for each i ∈ {1, . . . , n}the pseudo-uniformized observations (Ũ1i, Ũ2i) = (F1Xih1(Y1i), F2Xih2(Y2i)),where h1, h2 are bandwidth parameters that may differ from h. Then, let

    G̃xh(v1, v2) =

    n∑

    i=1

    whi(x) I {F1Xih1(Y1i) ≤ v1, F2Xih2(Y2i) ≤ v2} (6)

    and estimate Cx with

    C̃xh(u1, u2) = G̃xh

    {G̃−11xh(u1), G̃

    −12xh(u2)

    }, (7)

    where G̃1xh and G̃2xh are the marginals extracted from G̃xh. The investigationof the asymptotic properties of the conditional empirical copula processes

    Cxh =√nh(Cxh − Cx

    )and C̃xh =

    √nh(C̃xh − Cx

    )

    as random elements in the space `∞([0, 1]2) of bounded functions on [0, 1]2

    endowed with the supremum norm have been done by Veraverbeke et al.(2011). Specifically, they obtained that Cxh and C̃xh both converge weakly toGaussian processes whose stochastic behavior differ only on their respectivebias. The main goals of this paper are to

    (i) re-derive the above-mentioned asymptotic results of Veraverbeke et al.(2011) in a simpler way using the functional delta method;

    (ii) extend these results to serially dependent observations, i.e. time series.

    The paper is organized as followed. Section 2 revisits some results on classi-cal and conditional empirical copula processes in the light of the functionaldelta method. Section 3 extends these results to conditional empirical cop-ula processes under serial dependence. Some simulation results that aim atstudying the accuracy of these estimators are provided in Section 4. An ap-plication is given in Section 5 on measuring causality in the bivariate timeseries of returns and volume of the Standard & Poor’s 500 index. The proofsof the main results are to be found in Section 6 and auxiliary results neededfor these theoretical results to hold are relegated to an Appendix.

    4

  • 2. Revisiting conditional empirical copulas in the i.i.d. case

    2.1. A Hadamard functional

    Let D be the space of bivariate distribution functions on R2 and consider themapping Φ : D → `∞([0, 1]2) defined by

    Φ(H) = H ◦(F−11 , F

    −12

    ), (8)

    where F1(y) = limw→∞H(y, w) and F2(y) = limw→∞H(w, y). Then if F1and F2 are continuous, the unique copula of H writes C = Φ(H). TheHadamard differentiability of Φ has been first investigated in van der Vaart& Wellner (1996) under strong assumptions on the partial derivatives of C.These assumptions were recently relaxed by Bücher & Volgushev (2013) andmatch those identified by Segers (2012).

    Definition 2.1. In the sequel, D is the class of copulas C such that the partialderivatives C [1](u1, u2) = ∂C(u1, u2)/∂u1 and C [2](u1, u2) = ∂C(u1, u2)/∂u2exist and are continuous respectively on the sets (0, 1)×[0, 1] and [0, 1]×(0, 1).

    If the copula C of H belongs to D, then Bücher & Volgushev (2013) showthat Φ is Hadamard differentiable with derivative at H given by

    Φ′H(∆)(u1, u2) = ∆̃(u1, u2)− C [1](u1, u2) ∆̃(u1, 1)− C [2](u1, u2) ∆̃(1, u2),

    with ∆̃(u1, u2) = ∆{F−11 (u1), F−12 (u2)}. As a consequence, if Hn is an esti-mator of H such that Hn =

    √n(Hn −H) converges weakly to some limit H,

    an application of the functional delta method yields for Cn = Φ(Hn) that

    Cn =√n (Cn − C) C = Φ′H (H) .

    When (Y11, Y21), . . . , (Y1n, Y2n) are i.i.d., Hn is usually the bivariate empiricaldistribution function. From classical results, H is a centered Gaussian processwith covariance function given for each (y1, y2), (y

    ′1, y

    ′2) ∈ R2 by

    ΓH(y1, y2, y′1, y

    ′2) = H(y1 ∧ y′1, y2 ∧ y′2)−H(y1, y2)H(y′1, y′2). (9)

    It follows that the limit C = Φ′H(H) of Cn is a centered Gaussian processon [0, 1]2 of the form given in equation (1). One then recovers the resultobtained by Segers (2012).

    5

  • 2.2. Asymptotic behavior of Cxh

    As already mentioned, the asymptotic behavior of Cxh has been investigatedby Veraverbeke et al. (2011). Here, this result will be re-derived upon notingthat Cxh = Φ(Hxh), where the functional Φ is defined in equation (8). Whenthe weights in the definition of Hxh are nonnegative and sum to one, Hxh is adistribution function and thus belongs to the space D of bivariate distributionfunctions on R2. In that case, the asymptotic behavior of Cxh will be aconsequence of the large-sample behavior of Hxh =

    √nh(Hxh −Hx). To this

    end, one needs to introduce the following family of distributions.

    Definition 2.2. In the sequel, Fx is the class of distribution functions H onR

    3 whose conditional distribution Hz(y1, y2) = ∂H(y1, y2, z)/∂z is such that

    Ḣz(y1, y2) =∂

    ∂zHz(y1, y2) and Ḧz(y1, y2) =

    ∂2

    ∂z2Hz(y1, y2)

    are uniformly continuous for each z in a neighborhood of x.

    Now the following proposition can be deduced from Veraverbeke et al. (2011).

    Proposition 2.3. Let W1, . . . ,Wn, where Wi = (Y1i, Y2i, Xi), be i.i.d. froma distribution function H that belongs to Fx. Also assume that the weightsin the definition of Hxh satisfy conditions W1–W5 described in Appendix A.If nh → ∞ and nh5 → K < ∞ as n → ∞, then the empirical process Hxhconverges weakly to a Gaussian process Hx having covariance function

    Cov {Hx(y1, y2),Hx(y′1, y′2)} = K5 ΓHx(y1, y2, y′1, y′2)

    and asymptotic bias

    BHx(y1, y2) = K

    {K3 Ḣx(y1, y2) +

    K42

    Ḧx(y1, y2)

    }. (10)

    As a straightforward application of the functional delta method, one ob-tains from the conclusion of Proposition 2.3 that as long as Cx belongs tothe class D, the empirical process Cxh converges weakly to Cx = Φ′Hx(Hx).One then deduces the asymptotic representation

    Cx(u1, u2) = αx(u1, u2)− C [1]x (u1, u2)αx(u1, 1)− C [2]x (u1, u2)αx(1, u2),

    6

  • where αx(u1, u2) = Hx{F−11x (u1), F−12x (u2)} is a Cx-Brownian bridge. Be-cause Φ′Hx is a linear functional, the asymptotic bias of Cx obtains viaE{Φ′Hx(Hx)} = Φ′Hx{E(Hx)} = Φ′Hx(Bx). From equation (10), one obtainsthe expression already derived by Veraverbeke et al. (2011), namely

    E {Cx(u1, u2)} = B̃Hx(u1, u2)−C [1]x (u1, u2) B̃Hx(u1, 1)−C [2]x (u1, u2) B̃Hx(1, u2),

    where B̃Hx(u1, u2) = BHx{F−11x (u1), F−12x (u2)}.

    Remark 2.4. When Hxh fails to be a distribution function, a version H′xh isobtained upon setting negative weights to zero and by re-scaling the resultingweights in order that they sum to one. Then, as long as the difference betweenHxh and H′xh is asymptotically negligible, the large-sample behavior of Cxhobtains as a consequence of that of H′xh. As pointed out in Omelka et al.(2013), this holds if

    √nh sup

    z∈J(n)x

    ∣∣∣∣∣

    n∑

    i=1

    whi(z)− 1∣∣∣∣∣ and sup

    z∈J(n)x

    ∣∣∣∣∣

    n∑

    i=1

    whi(z) I{whi(z) < 0}∣∣∣∣∣

    both converge in probability to zero, where J (n)x = [mini∈I(n)x Xi,maxi∈I(n)x Xi]

    and I(n)x = {i : whi(x) > 0}.

    2.3. Asymptotic behavior of C̃xh

    Obtaining the asymptotic behavior of C̃xh is more tricky because it is basedon the pseudo-uniformized observations (Ũ11, Ũ21), . . . , Ũ1n, Ũ2n). As a firststep, consider a version of G̃xh computed from (U11, U21), . . . , (U1n, U2n),where U1i = F1Xi(Y1i) and U2i = F2Xi(Y2i), namely

    Gxh(u1, u2) =n∑

    i=1

    whi(x) I (U1i ≤ u1, U2i ≤ u2) .

    The weak limit of Gxh =√nh(Gxh − Cx) obtains from Proposition 2.3 with

    H replaced by C. Therefore, under the conditions of Proposition 2.3, Gxhconverges weakly to Gx, where Gx is a Gaussian process with covariancefunction Cov{Gx(u1, u2),Gx(u′1, u′2)} = K5 ΓCx(u1, u2, u′1, u′2) and asymptoticbias E{Gx(u1, u2)} = BCx(u1, u2). If in addition, Cx belongs to D, thenthe empirical process C̃?xh =

    √nh{Φ(Gxh) − Φ(Cx)} converges weakly to

    7

  • C̃?x = Φ′Cx(Gx). Since BCx(u, 1) = BCx(1, u) = 0 for any u ∈ [0, 1], one

    deduces E{Φ′Cx(Gx)} = Φ′Cx(BCx) = BCx as the bias of C̃?x.It remains to show that the processes C̃?xh and C̃xh are asymptotically

    equivalent. It is indeed the case if one can show that the pseudo uniformizedobservations are close to the truely uniformized observations. This is thesubject of the next proposition whose consequence is that one recovers theresult stated in Veraverbeke et al. (2011).

    Proposition 2.5. Suppose that for j = 1, 2, the functions Fjz{F−1jz (u)},Ḟjz{F−1jz (u)} and F̈jz{F−1jz (u)} are continuous in (z, u) for z in a neighbor-hood of x. Further assume that nh1 → ∞ and nh2 → ∞ as n → ∞, and thath/min(h1, h2) < ∞ with nh51 < ∞ and nh52 < ∞. Then under the conditionsof Proposition 2.3 and if in addition, conditions W6–W10 in Appendix Ahold, the empirical process C̃xh converges weakly to Φ′Cx(Gx).

    3. Asymptotic behavior of Cxh and C̃xh under serial dependence

    3.1. α-mixing processes

    Consider a sequence (ηi)i∈Z of random vectors defined on some probabilityspace (Ω,F ,P). The σ-field generated by (ηi)a≤i≤b, a, b ∈ Z ∪ {−∞,+∞} isdenoted by F ba. As defined, e.g. in Rio (2000), the α-mixing coefficients of(ηi)i∈Z are given by αη(0) = 1/2 and for r ≥ 1,

    αη(r) = supk∈Z

    α(Fk−∞,F∞k+r

    ),

    where for two σ-fields A,B ⊂ F ,

    α (A,B) = sup(A,B)∈A×B

    |P(A ∩B)− P(A)P(B)| .

    The process (ηi)i∈Z is said to be α-mixing if αη(r) → 0 as r → ∞. Theα-mixing condition is also called strong mixing assumption in the literature.Many well-known stochastic processes satisfy this property, for example theARMA and ARCH processes are α−mixing under some regular conditions ontheir parameters. For more details, see Meitz & Saikkonen (2002), Doukhan(1994) and Carrasco & Chen (2002).

    The following two subsections establish the asymptotic behavior of Cxhand C̃xh under the strong mixing assumption.

    8

  • 3.2. The process Cxh

    Similarly as in the i.i.d. case, the asymptotic behavior of Cxh under serialdependence can be derived as a consequence of the large-sample behaviorof the associated weighted empirical process Hxh =

    √nh(Hxh − Hx). As

    pointed out in the next theorem, one obtains under certain conditions on theweights that Hxh converges weakly, under a strong mixing assumption, to aGaussian process whose representation coincides with that of Hx appearingin Proposition 2.3 in the i.i.d. situation.

    Theorem 3.1. Let (Wi)i∈Z, where Wi = (Y1i, Y2i, Xi), be a stationary se-quence with α-mixing coefficients such that αW(r) = O(r−a) for some a > 6.Assume that their common joint distribution H belongs to the class Fx andsuppose that conditions W ?1 , W2–W5 and W11–W13 on the weights are ful-filled. Then if nh → ∞ and nh5 → K < ∞ as n → ∞, the process Hxhconverges weakly to the same Gaussian process Hx as in the i.i.d. case.

    Interestingly, the asymptotic covariance structure of Hxh under a strong mix-ing assumption matches the asymptotic covariance structure of Hxh found inthe i.i.d. setting. In other words, the impact of time-dependency, comparedto the serially independent case, is asymptotically negligible. This is due tothe use of the weight functions that smooth the covariate space in a shrink-ing neighborhood of x as n goes to infinity. Note however that the strongerassumptions W ?1 and W11–W13 on the weight functions were needed in orderto tackle moments of order six involved by time-dependency. These condi-tions are fulfilled, among others, by the Nadaraya–Watson and local linearweights. The latter are given respectively by

    wNWhi (x) =K(Xi−x

    h

    )

    Sn,0and wLLhi (x) =

    K(Xi−x

    h

    ) {Sn,2 −

    (Xi−x

    h

    )Sn,1

    }

    Sn,0Sn,2 − S2n,1,

    where K is a symmetric and continuously differentiable kernel density func-tion on [−1, 1] and for j ∈ {0, 1, 2},

    Sn,j =n∑

    i=1

    (Xi − x

    h

    )jK

    (Xi − x

    h

    ).

    The following corollary to Theorem 3.1 is a straightforward consequence ofthe Hadamard differentiability of the functional Φ defined in (8) and of theweak convergence of Hxh under a strong mixing assumption.

    9

  • Corollary 3.2. If the conditions of Theorem 3.1 are met and Cx belongs toD, then the empirical process Cxh converges weakly to Cx = Φ′Hx(Hx). Hence,Cxh has the same asymptotic behavior as in the i.i.d. case.

    3.3. The process C̃xh

    From the definitions of G̃xh and Gxh, one can write

    C̃xh =√nh{Φ(Gxh)− Cx

    }+√nh{Φ(G̃xh)− Φ(Gxh)

    }.

    Provided the conditions of Corollary 3.2 are satisfied, one deduces that thefirst summand on the right converges weakly to C̃x = Φ

    ′Cx(Gx). Therefore,

    the key result for the weak convergence of C̃xh is the asymptotic negligibilityof the second summand; this is established next. Then under the conditionsof Proposition 2.3 and if in addition, conditions W6–W10 in Appendix Ahold, the empirical process C̃xh converges weakly to Φ

    ′Cx(Gx).

    Theorem 3.3. Let (Wi)i∈Z, where Wi = (Y1i, Y2i, Xi), be a stationarysequence with associated α-mixing coefficients such that αW(r) = O(r−a)for some a > 6. Suppose that for j = 1, 2, the functions Fjz{F−1jz (u)},Ḟjz{F−1jz (u)} and F̈jz{F−1jz (u)} are continuous in (z, u) for z in a neighbor-hood of x. Further assume that nh1 → ∞ and nh2 → ∞ as n → ∞,√nh1h

    21 < ∞,

    √nh2h

    22 < ∞ and that h/min(h1, h2) < ∞. In addition,

    assume that conditions W ?1 , W2–W13 on the weights are satisfied. If the dis-tribution function of (F1X(Y1), F2X(Y2), X) belongs to Fx, then

    sup(u1,u2)∈[0,1]2

    √nh∣∣∣Φ(G̃xh)(u1, u2)− Φ(Gxh)(u1, u2)

    ∣∣∣

    converges in probability to zero as n → ∞.

    4. Sample behavior of the conditional copula estimators

    In order to evaluate the sample performance of the estimators Cxh and C̃xh,Veraverbeke et al. (2011) compared their asymptotic bias under various sce-

    narios. They observed that generally speaking, C̃xh has a significantly smallerbias than Cxh. The aim of this section is to perform a similar investigationin the case of serially dependent data. To this end, consider the generalautoregressive data generating process

    Wi = θWi−1 + (1− θ2)1/2εi,

    10

  • where for each i ∈ N, Wi = (Y1i, Y2i, Xi) and εi are i.i.d. from some dis-tribution on R3 having mean vector zero. In that model, the parameterθ ∈ [0, 1) controls the strength of the serial dependence between successiveobservations; it is a situation where the data are α-mixing.

    For the results that will be presented next, one restricts to the case whenεi are standard normal with some correlation matrix R = (Rij)

    3i,j=1 where

    Rij is the correlation coefficient between Yi and Yj . In that case, it is wellknown that the joint conditional distribution of (Y1i, Y2i) given Xi = x is thebivariate Normal distribution with correlation coefficient

    ρx =R12 − R13R23√

    (1−R213)(1− R223).

    As a consequence, the conditional copula of (Y1i, Y2i) given Xi = x is simplythe so-called Gaussian copula with parameter ρx. A convenient way to quan-tify dependence in a bivariate random vector is via its corresponding valueof Kendall’s tau, which writes in terms of the underlying copula C as

    τ(C) = 4

    ∫ 1

    0

    ∫ 1

    0

    C(u1, u2) dC(u1, u2)− 1. (11)

    For a given conditional copula Cx, the conditional Kendall’s tau is simplyτx = τ(Cx). For the conditional Gaussian copula, τx = (2/π) sin

    −1(ρx). Fourvariants of the model have been considered; they are described in Table 1.

    Table 1: The four models considered in the simulation study

    Model R12 R13 R23 ρx τxM1 .9 .8 .8 .72 .52M2 -.9 .8 -.8 -.72 -.52M3 .8 .1 .1 .80 .59M4 .1 .1 .1 .09 .06

    The relative performance of Cxh and C̃xh for the estimation of Cx hasbeen evaluated in the light of the average integrated squared bias (AISB) andthe average integrated variance (AIV). To be specific, if Ĉ is some estimatorof Cx, then

    AISB(Ĉ) =

    ∫ 1

    0

    ∫ 1

    0

    {E(Ĉ(u1, u2))− Cx(u1, u2)

    }2du1du2

    11

  • and

    AIV(Ĉ) =

    ∫ 1

    0

    ∫ 1

    0

    {Ĉ2(u1, u2)−

    (E(Ĉ(u1, u2))

    )2}du1du2.

    The latter have been estimated from 1 000 replicates under each of the sce-nario considered with n = 250. The results about the estimated values ofAISB and AIV for the two conditional copula estimators are reported inFigures 1–4 respectively for models M1–M4.

    First observe that, interestingly, the integrated variance is very similarfor any values of θ. This is in accordance with the theoretical results ofSection 3 that states that the estimators act, asymptotically, as in the i.i.d.case. Considering model M1 first, one notes that the C̃xh outperforms Cxhin term of AISB. This difference might be explained by the fact that E(Cxh)depends in general on F1x and F2x, and therefore is affected in some way bythe dependence in the conditional marginals. Also observe that the integratedbias of C̃xh stabilizes as the bandwidth parameter h takes large values; it isnot the case for Cxh. Finally note that Cxh do slightly better than C̃xhin term of AIV. Under model M2, one has R13 = −R23 and this entailsthat E{Cx(u, u)} = E{C̃x(u, u)}. As a consequence, the terms involving themarginal distributions cancel. For M3 and M4, the two estimators performquite similarly in term of bias. This may be explained by the fact that here,the covariate has a weak influence on the marginal distributions. However,there is a slight advantage of Cxh in term of variance for small values of thebandwidth h.

    5. Empirical application: measuring causality in financial data

    The concept of causality was introduced by Wiener (1956) and Granger(1969) in order to study the dynamic relationship between time series. When-ever a test for the non-causality hypothesis is rejected, it is important to ap-propriately quantify the strength of this causality. In this regard, measuresbased on linear models were proposed by Geweke (1982) and Geweke (1984),while measure using the Kullback–Leibler information criterion were investi-gated by Gouriéroux et al. (1987) while assuming the normality of the data.A nonparametric copula-based measure was recently proposed by Taamoutiet al. (2014). However, the above-mentioned measures of causality are globalin the sense that they do not allow to quantify the local strength of causality.

    A way to measure local causality in time series is by the mean of the con-ditional Kendall’s measure of association, namely τx = τ(Cx), where τ(C) is

    12

  • defined in (11). To be specific, we consider (Zi)i∈Z, where Zi = (Z1i, Z2i).Then, define the process (Wi)i∈Z, where Wi = (Z1i, Z2,i−1, Z1,i−1). The con-ditional Kendall’s tau of (Z1i, Z2,i−1) given Z1,i−1 is a nonparametric mea-sure of the local Granger causality from Z2 to Z1. In other words, based onW1, . . . ,Wn, one considers τ̂x arising while replacing Cx in τ(Cx) by one ofthe two conditional copula estimators described in this work.

    An illustration of this new local measure of causality will be given onthe time series of daily returns and volume of the Standard and Poor’s 500(S&P500) Index. The relationship between these two indices has been thesubject of extensive theoretical and empirical researches; see Bouezmarniet al. (2012) for more details. The data set that will be analyzed comesfrom Yahoo Finance and consists of n = 3 032 daily observations takenbetween January 1997 and January 2009. Specifically, one considers thecontinuously compounded changes in prices (returns) and trading volume(volume growth rate). Also, according to the tests of stationarity reported inBouezmarni et al. (2012), the first difference of logarithmic price and volume,rather than their level, are considered. Consequently, the upcoming causalityrelations have to be interpreted in terms of growth rates. First note that thenull hypothesis of Granger non-causality from returns to volume was clearlyrejected by the tests proposed by Bouezmarni et al. (2012) and Su & White(2008). Moreover, a non-linear feedback effect from volume to returns wasdetected. In this therefore of interest to study these relationships in the lightof the proposed local causality measure.

    First, we consider the causality measure from volume to return. In thiscase, the return (resp. volume) plays the role of Z1 (resp. Z2) in the process(Wi)i∈Z given above. Figure A.5 reports the resulting value of τ̂x as a functionof x. Note that the confidence bands are based on a multiplier bootstrapmethod under the independence assumption, see Camirand (2013) for moredetails. In this figure, one can see that the conditional Kendall’s tau thatmeasures the causality from volume to return clearly exceeds the confidencebands in the central region of the return when τ̂x is based on C̃xh; this is lessobvious when τ̂x is based on Cxh.

    Second, we consider the causality measure from return to volume. In thiscase, the volume (resp. return) plays the role of Z1 (resp. Z2). Figure A.6illustrates the resulting value of τ̂x as a function of x. In this figure, theconditional Kendall’s tau that measures the causality from return to volumeclearly exceeds the confidence bands in the right region of the volume whenτ̂x is based on Cxh. Also, one can see that it clearly exceeds the confidence

    13

  • bands in the central region of the volume when τ̂x is based on C̃xh.

    6. Proofs of the main results

    6.1. Proof of Theorem 3.1

    The proof divides into two steps, namely

    (i) it is first shown that the finite-dimensional distributions of the processZxn = Hxh − E(Hxh) are asymptotically Normal;

    (ii) the tightness of the sequence Zxn is established.

    Finite-dimensional distributions. It will be shown that the random variableZxn(y1, y2) is asymptotically normally distributed for fixed and arbitraryy1, y2 ∈ R. The arguments that will be presented next for that purposecan easily be adapted to show the joint weak convergence of any finite-dimensional components by mean of the Cramér–Wold device.

    First define σ2z(y1, y2) = ΓHz(y1, y2, y1, y2), where ΓH is given in (9). Thenfor each i ∈ {1, . . . , n}, let ϑi(y1, y2) = I(Y1i ≤ y1, Y2i ≤ y2)−Hxi(y1, y2) andobserve that

    Var {Zxn(y1, y2)} = nhn∑

    i=1

    {whi(x)}2 σ2xi(y1, y2)

    + 2

    n∑

    i=1

    n−i∑

    `=1

    whi(x)wh,i+`(x) Cov {ϑi(y1, y2), ϑi+`(y1, y2)}

    = Λn1(y1, y2) + Λn2(y1, y2).

    Since H ∈ Fx, one has σxi(y1, y2) = σx(y1, y2) + o(1) for any i ∈ Inx, andthen in view of Condition W4, Λn1(y1, y2) → K5 σx(y1, y2) as n → ∞. Forthe second summand, one follows a similar idea as in Li & Racine (2007). Tothis end, note that the α-mixing coefficients are such that

    |Cov {ϑi(y1, y2), ϑi+`(y1, y2)}| ≤ αW(`) ≤ 1.

    14

  • Then, for πn = bh−1/2c, one can write

    Λn2(y1, y2) = nh

    n∑

    i=1

    πn∑

    `=1

    whi(x)wh,i+`(x) Cov {ϑi(y1, y2), ϑi+`(y1, y2)}

    +nh

    n∑

    i=1

    n−i∑

    `=πn+1

    whi(x)wh,i+`(x) Cov {ϑi(y1, y2), ϑi+`(y1, y2)}

    ≤ nh(πn max

    1≤`≤πn

    n−πn∑

    i=1

    whi(x)wh,i+`(x) + max1≤`≤n

    whi(x)

    n∑

    `=πn+1

    αW(`)

    )

    = O

    (h1/2 +

    n∑

    `=πn+1

    αW(`)

    ),

    where the last equality follows from assumptions W ?1 and W11. From the as-sumptions on h and on the α-mixing coefficients, one concludes that this lastexpression is indeed o(1). Thus, as n → ∞, Var{Zxn(y1, y2)} → K5 σ2x(y1, y2).

    Now in order to show the asymptotic normality, a blocking techniquedescribed for instance in Billingsley (1968) will be used. To this end, writen = rn(bn + `n) and assume without loss of generality that rn, bn and `nare integers such that bn ∼ n1−�+δ and `n ∼ n1−� for some δ, � > 0. Then,introduce for each i ∈ {1, . . . , n} the sets Sn1(i) = {j ∈ N : (i− 1)(bn + `n) +1 ≤ j ≤ (i − 1)(bn + `n) + bn} and Sn2(i) = {j ∈ N : i(bn + `n) + 1 − `n ≤j ≤ i(bn + `n)}. With this notation, one can write Zxn = Un + Wn, whereUn =

    √nh∑rn

    i=1 Uni and Wn =√nh∑rn

    i=1Wni, with

    Uni =∑

    j∈Sn1(i)

    whj(x)ϑj(y1, y2) and Wni =∑

    j∈Sn2(i)

    whj(x)ϑj(y1, y2).

    For any κ > 0, one has

    P(|Wn| > κ) ≤rn∑

    i=1

    P

    (|Wni| >

    κ√nh rn

    )≤

    rn∑

    i=1

    r4n(nh)2

    κ4E(|Wni|4

    ).

    Now consider the following instrumental Lemma.

    Lemma 6.1. Assume W ?1 , W11 and W12 are satisfied and suppose∑

    r≥0(r+1)2αW(r) < ∞. Let d ∈ {`n, bn}, and let Sn(k) be the corresponding setof indices (either Sn1(k) or Sn2(k)) of length d and suppose d h → ∞. For

    15

  • any y1, y2 ∈ R, write Sd(k) =∑

    i∈Sn(k)ϑi(y1, y2)whi(x). Then there exists a

    constant Cα such that

    rn∑

    k=1

    E{Sd(k)

    4}≤ Cαn−3h−2d.

    Because αW(r) = O(r−a) for some a > 6, one can apply Lemma 6.1 with

    d = `n whenever `n h → ∞ in order to bound∑rn

    i=1 E(|Wni|4). Hence, oneconcludes that there exists a constant Cα such that

    P(|Wn| > κ) ≤Cα r

    4n `n

    κ4 n→ 0 whenever 3� < 4δ. (12)

    In order to deal with Un, let U′n1, . . . , U

    ′nrn be independent random variables

    having the same conditional distribution function as Un1, . . . , Unrn. Basedon Billingsley (1968), p. 376, one can show that the respective characteristicfunctions of Un and U

    ′n =

    √nh∑rn

    i=1 U′n,i differ by at most 16 rn α`n → 0

    whenever (a+ 1)� < a+ δ. As a consequence, Un and U′n are asymptotically

    equivalent. By straightforward computations,

    Var

    (√nh

    rn∑

    i=1

    U ′ni

    )= K5 σ

    2x(y1, y2) + o(1).

    Next, using Lemma 6.1 again with d = bn such that bn h → ∞,rn∑

    i=1

    E{√

    nh (U ′ni)2I

    (√nh(U ′ni)

    2 > κ)}

    ≤rn∑

    i=1

    (nh

    κ

    )2E{(U ′ni)

    4}

    = O

    (bnn

    ).

    This last expression is o(1) as long as � > δ. Thus if τ is such that h ∼ n−τ ,the choice of � = 4/5min{a/(a+1), 1− τ} and δ = 4/5� leads the Lyapunovratio to converge to 0.

    Asymptotic tightness. For y = (y1, y2) and y′ = (y′1, y

    ′2) in R

    2 and a fixedx ∈ R, consider the semi-metric

    ρ(y,y′) = |F1x(y1)− F1x(y′1)|+ |F2x(y2)− F2x(y′2)| .

    16

  • Now for a bounded function f : R2 → R and a subset T of R2, let

    Wδ(f, T ) = supy,y′∈T ;ρ(y,y′) 0.

    Hence, W·(Zxn,R2) corresponds to the modulus of ρ-continuity of Zxn. Now

    for γ ∈ (0, 1/2) and κn,γ = b(nh)1/2+γc, define for each j ∈ {1, 2} the grid

    I(j)n,γ =

    {F−1jx

    (0

    κn,γ

    ), F−1jx

    (1

    κn,γ

    ), . . . , F−1jx

    (κn,γκn,γ

    )},

    and let Tn,γ = I(1)n,γ × I(2)n,γ. For y = (y1, y2) ∈ R2, define yj,γ = max{z ∈ I

    (j)n,γ :

    z ≤ yj} and yj,γ = min{z ∈ I(j)n,γ : z ≥ yj}, so that Fjx(yj,γr) − Fjx(yj,γ) ≤

    1/κn,γ. Now observe that for any y ∈ R2,

    Zxn(y)− Zxn(yγ) ≤ Zxn(yγ)− Zxn(yγ)

    + 2√nh

    n∑

    i=1

    whi(x){Hxi(yγ)−Hxi(yγ)

    }

    = Zxn(yγ)− Zxn(yγ) + 2√nh ρ

    (yγ ,yγ

    )+ o(1).

    Indeed, the assumption H ∈ Fx allows the Taylor expansion√nh∑n

    i=1whi(x){Hxi(yγ)−Hxi(yγ)

    }

    =√nh[Hx(yγ)−Hx(yγ)] + [Ḣx(yγ)− Ḣx(yγ)]

    √nh

    n∑

    i=1

    whi(x){xi − x}

    +√nh

    n∑

    i=1

    whi(x){xi − x}2[Ḧzi(yγ)− Ḧzi(yγ)]

    where zi lies between xi and x. From Conditions W2–W3 and the fact thatH ∈ Fx, the latter is bounded by

    √nh ρ

    (yγ,yγ

    )+ nh5 (K2 +K3)× o(1),

    where one uses the general inequality on distribution functions

    |Hx(y)−Hx(y′)| ≤ |F1x(y1)− F1x(y′2)|+ |F2x(y2)− F2x(y′2)| .

    17

  • As a consequence, uniformly in y ∈ R2,

    Zxn(y)− Zxn(yγ) ≤ Zxn(yγ)− Zxn(yγ ,yγ) + o(1).

    From similar arguments, one deduces

    Zxn(yγ)− Zxn(y) ≤ Zxn(yγ)− Zxn(yγ) + o(1).

    Thus for any y, z ∈ R2,

    |Zxn(y)− Zxn(z)| ≤∣∣∣Zxn(yγ)− Zxn(yγ)

    ∣∣∣+∣∣Zxn(zγ)− Zxn(zγ)

    ∣∣

    +∣∣∣Zxn(yγ)− Zxn(zγ)

    ∣∣∣ .

    Since for n sufficiently large, ρ(y, z) < δ implies ρ(yγ, zγ) < 2δ, it follows

    that Wδ(Zxn,R2) ≤ 3W2δ(Zxn, Tn). It remains to show that for any positive

    sequence δn ↓ 0 and any � > 0,

    limn→∞

    P (Wδn (Zxn, Tn) > �) = 0.

    Observe that Wδn (Zxn, Tn) = 0 whenever δn < 2κ−1n,γ, while Wδn (Zxn, Tn) ≤

    W2κn,γ (Zxn, Tn) otherwise. Then, for any 1 ≤ i, j ≤ κn,γ, define the intervals

    An,γ(i, j) =

    [F−11x

    (i− 1κn,γ

    ), F−11x

    (i

    κn,γ

    )]×[F−12x

    (j − 1κn,γ

    ), F−12x

    (j

    κn,γ

    )].

    One can then conclude that

    P (Wδn (Zxn, Tn) ≥ �) ≤ P(

    max1≤i,j≤κn,γ

    |Hxh(An,γ(i, j))| ≥ �),

    where for an arbitrary nonempty rectangle A ∈ R2,

    Hxh(A) =√nh

    n∑

    i=1

    whi(x) {I ( (Y1i, Y2i) ∈ A)− νxi(A)} ,

    with νx(A) = P(Yx ∈ A), Yx ∼ Hx. Define µx = νx ⊗ λ, where λ denotesthe ρ-measure of A. In order to derive an extension of Theorem 3 of Bickel& Wichura (1971), one needs to find β > 1 and C ∈ R (that may depend on� and β) such that

    P (|Hxh(An,γ(i, j))| ≥ �) ≤ Cµx(An,γ(i, j))β.

    The following Lemma provides a moment inequality for Hxh(A).

    18

  • Lemma 6.2. Let (W)i∈Z, where Wi = (Y1i, Y2i, Xi), be a stationary sequencewith α-mixing coefficients satisfying αW(r) = O(r−a) for some a > 6 anddistribution function H ∈ Fx. Suppose the bandwidth parameter h is suchthat nh1+ra < ∞, where ra denotes the largest even integer such that ra < a.Moreover, assume Conditions W ?1 , W2–W5 and W11–W13 are satisfied. Thenfor any rectangle A ⊆ R2 such that µx(A) ≥ (nh)−1−δ for some positiveδ < 1, there exists Cα(a, x) < ∞ such that

    E {Hxh(A)ra} ≤Cα(ra, x)

    (nh)ra/2

    ra/2∑

    p=1

    {µx(A) + h2}p(nh)p

    ≤ Cα(a, x)(nh)ra/2

    ra/2∑

    p=1

    nph3pp∑

    k=1

    (p

    k

    )µx(A)

    kh−2k + 1

    .

    Starting from the Markov inequality,

    P (|Hxh(An,γ(i, j))| ≥ �) ≤ �−raE {Hxh(An,γ(i, j))ra} .Next note that for any 1 ≤ i, j ≤ κn,γ, the rectangle A = An,γ(i, j) satisfies

    (nh)−1−2γ ≤ µx(A) ≤ (nh)−1/2−γ .Upon taking β < 1/4(ra + 2) and γ such that n

    2γh < 1 in Lemma 6.2, thisyields for any p < β and n sufficiently large,

    p∑

    k=1

    (p

    k

    )µx(A)

    k−βh2p−2k(nh)−ra/2+p < 1.

    Moreover, if β(1 + 2γ) < ra/2, then the same holds true when p > β. Thus,if β < min{(ra + 2)/4, ra/2(1 + 2γ)} and for a suitable choice of γ,

    ra/2∑

    p=1

    p∑

    k=1

    (p

    k

    )µx(A)

    k−βh2(p−k)(nh)−ra/2+p ≤ ra2.

    Finally,

    ra/2∑

    p=1

    h2pµx(A)−β(nh)−ra/2+p ≤ max

    p∈{1,...,ra/2}

    {h2p(nh)(1+2γ)β(nh)−ra/2+p

    }

    ∼ n−2pτ+(1+2γ)(1−τ)β−(1−τ)(ra/2−p)∼ np(1−3τ)+(1+2γ)(1−τ)β−(1−τ)ra/2.

    19

  • If τ < 1/3, the maximum is attained at p = ra/2, which yields the boundβ < ra/4 to ensure the negligibility of the previous display since τ > 1/5.Else, if τ > 1/3 the maximum is at p = 1 and imposes β < ra/2. Sincea > 6, ra ≥ 6 and all above quantities allows a choice of β greater than 1.As a consequence one can find a constant C that depends on �, β, ra, γ andτ such that

    P (|Hxh(An,γ(i, j))| ≥ �) ≤ Cµx(An,γ(i, j))β.The asymptotic ρ-equicontinuity follows from an application of an exten-sion to Theorem 3 in Bickel & Wichura (1971) or Lemma 2 in Balacheff &Dupont (1980). Since the sequences Zxn(y) are asymptotically tight in R

    2

    and because R2 is totally bounded for ρ, the tightness of the sequence Zxnin `∞(R2) can be deduced from Theorem 1.5.7 in van der Vaart & Wellner(1996), which completes the proof.

    6.2. Proof of Theorem 3.3

    As pointed out by Veraverbeke et al. (2011), the asymptotic negligibility

    of√nh{Φ(Gxh) − Φ(G̃xh)} is closely related to the asymptotic behavior of

    the processes of pseudo-observations Z̃1xn = Z1xn − E(Z1xn) and Z̃2xn =Z2xn − E(Z2xn), where for j = 1, 2,

    Zjxn(t, u) =√

    nhj Fjzthj{F−1jzt (u)

    }, zt = x+ tCh.

    The key result is the following lemma whose proof is to be found in theAppendix B.

    Lemma 6.3. Let (Wi)i∈Z, where Wi = (Y1i, Y2i, Xi), be a stationary se-quence with associated α-mixing coefficients such that αW(r) = O(r−a) forsome a > 6. Suppose that for j = 1, 2, the functions Fjz{F−1jz (u)}, Ḟjz{F−1jz (u)}and F̈jz{F−1jz (u)} are continuous in (z, u) for z in a neighborhood of x. More-over assume nh51 < ∞ and nh52 < ∞. Finally, suppose assumptions W ?1 ,W6–W13 are satisfied. Then the sequences Z̃1xn and Z̃2xn are asymptoticallytight in l∞([−1, 1]× [0, 1]).

    Since the assumptions in Lemma 6.3 are satisfied, one can conclude thatZ̃1xn and Z̃2xn are asymptotically tight in the space `

    ∞([−1, 1]× [0, 1]). Theasymptotic negligibility of

    √nh{Φ(Gxh)−Φ(G̃xh)} will then follow from sim-

    ilar arguments as in Appendix B.2 of Veraverbeke et al. (2011).

    20

  • References

    Balacheff, S. & Dupont, G. (1980). Normalite asymptotique des pro-cessus empiriques tronques et des processus de rang. Lecture Notes inMathematics , 19–45.

    Bickel, P. J. & Wichura, M. J. (1971). Convergence criteria for mul-tiparameter stochastic processes and some applications. The Annals ofMathematical Statistics , 1656–1670.

    Billingsley, P. (1968). Convergence of probability measures. New York:John Wiley & Sons Inc.

    Bouezmarni, T., Rombouts, J. & Taamouti, A. (2012). A nonpara-metric copula based test for conditional independence with applications togranger causality. Journal of Business & Economic Statistics 30, 275–287.

    Bücher, A. & Ruppert, M. (2013). Consistent testing for a constant cop-ula under strong mixing based on the tapered block multiplier technique.J. Multivariate Anal. 116, 208–229.

    Bücher, A. & Volgushev, S. (2013). Empirical and sequential empiricalcopula processes under serial dependence. J. Multivariate Anal. .

    Camirand, F. (2013). Test d’indépendance conditionnel basé sur la copuleconditionnelle. Master’s thesis, Université de Sherbrooke.

    Carrasco, M. & Chen, X. (2002). Mixing and moment properties ofvarious garch and stochastic volatility models. Econometric Theory 18,17–39.

    Deheuvels, P. (1979). La fonction de dépendance empirique et ses pro-priétés. Un test non paramétrique d’indépendance. Acad. Roy. Belg. Bull.Cl. Sci. (5) 65, 274–292.

    Doukhan, P. (1994). Mixing: Properties and examples .

    Fermanian, J.-D., Radulović, D. & Wegkamp, M. H. (2004). Weakconvergence of empirical copula processes. Bernoulli 10, 847–860.

    Gaenßler, P. & Stute, W. (1987). Seminar on empirical processes.Basel: Birkhäuser Verlag.

    21

  • Geweke, J. (1982). Measurement of linear dependence and feedback be-tween multiple time series. Journal of the American Statistical Association77, 304–313.

    Geweke, J. (1984). Measures of conditional linear dependence and feedbackbetween time series. Journal of the American Statistical Association 79,907–915.

    Gijbels, I., Veraverbeke, N. & Omelka, M. (2011). Conditional cop-ulas, association measures and their applications. Comput. Statist. DataAnal. 55, 1919–1932.

    Gouriéroux, C., Monfort, A. & Renault, É. (1987). Kullback causal-ity measures. Annales dÉconomie et de Statistique 6/7, 369–410.

    Granger, C. W. J. (1969). Investigating causal relations by econometricmodels and cross-spectral methods. Econometrica 37, 424–459.

    Li, Q. & Racine, J. S. (2007). Nonparametric econometrics: Theory andpractice. Princeton University Press.

    Meitz, M. & Saikkonen, P. (2002). Ergodicity, mixing, and existence ofmoments of a class of markov models with applications to garch and acdmodels. Econometric Theory 24, 1291–1320.

    Nelsen, R. B. (2006). An introduction to copulas. Springer Series in Statis-tics. New York: Springer, 2nd ed.

    Omelka, M., Veraverbeke, N. & Gijbels, I. (2013). Bootstrapping theconditional copula. J. Statist. Plann. Inference 143, 1–23.

    Rio, E. (2000). Théorie asymptotique des processus aléatoires faiblementdépendants. Mathématiques et Applications. Springer.

    Rüschendorf, L. (1976). Asymptotic distributions of multivariate rankorder statistics. Ann. Statist. 4, 912–923.

    Segers, J. (2012). Weak convergence of empirical copula processes undernonrestrictive smoothness assumptions. Bernoulli 18, 764–782.

    Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges.Publ. Inst. Statist. Univ. Paris 8, 229–231.

    22

  • Su, L. & White, H. (2008). A nonparametric hellinger metric test forconditional independence. Econometric Theory 24, 829–864.

    Taamouti, A., Bouezmarni, T. & El Ghouch, A. (2014). Nonparamet-ric estimation and inference for conditional density based granger causalitymeasures. Journal of Econometrics 180, 251–264.

    van der Vaart, A. W. & Wellner, J. A. (1996). Weak convergenceand empirical processes. Springer Series in Statistics. New York: Springer-Verlag. With applications to statistics.

    Veraverbeke, N., Omelka, M. & Gijbels, I. (2011). Estimation of aconditional copula and association measures. Scand. J. Stat. 38, 766–780.

    Wiener, N. (1956). The theory of prediction. in E. F. Beckenback, ed., TheTheory of Prediction, McGraw-Hill, New York, chapter 8. .

    Appendix A. Conditions on the weights

    The following conditions are needed in order to establish Proposition 2.3.

    W1. max1≤i≤n |whi(x)| = oP((nh)−1/2

    );

    W2.∑n

    i=1whi(x)(Xi − x) = h2K3 + oP((nh)−1/2

    )for some K3 < ∞;

    W3.∑n

    i=1whi(x)(Xi − x)2 = h2K4 + oP((nh)−1/2

    )for some K4 > 0;

    W4. nh∑n

    i=1 {whi(x)}2 = K5 + oP(1) for some K5 > 0;

    W5. maxi∈Inx Xi − mini∈Inx Xi = oP(1), where Inx = {j ∈ {1, . . . , n} :whj(x) > 0}.

    Additionally, one has to assume the following conditions in Proposition 2.5.

    W6. supz∈J(n)x∑n

    i=1 |w′gji(z)| = OP(g−1j ), where J

    (n)x = [mini∈Inx Xi,maxi∈Inx Xi];

    W7. supz∈J(n)x∑n

    i=1{w′gji(z)}2 = OP(n−1g−3j );

    W8. For some finite constant C,

    P

    (sup

    z∈J(n)x

    max1≤i≤n

    ∣∣wgji(z) I (|xi − x| > Ch > 0)∣∣)

    = oP(1);

    23

  • W9. There exists DK < ∞ such that for all an,

    supz∈J

    (n)x

    ∣∣∣∣∣

    n∑

    i=1

    wani(z)(xi − z)− a2nDk

    ∣∣∣∣∣ = oP(a2n);

    W10. There exists EK < ∞ such that for all an,

    supz∈J

    (n)x

    ∣∣∣∣∣

    n∑

    i=1

    wani(z)(xi − z)2 − a2nEk

    ∣∣∣∣∣ = oP(a2n).

    The following version of condition W1 is needed to calculate the variance ofHxh:

    W ?1 . max1≤i≤n |whi(x)| = OP ((nh)−1).

    In order to establish moment inequalities of order r, one needs that for anyinteger 1 ≤ k ≤ r, any choice of L1, . . . , Lk ∈ N such that L1 + + Lk = r,and for some positive sequence υn satisfying n− υn → ∞:

    W11. supz∈Jx

    max1≤`2

  • S(2)d (k) = 4

    ∑i1,i2∈Sn(k)

    i1 6=i2

    ϑ3i1ϑi2whi1(x)3whi2(x), S

    (5)d (k) =

    ∑i1·i4∈Sn(k)i1 6=... 6=i4

    ∏4k=1 ϑikwhik(x)

    S(3)d (k) = 6

    ∑i1,i2∈Sn(k)

    i1 6=i2

    ϑ2i1ϑ2i2whi1(x)

    2whi2(x)2.

    First, in view of condition W12, one can find a constant K12 such that∑ni=1whi(x)

    4 ≤ K12n−3h−3. Since d > h−1, one writesrn∑

    k=1

    E{S(1)d (k)

    }≤ K12n−3h−3 ≤ K12n−3h−2d.

    Next, split S(2)d (k) in S

    (2,)d (k) according to the cases i1 < i2

    and i1 > i2. Once again, for any 1 ≤ ` ≤ d − i, |E{ϑ3iϑi+`}| ≤ αW(`) ≤ 1.One writes for πn = b1/

    √hc

    ∑rnk=1E

    {S(2,)d (k) ∼ o

    (n−3h−2d

    ).

    Next decompose S(3)d (k) = A

    (3)d (k) + {S

    (3)d (k)− A

    (3)d (k)} where

    A(3)d (k) = 6

    i1,i2∈Sn(k)i1 6=i2

    E{ϑ2i1}E{ϑ2i2}whi1(x)2whi2(x)2

    = 12∑

    i1,i2∈Sn(k)i1

  • Taking care of the first term, with assumption W11 in mind, one writes

    rn∑

    k=1

    A(3)d (k) ≤

    n∑

    i=1

    d∧(n−i)∑

    `=1

    E{ϑ2i1}E{ϑ2i+`}whi1(x)2wh,i+`(x)2

    ≤ d×{max1≤`≤d

    n−∑̀

    i=1

    whi1(x)2wh,i+`(x)

    2

    }∼ O(dn−3h−2).

    Next, since |E{ϑ2iϑ2i+`} − E{ϑ2i }E{ϑ2i+`}| ≤ αW(`),rn∑

    k=1

    S(3)d (k)−A

    (3)d (k) ≤ πn

    n∑

    i=1

    whi(x)2wh,i+`(x)

    2

    +

    {max1≤i≤n

    whi

    }2{ n∑

    i=1

    whi(x)2

    }d∑

    `=πn+1

    αW(`)

    ∼ O(n−3h−3/2) +O(n−3h−3)o(1) = o(n−3h−2d)

    where the last display is obtained from conditions W ?1 ,W11–W12, and the(assumed) boundedness of

    ∑∞`=1 αW(`). Using a similar strategy, with the

    help of those conditions and the fact that∑∞

    `=1(`+1)αW(`) < ∞ one obtainsrn∑

    k=1

    S(4)d (k) ∼ o(n−3h−2d).

    Finally observe that

    ES(5)d (k) = 4!

    ∑i1·i4∈Sn(k)i1

  • Since |E{ϑiϑi+g}| ≤ αW(g), one writes

    rn∑

    k=1

    A(5,1)d (k) ≤ πn

    {max

    1≤g2,g3≤g1≤πn

    n−g1−g2−g3∑

    i=1

    whi(x)3∏

    k=1

    wh,i+gk(x)

    }

    +

    {max1≤i≤n

    whi(x)

    }3 d∑

    g1=πn+1

    (g1 + 1)2αW(g1)

    = O(n−3h−3/2) +O(n−3h−3)o(1) ∼ o(n−3h−2d)

    where the last equality follows from assumption W ?1 and W11 together withthe assumption over the finiteness of

    ∑dg1=πn+1

    (g1 + 1)2αW(g1). Using the

    same conditions one deduces

    rn∑

    k=1

    A(5,3)r (k) ∼ o(n−3h−2d).

    Finally, as for S(3)d (k), decompose A

    (5,2)d (k) = D

    (5,2)r (k)+{A(5,2)r (k)−D(5,2)r (k)}

    where

    D(5,2)r (k) =∑

    S(2)n (k)

    E{ϑi1ϑi2}E{ϑi3ϑi4}4∏

    j=1

    whij(x).

    Roughly,

    rn∑

    r=1

    D(5,2)r (k) ≤ dn∑

    i=1

    d∧(n−i)∑

    `=1

    E {ϑiϑi+`}whi(x)wh,i+`(x) max1≤i≤n

    whi(x)2

    d∑

    g3=1

    αW(g3)

    ≤{max1≤`≤d

    n−∑̀

    i=1

    whi(x)wh,i+`(x)

    }×O(dn−2h−2)

    = O(dn−3h−2)

    using W ?1 , W11 and the assumption over the mixing coefficients αW(r). More-over, for any i1 < i2 < i3 < i4, |E{ϑi1ϑi2ϑi3ϑi4} − E{ϑi1ϑi2}E{ϑi3ϑi4}| ≤αW(g2). Similar development as for S

    (3)d (k)− A

    (3)d (k) leads

    rn∑

    r=1

    {A(5,2)d (k)−D(5,2)r (k)} ∼ o(n−3h−2d).

    27

  • Wrapping up every terms of the decomposition of∑rn

    k=1E {Sd(k)4}, one con-cludes that there exist a finite constant Cα that depends on the coefficientsαW(r) trough the sums

    ∑∞`=1(1 + `)

    jαW(`) for j = 1, 2, 3 and on the choiceof weight system through conditions W ?1 ,W11 and W12, such that

    rn∑

    k=1

    E {Sd(k)4} ≤ Cαn−3h−2d.

    Appendix B.2. Proof of Lemma 6.2

    First note that since H ∈ Fx, whenever xi falls in a neighborhood of x,

    νxi(A) = νx(A) + ν̇x(A){xi − x}+ ν̈zi(A){xi − x}2 (B.1)

    where zi is between xi and x. For simplicity let ϑi stand for ϑi(A) and νz forνz(A). The proof begins with a proposition.

    Proposition Appendix B.1. Suppose Conditions W ?1 , W11–W13 are sat-isfied, and assume the mixing coefficients of the sequence Wi = (Y1i, Y2i, Xi)satisfy αW(r) = O(r−a) for some a > 6. For any integers L1, L2 ≥ 1,write Sn(L1, L2) =

    ∑n−1`=1

    ∑n−`i=1 ϑ

    L1i ϑ

    L2i+`whi(x)

    L1wh,i+`(x)L2. Then there exist

    a constant Kα > 0 such that

    E{Sn(L1, L2)}

    = Kα

    {(nh)−l1−l2+1 [{νx + h2}+ π−ran + nh{νx + h2}2] if min(L1, L2) > 1(nh)−l1−l2+1 [{νx + h2}+ π−ran ] else .

    where πn = bh−1c.

    Proof. First decompose Sn(L1, L2) = {Sn(L1, L2)−An(L1, L2)}+An(L1, L2),where

    An(L1, L2) =n−1∑

    `=1

    n−∑̀

    i=1

    E(ϑL1i)E(ϑL2i+`

    )whi(x)

    L1wh,i+`(x)L2 .

    Notice that on one side |E{ϑL1i ϑL2i+`} − E(ϑL1i)E(ϑL2i+`

    )| ≤ αW(`) while on

    the other side |E{ϑL1i ϑL2i+`} − E(ϑL1i)E(ϑL2i+`

    )| ≤ νxi. Set υn = bn/2c. For n

    sufficiently large, πn < υn and one writes

    28

  • E{Sn(L1, L2)− An(L1, L2)}

    =

    πn∑

    `=1

    n−∑̀

    i=1

    [E{ϑL1i ϑL2i+`} − E(ϑL1i)E(ϑL2i+`

    )]whi(x)

    L1wh,i+`(x)L2

    +n−1∑

    `=πn+1

    n−∑̀

    i=1

    [E{ϑL1i ϑL2i+`} − E(ϑL1i)E(ϑL2i+`

    )]whi(x)

    L1wh,i+`(x)L2

    ≤ πn{

    max1≤`≤πn

    n−∑̀

    i=1

    νxiwhi(x)L1wh,i+`(x)

    L2

    }

    +

    υn∑

    `=πn+1

    αW(`)

    {max

    πn+1≤`≤υn

    n−∑̀

    i=1

    whi(x)L1wh,i+`(x)

    L2

    }

    +n−1∑

    υn

    αW(`)(υn + 1)

    {max1≤i≤n

    whi(x)

    }L1+L2

    = O((nh)−L1−L2−1{πnh{νx + �Ah2}+ π−ra+1n h+ υ−ra+2n h−1}

    )

    where �A = ν̇x(A) + ν̈x(A) piked from equation (B.1) together with as-sumption W11,W13 and W

    ?1 , since

    ∑n`=γn

    αW(`) ∼ O(γra+1n ) whenever γn →∞. From H ∈ Fx, �A remains uniformly bounded and can simply be ig-nored. From πnh → 1, there exist a constant Cα such that E{Sn(L1, L2) −An(L1, L2)} ≤ Cα(nh)−L1−L2−1({νx + h2}+ π−ran ).Next, notice that An(1, L2) = An(L1, 1) = 0 since E{ϑi} = 0. Otherwise,using W12 and W13 coupled with equation (B.1), one deduces there exist aconstant C ′ such that An(L1, L2) ≤ C ′{νx + h2}2(nh)−L1−L2. Setting Kα =max{Cα, C ′} concludes the proof.

    Denote the set of indices

    B(r)p ={

    L1, ..., Lr ∈ {1, ..., p} : L1 + ...+ Lr = p}.

    In the following, using an induction type of argument one shows that for anyp ≤ ra, for any r ≤ p and for any (L1, ..., Lr) ∈ B(r)p there exist a constantC(r, p) such that

    ∑i1

  • Step 1. The case p = 2, p = 3 and p = 4.

    For p = 2 and r = 1, the set B(1)2 contains only one indice (2). Moreover

    since Condition W12 holds one can find a constant C(1, 2) such that:

    n∑

    i=1

    E(ϑ2i)whi(x)

    2 ≤ C(1, 2){νx + h2}(nh)−1

    ≤ C(1, 2){µx + h2}(nh)−1.

    For r = 2, notice that the set B(2)2 only contains the pair (1, 1). One can

    than use Proposition Appendix B.1 with L1 = L2 = 1 to find a constantC(2, 2) such that

    i1

  • ∑i1

  • where

    T1n(g) =

    `g∑

    i1:ir

    E {r∏

    k=1

    ϑLkik whik(x)Lk} − T2n(g)

    T2n(g) =

    `g∑

    i1:ir

    E {g∏

    k=1

    ϑLkik whik(x)Lk}E {

    r∏

    k=g+1

    ϑLkik whik(x)Lk}.

    Dealing with T1n(g) first, using W?1 and W11 together with equation (B.1),

    for υn = n− (nh)1−δ (where δ is defined in the statement of the lemma):T1n(g) ≤

    {πn{νx + h2}+

    ∑υn`=πn+1

    (`+ 1)r−2αW(`)}

    × max1≤`2

  • since nh1+ra < ∞ which implies π−ran < (nh)−1. Thus there exist a constantC(r, p) such that

    T1n(g) + T2n(g) ≤ C(r, p){

    r?∑

    k=1

    {µx(A) + h2}k(nh)−p+k + π−ran (nh)−p+1}.

    Equation (B.2) is therefore satisfy.This completes Step 3. The proof followsfrom the decomposition

    E (Hxh(A)ra) = (nh)ra/2

    ra∑

    r=1

    r!∑

    (l1,...,lr)∈B(r)ra

    i1 �

    )= 0.

    As in the proof of 3.1, for κ(j)n,γ = b(nhj)1/2+γc define grids

    I(1)n,γ =

    {0,± 1

    κ(j)n,γ

    , . . . ,±κ(j)n,γ − 1κ(j)n,γ

    ,±1}

    I(2)n,γ =

    {0,

    1

    κ(j)n,γ

    , . . . ,κ(j)n,γ − 1κ(j)n,γ

    , 1

    }

    33

  • where γ ∈ (0, 1/2) is a grid parameter to be fixed later, and set Tn =I(1)n,γ × I(2)n,γ. For any (t, u) ∈ [−1, 1] × [0, 1], define (tγ, uγ) and (tγ, uγ) as

    in Section 6.1. Analogously to that section observe that

    Z̃jxn(t, u)−Z̃jxn(tγ , uγ) = {Z̃jxn(t, u)−Z̃jxn(t, uγ)}+{Z̃jxn(t, uγ)−Z̃jxn(tγ , uγ)}

    ≤ {Z̃jxn(t, uγ)− Z̃jxn(t, uγ)}+ {Z̃jxn(t, uγ)− Z̃jxn(tγ , uγ)}+E {Zjxn(t, uγ)− Zjxn(t, uγ)}. (B.3)

    Starting with the last term, using a Taylor expansion of Fjxi around zt:

    E {Zjxn(t, uγ)− Zjxn(t, uγ)} =√nhj{uγ − uγ}

    +{Ḟjzt(F−1jzt (uγ))− Ḟjzt(F−1jzt

    (uγ))}[√nhjn

    n∑

    i=1

    whji(zt)(xi − zt)]

    +1/2√nhj

    n∑

    i=1

    {F̈jrti(F−1jzt (uγ))− F̈jrti(F−1jzt (uγ))}whji(zt)(xi − zt)2

    where rti lies between zt and xi. Assumptions W9 and W10 together with the(assumed) uniform continuity of the functions Ḟjz(F

    −1jz ) and F̈jz(F

    −1jz ) yields

    {Ḟjzt(F−1jzt (uγ))− Ḟjzt(F−1jzt (uγ))}

    √nhj

    n∑

    i=1

    whji(zt)(xi − zt)= o(1)O(√nhjh

    2j )

    and

    √nhj

    n∑

    i=1

    {F̈jrti(F−1jzt (uγ))− F̈jrti(F−1jzt (uγ))}whji(zt)(xi − zt)2= o(1)O(√nhjh

    2j ).

    From the boundedness of√

    nhjh2j , one deduces the negligibility of the last

    two equations. From the grid definition,√

    nhj{uγ−uγ} = O((nhj)−γ), whichleads to the neglibility of E {Zjxn(t, uγ)− Zjxn(t, uγ)}.

    Next we deal with the term Z̃jxn(t, uγ)−Z̃jxn(tγ , uγ) in equation B.3. DenoteFjzhj =

    √nhj{Fjzhj−EFjzhj} and notice that Fjzhj◦F−1jz = Z̃jxn. One writes

    Z̃jxn(t, uγ)− Z̃jxn(tγ , uγ) =[Fjzthj{F−1jzt (uγ)} − Fjzthj{F

    −1jztγ

    (uγ)}]

    +[Fjzthj{F−1jztγ (uγ)} − Fjztγhj {F

    −1jztγ

    (uγ)}].

    34

  • In view of assumption W6 and the fact that zt − ztγ = Ch(t − tγ), for anyy ∈ R:

    √nhj |Fjzthj (y)− Fjztγhj(y)| =

    √nhj

    ∣∣∣∣∣

    n∑

    i=1

    I{Yji ≤ y}[whji(zt)− whji(ztγ ) ]∣∣∣∣∣

    ≤√

    nhj

    {supz∈Ix

    n∑

    i=1

    |w′hi(z)|}

    × h(t− tγ)

    = O((nhj)

    −γh−1j h).

    Since h/min(h1, h2) < ∞ the latter is o(1) uniformly in y. From similararguments one deduces that supy,t |Fjzthj(y) − Fjztγhj(y)| = o(1). It followsthat

    Z̃jxn(t, uγ)− Z̃jxn(tγ, uγ) = Fjzthj{F−1jzt (uγ)} − Fjzthj{F−1jztγ

    (uγ)}+ o(1)= Fjztγhj{F

    −1jzt (uγ)} − Fjztγhj{F

    −1jztγ

    (uγ)}+ o(1).

    Using the same strategy with the first term of equation (B.3), one deducesthat this term is

    Z̃jxn(t, uγ)− Z̃jxn(t, uγ) = Fjztγhj{F−1jzt (uγ)} − Fjztγhj{F

    −1jztγ

    (uγ)}+ o(1).

    From the continuity of the function z 7→ F−1jz in a neighborhood of x andsimilar arguments as the ones used previously,

    |Fjztγhj{F−1jzt

    (uγ)}−Fjztγhj{F−1jztγ

    (uγ)}|≤|Fjztγhj{F−1jztγ

    (uγ)}−Fjztγhj{F−1jztγ

    (uγ)}|

    +|Fjztγhj{F−1jztγ

    (uγ)} − Fjztγhj{F−1jztγ

    (uγ)}|+ o(1).

    Wrapping up the discussions around decomposition (B.3), one concludes that

    sup(t,u)∈I

    |Z̃jxn(t, u)− Z̃jxn(tγ , uγ)| ≤ 2 supt∈I

    (1)n,γ ,u∈[0,1]

    |Fjzthj{F−1jzt (uγ)} − Fjzthj{F−1jzt

    (uγ)}|

    +2 supt∈[−1,1],u∈I

    (2)n,γ

    |Fjztγhj{F−1jztγ

    (u)} − Fjztγhj{F−1jztγ

    (u)}|+ o(1).

    For tk =k

    κ(j)n,γ

    , denote A(j)n,γ(i, t) = [F

    −1jzt

    { i−1κ(j)n,γ

    }, F−1jzt { iκ(j)n,γ }] and B(j)n,γ(i, k) =

    [F−1jztk{ iκ(j)n,γ

    }, F−1jztk+1{i

    κ(j)n,γ

    }]. Moreover, write

    Gn,γ = {0, 1, . . . , κ(j)n,γ} × {0,±1, . . . ,±κ(j)n,γ}.

    35

  • Then from similar arguments as in the end of Section 6.1, for sufficientlylarge n:

    Wδ(Z̃jxn, I) ≤ 6 max(i,k)∈Gn,γ

    |Fjztkhj{A(j)n,γ(i, tk)}|+6 max

    (i,k)∈Gn,γ|Fjztkhj{B

    (j)n,γ(i, tk)}|.

    For any intervale A = [a, b] ⊂ R, denote νjz(A) = Fjz(b) − Fjz(a). On onehand, νjztk (A

    (j)n,γ(i, tk)) = (nh)

    −1/2−γ . On the other hand,

    νjztk (B(j)n,γ(i, k)) = |u− Fjztk{F

    −1jztk+1

    (u)}|

    = |Ḟjztk+1{F−1jztk+1

    (u)}(ztk − ztk+1) +1

    2F̈jzt?{F−1jztk+1(u)}(ztk − ztk+1)

    2|

    where t? ∈ [tk, tk+1]. Since ztk − ztk+1 = h(nhj)1/2+γ , the νjztk -measure of theset B

    (j)n,γ(i, k) is smaller than the νjztk -measure of the set A

    (j)n,γ(i, tk). One then

    argues that for n sufficiently large, for any (i, k) ∈ Gn,γ, either B(j)n,γ(i, k) ⊂A

    (j)n,γ(i− 1, tk) or B(j)n,γ(i, k) ⊂ A(j)n,γ(i, tk). Thus for any � > 0:

    P

    (Wδ(Z̃jxn, I) ≥ �

    )≤ P

    (12 max

    (i,k)∈Gn,γ|Fjztkhj{A

    (j)n,γ(i, tk)}| ≥

    12

    )

    ≤∑

    (i,k)∈Gn,γ

    P

    (|Fjztkhj{A

    (j)n,γ(i, tk)}| ≥

    12

    )

    ≤ (nhj)1+2γ max(i,k)∈Gn,γ

    P

    (|Fjztkhj{A

    (j)n,γ(i, tk)}| ≥

    12

    )

    Using Markov inequality coupled with Lemma 2 with µx = νjztk andra = 6, for some β > 0 :

    P(|Fjztkhj{A

    (j)n,γ(i, tk)}| ≥ �

    )≤ �−6E

    ((Fjztkhj{A

    (j)n,γ(i, tk)})6

    )

    ≤ Cα(ra, zt)3∑

    `=1

    {νjztk (A(j)n,γ(i, tk)) + h

    2j}`(nhj)`−3

    = Cα(ra, zt)(nhj)−1−2γ−βSn,γ(β)

    where Sn,γ(β) =∑3

    `=1{(nhj)−1/2−γ+h2j}`(nhj)`−2+β+2γ. It will now be shownthat Sn,γ(β) < 1 for some proper choice of β. Consider first the case whenhj ∼ n−τ with τ ≤ 13+2γ . This implies h2j > (nhj)−1/2−γ . It that case :

    Sn,γ(β) ≤ 8h6j(nhj)1+2γ+β + 4h4j(nhj)2γ+β + 2h2j(nhj)−1+2γ+β

    ∼ n−6τ+(1−τ)(1+2γ+β) + n−4τ+(1−τ)(2γ+β) + n−2τ+(1−τ)(−1+2γ+β).

    36

  • Thus, taking β < 1/2−2γ leads Sn,γ(β) < 1 for some small γ < 1/4 since τ ≥1/5. Next consider the case when τ > 1

    3+2γwhich implies h2j < (nhj)

    −1/2−γ .In that case:

    Sn,γ(β) ≤ 8(nhj)−1/2−γ+β + 4(nhj)−1+β + 2(nhj)−3/2+γ+β .

    The choice of β < 1/2− 2γ still implies Sn,γ(β) < 1. The conclusion is thatP

    (Wδ(Z̃jxn, I) ≥ �

    )≤ (nh)−β . The lemma is therefore proven.

    37

  • 1 2 3 4 5

    0e+

    001e

    −04

    2e−

    043e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    1 2 3 4 5

    0e+

    002e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    1 2 3 4 5

    0e+

    002e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    Figure 1: Average integrated squared bias (left) and average integrated variance (right) ofCxh (line) and C̃xh (dashed line) as a function of h under model M1 when θ = 0 (upperpanels), θ = .25 (middle panels) and θ = .5 (bottom panels)

    38

  • 1 2 3 4 5

    0e+

    002e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0.00

    000.

    0004

    0.00

    080.

    0012

    h

    AIV

    1 2 3 4 5

    0e+

    002e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0.00

    000.

    0004

    0.00

    080.

    0012

    h

    AIV

    1 2 3 4 5

    0e+

    002e

    −04

    4e−

    04

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    Figure 2: Average integrated squared bias (left) and average integrated variance (right) ofCxh (line) and C̃xh (dashed line) as a function of h under model M2 when θ = 0 (upperpanels), θ = .25 (middle panels) and θ = .5 (bottom panels)

    39

  • 1 2 3 4 5

    0.00

    000

    0.00

    010

    0.00

    020

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    1 2 3 4 5

    0.00

    000

    0.00

    010

    0.00

    020

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    1 2 3 4 5

    0.00

    000

    0.00

    010

    0.00

    020

    h

    AIS

    B

    1 2 3 4 5

    0e+

    004e

    −04

    8e−

    04

    h

    AIV

    Figure 3: Average integrated squared bias (left) and average integrated variance (right) ofCxh (line) and C̃xh (dashed line) as a function of h under model M3 when θ = 0 (upperpanels), θ = .25 (middle panels) and θ = .5 (bottom panels)

    40

  • 12

    34

    5

    0.00000 0.00010 0.00020 0.00030

    h

    AISB

    12

    34

    5

    0.0000 0.0005 0.0010 0.0015

    h

    AIV

    12

    34

    5

    0.00000 0.00010 0.00020 0.00030

    h

    AISB

    12

    34

    5

    0.0000 0.0005 0.0010 0.0015

    h

    AIV

    12

    34

    5

    0.00000 0.00010 0.00020

    h

    AISB

    12

    34

    5

    0.0000 0.0005 0.0010 0.0015

    h

    AIV

    Fig

    ure

    4:

    Avera

    ge

    integ

    rated

    squared

    bia

    s(left)

    and

    avera

    ge

    integ

    rated

    varia

    nce

    (right)

    of

    Cxh

    (line)

    andC̃

    xh

    (dash

    edlin

    e)as

    afu

    nctio

    nofh

    under

    model

    M4

    when

    θ=

    0(u

    pper

    panels),

    θ=

    .25

    (mid

    dle

    panels)

    andθ=

    .5(b

    otto

    mpanels)

    41

  • −0.004 −0.002 0.000 0.002 0.004 0.006

    −0.

    06−

    0.04

    −0.

    020.

    000.

    020.

    040.

    06

    Y1t−1

    τ xh

    −0.004 −0.002 0.000 0.002 0.004 0.006

    −0.

    06−

    0.04

    −0.

    020.

    000.

    020.

    040.

    06

    Y1t−1

    τ~ xh

    Figure 5: Estimated values of Kendall’s tau of (Z1i, Z2,i−1) given Z1,i−1 = x as a functionof x, where Z1i and Z2i are respectively the return and the volume at time i. Upper panel:estimation based on Cxh; bottom panel: estimation based on C̃xh

    42

  • −0.05 0.00 0.05 0.10

    −0.

    06−

    0.04

    −0.

    020.

    000.

    020.

    040.

    06

    Y2t−1

    τ xh

    −0.05 0.00 0.05 0.10

    −0.

    06−

    0.04

    −0.

    020.

    000.

    020.

    040.

    06

    Y2t−1

    τ~ xh

    Figure 6: Estimated values of Kendall’s tau of (Z1i, Z2,i−1) given Z1,i−1 = x as a functionof x, where Z1i and Z2i are respectively the volume and the return at time i. Upper panel:estimation based on Cxh; bottom panel: estimation based on C̃xh

    43