Two Sample Problem for Functional Data -...
Transcript of Two Sample Problem for Functional Data -...
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Two Sample Problem for Functional Data
Radek Hendrych
Stochastic Modelling in Economics and Finance 1
November 25, 2013
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Outline
1 Introduction
2 Testing Equality of Mean Functions
3 Testing Equality of Covariance Operators
4 Bibliography
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Two Sample Problem for Functional Data
testing the equality of the means in two independent samples
testing the equality of the covariance operators in two independentsamples
Asymptotic procedures will be introduced.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Outline
1 Introduction
2 Testing Equality of Mean Functions
3 Testing Equality of Covariance Operators
4 Bibliography
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Testing Equality of Mean Functions
Model
Consider two samples X1, . . . ,XN and X ∗1 , . . . ,X∗M satisfying the model
Xi (t) = µ(t) + εi (t), i = 1 . . . ,N, (1)
X ∗j (t) = µ∗(t) + ε∗j (t), j = 1 . . . ,M. (2)
Assumptions
(A1) the two samples are independent
(A2) ε1, . . . , εN are i.i.d. with Eε1(t) = 0 and E||ε1||4 <∞(A3) ε∗1 , . . . , ε
∗M are i.i.d. with Eε∗1(t) = 0 and E||ε∗1 ||4 <∞
The ε1 and ε∗1 do not have to follow the same distribution.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Main Goal
Testing Hypothesis
H0 : µ = µ∗ in L2 against H1 : ¬H0
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Method I
XN(t) =1
N
N∑i=1
Xi (t) and X ∗M(t) =1
M
M∑j=1
X ∗j (t)
are unbiased estimators for µ(t) and µ∗(t), respectively.
It is natural to reject the null hypothesis if
UN,M =NM
N + M
∫ 1
0
(XN(t)− X ∗M(t)
)2dt (3)
is large.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Convergence of UN,M under H0
Theorem
If H0 and the assumptions (A1), (A2) and (A3) hold, and
N
N + M→ θ, for some θ ∈ [0, 1], as N →∞,
then
UN,Md−→∫ 1
0
Γ2(t)dt, N,M →∞, (4)
where {Γ(t), t ∈ [0, 1]} is a Gaussian process satisfying EΓ(t) = 0 and
E [Γ(t)Γ(s)] = (1− θ)c(t, s) + θc∗(t, s),
with c(t, s) = cov(X1(t),X1(s)) and c∗(t, s) = cov(X ∗1 (t),X ∗1 (s)).
Proof. See [1].
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
The limit distribution of UN,M in (4) depends on the unknown covariancefunctions c and c∗.
According to the Karhunen-Loeve expansion, one can suppose that
Γ(t) =∞∑k=1
τ1/2k Nkϕk(t),
where Nk , k ∈ N, are independent N(0, 1) random variables,τ1 ≥ τ2 ≥ . . . and ϕ1, ϕ2, . . . are the eigenvalues and eigenfunctions ofthe operator determined by (1− θ)c + θc∗.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Since ∫ 1
0
Γ2(t)dt =∞∑k=1
τkN2k ,
to provide a reasonable approximation for∫ 1
0Γ2(t)dt, one only need to
estimate τk .
This can be done using τk , the eigenvalues of the empirical covariancefunction
zN,M(t, s) =M
N + M
1
N
N∑i=1
(Xi (t)− XN(t)
) (Xi (s)− XN(s)
)+
N
N + M
1
M
M∑j=1
(X ∗j (t)− X ∗M(t)
) (X ∗j (s)− X ∗M(s)
).
Thus, the sum∑d
k=1 τkN2k offers an approximation to the limit
distribution in (4) if d is large enough.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Asymptotic Consistency of Method I
Theorem
If the assumptions (A1), (A2) and (A3) hold,
N
N + M→ θ, for some θ ∈ [0, 1], as N →∞,
and ∫ 1
0
(µ(t)− µ∗(t))2 dt > 0,
then UN,MP−→∞, as N,M →∞.
Proof. See [1].
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Method II
The method is a projection version of the first procedure based on UN,M .
It does not require the numerical evaluation of the integral in thedefinition of UN,M (⇒ an easier implementation).
Consider projections onto the space determined by the leadingeigenfunctions of the operator Z = (1− θ)C + θC∗.
In particular, assume that the eigenvalues of Z satisfy
τ1 > τ2 > · · · > τd > τd+1. (5)
One wants to project observations onto the space spanned by ϕ1, . . . , ϕd ,i.e. the corresponding eigenfunctions.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
In fact, the functions ϕ1, . . . , ϕd are unknown.
⇓
The corresponding eigenfunctions of ZN,M , denoted by ϕi , are used.
This delivers a projection of XN − X ∗M into the linear space spanned byϕ1, . . . , ϕd . Let
a = (a1, . . . , ad)> , where ai = 〈XN − X ∗M , ϕi 〉.
Under the conditions of the first theorem, it can be shown that√NM/(N + M)a has approximately d-variate normal distribution (up to
some random signs) with the asymptotic variance Q = {Q(i , j)}di,j=1,
Q(i , j) = (1−θ)E〈X1−µ, ϕi 〉〈X1−µ, ϕj〉+θE〈X ∗1 −µ∗, ϕi 〉〈X ∗1 −µ∗, ϕj〉.
Thus, Q(i , i) = τi and Q(i , j)i 6=j= 0.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
According to the mentioned facts, testing procedures can be based on
T(1)N,M =
NM
N + M
d∑k=1
a2kτk, (6)
T(2)N,M =
NM
N + M
d∑k=1
a2k . (7)
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Convergence of T(1)N,M and T
(2)N,M under H0
Theorem
If H0, the assumptions (A1), (A2), (A3) and (5) hold, and
N
N + M→ θ, for some θ ∈ [0, 1], as N →∞,
then
T(1)N,M
d−→ χ2d and T
(2)N,M
d−→d∑
k=1
τkN2k , N,M →∞, (8)
where N1, . . . ,Nk are independent standard normal random variables.
Proof. See [1].
T(2)N,M is a projection version of UN,M , where only first d terms in L2 expansion
of XN − X ∗M are used. The statistic T
(1)N,M is an asymptotically distribution free
modification of T(2)N,M (and hence of UN,M).
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Asymptotic Consistency of Method II
Theorem
If the assumptions (A1), (A2), (A3) and (5) hold,
N
N + M→ θ, for some θ ∈ [0, 1], as N →∞,
and µ− µ∗ is not orthogonal to the linear span of ϕ1, . . . , ϕd , then
T(1)N,M
P−→∞ and T(2)N,M
P−→∞, as N,M →∞.
Proof. See [1].
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Empirical Example
Data
The data set consists of egg-laying trajectories of Mediterraneanfruit flies (medflies).
Consider 534 egg-laying curves of flies who lived at least 30 days.
Each function is defined over an interval [0, 30].
Its value on day t ∈ [0, 30] is the number of eggs laid by the fly i onthat day.
The 534 medflies are classified into:
256 short-lived (died before the end of the 44th day after birth),278 long-lived (lived longer than 44 days).
Two Samples
Xi (t), t ∈ [0, 30], i = 1, . . . , 256 (for short-lived medflies)
X ∗j (t), t ∈ [0, 30], j = 1, . . . , 278 (for long-lived medflies)
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Figure: Randomly selected egg-laying curves for short- and long-lived medflies.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Figure: The estimated mean functions for the medfly data: short-lived by thesolid line, long-lived by the dashed line.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
d 1 2 3 4 5 6 7 8 9
T(1)256,278 1.0 2.2 3.0 5.7 10.3 15.3 3.2 2.7 5.0
T(2)256,278 1.0 1.0 1.0 1.1 1.1 1.0 1.0 1.1 1.1
Table: The p-values (in %) of the tests based on the statistics T(1)256,278 and
T(2)256,278 applied to the medfly data.
The p-values for T(2)256,278 are much more stable.
The test based on T(1)N,M is easier to apply (due to the standard
chi-squared critical values), the test based on T(2)N,M may be more reliable.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Outline
1 Introduction
2 Testing Equality of Mean Functions
3 Testing Equality of Covariance Operators
4 Bibliography
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Testing Equality of Covariance Operators
Model Assumptions
Consider two samples X1, . . . ,XN and X ∗1 , . . . ,X∗M satisfying:
(A1) the two samples are independent,
(A2) X1, . . . ,XN are i.i.d. elements of L2 with EX1(t) = 0,
(A3) X ∗1 , . . . ,X∗M are i.i.d. elements of L2 with EX ∗1 (t) = 0.
The X1 and X ∗1 do not have to follow the same distribution.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Main Goal
Testing Hypothesis
Suppose the variance operators
C (x) = E[〈X , x〉X ], C∗(x) = E[〈X ∗, x〉X ∗], x ∈ L2,
where X has the same distribution as X1, and X ∗ has the samedistribution as X ∗1 .
One wants to test
H0 : C = C∗ against H1 : ¬H0.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Let R be the empirical covariance operator of the pooled data, i.e.
R(x) =1
N + M
N∑i=1
〈Xi , x〉Xi +M∑j=1
〈X ∗j , x〉X ∗j
= θC (x)+(1− θ)C∗(x),
where C and C∗ are the empirical counterparts of C and C∗, x ∈ L2, andθ = N
N+M .
The operator R has N + M eigenfunctions (denoted by φk).
Set
λk =1
N
N∑i=1
〈Xi , φk〉2 and λ∗k =1
M
M∑j=1
〈X ∗j , φk〉2.
These are the sample variances of the Fourier coefficients of X and X ∗
with respect to the orthonormal system {φk , k = 1, . . . ,N + M}.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Convergence of T under H0 and Gaussianity
The test statistic T is defined by
T =N + M
2θ(1− θ)
N+M∑i,j=1
〈(C − C∗)φi , φj〉2(θλi + (1− θ)λ∗i
)(θλj + (1− θ)λ∗j
) .Theorem
Suppose X and X ∗ are Gaussian elements of L2 such that E||X ||4 <∞and E||X ∗||4 <∞. Suppose also that θ → θ ∈ (0, 1), as N →∞. If H0
holds, then
Td−→ χ2
(N+M)(N+M+1)/2, N,M →∞. (9)
Proof. See [1].
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Outline
1 Introduction
2 Testing Equality of Mean Functions
3 Testing Equality of Covariance Operators
4 Bibliography
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Bibliography
Horvath, L., Kokoszka, P.: Inference for Functional Data withApplications. New York: Springer, 2012. Chapter 5.
Radek Hendrych Two Sample Problem for Functional Data
IntroductionTesting Equality of Mean Functions
Testing Equality of Covariance OperatorsBibliography
Thank you for your attention.
Radek Hendrych Two Sample Problem for Functional Data