Functional Data Analysis: Introduction III Functional Data ... · Functional data indicate that the...

Functional Data Analysis: Introduction I

Definition (functional data)

Functional data indicate that the collected (observed ) data are observations

of functions varying over a continuum.

Functional Data Analysis , J.O. Ramsay and B. W. Silverman, Springer 2006.

148

Functional Data Analysis: Introduction II

Functional Data Analysis (FDA) aims at:

representing data in ways that help further analysis.

to display the data so as to highlight various characteristics

studying important sources of patterns and variations among the data

149

Functional Data Analysis: Introduction III

Example: Height of Girls

150

Functional Data Analysis: Introduction IV

10 subjects took part in the study.

each record has 31 discrete values of height that are not equally spaced.

there are uncertainty or noise in the collected height values (about 3mm).

the underlying process can be assumed to be a smooth function µj(t) for

each subject j taking part in the study and observations have been

collected for the stochastic process

sj(t) = µj(t) + ǫj(t)

151

Functional Data Analysis: Introduction V

For each record j, it is interesting to calculate an estimate of the function

µj(t) (Linear Smoothing methods).

When having a family of functions {µj(t)}j=1,··· ,J , it is interesting to

investigate the variations of this family of functions (functional Principal

Component Analysis).

For simplicity here, we are considering temporal stochastic processes. All

methods can easily be extended to spatio-temporal stochastic processes.

152

Linear Smoothing: Introduction I

We consider that we have observations {(t(i), s(i))}i=1,··· ,n

Definition (Linear smoothing)

A linear smoother estimates the function value s(t) by a linear combination of

the discrete observations:

s(t) =n∑

i=1

λi(t) s(i) =< ~λ(t)|~s >

The behaviour of the smoother at t is controlled by the weights λi(t).

153

Linear Smoothing: Introduction II

Example of linear smoothing methods:

1 Kriging X

2 Regression on basis of functions

3 Nadaraya-Watson estimator

Note that these methods have different hypotheses concerning the nature of

the noise ǫ.

154

Linear Smoothing: Regression on basis of functions I

A model can be proposed for the deterministic part of the process as

follow:

µ(t) =K∑

k=0

θk φk(t) =< ~φ(t)|~θ >

The standard model for the error ǫ is to be N (0, σ2) and independent on

time (or E[s(t)] = µ(t) and Var[s(t)] = σ2). Observations collected are

{(t(i), s(i))}i=1,··· ,n

155

Linear Smoothing: Regression on basis of functions II

The coefficients {θk}k=0,··· ,K are estimated using least squares:

~θ = (ΦTΦ)−1ΦT ~s

where ~s = [s(i), s(2), · · · , s(n)]T , Φ is the n× (K + 1) matrix collecting the

values {φk(t(i))}. So the estimate for E(s(t)) is:

s(t) =< ~φ(t)|~θ >= ~φ(t)T (ΦTΦ)−1ΦT

︸︷︷︸

~S(t)T

~s

Exercise: What are the basis functions {φk}k=0,··· ,K that can be used?

156

Linear Smoothing: Regression on basis of functions III

Example: Vancouver precipitation data with 13 Fourier Bases.

157

Linear Smoothing: Regression on basis of functions IV

Choice of Basis Expansions.

When performing least squares fitting of basis expansions φ0(t), φ1(t), · · · is a

basis system for s. The choice of this basis system is important in particular if

you want to explore time derivatives of s(t).

158

Linear Smoothing: Regression on basis of functions V

Monomial Basis φ0(t) = 1, φ1(t) = t, φ2(t) = t2, · · · , φK(t) = tK

Properties

◮ difficult for K > 6

◮ Derivatives of φk(t) get simpler but this is often not a desirable property

when fitting real-world data.

159

Linear Smoothing: Regression on basis of functions VI

K = 0 K = 1

K = 2 K = 3 160

Linear Smoothing: Regression on basis of functions

VII

Fourier Basis

{1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt), sin(3ωt), cos(3ωt), · · · , sin(Kωt), cos(Kωt)}

Properties

◮ natural basis for periodic data

◮ Derivatives retain complexities.

161

Linear Smoothing: Regression on basis of functions

VIII

K = 3 K = 4

K = 5 K = 6 162

Linear Smoothing: Regression on basis of functions IX

Splines. Splines are polynomial segments joined end-to-end. They are

constrained to be smooth at the joints (called knots). The order (order =

degree+1) of the polynomial segments and the location of the knots

define the system.

163

Linear Smoothing: Regression on basis of functions X

K = 1 K = 2

K = 3 K = 4

164

Linear Smoothing: Regression on basis of functions XI

Other basis:

wavelet

...

How to choose the number K of functions?

Information Criterion: AIC, BIC.

Cross-Validation

165

Linear Smoothing: Nadaraya-Watson estimator I

The model is s(t) = µ(t) + ǫ(t) with E(ǫ(t)) = 0.

The estimate s(t) is computed as an expectation of s at time t:

E[s|t] = E[s(t)] = E[µ(t) + ǫ(t)] = µ(t)

By definition of expectation:

E[s|t] =

∫

s ps|t(s|t) ds =

∫s pst(s, t) ds

pt(t)

Note how Bayes formula is used.

166

Linear Smoothing: Nadaraya-Watson estimator II

Since we dont know the true densities pst(s, t) and pt(t), the idea of the

Nadaraya-Watson estimator is to replace them by density estimates such

that the integral is easy to compute and the solution provides a smooth

estimate of s(t).

Consequently the Nadaraya-Watson estimator can be written:

E[s(t)] ≃

∫s pst(s, t) ds

pt(t)=

∑ni=1

1hk(

t−t(i)

h

)

s(i)

∑ni=1

1hk(

t−t(i)

h

) =

n∑

i=1

Si(t) s(i) = s(t)

167

Linear Smoothing: Nadaraya-Watson estimator III

Choosing the kernel k as the Dirac function is equivalent as using the

empirical density estimates for pst(s, t) and pt(t).

The resulting estimate for s(t) would not be smooth with the Dirac kernel

and other kernels can be used (e.g. Gaussian, uniform, quadratic,

triangle, Epanechnikov, etc.).

168

Linear Smoothing: Nadaraya-Watson estimator IV

Definition (Kernel Density Estimate)

Consider a random variable x for which observations {x(i)}i=1,··· ,n have been

collected. Choosing a even function k such that

∫

k(x) dx = 1 and k(x) ≥ 0 ∀x

the a Kernel Density Estimate (KDE) for the density function px is:

px(x) =1

nh

n∑

i=1

k

(x− x(i)

h

)

h is called the bandwidth and is a parameter controlling the level of smoothing.

169

Linear Smoothing: Conclusion and Extensions I

Conclusion:

Nadaraya Watson estimator: the bandwidth needs to be

chosen/estimated.

Regression on basis of functions: {θk} needs to be estimated, the basis

{φk(t)} needs to be chosen, and the number K of functions needs to be

selected.

170

Functional Data Analysis: Introduction III Functional Data ... · Functional data indicate that the...

Documents

Transcript of Functional Data Analysis: Introduction III Functional Data ... · Functional data indicate that the...