Introduction to FDA and linear models
description
Transcript of Introduction to FDA and linear models
![Page 1: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/1.jpg)
Introduction to FDA and linear models
Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org
Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de PerpignanFrance
La Havane, September 15th, 2008
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 1 / 37
![Page 2: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/2.jpg)
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 2 / 37
![Page 3: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/3.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling grid
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 4: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/4.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 1: Regression case 1
100 wavelengths
Find the fat content of peaces of meat from their absorbence spectra.
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 5: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/5.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 2: Regression case 2
1 049 wavelengths
Find the disease content in wheat from its absorbence spectra.
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 6: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/6.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 3: Classification case 1
Recognize one of the five phonemes from its log-periodograms (256wavelengths).
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 7: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/7.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 4 : Classification case 2
Recognize one of two words from its record in frequency domain (8 192time steps).
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 8: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/8.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 5: Regression on functional data
[Azaïs et al., 2008]
Estimate a typical load curve (electricity consumption) from economicmultivariate variables.
Example 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 9: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/9.jpg)
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomenaon a discrete sampling gridExample 6 : Curves clustering
[Bensmail et al., 2005]Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
![Page 10: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/10.jpg)
Specific issues of learning with FDA
High dimensional data (the number of discretization point is oftenbigger or much more bigger than the number of observations);
Highly correlated data (because of the functional structureunderlined, the values at two sampling points are correlated).
Consequences: Direct use of classical statistical methods on thediscretization leads to ill-posed problems and provides inaccuratesolutions.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
![Page 11: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/11.jpg)
Specific issues of learning with FDA
High dimensional data (the number of discretization point is oftenbigger or much more bigger than the number of observations);
Highly correlated data (because of the functional structureunderlined, the values at two sampling points are correlated).
Consequences: Direct use of classical statistical methods on thediscretization leads to ill-posed problems and provides inaccuratesolutions.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
![Page 12: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/12.jpg)
Theoretical model
A functional random variable is a random variable X taking its values ina functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product 〈., .〉X;
in particular, L2 is often used;
any (infinite dimensional) Banach space with norm ‖.‖X (less usual);
for example, C0.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
![Page 13: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/13.jpg)
Theoretical model
A functional random variable is a random variable X taking its values ina functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product 〈., .〉X;
in particular, L2 is often used;
any (infinite dimensional) Banach space with norm ‖.‖X (less usual);
for example, C0.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
![Page 14: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/14.jpg)
Theoretical model
A functional random variable is a random variable X taking its values ina functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product 〈., .〉X;
in particular, L2 is often used;
any (infinite dimensional) Banach space with norm ‖.‖X (less usual);
for example, C0.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
![Page 15: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/15.jpg)
Hilbertian context
In the hilbertian context, we are able to definethe expectation of X as (Theorem of Riesz) the unique elementE (X) of X such that
∀ u ∈ X, 〈E (X) , u〉X = E (〈X , u〉X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → 〈u1, v〉Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space ofHilbert-Schmidt operators from X to X, HS(X)1
the variance of Xas the linear operator ΓX
ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X →
E (〈X − E (X) , u〉X(X − E (X))) .
1
This Hilbert space is equipped with the inner product∀ g1, g2 ∈ HS(X), 〈g1, g2〉HS(X) =
∑i〈g1ei , g2ei〉X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
![Page 16: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/16.jpg)
Hilbertian context
In the hilbertian context, we are able to definethe expectation of X as (Theorem of Riesz) the unique elementE (X) of X such that
∀ u ∈ X, 〈E (X) , u〉X = E (〈X , u〉X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → 〈u1, v〉Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space ofHilbert-Schmidt operators from X to X, HS(X)1
the variance of Xas the linear operator ΓX
ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X →
E (〈X − E (X) , u〉X(X − E (X))) .
1
This Hilbert space is equipped with the inner product∀ g1, g2 ∈ HS(X), 〈g1, g2〉HS(X) =
∑i〈g1ei , g2ei〉X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
![Page 17: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/17.jpg)
Hilbertian context
In the hilbertian context, we are able to definethe expectation of X as (Theorem of Riesz) the unique elementE (X) of X such that
∀ u ∈ X, 〈E (X) , u〉X = E (〈X , u〉X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → 〈u1, v〉Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space ofHilbert-Schmidt operators from X to X, HS(X)1
the variance of Xas the linear operator ΓX
ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X →
E (〈X − E (X) , u〉X(X − E (X))) .
1This Hilbert space is equipped with the inner product∀ g1, g2 ∈ HS(X), 〈g1, g2〉HS(X) =
∑i〈g1ei , g2ei〉X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
![Page 18: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/18.jpg)
Hilbertian context
In the hilbertian context, we are able to definethe expectation of X as (Theorem of Riesz) the unique elementE (X) of X such that
∀ u ∈ X, 〈E (X) , u〉X = E (〈X , u〉X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → 〈u1, v〉Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space ofHilbert-Schmidt operators from X to X, HS(X)1 the variance of Xas the linear operator ΓX
ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X →
E (〈X − E (X) , u〉X(X − E (X))) .
1This Hilbert space is equipped with the inner product∀ g1, g2 ∈ HS(X), 〈g1, g2〉HS(X) =
∑i〈g1ei , g2ei〉X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
![Page 19: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/19.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons)
because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.
As,
(ΓX u)(t)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 20: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/20.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons)
because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.
As,
(ΓX u)(t)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 21: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/21.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons)
because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.
As,
(ΓX u)(t)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 22: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/22.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, ΓtX u = 〈ζ t , u〉X.
As,
(ΓX u)(t)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 23: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/23.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.
As,
(ΓX u)(t)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 24: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/24.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.As,
(ΓX u)(t) = E (〈X , u〉XX(t))
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 25: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/25.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.As,
(ΓX u)(t) = E (〈X(t)X , u〉X)
= 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 26: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/26.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.As,
(ΓX u)(t) = 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 27: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/27.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.As,
(ΓX u)(t) = 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .
3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 28: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/28.jpg)
Case X = L2([0, 1])
if X = L2([0, 1]), this expressions simplify in:norm: ‖X‖2 =
∫[0,1]
(X(t))2dt < +∞;expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) =
∫X(t)dPX ,
variance: for all t , t ′ ∈ [0, 1],
ΓX ' γ(t , t ′) = E (X(t)X(t ′))
(if E (X) = 0 for clarity reasons) because:1 for all t ∈ [0, 1], we can define Γt
X : u ∈ X → (ΓX u)(t) ∈ R,2 By Riesz’s Theorem, it exists ζ t ∈ X such that ∀ u ∈ X, Γt
X u = 〈ζ t , u〉X.As,
(ΓX u)(t) = 〈E (X(t)X), u〉X,
we have that ζ t = E (X(t)X) .3 We define γ : (t , t ′) ∈ [0, 1]2 → ζ t (t ′) = E (X(t)X(t ′)) �
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
![Page 29: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/29.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1).
This eigensystem is such that:λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension.
More precisely, if λi > 0 for all i ≥ 1,∑i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 30: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/30.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1).
This eigensystem is such that:λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension.
More precisely, if λi > 0 for all i ≥ 1,∑i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 31: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/31.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,
Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension.
More precisely, if λi > 0 for all i ≥ 1,∑i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 32: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/32.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension.
More precisely, if λi > 0 for all i ≥ 1,∑i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 33: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/33.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension.
More precisely, if λi > 0 for all i ≥ 1,∑i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 34: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/34.jpg)
Properties of ΓX
ΓX is Hilbert-Schmidt:(definition)
∑i ‖ΓX ei‖
2 < +∞;it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i ,Karhunen-Loeve decomposition of ΓX
ΓX =∑i≥1
λivi ⊗ vi .
ΓX has no inverse in the space of continuous operator from X to X, ifX has infinite dimension. More precisely, if λi > 0 for all i ≥ 1,∑
i≥1
1λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
![Page 35: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/35.jpg)
Model of the observed data
We focus on:1 a regression problem: Y ∈ R has to be predicted from X ∈ X,2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Learning set - Version 1
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y).
Remark: if E(‖X‖2
X
)< +∞ (i.e., if ΓX exists), a functional version of the
Central Limit Theorem exists.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
![Page 36: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/36.jpg)
Model of the observed data
We focus on:1 a regression problem: Y ∈ R has to be predicted from X ∈ X,2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Learning set - Version 1
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y).
Remark: if E(‖X‖2
X
)< +∞ (i.e., if ΓX exists), a functional version of the
Central Limit Theorem exists.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
![Page 37: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/37.jpg)
Model of the observed data
We focus on:1 a regression problem: Y ∈ R has to be predicted from X ∈ X,2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Learning set - Version 1
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y).
Remark: if E(‖X‖2
X
)< +∞ (i.e., if ΓX exists), a functional version of the
Central Limit Theorem exists.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
![Page 38: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/38.jpg)
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t))t∈τi is observed, where τi is a finite set.
Questions:1 How to obtain (xi)i from (xτ
i
i )i?
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
![Page 39: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/39.jpg)
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t))t∈τi is observed, where τi is a finite set.
Questions:1 How to obtain (xi)i from (xτ
i
i )i?
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
![Page 40: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/40.jpg)
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t))t∈τi is observed, where τi is a finite set.
Questions:1 How to obtain (xi)i from (xτ
i
i )i?
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
![Page 41: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/41.jpg)
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t))t∈τi is observed, where τi is a finite set.
Questions:1 How to obtain (xi)i from (xτ
i
i )i?2 What are the consequences of this uncertainty on the accuracy of
the solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
![Page 42: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/42.jpg)
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t) + εit )t∈τi is observed, where τi is a finite setand εit is a centered random variable independant of X .
Again,1 How to obtain (xi)i from (xτ
i
i )i? (works have been done: functionestimation)
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i? (relating to “errors-in-variables” problems ; almost no work inFDA)
In these presentations, works coming from the three points of view.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
![Page 43: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/43.jpg)
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t) + εit )t∈τi is observed, where τi is a finite setand εit is a centered random variable independant of X .
Again,1 How to obtain (xi)i from (xτ
i
i )i? (works have been done: functionestimation)
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i? (relating to “errors-in-variables” problems ; almost no work inFDA)
In these presentations, works coming from the three points of view.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
![Page 44: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/44.jpg)
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t) + εit )t∈τi is observed, where τi is a finite setand εit is a centered random variable independant of X .
Again,1 How to obtain (xi)i from (xτ
i
i )i? (works have been done: functionestimation)
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i? (relating to “errors-in-variables” problems ; almost no work inFDA)
In these presentations, works coming from the three points of view.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
![Page 45: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/45.jpg)
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X ,Y) and forall i = 1, . . . , n, xτ
i
i = (xi(t) + εit )t∈τi is observed, where τi is a finite setand εit is a centered random variable independant of X .
Again,1 How to obtain (xi)i from (xτ
i
i )i? (works have been done: functionestimation)
2 What are the consequences of this uncertainty on the accuracy ofthe solution of the regression/classification problem? Can we obtain asolution that is as good as those obtained from the direct observationof (xi)i? (relating to “errors-in-variables” problems ; almost no work inFDA)
In these presentations, works coming from the three points of view.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
![Page 46: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/46.jpg)
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 12 / 37
![Page 47: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/47.jpg)
Multidimensional PCA: context and notations
Data:
A real matrixX = (x j
i )i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for anindividual:
xi = (x1i , . . . , x
pi )T .
p columns, each corresponding to n observations of a variable:
x j = (x j1, . . . , x
jn)T .
Aim: Find linearly independant variables, that are linear combinations ofthe original ones, by order of “importance” in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
![Page 48: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/48.jpg)
Multidimensional PCA: context and notations
Data:
A real matrixX = (x j
i )i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for anindividual:
xi = (x1i , . . . , x
pi )T .
p columns, each corresponding to n observations of a variable:
x j = (x j1, . . . , x
jn)T .
Aim: Find linearly independant variables, that are linear combinations ofthe original ones, by order of “importance” in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
![Page 49: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/49.jpg)
Multidimensional PCA: context and notations
Data:
A real matrixX = (x j
i )i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for anindividual:
xi = (x1i , . . . , x
pi )T .
p columns, each corresponding to n observations of a variable:
x j = (x j1, . . . , x
jn)T .
Aim: Find linearly independant variables, that are linear combinations ofthe original ones, by order of “importance” in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
![Page 50: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/50.jpg)
Multidimensional PCA: context and notations
Data:
A real matrixX = (x j
i )i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for anindividual:
xi = (x1i , . . . , x
pi )T .
p columns, each corresponding to n observations of a variable:
x j = (x j1, . . . , x
jn)T .
Aim: Find linearly independant variables, that are linear combinations ofthe original ones, by order of “importance” in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
![Page 51: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/51.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 52: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/52.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.
Solution:Var
([∥∥∥Pa(xi)∥∥∥Rp
]i
)=
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 53: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/53.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
∥∥∥aT xia∥∥∥2Rp
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 54: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/54.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
(aT xi)2 ‖a‖2Rp︸︷︷︸
=1
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 55: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/55.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
(aT xi)(xTi a)
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 56: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/56.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)= aT
1n
n∑i=1
xixTi
a
aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 57: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/57.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)= aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 58: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/58.jpg)
First principal component
Suppose (to simplify)1 Data are centered: for all j = 1, . . . , p, x j = 1
n∑n
i=1 x ji = 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1n XT X.
Problem: Find a∗ ∈ Rp such that:
a∗ := arg maxa:‖a‖Rp =1
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)︸ ︷︷ ︸inertia
.Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)= aT Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
![Page 59: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/59.jpg)
An eigenvalue decomposition
GeneralizationIf ((λi)i=1,...,p , (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (bydecreasing order of the positive λi) then (ai) are the factorial axis of X.The principal components of X are the coordinates of the projections ofthe data onto these axis.
Then, we have:
Var (X) =
p∑j=1
λjajaTj
and
xi =
p∑j=1
(xTi aj)︸ ︷︷ ︸
principal component c ji
aj .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
![Page 60: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/60.jpg)
An eigenvalue decomposition
GeneralizationIf ((λi)i=1,...,p , (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (bydecreasing order of the positive λi) then (ai) are the factorial axis of X.The principal components of X are the coordinates of the projections ofthe data onto these axis.
Then, we have:
Var (X) =
p∑j=1
λjajaTj
and
xi =
p∑j=1
(xTi aj)︸ ︷︷ ︸
principal component c ji
aj .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
![Page 61: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/61.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:Var
([∥∥∥Pa(xi)∥∥∥Rp
]i
)=
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 62: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/62.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:Var
([∥∥∥Pa(xi)∥∥∥Rp
]i
)=
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 63: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/63.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
‖〈a, xi〉Xa‖2X
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 64: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/64.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
〈a, xi〉2X‖a‖2X︸︷︷︸
=1
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 65: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/65.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)=
1n
n∑i=1
〈a, 〈a, xi〉Xxi〉X
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 66: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/66.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:
Var([∥∥∥Pa(xi)
∥∥∥Rp
]i
)= 〈
1n
n∑i=1
〈a, xi〉Xxi , a〉X
〈ΓnX a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 67: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/67.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:Var
([∥∥∥Pa(xi)∥∥∥Rp
]i
)= 〈Γn
X a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).
⇒ a∗ is the eigenvector of ΓnX associated with the biggest (positive)
eigenvalue: ΓnX a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 68: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/68.jpg)
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functionalvariable X taking its values in X.Aim: Find a∗ ∈ X such that:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}.
Solution:Var
([∥∥∥Pa(xi)∥∥∥Rp
]i
)= 〈Γn
X a, a〉X
where ΓnX = 1
n∑n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rankalmost equal to n (Hilbert-Schmidt operator).⇒ a∗ is the eigenvector of Γn
X associated with the biggest (positive)eigenvalue: Γn
X a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
![Page 69: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/69.jpg)
Eigenvalue decomposition of ΓnX
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of ΓnX (by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (notethat at most n λi are nonzero).
The principal components of x1, . . . , xn are the coordinates of theprojections of the data onto these axis.
Then, we have:
ΓnX =
n∑j=1
λjaj ⊗ aj
and
xi =n∑
j=1
〈xi , aj〉X︸ ︷︷ ︸principal component c j
i
aj .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
![Page 70: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/70.jpg)
Eigenvalue decomposition of ΓnX
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of ΓnX (by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (notethat at most n λi are nonzero).The principal components of x1, . . . , xn are the coordinates of theprojections of the data onto these axis.
Then, we have:
ΓnX =
n∑j=1
λjaj ⊗ aj
and
xi =n∑
j=1
〈xi , aj〉X︸ ︷︷ ︸principal component c j
i
aj .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
![Page 71: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/71.jpg)
Eigenvalue decomposition of ΓnX
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of ΓnX (by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (notethat at most n λi are nonzero).The principal components of x1, . . . , xn are the coordinates of theprojections of the data onto these axis.
Then, we have:
ΓnX =
n∑j=1
λjaj ⊗ aj
and
xi =n∑
j=1
〈xi , aj〉X︸ ︷︷ ︸principal component c j
i
aj .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
![Page 72: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/72.jpg)
Example on Tecator dataset
Data:
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 73: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/73.jpg)
Example on Tecator dataset
Data:
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 74: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/74.jpg)
Example on Tecator dataset
Two first factorial axes
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 75: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/75.jpg)
Example on Tecator dataset
Two first factorial axes
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 76: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/76.jpg)
Example on Tecator dataset
Two first factorial axes
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 77: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/77.jpg)
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 78: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/78.jpg)
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 79: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/79.jpg)
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
![Page 80: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/80.jpg)
Link with the regression problem
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
![Page 81: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/81.jpg)
Link with the regression problem
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
![Page 82: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/82.jpg)
Smoothness of the factorial axes
On a practical point of view, functional PCA in its original version iscomputing as the multivariate PCA (on the discretization or on thedecomposition of the curves on a Hilbert basis).
Hence, if the original data is irregular, the factorial axes won’t be smooth:
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
![Page 83: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/83.jpg)
Smoothness of the factorial axes
On a practical point of view, functional PCA in its original version iscomputing as the multivariate PCA (on the discretization or on thedecomposition of the curves on a Hilbert basis).Hence, if the original data is irregular, the factorial axes won’t be smooth:
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
![Page 84: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/84.jpg)
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtainsmooth (regular) factorial axes.
Penalized functional PCA: if D2mX ∈ X = L2
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)+µ
∥∥∥Dma∥∥∥2X
}(µ > 0) and hence a∗ is the eigenvector of Γn
X + µD2m associated with thebiggest eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
![Page 85: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/85.jpg)
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtainsmooth (regular) factorial axes.Ordinary functional PCA:
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)}and hence a∗ is the eigenvector of Γn
X associated with the biggesteigenvalue.
Penalized functional PCA: if D2mX ∈ X = L2
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)+µ
∥∥∥Dma∥∥∥2X
}(µ > 0) and hence a∗ is the eigenvector of Γn
X + µD2m associated with thebiggest eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
![Page 86: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/86.jpg)
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtainsmooth (regular) factorial axes.Penalized functional PCA: if D2mX ∈ X = L2
a∗ := arg maxa:‖a‖X=1
{Var
([∥∥∥Pa(xi)∥∥∥X
]i
)+µ
∥∥∥Dma∥∥∥2X
}(µ > 0) and hence a∗ is the eigenvector of Γn
X + µD2m associated with thebiggest eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
![Page 87: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/87.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 which implicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 88: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/88.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 which implicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 89: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/89.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 Then, we can demonstrate that:
ΓnX a + µD2ma =
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξik ek + µ(βT
k a)ek ,
where E is the matrix containing (〈ek , ek ′〉X)k ,k ′=1,...,K .
whichimplicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 90: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/90.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 which implicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 91: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/91.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 which implicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 92: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/92.jpg)
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,1 Approximate the observations by xi =
∑Kk=1 ξ
ik ek .
2 Approximate the derivatives of the (ek )k by D2mek ′ =∑K
k=1 βk ′k ek .
3 which implicates
ΓnX + µD2m : a ∈ RK →
K∑k=1
1n
n∑i=1
((ξi)T Ea)ξi + µ(βT a).
4 Smooth PCA is performed by an eigendecomposition of1n∑n
i=1 ξi(ξi)T E + µβT .
Remark: The decomposition D2mek ′ =∑K
k=1 βk ′k ek is easy to obtain
when using a spline basis⇒ splines are well designed to represent datawith smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
![Page 93: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/93.jpg)
To conclude, several references. . .
Theoretical background for functional PCA:[Deville, 1974][Dauxois and Pousse, 1976]
Smooth PCA:[Besse and Ramsay, 1986][Pezzulli and Silverman, 1993][Silverman, 1996]
Several examples and discussion:[Ramsay and Silverman, 2002][Ramsay and Silverman, 1997]
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 23 / 37
![Page 94: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/94.jpg)
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 24 / 37
![Page 95: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/95.jpg)
The model
We are interested here in the functional linear regression model:Y is a random variable taking its values in R,X is a random variable taking its values in X,X and Y satisfy the following model:
Y = 〈X , α〉X + ε
where ε is a real random variable centered and independant of X andα is the parameter to be estimated.
Or, alternatively, we are given a training set of size n witherrors-in-variables: (wi , yi)i=1,...,n with:
wi = xi + ηi (ηi is the realization of a centered random variableindependant of Y )yi = 〈xi , α〉X + εi
This problem will be investigated in Presentation 4
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
![Page 96: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/96.jpg)
The model
We are interested here in the functional linear regression model:Y is a random variable taking its values in R,X is a random variable taking its values in X,X and Y satisfy the following model:
Y = 〈X , α〉X + ε
where ε is a real random variable centered and independant of X andα is the parameter to be estimated.
We are given a training set of size n, (xi , yi)i=1,...,n of independantrealizations of the random pair (X ,Y).
Or, alternatively, we are given a training set of size n witherrors-in-variables: (wi , yi)i=1,...,n with:
wi = xi + ηi (ηi is the realization of a centered random variableindependant of Y )yi = 〈xi , α〉X + εi
This problem will be investigated in Presentation 4
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
![Page 97: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/97.jpg)
The model
We are interested here in the functional linear regression model:Y is a random variable taking its values in R,X is a random variable taking its values in X,X and Y satisfy the following model:
Y = 〈X , α〉X + ε
where ε is a real random variable centered and independant of X andα is the parameter to be estimated.
Or, alternatively, we are given a training set of size n witherrors-in-variables: (wi , yi)i=1,...,n with:
wi = xi + ηi (ηi is the realization of a centered random variableindependant of Y )yi = 〈xi , α〉X + εi
This problem will be investigated in Presentation 4
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
![Page 98: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/98.jpg)
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.Let us first define the covariance between X and Y as:
∆(X ,Y) = E (XY) ∈ X.
Then, we have:ΓXα = ∆(X ,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empiricalestimate of α (using a generalized inverse of Γn
X ) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.⇒ Penalization or regularization are needed to access a relevantestimate.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
![Page 99: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/99.jpg)
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.Let us first define the covariance between X and Y as:
∆(X ,Y) = E (XY) ∈ X.
Then, we have:ΓXα = ∆(X ,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empiricalestimate of α (using a generalized inverse of Γn
X ) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.⇒ Penalization or regularization are needed to access a relevantestimate.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
![Page 100: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/100.jpg)
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.Let us first define the covariance between X and Y as:
∆(X ,Y) = E (XY) ∈ X.
Then, we have:ΓXα = ∆(X ,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empiricalestimate of α (using a generalized inverse of Γn
X ) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.
⇒ Penalization or regularization are needed to access a relevantestimate.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
![Page 101: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/101.jpg)
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.Let us first define the covariance between X and Y as:
∆(X ,Y) = E (XY) ∈ X.
Then, we have:ΓXα = ∆(X ,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empiricalestimate of α (using a generalized inverse of Γn
X ) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.⇒ Penalization or regularization are needed to access a relevantestimate.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
![Page 102: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/102.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.
PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 103: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/103.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 104: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/104.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 105: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/105.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 106: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/106.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 107: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/107.jpg)
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] onhilbertian AR models.PCA decomposition of X : Note
((λni , v
ni ))i≥1 the eigenvalue decomposition of Γn
X ((λi)i are ordered indecreasing order and almost n eigenvalues are not null; (vn
i )i areorthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) =∑kn
i=1〈vni , .〉Xvn
i
Γn,knX = Pkn ◦ Γn,kn
X ◦ Pkn = 1n∑kn
i=1 λni 〈v
ni , .〉Xvn
i
∆n,kn(X ,Y)
= Pkn ◦1n∑n
i=1 yixi = 1n∑
i=1,...,n, i′=1,...,kn yi〈xi , vni′ 〉Xvn
i′
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
![Page 108: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/108.jpg)
Definition of a consistent estimate for α
Define:αn =
(Γn,kn
X
)+∆n,kn
(X ,Y)
where(Γn,kn
X
)+denotes the generalized inverse (Moore-Penrose):
(Γn,kn
X
)+=
kn∑i=1
(λni )−1〈vn
i , .〉Xvni .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 28 / 37
![Page 109: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/109.jpg)
Assumptions for consistency result
(A1) (λni )i=1,...,kn are all distinct and not null a.s.
(A2) (λi)i≥1 are all distinct and not null
(A3) X is a.s. bounded in X (‖X‖X ≤ C1 a.s.)
(A4) ε is a.s. bounded (‖ε‖ ≤ C2 a.s.)
(A5) limn→+∞nλ4
knlog n = +∞
(A6) limn→+∞nλ2
kn(∑knj=1 aj
)2log n
= +∞ where a1 = 2√
2λ1−λ2
and
aj = 2√
2min(λj−1−λj ;λj−λj+1)
for j > 1
Example of ΓX satisfying those assumptions: if the eigenvalues of ΓX
are geometrically or exponentially decreasing, these assumptions arefullfilled as long as the sequence (kn)n tends slowly enough to +∞.For example, X is a Brownian motion on [0, 1] and kn = o(log n).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
![Page 110: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/110.jpg)
Assumptions for consistency result
(A1) (λni )i=1,...,kn are all distinct and not null a.s.
(A2) (λi)i≥1 are all distinct and not null
(A3) X is a.s. bounded in X (‖X‖X ≤ C1 a.s.)
(A4) ε is a.s. bounded (‖ε‖ ≤ C2 a.s.)
(A5) limn→+∞nλ4
knlog n = +∞
(A6) limn→+∞nλ2
kn(∑knj=1 aj
)2log n
= +∞ where a1 = 2√
2λ1−λ2
and
aj = 2√
2min(λj−1−λj ;λj−λj+1)
for j > 1
Example of ΓX satisfying those assumptions: if the eigenvalues of ΓX
are geometrically or exponentially decreasing, these assumptions arefullfilled as long as the sequence (kn)n tends slowly enough to +∞.For example, X is a Brownian motion on [0, 1] and kn = o(log n).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
![Page 111: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/111.jpg)
Consistency result
Theorem [Cardot et al., 1999]Under assumptions (A1)-(A6), we have:∥∥∥αn − α
∥∥∥X
n→+∞−−−−−−→ 0
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 30 / 37
![Page 112: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/112.jpg)
Smoothing approach based on B-splines
References: [Cardot et al., 2003]
Suppose that X takes values in L2([0, 1]).Basics on B-splines: Let q and k be to integers and denotes by Sqk thespace of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval[
l−1k , l
k
]for all
l = 1, . . . , k ,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk isdenoted by {Bqk
j , j = 1, . . . , q + k } (normalized B-Splines, see[de Boor, 1978]).These functions are easy to manipulate and have interesting smoothnessproperties. They can be used to express either X and the parameter α aswell as to impose smoothness constrains on α.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
![Page 113: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/113.jpg)
Smoothing approach based on B-splines
References: [Cardot et al., 2003]Suppose that X takes values in L2([0, 1]).Basics on B-splines: Let q and k be to integers and denotes by Sqk thespace of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval[
l−1k , l
k
]for all
l = 1, . . . , k ,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk isdenoted by {Bqk
j , j = 1, . . . , q + k } (normalized B-Splines, see[de Boor, 1978]).These functions are easy to manipulate and have interesting smoothnessproperties. They can be used to express either X and the parameter α aswell as to impose smoothness constrains on α.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
![Page 114: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/114.jpg)
Smoothing approach based on B-splines
References: [Cardot et al., 2003]Suppose that X takes values in L2([0, 1]).Basics on B-splines: Let q and k be to integers and denotes by Sqk thespace of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval[
l−1k , l
k
]for all
l = 1, . . . , k ,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk isdenoted by {Bqk
j , j = 1, . . . , q + k } (normalized B-Splines, see[de Boor, 1978]).
These functions are easy to manipulate and have interesting smoothnessproperties. They can be used to express either X and the parameter α aswell as to impose smoothness constrains on α.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
![Page 115: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/115.jpg)
Smoothing approach based on B-splines
References: [Cardot et al., 2003]Suppose that X takes values in L2([0, 1]).Basics on B-splines: Let q and k be to integers and denotes by Sqk thespace of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval[
l−1k , l
k
]for all
l = 1, . . . , k ,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk isdenoted by {Bqk
j , j = 1, . . . , q + k } (normalized B-Splines, see[de Boor, 1978]).These functions are easy to manipulate and have interesting smoothnessproperties. They can be used to express either X and the parameter α aswell as to impose smoothness constrains on α.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
![Page 116: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/116.jpg)
Definition of a consistent estimate for α
Note Bqk :=(Bqk
1 , . . . ,Bqkq+k
)Tand B(m)
qk the m derivatives of Bqk for am < q + k .
A penalized mean square estimate: Providing the fact that αn is smoothenough, we aim at finding αn =
∑q+kj=1 an
j Bqkj = (an)T Bqk solution of the
optimization problem:
arg mina∈Rq+k
1n
n∑i=1
(yi − 〈aT Bqk , xi〉X
)2
︸ ︷︷ ︸mean square criterion
+µ∥∥∥∥aT B(m)
qk
∥∥∥∥2
X︸ ︷︷ ︸smoothness penalization
The solution of the previous problem is given by
an =(Cn + µGn
qk
)−1bn
where Cn is the matrix with components 〈ΓnX Bqk
j ,Bqkj′ 〉X
(j, j′ = 1, . . . , q + k ), bn is the vector with components 〈∆n(X ,Y)
,Bqkj 〉X
(j = 1, . . . , q + k ) and Gnqk is the matrix with components 〈Bqk(m)
j ,Bqk(m)j′ 〉X
(j, j′ = 1, . . . , q + k ).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
![Page 117: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/117.jpg)
Definition of a consistent estimate for α
Note Bqk :=(Bqk
1 , . . . ,Bqkq+k
)Tand B(m)
qk the m derivatives of Bqk for am < q + k .A penalized mean square estimate: Providing the fact that αn is smoothenough, we aim at finding αn =
∑q+kj=1 an
j Bqkj = (an)T Bqk solution of the
optimization problem:
arg mina∈Rq+k
1n
n∑i=1
(yi − 〈aT Bqk , xi〉X
)2+ µ
∥∥∥∥aT B(m)qk
∥∥∥∥2
X
arg mina∈Rq+k
1n
n∑i=1
(yi − 〈aT Bqk , xi〉X
)2
︸ ︷︷ ︸mean square criterion
+µ∥∥∥∥aT B(m)
qk
∥∥∥∥2
X︸ ︷︷ ︸smoothness penalization
The solution of the previous problem is given by
an =(Cn + µGn
qk
)−1bn
where Cn is the matrix with components 〈ΓnX Bqk
j ,Bqkj′ 〉X
(j, j′ = 1, . . . , q + k ), bn is the vector with components 〈∆n(X ,Y)
,Bqkj 〉X
(j = 1, . . . , q + k ) and Gnqk is the matrix with components 〈Bqk(m)
j ,Bqk(m)j′ 〉X
(j, j′ = 1, . . . , q + k ).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
![Page 118: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/118.jpg)
Definition of a consistent estimate for α
Note Bqk :=(Bqk
1 , . . . ,Bqkq+k
)Tand B(m)
qk the m derivatives of Bqk for am < q + k .A penalized mean square estimate: Providing the fact that αn is smoothenough, we aim at finding αn =
∑q+kj=1 an
j Bqkj = (an)T Bqk solution of the
optimization problem:
arg mina∈Rq+k
1n
n∑i=1
(yi − 〈aT Bqk , xi〉X
)2
︸ ︷︷ ︸mean square criterion
+µ∥∥∥∥aT B(m)
qk
∥∥∥∥2
X︸ ︷︷ ︸smoothness penalization
The solution of the previous problem is given by
an =(Cn + µGn
qk
)−1bn
where Cn is the matrix with components 〈ΓnX Bqk
j ,Bqkj′ 〉X
(j, j′ = 1, . . . , q + k ), bn is the vector with components 〈∆n(X ,Y)
,Bqkj 〉X
(j = 1, . . . , q + k ) and Gnqk is the matrix with components 〈Bqk(m)
j ,Bqk(m)j′ 〉X
(j, j′ = 1, . . . , q + k ).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
![Page 119: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/119.jpg)
Definition of a consistent estimate for α
Note Bqk :=(Bqk
1 , . . . ,Bqkq+k
)Tand B(m)
qk the m derivatives of Bqk for am < q + k .A penalized mean square estimate: Providing the fact that αn is smoothenough, we aim at finding αn =
∑q+kj=1 an
j Bqkj = (an)T Bqk solution of the
optimization problem:
arg mina∈Rq+k
1n
n∑i=1
(yi − 〈aT Bqk , xi〉X
)2
︸ ︷︷ ︸mean square criterion
+µ∥∥∥∥aT B(m)
qk
∥∥∥∥2
X︸ ︷︷ ︸smoothness penalization
The solution of the previous problem is given by
an =(Cn + µGn
qk
)−1bn
where Cn is the matrix with components 〈ΓnX Bqk
j ,Bqkj′ 〉X
(j, j′ = 1, . . . , q + k ), bn is the vector with components 〈∆n(X ,Y)
,Bqkj 〉X
(j = 1, . . . , q + k ) and Gnqk is the matrix with components 〈Bqk(m)
j ,Bqk(m)j′ 〉X
(j, j′ = 1, . . . , q + k ).Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
![Page 120: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/120.jpg)
Assumptions for consistency result
(A1) X is a.s. bounded in X
(A2) Var (Y |X = x) ≤ C1 for all x ∈ X
(A3) E (Y |X = x) ≤ C2 for all x ∈ X
(A4) it exists an integer p′ and a real ν ∈ [0, 1] such that p′ + ν ≤ qand
∣∣∣α(p′)(t1) − α(p′)(t2)∣∣∣ ≤ |t1 − t2|ν
(A5) µ = O(n−(1−δ)/2
)for a 0 < δ < 1
(A6) limn→+∞ µk 2(m−p) = 0 for p = p′ + ν
(A7) k = O(n1/(4p+1)
)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 33 / 37
![Page 121: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/121.jpg)
Consistency result
Theorem [Cardot et al., 2003]Under assumptions (A1)-(A7),
limn→+∞ P (it exists a unique solution to the minimization problem) = 1
E(‖αn − α‖2X |x1, . . . , xn
)= OP
(n−2p/(4p+1)
)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 34 / 37
![Page 122: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/122.jpg)
Other functional linear methods
Canonical correlation: [Leurgans et al., 1993]
Factorial Discriminant Analysis: [Hastie et al., 1995]
Partial Least Squares: [Preda and Saporta, 2005]
. . .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 35 / 37
![Page 123: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/123.jpg)
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 36 / 37
![Page 124: Introduction to FDA and linear models](https://reader031.fdocuments.net/reader031/viewer/2022020207/5550b47ab4c90504628b4aaf/html5/thumbnails/124.jpg)
References
Further details for the references are given in the joint document.
Azaïs, J., Bercu, S., Chaoui, O., Fort, J., Lagnoux-Renaudie, A., andLé, P. (2008).Load curves estimation and simultaneous confidence bands.preprint.available (in french) athttp://www.lsp.ups-tlse.fr/Fp/Lagnoux/rapport_final.pdf.
Bensmail, H., Aruna, B., Semmes, O., and Haoudi, A. (2005).Functional clustering algorithm for high-dimensional proteomics data.Journal of Biomedicine and Biotechnology, 2:80–86.
Besse, P. and Ramsay, J. (1986).Principal component analysis of sampled curves.Psychometrika, 51:285–311.
Bosq, D. (1991).Modelization, non-parametric estimation and prediction for continuoustime processes, volume 335 of ASI Series, pages 509–529.NATO.
Cardot, H., Ferraty, F., and Sarda, P. (1999).Functional linear model.Statistics and Probability Letters, 45:11–22.
Cardot, H., Ferraty, F., and Sarda, P. (2003).Spline estimators for the functional linear model.Statistica Sinica, 13:571–591.
Dauxois, J. and Pousse, A. (1976).Les analyses factorielles en calcul des probabilités et en statistique :essai d’étude synthétique.Thèse d’État, Université Toulouse III.
de Boor, C. (1978).A Practical Guide to Splines.Springer, New York.
Deville, J. (1974).Méthodes statistiques et numériques de l’analyse harmonique.Annales de l’INSEE, 15(Janvier–Avril):3–97.
Hastie, T., Buja, A., and Tibshirani, R. (1995).Penalized discriminant analysis.Annals of Statistics, 23:73–102.
Leurgans, S., Moyeed, R., and Silverman, B. (1993).Canonical correlation analysis when the data are curves.Journal of the Royal Statistical Society, Series B, 55:725–740.
Pezzulli, S. and Silverman, B. (1993).Some properties of smoothed principal components analysis forfunctional data.Computational Statistics, 8:1–16.
Preda, C. and Saporta, G. (2005).Clusterwise pls regression on a stochastic process.Computational Statistics and Data Analysis, 49(1):99–108.
Ramsay, J. and Silverman, B. (1997).Functional Data Analysis.Springer Verlag, New York.
Ramsay, J. and Silverman, B. (2002).Applied Functional Data Analysis.Springer Verlag.
Silverman, B. (1996).Smoothed functional principal components analysis by choice ofnorm.Annals of Statistics, 24:1–24.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 37 / 37