Penalized spline models for functional principal component ...
Functional Principal Component Analysis of Financial Time...
-
Upload
truongtuong -
Category
Documents
-
view
221 -
download
0
Transcript of Functional Principal Component Analysis of Financial Time...
Functional Principal Component Analysisof Financial Time Series
G. Damiana CostanzoDipartimento di Economia e Statistica, Universita della Calabria
87036 Arcavacata di Rende (CS), [email protected]
Cnam - Paris, November 23rd 2005
Summary
1. Introduction
2. Functional data vs. Multidimensional data modeling
3. Functional PCA
4. The e.e.v. MIB30 dataset
5. The statistical analysis
6. Conclusions and perspectives
Ouverture
The problem (methodological perspective):
• Dimensional reduction of a functional data set with
homogeneous piecewise components
The datasets:
• Daily quantities (prices and e.e.v.) of the shares constitu ting the
MIB30 basket in the period: January 3rd, 2000 - December 30th ,
2002 (courtesy of the Research & Development DBMS (Borsa Ita l-
iana)).
The statistical method:
• Functional principal component analysis
Why Functional Data
FdA is a generalization of classical MvA when data are func-
tions, curves or trajectories.
Such data arise quite naturally in different fields.
For example in phenomena where measurements come from an
automated on-line collection process (on-line sensing and moni-
toring equipments):
in economic analysis, statistical quality control of manufacturing pro-
cess, shape analysis and natural science: seismology, meteorology,
physiology and medicine (see recent paper on gait data by Preda &
Saporta, 2005)
MD vs. FD modeling
Given a set of n units ω1, . . . , ωn.
Multidimensional Data Functional Data
Data set:
X: set of points in Rp
...
...
x1x2
xn
XT : set of functions on T
...
...
x1(t)x2(t)
xn(t)
# variables p < ∞ p = ∞
Vector space: Euclidean space Rp Hilbert space H
see e.g. Ramsay (1982), Saporta (1985).
Rationale
Observed data functions must be thought as single entities r ather
than a sequence of individual observations: the term functi onal
refers to the intrinsec structure of the data rather than to t heir ex-
plicit form. In fact, from a practical point of view, functio nal data
are usually observed and recorded discretely.
Let {ω1, . . . , ωn} be a set of n units and let yi = (yi(t1), . . . , yi(tp))
be a sample of measurements of a variable Y taken at p times
t1, . . . , tp ∈ T = [a, b] in the i-th unit ωi, (i = 1, . . . , n). Such data
yi (i = 1, . . . , n) are regarded as functional so they are called raw
functional data .
What is new then ?
Owing to the functional nature of the data it is assumed that t here
is a thrue function underlying the (discretely) observed da ta.
=⇒
The first step is to convert raw functional data into a suitabl e
functional form and thus a smooth function xi(t) is assumed to
lie behind yi which is referred to as the true functional form ;
this implies, in principle, that we can evaluate x at any point
t ∈ T and in addition, we can evaluate any derivative x(m)(t)
that exists at t up to some m-th order. Finally, the set XT =
{x1(t), . . . , xn(t)}t∈T is called the functional dataset .
A Remark
Though functional data analysis often deals with temporal d ata,
its scope and objectives are quite different from time serie s anal-
ysis. The last focuses mainly on modelling data, or in predic ting
future observations, the techniques in FDA are essentially ex-
ploratory in nature: the emphasis is on trajectories and sha pes.
Moreover by adopting a functional approach:
a) unequally-spaced observations can be considered with miss -
ing values;
b) in some cases full description of data involves the study of
certain derivatives (i.e. velocity and accelleration ).
Estimation Strategy 1/3
The definition of the FDA of a set of data which are functional e ither in factor in principle usually implies the following tasks:
1. choice of function space in which the analysis has to take p lace;
2. specification of the analysis in functional analytic term s;
3. determination of how a finite dimensional observation vec tor has to bemapped into function space;
4. description of what the FDA of the functional representat ions of the finiteobservations means in terms of analysing the observations t hemselves.
Estimation Strategy 2/3
We distinguish two cases depending on whether data yi are assumed to beerrorless or not. In the first case function xi(t) should satisfy the constraints:
xi(tj) = yi(tj) j = 1, . . . , p . (1)
When observational errors are assumed to be present in the ra w data, the con-version from yi to function xi(t) may involve a smoothing procedure and inmodelling terms we write:
yi(tj) = xi(tj) + ǫj j = 1, . . . , p (2)
where the error term ǫj contributes a roughness to the raw data. The standardassumption requires that the ǫj ’s are i.i.d., with zero mean and common finitevariance.
Estimation Strategy 3/3
A number of strategies can be considered to convert raw funct ional data intothe true functional form based on different approaches, see e.g. Simonoff,1996.Here we considered the roughness penalty or regularization approach basedon spline smoothing.
This method estimates x from observations of the form (2) by making explicittwo possible aims in curve estimation:
a) we wish to ensure that the estimated curve gives a good fit to th e data,for ex. in terms of the residual sum of squares
∑
j[yj − x(tj)]2;
b) we do not want the fit to be too good if this results in a curve x that is ex-cessively irregular (for ex. by the smoothing we can gain inf ormations aboutderivates of the thrue functions).
Functional PCA 1/2
The objective in principal component analysis of functiona l data is the orthog-onal decomposition of the variance (kernel) function:
v(t, u) :=1
n
n∑
i=1
{xi(t) − x(t)}{xi(u) − x(u)}
(which is the counterpart of the covariance matrix of a multi dimensional dataset)in order to isolate the dominant components of functional va riation, see e.g.Ramsay & Silverman (1997, 2002), James et al. (2000).
In the functional space H, the role of the covariance matrix is played by thecovariance operator V defined by:
V ξ :=∫
v(·, t)ξ(t)dt for any function ξ ∈ H.
In analogy with the multivariate case, the functional PCA pr oblem leads to theeigenequation:
V ξ = λξ
where now ξ is an eigenfunction, rather than an eigenvector, and λ is theeigenvalue.
Functional PCA 2/2
Functional PCA is characterized by the decomposition of the variance func-tion:
v(t, u) =∑
j
λjξj(t)ξj(u)
where the eigenvalues:
λj :=
∫
Tξj(t)v(t, u)ξj(u)dt du
are positive and non decreasing while the eigenfunctions mu st satisfy theconstraints:
∫
Tξ2j (t)dt = 1 and
∫
Tξjξi(t)dt = 0 (i < j).
The ξj ’s are usually called principal component weight functions .
Finally the principal component scores (of ξ(t)) of the units in the datasetare the values wi given by:
wi :=
∫
Tξ(t)xi(t)dt .
The MIB30 dataset 1/2
Data considered here consist of the total value of the exchan ged equivalentvalue (e.e.v.) of the 30 shares composing the MIB30 index in t he period ”Jan-uary 3rd, 2000 - December 30th”, 2002, see e.g. Costanzo (200 3). The datamatrix is 30 × 758 (note that p ≫ n).
An important characteristic of this basket is that it is ”ope n” since its compo-sition is normally updated twice a year, in the months of Marc h and September(ordinary revisions). However, in response to extraordina ry events, or for tech-nical reasons ordinary revisions may be brought forward or p ostponed withrespect to the scheduled date, see www.borsaitalia.it for details.
In our datased they are 21 companies which remain during the t hree yearsand 23 companies sharing the other 9 places in the basket (sin ce they remainin the basket only for one or more short periods): they have be en denoted byT1, . . . , T9. Such mixed trajectories will be called here homogeneous piece-wise components of the functional data set.
The MIB30 dataset 2/2
Example of homogeneous piecewise components T1, T2, T3.
Date T1 T2 T303/01/2000 AEM Banca Commerciale Italiana Banca di Roma04/04/2000 AEM Banca Commerciale Italiana Banca di Roma18/09/2000 AEM Banca Commerciale Italiana Banca di Roma02/01/2001 AEM Banca Commerciale Italiana Banca di Roma19/03/2001 AEM Italgas Banca di Roma02/05/2001 AEM Italgas Banca di Roma24/08/2001 AEM Italgas Banca di Roma24/09/2001 AEM Italgas Banca di Roma18/03/2002 Snam Rete Gas Italgas Banca di Roma01/07/2002 AEM Italgas Capitalia15/07/2002 AEM Italgas Capitalia23/09/2002 Banca Antonveneta Italgas Capitalia04/12/2002 Banca Antonveneta Italgas Capitalia
The statistical analysis 1/5
(see Ingrassia and Costanzo, 2004)Examples of two trajectories : we set up a B-spline basis with 150 knots (ap-proximately one knot for each week) and order 6 and make the fu nctional data
0 200 400 600
Day
05
00
00
00
00
10
00
00
00
00
15
00
00
00
00
e.e
.v.
Enel
0 200 400 600
Day
05
00
00
00
00
10
00
00
00
00
15
00
00
00
00
e.e
.v.
Eni
The statistical analysis 2/5
The entire functional dataset :
0 200 400 600
Day
020
0000
000
4000
0000
060
0000
000
8000
0000
0
e.e.
v.
The statistical analysis 3/5
Summary statistics
0 200 400 600
Day
20
00
00
00
60
00
00
00
10
00
00
00
01
40
00
00
00
Me
an
e.e
.v.
Titles Mean
0 200 400 600
Day
50
00
00
00
10
00
00
00
01
50
00
00
00
Std
. D
ev.
e.e
.v.
Titles Standard Deviation
The statistical analysis 4/5
PCA results : The first two components explain respectively 88.9% and 7.1 %of the functional variability.
0 200 400 600
PCA function 1 (Percentage of variability 88.9 )
-50
00
00
00
50
00
00
00
15
00
00
00
0
++
+++
++
+
+
+
+
+
+++
++++++++++++++
+++
+++++
++++
+
++
+
++
+
+
++
+++++++++
+++++++
+
++
++++
+
++++
+
++++
++++++++
+
++
++++++
++++++
++++
+
+++++++++++
+++++++++++
++
+
++++++++
++
++
+
--
---
--
-
-
-
-
-
---
--------------
---
-----
----
-
--
-
--
-
-
--
---------
-------
-
--
----
-
----
-
----
-----
----
--
------
------
----
-
-------
----
----------
---
-
--------
--
--
-
0 200 400 600
PCA function 2 (Percentage of variability 7.1 )
05
00
00
00
01
00
00
00
00
15
00
00
00
0
+
+
+++
++
+
+
+
+
+
+++
++
+
+
+
+
++++++++
+++
++++
+
++
++
+
++
+
++
+
+
++
+++++++++
+++
+
+++
+
++
+++
+
+
+
++++
++++
+++++
+++
+
++
++++
++
+
++++
++
+++
+
+++
++
++
++++
+
+
+++
+
+
++++
++
+
++++
++++
++
+
+
+
-
-
---
--
-
-
-
-
-
---
--
-
-
-
-
--------
---
----
-
--
--
-
--
-
--
-
-
--
---------
---
-
---
-
--
---
-
-
-
----
----
-----
---
-
-
-
------
------
-
---
-
--
-
--
--
----
-
-
-
--
-
------
-
-
--------
--
-
-
-
The statistical analysis 5/5
PC scores on the two first harmonics .
-2000000000 0 2000000000 4000000000 6000000000 8000000000
Scores on Harmonic 1
-200
0000
000
-100
0000
000
010
0000
0000
Sco
res
on H
arm
onic
2Alleanza
Autostrade
B.FideraumMontePaschi
B.N.L
Enel Eni
FiatFinmecc.Generali
Mediaset
MedioB.Mediolanum
Olivetti
PirelliRas SanPaolo
SeatP.Gialle
Telecom
Tim
Unicred.It.
T1
T2
T3
T4
T5T6T7T8
T9
1. companies with large positive (negative) values on the fir st harmonic presenta larger (smaller) value than the mean during the entire cons idered period;
2. companies with large positive (negative) values on the se cond harmonicshow a large decrement (increment) after September 11th, 20 01 (Day=431).
FPCA on standardized data 1/2
An insightful understanding comes from the PC analysis of th e daily standard-izedraw functional data: zij =
xij−xi
si(i = 1, . . . ,758, j = 1, . . . ,30)
0 200 400 600
PCA function 1 (Percentage of variability 89.4 )
-0.5
0.0
0.5
++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
----------------------------------------------------------------------------------------------
------
0 200 400 600
PCA function 2 (Percentage of variability 6.9 )
-0.4
-0.2
0.0
0.2
0.4
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
----------------------------------------
-------
------
-----
----
----
----
----
----
----
----
------
-----
---
The second PC highlights the shock of September 11th, 2001; s o it can beconsidered as the shock component (Day=431).
What about the 3rd and 4th FunctionalCurves?Just have a look:
0 200 400 600
PCA function 3 (Percentage of variability 2.5 )
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3 +
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++++++++++++++++
+++++
++
++
++
++
++
++
++
+
+
+
+
+
+
+
+
+
+
++
+
++
+
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-- - - - - - - - - - - - - - - - -
- --
--
--
- --
--
--
--
--
-
-
-
-
-
-
-
-
-
- -
--
-
-
-
0 200 400 600
PCA function 4 (Percentage of variability 0.9 )
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
+
+
+
+
+
+
+
+
+
+
+
+
++
++
++
++++++++++++++++
++
++
++
++
++
++
++
++++++++++++++++++
++++++
+++
++
++++++++++++++++
+
++
++
+
+
-
-
-
-
-
-
-
-
-
-
-
-
--
--
--
- - - - - - - - - - - - - - - --
--
--
--
--
--
--
--
- - - - - - - - - - - - - - - - - - - - - - -- -
- - --
- - - - - - - - - - - - - --
- -
- -
-
-
-
They account for small proportions of the functional variab ility, but they showother modes of variation in the curves dataset.
FPCA on standardized data 2/2
PC scores on the two first harmonics .
-20 0 20 40 60
Scores on Harmonic 1
-20
-10
010
Sco
res
on H
arm
onic
2
Autostrade
EnelEni
Generali
Mediaset
Olivetti
SanPaolo
SeatP.Gialle
Telecom
Tim
Unicred.It.
T3
T4
T9
Back to raw data:Analysis of the 1st PCConsider the Companies with the largest minimum standardiz ed values overthe three years:
mini=1,...,758zij Company Score
0.3294 Eni w1 > 400.0579 Telecom w1 > 400.0566 Tim w1 > 40
-0.2197 Enel 10 < w1 < 40-0.2770 Generali 10 < w1 < 40-0.5360 Olivetti 10 < w1 < 40-0.5554 Unicredito 10 < w1 < 40-0.6404 T4 near 0-0.6552 Mediaset near 0-0.7511 Seat Pagine Gialle near 0
The correlation r ( w1i, zij) = 0.96
Analysis of the 2nd PCLet xBi the mean value of the e.e.v. of the ith company over the days: 1,...,431(i.e. before September 11th, 2001) and xAi the corresponding mean value afterSeptember 11th, 2001. Let us consider the variation per cent :
δi =xAi − xBi
xBi
100%
δi Company Score
-80.20% Seat Pagine Gialle w2 > 10-58.50% Olivetti w2 > 10-47.08% Enel 0 < w2 < 10
63.21% Unicredito −20 < w2 < −1083.10% Autostrade −10 < w2 < 0
133.79% T9 w2 < −20
The companies with large positive (negative) scores on the 2 nd PC presentthe largest decrements (increments) after September 11th, 2001.The correlation r ( w2i, zij) = 0.84
Conclusions and perspectives for fu-ture researches
1. Functional PCA looks an interesting tools in order to gain insight in func-tional dataset.
2. Does it open methodological perspectives for the constru ction of new fi-nancial indeces ?
Some existing stock market indices have been criticized (e. g. Elton andGruber, 1973, 1995): the famous U.S. Dow Jones presents some statisticalflaws.In Italy MIB30 basket is summarized by the MIB30 index=⇒
analysis of the MIB30 index within the FDA framework: ′how′ and howmuch is it statistical representative of the basket?
Year
MIB
30 In
dex
2000.0 2000.5 2001.0 2001.5 2002.0 2002.5 2003.0
1520
2530
35
in fact MIB30 is calculated according to the formula:
MIB30 = 10000
30∑
i=1
pit
pi0wiT
rT (3)
where the weight of the i-th share in the basket (i.e. the weight of each com-pany in the index) is:
wiT =pi0qi0
∑30i=1 pi0qi0
REFERENCES
- Costanzo, G. D. (2003), A graphical analysis of the dynamics of
the MIB30 index in the period 2000-2002 by a functional data a p-
proach. SIS 2003, Napoli.
- Elton, E.J. and Gruber, M. J. (1973), Estimating the dependence
structures of share prices. Implications for Portfolio. Jo urnal of
Finance, 1203-1232.
- Elton, E.J. and Gruber, M. J. (1995), Modern Portfolio Theory and
Investment Analysis. John Wiley and Sons, New York.
- Ingrassia, S. and Costanzo, G. D. (2004), Functional principal
component analysis of financial time series, New Developmen ts
in Classification and Data Analysis, Spriger-Verlag, Berli n.
- Preda C. and Saporta G. (2005), PLS discriminant analysis for
functional data. XI ASMDA Symposium, Brest, May 2005.
- Ramsay, J. O.(1982), When the data are functions, Psychome-
trika, 47.
- Ramsay, J. O. and Silverman, B. W. (1997), Functional Data Anal-
ysis, Springer-Verlag, New York.
- Ramsay, J. O and Silverman, B. W. (2002), Applied Functional
Data Analysis, Springer-Verlag, New York.
- Saporta, G. (1985), Data Analysis For Numerical and Categori-
cal Individual Time Series, Applied Stochastic Models and D ata
Analysis, 1.
- Simonoff, J. S. (1996), Smoothing Methods in Statistics, Springer-
Verlag, New York.