Functional Principal Component Analysis of Financial Time...

Functional Principal Component Analysisof Financial Time Series

G. Damiana CostanzoDipartimento di Economia e Statistica, Universita della Calabria

87036 Arcavacata di Rende (CS), [email protected]

Cnam - Paris, November 23rd 2005

Summary

1. Introduction

2. Functional data vs. Multidimensional data modeling

3. Functional PCA

4. The e.e.v. MIB30 dataset

5. The statistical analysis

6. Conclusions and perspectives

Ouverture

The problem (methodological perspective):

• Dimensional reduction of a functional data set with

homogeneous piecewise components

The datasets:

• Daily quantities (prices and e.e.v.) of the shares constitu ting the

MIB30 basket in the period: January 3rd, 2000 - December 30th ,

2002 (courtesy of the Research & Development DBMS (Borsa Ita l-

iana)).

The statistical method:

• Functional principal component analysis

Why Functional Data

FdA is a generalization of classical MvA when data are func-

tions, curves or trajectories.

Such data arise quite naturally in different fields.

For example in phenomena where measurements come from an

automated on-line collection process (on-line sensing and moni-

toring equipments):

in economic analysis, statistical quality control of manufacturing pro-

cess, shape analysis and natural science: seismology, meteorology,

physiology and medicine (see recent paper on gait data by Preda &

Saporta, 2005)

MD vs. FD modeling

Given a set of n units ω1, . . . , ωn.

Multidimensional Data Functional Data

Data set:

X: set of points in Rp

...

...

x1x2

xn

XT : set of functions on T

...

...

x1(t)x2(t)

xn(t)

# variables p < ∞ p = ∞

Vector space: Euclidean space Rp Hilbert space H

see e.g. Ramsay (1982), Saporta (1985).

Rationale

Observed data functions must be thought as single entities r ather

than a sequence of individual observations: the term functi onal

refers to the intrinsec structure of the data rather than to t heir ex-

plicit form. In fact, from a practical point of view, functio nal data

are usually observed and recorded discretely.

Let {ω1, . . . , ωn} be a set of n units and let yi = (yi(t1), . . . , yi(tp))

be a sample of measurements of a variable Y taken at p times

t1, . . . , tp ∈ T = [a, b] in the i-th unit ωi, (i = 1, . . . , n). Such data

yi (i = 1, . . . , n) are regarded as functional so they are called raw

functional data .

What is new then ?

Owing to the functional nature of the data it is assumed that t here

is a thrue function underlying the (discretely) observed da ta.

=⇒

The first step is to convert raw functional data into a suitabl e

functional form and thus a smooth function xi(t) is assumed to

lie behind yi which is referred to as the true functional form ;

this implies, in principle, that we can evaluate x at any point

t ∈ T and in addition, we can evaluate any derivative x(m)(t)

that exists at t up to some m-th order. Finally, the set XT =

{x1(t), . . . , xn(t)}t∈T is called the functional dataset .

A Remark

Though functional data analysis often deals with temporal d ata,

its scope and objectives are quite different from time serie s anal-

ysis. The last focuses mainly on modelling data, or in predic ting

future observations, the techniques in FDA are essentially ex-

ploratory in nature: the emphasis is on trajectories and sha pes.

Moreover by adopting a functional approach:

a) unequally-spaced observations can be considered with miss -

ing values;

b) in some cases full description of data involves the study of

certain derivatives (i.e. velocity and accelleration ).

Estimation Strategy 1/3

The definition of the FDA of a set of data which are functional e ither in factor in principle usually implies the following tasks:

1. choice of function space in which the analysis has to take p lace;

2. specification of the analysis in functional analytic term s;

3. determination of how a finite dimensional observation vec tor has to bemapped into function space;

4. description of what the FDA of the functional representat ions of the finiteobservations means in terms of analysing the observations t hemselves.


We distinguish two cases depending on whether data yi are assumed to beerrorless or not. In the first case function xi(t) should satisfy the constraints:

xi(tj) = yi(tj) j = 1, . . . , p . (1)

When observational errors are assumed to be present in the ra w data, the con-version from yi to function xi(t) may involve a smoothing procedure and inmodelling terms we write:

yi(tj) = xi(tj) + ǫj j = 1, . . . , p (2)

where the error term ǫj contributes a roughness to the raw data. The standardassumption requires that the ǫj ’s are i.i.d., with zero mean and common finitevariance.


A number of strategies can be considered to convert raw funct ional data intothe true functional form based on different approaches, see e.g. Simonoff,1996.Here we considered the roughness penalty or regularization approach basedon spline smoothing.

This method estimates x from observations of the form (2) by making explicittwo possible aims in curve estimation:

a) we wish to ensure that the estimated curve gives a good fit to th e data,for ex. in terms of the residual sum of squares

∑

j[yj − x(tj)]2;

b) we do not want the fit to be too good if this results in a curve x that is ex-cessively irregular (for ex. by the smoothing we can gain inf ormations aboutderivates of the thrue functions).

Functional PCA 1/2

The objective in principal component analysis of functiona l data is the orthog-onal decomposition of the variance (kernel) function:

v(t, u) :=1

n

n∑

i=1

{xi(t) − x(t)}{xi(u) − x(u)}

(which is the counterpart of the covariance matrix of a multi dimensional dataset)in order to isolate the dominant components of functional va riation, see e.g.Ramsay & Silverman (1997, 2002), James et al. (2000).

In the functional space H, the role of the covariance matrix is played by thecovariance operator V defined by:

V ξ :=∫

v(·, t)ξ(t)dt for any function ξ ∈ H.

In analogy with the multivariate case, the functional PCA pr oblem leads to theeigenequation:

V ξ = λξ

where now ξ is an eigenfunction, rather than an eigenvector, and λ is theeigenvalue.

Functional PCA 2/2

Functional PCA is characterized by the decomposition of the variance func-tion:

v(t, u) =∑

j

λjξj(t)ξj(u)

where the eigenvalues:

λj :=

∫

Tξj(t)v(t, u)ξj(u)dt du

are positive and non decreasing while the eigenfunctions mu st satisfy theconstraints:

∫

Tξ2j (t)dt = 1 and

∫

Tξjξi(t)dt = 0 (i < j).

The ξj ’s are usually called principal component weight functions .

Finally the principal component scores (of ξ(t)) of the units in the datasetare the values wi given by:

wi :=

∫

Tξ(t)xi(t)dt .

The MIB30 dataset 1/2

Data considered here consist of the total value of the exchan ged equivalentvalue (e.e.v.) of the 30 shares composing the MIB30 index in t he period ”Jan-uary 3rd, 2000 - December 30th”, 2002, see e.g. Costanzo (200 3). The datamatrix is 30 × 758 (note that p ≫ n).

An important characteristic of this basket is that it is ”ope n” since its compo-sition is normally updated twice a year, in the months of Marc h and September(ordinary revisions). However, in response to extraordina ry events, or for tech-nical reasons ordinary revisions may be brought forward or p ostponed withrespect to the scheduled date, see www.borsaitalia.it for details.

In our datased they are 21 companies which remain during the t hree yearsand 23 companies sharing the other 9 places in the basket (sin ce they remainin the basket only for one or more short periods): they have be en denoted byT1, . . . , T9. Such mixed trajectories will be called here homogeneous piece-wise components of the functional data set.

The MIB30 dataset 2/2

Example of homogeneous piecewise components T1, T2, T3.

Date T1 T2 T303/01/2000 AEM Banca Commerciale Italiana Banca di Roma04/04/2000 AEM Banca Commerciale Italiana Banca di Roma18/09/2000 AEM Banca Commerciale Italiana Banca di Roma02/01/2001 AEM Banca Commerciale Italiana Banca di Roma19/03/2001 AEM Italgas Banca di Roma02/05/2001 AEM Italgas Banca di Roma24/08/2001 AEM Italgas Banca di Roma24/09/2001 AEM Italgas Banca di Roma18/03/2002 Snam Rete Gas Italgas Banca di Roma01/07/2002 AEM Italgas Capitalia15/07/2002 AEM Italgas Capitalia23/09/2002 Banca Antonveneta Italgas Capitalia04/12/2002 Banca Antonveneta Italgas Capitalia

The statistical analysis 1/5

(see Ingrassia and Costanzo, 2004)Examples of two trajectories : we set up a B-spline basis with 150 knots (ap-proximately one knot for each week) and order 6 and make the fu nctional data

0 200 400 600

Day

05

00

00

00

00

10

00

00

00

00

15

00

00

00

00

e.e

.v.

Enel

0 200 400 600

Day

05

00

00

00

00

10

00

00

00

00

15

00

00

00

00

e.e

.v.

Eni


The entire functional dataset :

0 200 400 600

Day

020

0000

000

4000

0000

060

0000

000

8000

0000

0

e.e.

v.


Summary statistics

0 200 400 600

Day

20

00

00

00

60

00

00

00

10

00

00

00

01

40

00

00

00

Me

an

e.e

.v.

Titles Mean

0 200 400 600

Day

50

00

00

00

10

00

00

00

01

50

00

00

00

Std

. D

ev.

e.e

.v.

Titles Standard Deviation


PCA results : The first two components explain respectively 88.9% and 7.1 %of the functional variability.

0 200 400 600

PCA function 1 (Percentage of variability 88.9 )

-50

00

00

00

50

00

00

00

15

00

00

00

0

++

+++

++

+

+

+

+

+

+++

++++++++++++++

+++

+++++

++++

+

++

+

++

+

+

++

+++++++++

+++++++

+

++

++++

+

++++

+

++++

++++++++

+

++

++++++

++++++

++++

+

+++++++++++

+++++++++++

++

+

++++++++

++

++

+

--

---

--

-

-

-

-

-

---

--------------

---

-----

----

-

--

-

--

-

-

--

---------

-------

-

--

----

-

----

-

----

-----

----

--

------

------

----

-

-------

----

----------

---

-

--------

--

--

-

0 200 400 600


05

00

00

00

01

00

00

00

00

15

00

00

00

0

+

+

+++

++

+

+

+

+

+

+++

++

+

+

+

+

++++++++

+++

++++

+

++

++

+

++

+

++

+

+

++

+++++++++

+++

+

+++

+

++

+++

+

+

+

++++

++++

+++++

+++

+

++

++++

++

+

++++

++

+++

+

+++

++

++

++++

+

+

+++

+

+

++++

++

+

++++

++++

++

+

+

+

-

-

---

--

-

-

-

-

-

---

--

-

-

-

-

--------

---

----

-

--

--

-

--

-

--

-

-

--

---------

---

-

---

-

--

---

-

-

-

----

----

-----

---

-

-

-

------

------

-

---

-

--

-

--

--

----

-

-

-

--

-

------

-

-

--------

--

-

-

-


PC scores on the two first harmonics .

-2000000000 0 2000000000 4000000000 6000000000 8000000000

Scores on Harmonic 1

-200

0000

000

-100

0000

000

010

0000

0000

Sco

res

on H

arm

onic

2Alleanza

Autostrade

B.FideraumMontePaschi

B.N.L

Enel Eni

FiatFinmecc.Generali

Mediaset

MedioB.Mediolanum

Olivetti

PirelliRas SanPaolo

SeatP.Gialle

Telecom

Tim

Unicred.It.

T1

T2

T3

T4

T5T6T7T8

T9

1. companies with large positive (negative) values on the fir st harmonic presenta larger (smaller) value than the mean during the entire cons idered period;

2. companies with large positive (negative) values on the se cond harmonicshow a large decrement (increment) after September 11th, 20 01 (Day=431).

FPCA on standardized data 1/2

An insightful understanding comes from the PC analysis of th e daily standard-izedraw functional data: zij =

xij−xi

si(i = 1, . . . ,758, j = 1, . . . ,30)

0 200 400 600


-0.5

0.0

0.5

++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

----------------------------------------------------------------------------------------------

------

0 200 400 600


-0.4

-0.2

0.0

0.2

0.4

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

----------------------------------------

-------

------

-----

----

----

----

----

----

----

----

------

-----

---

The second PC highlights the shock of September 11th, 2001; s o it can beconsidered as the shock component (Day=431).

What about the 3rd and 4th FunctionalCurves?Just have a look:

0 200 400 600


-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3 +

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++

++++++++++++++++

+++++

++

++

++

++

++

++

++

+

+

+

+

+

+

+

+

+

+

++

+

++

+

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

--

-- - - - - - - - - - - - - - - - -

- --

--

--

- --

--

--

--

--

-

-

-

-

-

-

-

-

-

- -

--

-

-

-

0 200 400 600


-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

+

+

+

+

+

+

+

+

+

+

+

+

++

++

++

++++++++++++++++

++

++

++

++

++

++

++

++++++++++++++++++

++++++

+++

++

++++++++++++++++

+

++

++

+

+

-

-

-

-

-

-

-

-

-

-

-

-

--

--

--

- - - - - - - - - - - - - - - --

--

--

--

--

--

--

--

- - - - - - - - - - - - - - - - - - - - - - -- -

- - --

- - - - - - - - - - - - - --

- -

- -

-

-

-

They account for small proportions of the functional variab ility, but they showother modes of variation in the curves dataset.

FPCA on standardized data 2/2

PC scores on the two first harmonics .

-20 0 20 40 60

Scores on Harmonic 1

-20

-10

010

Sco

res

on H

arm

onic

2

Autostrade

EnelEni

Generali

Mediaset

Olivetti

SanPaolo

SeatP.Gialle

Telecom

Tim

Unicred.It.

T3

T4

T9

Back to raw data:Analysis of the 1st PCConsider the Companies with the largest minimum standardiz ed values overthe three years:

mini=1,...,758zij Company Score

0.3294 Eni w1 > 400.0579 Telecom w1 > 400.0566 Tim w1 > 40

-0.2197 Enel 10 < w1 < 40-0.2770 Generali 10 < w1 < 40-0.5360 Olivetti 10 < w1 < 40-0.5554 Unicredito 10 < w1 < 40-0.6404 T4 near 0-0.6552 Mediaset near 0-0.7511 Seat Pagine Gialle near 0

The correlation r ( w1i, zij) = 0.96

Analysis of the 2nd PCLet xBi the mean value of the e.e.v. of the ith company over the days: 1,...,431(i.e. before September 11th, 2001) and xAi the corresponding mean value afterSeptember 11th, 2001. Let us consider the variation per cent :

δi =xAi − xBi

xBi

100%

δi Company Score

-80.20% Seat Pagine Gialle w2 > 10-58.50% Olivetti w2 > 10-47.08% Enel 0 < w2 < 10

63.21% Unicredito −20 < w2 < −1083.10% Autostrade −10 < w2 < 0

133.79% T9 w2 < −20

The companies with large positive (negative) scores on the 2 nd PC presentthe largest decrements (increments) after September 11th, 2001.The correlation r ( w2i, zij) = 0.84

Conclusions and perspectives for fu-ture researches

1. Functional PCA looks an interesting tools in order to gain insight in func-tional dataset.

2. Does it open methodological perspectives for the constru ction of new fi-nancial indeces ?

Some existing stock market indices have been criticized (e. g. Elton andGruber, 1973, 1995): the famous U.S. Dow Jones presents some statisticalflaws.In Italy MIB30 basket is summarized by the MIB30 index=⇒

analysis of the MIB30 index within the FDA framework: ′how′ and howmuch is it statistical representative of the basket?

Year

MIB

30 In

dex

2000.0 2000.5 2001.0 2001.5 2002.0 2002.5 2003.0

1520

2530

35

in fact MIB30 is calculated according to the formula:

MIB30 = 10000

30∑

i=1

pit

pi0wiT

rT (3)

where the weight of the i-th share in the basket (i.e. the weight of each com-pany in the index) is:

wiT =pi0qi0

∑30i=1 pi0qi0

REFERENCES

- Costanzo, G. D. (2003), A graphical analysis of the dynamics of

the MIB30 index in the period 2000-2002 by a functional data a p-

proach. SIS 2003, Napoli.

- Elton, E.J. and Gruber, M. J. (1973), Estimating the dependence

structures of share prices. Implications for Portfolio. Jo urnal of

Finance, 1203-1232.

- Elton, E.J. and Gruber, M. J. (1995), Modern Portfolio Theory and

Investment Analysis. John Wiley and Sons, New York.

- Ingrassia, S. and Costanzo, G. D. (2004), Functional principal

component analysis of financial time series, New Developmen ts

in Classification and Data Analysis, Spriger-Verlag, Berli n.

- Preda C. and Saporta G. (2005), PLS discriminant analysis for

functional data. XI ASMDA Symposium, Brest, May 2005.

- Ramsay, J. O.(1982), When the data are functions, Psychome-

trika, 47.

- Ramsay, J. O. and Silverman, B. W. (1997), Functional Data Anal-

ysis, Springer-Verlag, New York.

- Ramsay, J. O and Silverman, B. W. (2002), Applied Functional

Data Analysis, Springer-Verlag, New York.

- Saporta, G. (1985), Data Analysis For Numerical and Categori-

cal Individual Time Series, Applied Stochastic Models and D ata

Analysis, 1.

- Simonoff, J. S. (1996), Smoothing Methods in Statistics, Springer-

Verlag, New York.

Functional Principal Component Analysis of Financial Time...

Documents

Transcript of Functional Principal Component Analysis of Financial Time...