Hierarchically nested factor models

34
1 Hierarchically nested Hierarchically nested factor models factor models Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy) ome, June 20, 2007 Observatory Observatory of of Complex Complex Systems Systems EPL 78 (2007) 30006

description

Observatory of Complex Systems. Hierarchically nested factor models. Michele Tumminello, Fabrizio Lillo, R.N.M. University of Palermo (Italy). EPL 78 (2007) 30006. Rome, June 20, 2007. Motivation. - PowerPoint PPT Presentation

Transcript of Hierarchically nested factor models

Page 1: Hierarchically nested factor models

1

Hierarchically nested factor modelsHierarchically nested factor modelsMichele Tumminello, Fabrizio Lillo, R.N.M.

University of Palermo (Italy)

Rome, June 20, 2007

Observatory ofObservatory ofComplex Complex SystemsSystems

EPL 78 (2007) 30006

Page 2: Hierarchically nested factor models

2

Motivation• In many systems the dynamics of N elements is

monitored by sampling each time series with T observations

• One way of quantifying the interaction between elements is the correlation matrix

• Since there are TN observations and the number of parameters in the correlation matrix is O(N2), the estimated correlation matrix unavoidably suffers by statistical uncertainty due to the finiteness of the sample.

Page 3: Hierarchically nested factor models

3

Questions• How can one process (filter) the correlation

matrix in order to get a statistically reliable correlation matrix ?

• How can one build a time series factor model which describes the dynamics of the system?

• How can one compare the characteristics of different correlation matrix filtering procedures ?

Page 4: Hierarchically nested factor models

4

A real example

2222jjii

jijiij

rrrr

rrrr

Ln

P(t)

Correlation Matrix C=( ij)

As an example we consider the time series of price return of a set of stocks traded in a financial market

Similarity measure between stock i and j = Correlation coefficient ij

Page 5: Hierarchically nested factor models

5

Factor modelsFactor modelsFactor models are simple and widespread model of multivariate time seriesA general multifactor model for N variables xi(t) is

is a constant describing the weight of factor j in explaining the dynamics of the variable xi(t).The number of factors is K and they are described by the time series fj(t). is a (Gaussian) zero mean noise with unit variance

Page 6: Hierarchically nested factor models

6

Factor models: examplesMultifactor models have been introduced to model a set of asset prices, generalizing CAPM

where now B is a (NxK) matrix and f(t) is a (Kx1) vector of factors.

The factors can be selected either on a theoretical ground (e.g. interest rates for bonds, inflation, industrial production growth, oil price, etc.) or on a statistical ground (i.e. by applying factor analysis methods, etc.)

Examples of multifactor models are Arbitrage Pricing Theory (Ross 1976) and the Intertemporal CAPM (Merton 1973).

Page 7: Hierarchically nested factor models

7

Factor models and Principal Component Analysis (PCA)

A factor is associated to each relevant eigenvalue-eigenvector

x i(t) i(h ) h f (h )(t)

h1

K

1 i(h )2

h1

K

h i(t)

h-th factor Idiosyncratic term

Number of relevanteigenvalues

i-th component of the h-th eigenvector of C

f (h )(t) for h 1,...,K and i(t) for i 1,...,n are i.i.d. random variables with mean 0 and variance 1

h-th eigenvalue

How many eigenvalues should be included ?

Page 8: Hierarchically nested factor models

8

Random Matrix Theory

The idea is to compare the properties of an empirical correlation matrix C with the null hypothesis of a random matrix.

Q TN 1 fixed; T ; N

Density of eigenvalues of a Random Matrix

Q2 2

MAX MIN

MINMAX 2 11/Q2 1/Q

For correlation matrices 2 1

Page 9: Hierarchically nested factor models

9

Random Matrix Theory

2 1 11

0.85 (dotted line)

best fit : 2 0.74 (solid line)

L.Laloux et al, PRL 83, 1468 (1999)

N 406 assets of the S & P 500 (1991-1996)Q = 3.22

Random Matrix Theory helps to select the relevant eigenvalues

Page 10: Hierarchically nested factor models

10

A simple (hierarchical) model

x i(t) 0 f (0)(t)1 f (1)(t) 1 02 1

2 i(t) for i n1

x i(t) 0 f (0)(t)2 f (2)(t) 1 02 2

2 i(t) for n1 i N

C =

1 02 1

2

1

2 02 2

2

2

M

M 02

n1

n2 N n1

Page 11: Hierarchically nested factor models

11

Spectral Analysis

12

2 q (q2 4 n1 n2 M

2)1/ 2 2 large eigenvalues

2 corresponding eigenvectors

q(n1 1)1 (n2 1)2 and y q /(4 n1 n2 M2)1/ 2

where u1/ 2 n1 1 y 2 y 1 y 2 , v1/ 2 n2 1 y 2 y 1 y 2 ,

PCA is not able to reconstruct the true modelPCA is not able to reconstruct the true modeland/or to give insights about its hierarchical featuresand/or to give insights about its hierarchical features

Page 12: Hierarchically nested factor models

12

Hierarchical organizationHierarchical organization• Many natural and artificial systems are

intrinsically organized in a hierarchical structure.

• This means that the elements of the system can be partitioned in clusters which in turn can be partitioned in subclusters and so on up to a certain level.– How is it possible to detect the hierarchical structure

of the system ?– How is it possible to model the time series dynamics

of the system ?

Page 13: Hierarchically nested factor models

13

Clustering algorithms• The natural answer to the first question is the use of clustering algorithms» Clustering algorithms are data analysis techniques that allows to extract a hierarchical partitioning of the data»We are mainly interested in hierarchical clustering methods which allows to display the hierachical structure by means of a dendrogram »We focus our attention on two widely used clustering methods:-) the single linkage cluster analysis (SLCA)-) the average linkage cluster analysis (ALCA)

Page 14: Hierarchically nested factor models

14

How is it possible to extract a time series model for the stocks which takes into account the structure of the dendrogram?

Daily return of 100 stocks traded at NYSE in the time period1/1995-12/1998 (T=1011)

SECTORS

Energy

Technology

Financial

Healthcare

Basic Material

Services

Utilities

ALCA

Page 15: Hierarchically nested factor models

15

Hierarchical clustering approach

C (ij )

ij k

where

k is the first node where elements i and j merge together

Dendrograms obtained by hierarchical clustering are naturally associated with a correlation matrix C< given by

We propose to use as a model of the system the factor model whose correlation matrix is C<

The motivations are• The hierarchical structure is revealed by the dendrogram• The algorithm often filters robust information of the time series

Page 16: Hierarchically nested factor models

16

Hierarchical Clustering (HC)

The application of both the ALCA and SLCA

to C allowsto reveal the hierarchicalstructure of the model.

Is it possible to recover the 3-factor model starting from

such a dendrogram?

1M ; 2

1; 32

Page 17: Hierarchically nested factor models

17

Hierarchically Nested Factor Model (HNFM)

A factor is associated to each node

x i(t) hf (h )(t)

h G(i) 1 h

2

h G( i) i(t)

h-th factor Idiosyncratic term

h h

g(h ) ;1 1

27

931

)( e.g. , node ofParent )(

,,)9( e.g. ,element of Pedigree)(

gg

GiiG

hh

211212

2241 e.g. xx

C (ij )The model explainsThe model explains

Page 18: Hierarchically nested factor models

18

• We have shown that it is possible to associate a factor model to a dendrogram

• If the system has a hierarchical structure and if the clustering algorithm is able to detect it, it is likely that the factor model describes the hierarchical features of the systems.

• If the system has N elements the factor model has N factors– How is it possible to reduce the dimensionality of the

model ?– Principal Component Analysis prescribes to use the

k largest eigenvalues and (the corresponding eigenvectors) to build a k-factor model

Page 19: Hierarchically nested factor models

19

Statistical uncertainty and necessity of node reduction

dendrogram of the modeldendrogram of the model

3 nodes (factors)

dendrogram from a dendrogram from a realization of realization of finite lengthfinite length

99 nodes (factors)

Page 20: Hierarchically nested factor models

20

Bootstrap procedure• HC is applied to the data set. The result is the dendrogram .

• HC is applied to the N surrogated data matrices getting the set of surrogated dendrograms .

• For each node of D , the bootstrap value is computed as the percentage of surrogated dendrograms in which the node is preserved.

• A node is preserved in the bootstrap if it identifies a branch composed by the same elements as in the real data dendrogram

D1*, D2

*, ..., DN*

D

b( k )

Page 21: Hierarchically nested factor models

21

ExampleDaily return of 100 stocks traded at NYSE in the time period1/1995-12/1998 (T=1011)

ALCAbootstrap value distribution

Page 22: Hierarchically nested factor models

22

Node-factor reduction• Select a bootstrap value threshold .

• For any node with bootstrap value

If then merge the node with his first

ancestor q in the path to the root such that

bt

We do not choose a priori the value of bt but we infer the optimal value from the data in a self consistent way

(cfr Hillis and Bull, Syst. Biol. 1993)

Page 23: Hierarchically nested factor models

23

Empirical Application: node reductionDaily return of 100 stocks traded at NYSE in the time period1/1995-12/1998 (T=1011)

23 nodes

199

23 node modelE1=oil well and services, E2= oil and gas integrated S1=communication services, S2=retail H=major drugsU=electric utilities

Page 24: Hierarchically nested factor models

24

Meaning of factors in the HNFM

HNFM associated to the reduced dendrogram with 23 nodes.Equations for stocks belonging to the Technology and FinancialSectors.

Technology Factor

Financial Factor

Page 25: Hierarchically nested factor models

25

Comparing filtering procedures• A filtering procedure is a recipe to replace a

sample correlation matrix with another one which is supposed to better describe the system

• How can we compare different filtering procedures?

• A good filtering procedure should be able to– remove the right amount of noise from the matrix to

reveal the underlying model– be statistically robust to different realizations of the

process

Page 26: Hierarchically nested factor models

26

Kullback-Leibler distance

For multivariate normally distributed random variables we have:

Mutual information:

, where p and q are pdf’s.

Minimizing the Kullback-Leibler distance is equivalent tomaximize the likelihood in the MLFA.

We propose to use the Kullback-Leibler distance to quantify the performance of different filtering procedures of the correlation matrix

Page 27: Hierarchically nested factor models

27

By applying the theory of Wishart matrices it is possible to show that

where is the model correlation matrix of the system while S1 and S2 are two sample correlation matrices obtained from two independent realizations each of length T The three expectation values are independent from , i.e they do not

depend from the underlying model

Page 28: Hierarchically nested factor models

28

Filtered correlation matrices

We consider two filtered correlation matrices, ,both obtained by comparing the empirical correlation matrix eigenvalues with the expectations of Random Matrix Theory.

We consider two filtered correlation matrices, ,obtained by applying the ALCA and the SLCA to the empirical correlation matrix respectively.

CB and CS

CALCA and CSLCA

Page 29: Hierarchically nested factor models

29

Filtered correlation matrix (1)

Select max ;

D diagi if i max

0 otherwise

where 0 is the average of eigenvalues smaller than max

C correlation matrixD diagonal matrix of eigenvalues of C : diag 1,2,...,N V orthogonal matrix of eigenvectors of C

CBt (c i j

B ) VTD*V

CB c i j

B

c i iB c j j

B

M. Potters, J.-P. Bouchaud & L. Laloux, Acta Phys. Pol. B 36 (9), pp. 2767-2784 (2005).

Page 30: Hierarchically nested factor models

30

Filtered correlation matrix (2)

Select max;

D diagi if i max

0 otherwise

C correlation matrixD diagonal matrix of eigenvalues of C : diag 1,2,...,N V orthogonal matrix of eigenvectors of C

CSt (c i j

S ) VTD*V

CS i j c i jS 1 i j

B. Rosenow, V. Plerou, P. Gopikrishnan & H.E. Stanley, Europhys. Lett. 59 (4), pp. 500-506 (2002)

Page 31: Hierarchically nested factor models

31

Comparison of filtered correlation matrices

Block diagonal modelwith 12 factors.

N=100, T=748.

Gaussian random Variables.

Page 32: Hierarchically nested factor models

32

Comparison of filtered correlation matrices

Block diagonal modelwith 12 factors.

N=100, T=748.

Gaussian random Variables.

Page 33: Hierarchically nested factor models

33

Comparison of filtered correlation matrices

N 300 (NYSE)daily returns 2001- 2003T = 748

Page 34: Hierarchically nested factor models

34

Conclusions• It is possible to associate a time series factor model to a dendrogram, output of a hierarchical clustering algorithm• The robustness of the factors with respect to statistical uncertainty can be determined by using the bootstrap technique

• The Kullback-Leibler distance allows to compare the characteristics of different filtering procedure taking also into account the noise due to the finiteness of time series• This suggests the existence of a tradeoff between information and stability