Clustering Financial Time Series: How Long is Enough?

16
Introduction Clustering Financial Time Series: How Long is Enough? 25 th International Joint Conference on Artificial Intelligence IJCAI-16 S. Andler, G. Marti, F. Nielsen, P. Donnat July 14, 2016 Gautier Marti Clustering Financial Time Series: How Long is Enough?

Transcript of Clustering Financial Time Series: How Long is Enough?

Page 1: Clustering Financial Time Series: How Long is Enough?

Introduction

Clustering Financial Time Series:How Long is Enough?

25th International Joint Conference on Artificial IntelligenceIJCAI-16

S. Andler, G. Marti, F. Nielsen, P. Donnat

July 14, 2016

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 2: Clustering Financial Time Series: How Long is Enough?

Introduction

Clustering of Financial Time Series

Goal: Build Risk & Trading AI agents. . .

source: www.datagrapple.com

. . . which can strive with this kind of data.

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 3: Clustering Financial Time Series: How Long is Enough?

Introduction

Clustering of Financial Time Series

Stylized fact I : Financial time series correlations have a stronghierarchical block diagonal structure (Econophysics [4])

Stylized fact II : Most correlations are spurious (RMT [2])

Motivation for clustering financial time series using correlation as asimilarity measure:dimensionality reduction ≡ filtering noisy correlations

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 4: Clustering Financial Time Series: How Long is Enough?

Introduction

Challenge for the statistical practitioner

The dilemma:

the longer the time interval, the more precise the correlationestimates, but also

the longer the time interval, the more unrealistic thestationarity hypothesis for these time series.

Question: How does the clustering behave with statistical errorsof the correlation estimates?

How long is enough? 30 days? 120 days? 10 years?

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 5: Clustering Financial Time Series: How Long is Enough?

Introduction

A first theoretical approach - simplified setting

We consider the following framework:

financial time series ≡ random walksthey follow a joint elliptical distribution (e.g. Gaussian,Student) parameterized by a correlation matrixthe correlation matrix has a hierarchical block structure:

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 6: Clustering Financial Time Series: How Long is Enough?

Introduction

Simulations in the simplified setting

Some influential parameters:

clustering algorithm

number of observations T

number of variables N relative to T

contrast between the correlations, and their values

correlation estimator (e.g. Pearson, Spearman)

100 200 300 400 500Sample size

0.0

0.2

0.4

0.6

0.8

1.0

Scor

e

Empirical rates of convergence for Single Linkage

Gaussian - PearsonGaussian - SpearmanStudent - PearsonStudent - Spearman

100 200 300 400 500Sample size

0.0

0.2

0.4

0.6

0.8

1.0

Scor

e

Empirical rates of convergence for Average Linkage

Gaussian - PearsonGaussian - SpearmanStudent - PearsonStudent - Spearman

100 200 300 400 500Sample size

0.0

0.2

0.4

0.6

0.8

1.0

Scor

e

Empirical rates of convergence for Ward

Gaussian - PearsonGaussian - SpearmanStudent - PearsonStudent - Spearman

Ratio of the number of correct clustering obtained over thenumber of trials as a function of T

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 7: Clustering Financial Time Series: How Long is Enough?

Introduction

A consistency proof & first convergence bounds

A 2-step proof. First step:

We consider Hierarchical Agglomerative Clustering algorithms

Space contracting vs. Space conserving vs. Space dilating [1]D(t+1)

(C(t)i∪ C

(t)j

, C(t)k

)≤ min

{D

(t)ik

,D(t)jk

} D(t+1)(C(t)i∪ C

(t)j

, C(t)k

)∈[

min{D

(t)ik

,D(t)jk

},max

{D

(t)ik

,D(t)jk

}] D(t+1)(C(t)i∪ C

(t)j

, C(t)k

)≥ max

{D

(t)ik

,D(t)jk

}

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 8: Clustering Financial Time Series: How Long is Enough?

Introduction

A consistency proof & first convergence bounds

A 2-step proof. First step:

Which geometrical configurations lead to the true clustering?

For space-conserving algorithms (e.g. Single, Complete, AverageLinkage), a sufficient separability condition reads

max Dintra := max1≤i ,j≤NC(i)=C(j)

d(Xi ,Xj) < min1≤i ,j≤NC(i)6=C(j)

d(Xi ,Xj) =: min Dinter

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 9: Clustering Financial Time Series: How Long is Enough?

Introduction

A consistency proof & first convergence bounds

A 2-step proof. Second step:

How long does it take for the estimates of the correlationcoefficients to be precise enough to be with high probability ina good configuration for the clustering algorithm?

Answer: Concentration inequalities for correlation coefficients.

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 10: Clustering Financial Time Series: How Long is Enough?

Introduction

Convergence bounds

Combining both steps, we get the following convergence rate:

Convergence rate

The probability of the clustering algorithm making an error is

O

(√log N

T

).

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 11: Clustering Financial Time Series: How Long is Enough?

Introduction

Proof. Step 1 - A bit more details

By induction.

Let’s assume the separability condition is satisfied at step t,then

min D(t)intra ≤ max D

(t)intra < min D

(t)inter ≤ max D

(t)inter

From the space-conserving property, we get:

D(t+1)intra ∈

[minD

(t)intra,maxD

(t)intra

]and D

(t+1)inter ∈

[minD

(t)inter,maxD

(t)inter

].

Therefore:

separability condition is satisfied at t+1,

the clustering algorithm has not linked points from twodifferent clusters between step t and step t + 1.

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 12: Clustering Financial Time Series: How Long is Enough?

Introduction

Proof. Step 2 - A bit more details

Maximum statistical error

For space conserving algorithm the separability condition is met if

‖Σ− Σ‖∞ <minρi ,ρj |ρi − ρj |

2,

where C (i) 6= C (j).

This means that the statistical error has to be below the minimum

correlation ‘contrast’ between the clusters.

Weaker the ‘contrast’, more precise the correlation estimates have to be.

N.B. From Cramer–Rao lower bound, we get for Pearson correlationestimator:

var(ρ) ≥ (1− ρ2)2

1 + ρ2.

When correlation is high, it is easier to estimate.Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 13: Clustering Financial Time Series: How Long is Enough?

Introduction

Correlation estimates concentration bounds

number of variables N, observations T , minimum separation d

Concentration bounds [3]

If Σ and Σ are the population and empirical Spearman correlationmatrices respectively, then for N ≥ 24

logT + 2, we have with

probability at least 1− 1T 2 ,

‖Σ− Σ‖∞ ≤ 24

√log N

T.

P(“correct clustering”) ≥ 1− 2N2e−Td2/24

Not sharp enough! (for reasonable values of N,T , d)

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 14: Clustering Financial Time Series: How Long is Enough?

Introduction

Future developments

Bounds are not sharp enough. We can try to refine them using:

(theoretical) Intrinsic dimension of the HCBM model [5];

(empirical) A distance between dendrograms (instead ofcorrect/incorrect) for a finer analysis;

(empirical) A study of ‘correctness’ isoquants:

Precise convergence rates of clustering methodologies can providea useful model selection criterion for practitioners!

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 15: Clustering Financial Time Series: How Long is Enough?

Introduction

Zhenmin Chen and John W Van Ness.Space-conserving agglomerative algorithms.Journal of classification, 13(1):157–168, 1996.

Laurent Laloux, Pierre Cizeau, Marc Potters, andJean-Philippe Bouchaud.Random matrix theory and financial correlations.International Journal of Theoretical and Applied Finance,3(03):391–397, 2000.

Han Liu, Fang Han, Ming Yuan, John Lafferty, LarryWasserman, et al.High-dimensional semiparametric gaussian copula graphicalmodels.The Annals of Statistics, 40(4):2293–2326, 2012.

Rosario N Mantegna.Hierarchical structure in financial markets.

Gautier Marti Clustering Financial Time Series: How Long is Enough?

Page 16: Clustering Financial Time Series: How Long is Enough?

Introduction

The European Physical Journal B-Condensed Matter andComplex Systems, 11(1):193–197, 1999.

Joel A Tropp.An introduction to matrix concentration inequalities.arXiv preprint arXiv:1501.01571, 2015.

Gautier Marti Clustering Financial Time Series: How Long is Enough?