Determining the optimal dimensionality of multivariate volatility models...

24
Author's personal copy Journal of Economic Dynamics & Control 32 (2008) 279–302 Determining the optimal dimensionality of multivariate volatility models with tools from random matrix theory Bernd Rosenow Institut fu ¨ r Theoretische Physik, Universita ¨t zu Ko¨ln, Zulpicher Strasse 77, D-50923 Koln, Germany Received 1 May 2005; received in revised form 1 December 2006; accepted 26 January 2007 Available online 7 September 2007 Abstract We present a brief review of methods from random matrix theory (RMT), which allow to gain insight into the problem of estimating cross-correlation matrices of a large number of financial assets. These methods allow to determine the optimal number of principal components or factors for the description of correlations in such a way that only statistically relevant information is used. As an application of this method, we suggest two classes of multivariate GARCH-models which are both easy to estimate and perform well in forecasting the multivariate volatility process for more than 100 stocks. r 2007 Elsevier B.V. All rights reserved. JEL classification: C51; C53; G12 Keywords: Correlation estimation; Random matrix theory; Noise filtering; Multivariate volatility forecasts 1. Introduction The multivariate analysis of high dimensional data sets is an important statistical problem and has numerous applications not only in finance and economics, but also in other areas like signal transmission, climate analysis, and denoising of dynamic ARTICLE IN PRESS www.elsevier.com/locate/jedc 0165-1889/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jedc.2007.01.026 Tel.: +1 6174965390; fax: +1 6174962545. E-mail addresses: [email protected], , [email protected] (B. Rosenow).

Transcript of Determining the optimal dimensionality of multivariate volatility models...

Page 1: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

Journal of Economic Dynamics & Control 32 (2008) 279–302

Determining the optimal dimensionality ofmultivariate volatility models with tools from

random matrix theory

Bernd Rosenow!

Institut fur Theoretische Physik, Universitat zu Koln, Zulpicher Strasse 77, D-50923 Koln, Germany

Received 1 May 2005; received in revised form 1 December 2006; accepted 26 January 2007Available online 7 September 2007

Abstract

We present a brief review of methods from random matrix theory (RMT), which allow togain insight into the problem of estimating cross-correlation matrices of a large number offinancial assets. These methods allow to determine the optimal number of principalcomponents or factors for the description of correlations in such a way that only statisticallyrelevant information is used. As an application of this method, we suggest two classes ofmultivariate GARCH-models which are both easy to estimate and perform well in forecastingthe multivariate volatility process for more than 100 stocks.r 2007 Elsevier B.V. All rights reserved.

JEL classification: C51; C53; G12

Keywords: Correlation estimation; Random matrix theory; Noise filtering; Multivariate volatility forecasts

1. Introduction

The multivariate analysis of high dimensional data sets is an important statisticalproblem and has numerous applications not only in finance and economics, but alsoin other areas like signal transmission, climate analysis, and denoising of dynamic

ARTICLE IN PRESS

www.elsevier.com/locate/jedc

0165-1889/$ - see front matter r 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.jedc.2007.01.026

!Tel.: +1 6174965390; fax: +1 6174962545.E-mail addresses: [email protected], , [email protected] (B. Rosenow).

Page 2: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

imaging data. One typical problem is the estimation of cross-correlation matrices forN time series of length T. In the limit where the ratio T=N tends to infinity,traditional approaches to multivariate statistics are well founded and the traditionalmaximum likelihood estimators are applicable (Muirhead, 1982; Anderson, 1984).However, in many applications the length of time series and the number of variablesare of the same order, hence the number of correlation coefficients is about the sameas the number of available data points. Such an (almost) degeneracy of thecorrelation matrix due to the large number of time series is referred to as ‘curse ofdimensionality’. Then, the statistical fluctuations of the correlation coefficients aresignificant and turn the correlation matrix, at least partially, into a random matrix.

If a sample correlation matrix is calculated from Gaussian i.i.d. time series withfinite ratio T=N, it belongs to the Wishart ensemble of random matrices. In the limitof infinite T, N, but finite T=N, the eigenvalue distribution of Wishart matrices wascalculated by (Marcenko and Pastur, 1967; Stein, 1969). There exists a large body ofliterature, in which this result was rediscovered and generalized, for a review see (Bai,1999). In the physics literature, the eigenvalue distribution of Wishart matrices wasfirst found in (Dyson, 1971) and later on generalized in (Sengupta and Mitra, 1999).

Using insights into this random matrix aspect of high dimensional correlationestimates, a separation of noise and information is possible and has been used toobtain improved predictions of cross-correlations (Laloux et al., 2000; Rosenowet al., 2002). For practical applications, however, multivariate volatility forecasts areneeded, i.e. cross-correlations have to be estimated together with volatilities. Severalmultivariate GARCH (generalized autoregressive conditional heteroscedasticity(MV-GARCH)) models have been developed for this purpose (see Campbell et al.,1997 and the discussion below). In this article, we suggest ways how to incorporatethe knowledge about random matrix theory (RMT) based, improved correlationestimates into MV-GARCH models.

In recent years, there has been a number of approaches to the study of cross-correlations which are based on RMT. In a generic model for a financial marketwithout any correlations between time series, the sample cross-correlation matrix is arandom matrix R. Agreement between the statistical properties of the empiricalcross-correlation matrix C and the random control R is a signature for measurementnoise due to the ‘curse of dimensionality’, whereas deviations between C and Rindicate the presence of true information. In Laloux et al. (1999) and Plerou et al.(1999), the eigenvalue pdf of empirical cross-correlation matrices was compared tothe RMT prediction (Sengupta and Mitra, 1999). The largest part of the empiricaleigenvalue spectrum was found to agree well with the RMT prediction, only a fewlarge eigenvalues were found to deviate from it.

The studies (Laloux et al., 2000; Rosenow et al., 2002) tested correlation forecastsin the framework of Markowitz portfolio theory. Correlation matrices with areduced dimensionality of the parameter space, which contain only economicallymeaningful information as defined by an RMT analysis, turned out to provide muchbetter correlation forecasts than traditional estimates. Based on simulations of timeseries models, it was found that the influence of estimation noise on portfolios withnonlinear constraints is much more pronounced than in the case of linear constraints

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302280

Page 3: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

(Pafka and Kondor, 2002). The influence of noise is reduced if the ratio T=N issignificantly larger than one (Pafka and Kondor, 2003). In a comparison of differentdimensional reduction techniques (Pafka and Kondor, 2004), the filtering methoddescribed in Section 3 turned out to be the most robust one.

Whereas in the econophysics literature the necessity of dimensional reduction forcorrelation estimates and its relation to RMT was stressed, less attention was paid sofar to an accurate modeling of the volatility process together with the correlationestimates. On the other hand, in the recent economics literature correlation estimatesare mostly embedded into multivariate volatility estimates. As the influence ofrandomness on high dimensional volatility forecasts is not accounted for in thesestudies, many of the suggested models are not suitable for application to a largenumber of time series.

Often, MV-GARCH processes are used for volatility modeling. In the univariatesetting, a GARCH process relates the present volatility to past volatilities and pastreturns squared (Bollerslev, 1986). In the most general formulation of MV-GARCH,each covariance is linked to all other covariances. Thus, the number of parametersscales like the number N of time series to the forth power, and such a model is clearlynot applicable to the description of a large number of time series. While the literatureon MV-GARCH models is reviewed in detail in Section 5, we would like to mentionsome models here which either use only O!N" parameters in their originalformulation, or which can easily be modified to use noise filtered correlationmatrices as described in Section 3.

In Engle et al. (1990), a one-factor model which an ARCH description ofthe factor dynamics was used to describe the pricing of treasury bills. Inorthogonal GARCH (O-GARCH) models, the observed components of a multi-variate time series are transformed to uncorrelated components via an orthogonaltransformation, and the volatility dynamics of these uncorrelated components isdescribed by univariate GARCH processes (Alexander, 2002). In the conditionalconstant correlation (CCC) model of Bollerslev, only the parameters of theN univariate GARCH processes have to be estimated simultaneously, whereasthe time-constant correlation matrix is the unconditional correlation matrix ofGARCH residuals (Bollerslev, 1990). In the dynamical conditional correlationmodel (DCC-GARCH) (Engle and Sheppard, 2001; Engle, 2002), parameterestimation takes place as a two step process. In the first step, univariate GARCHprocesses for the individual time series are estimated, and in a second step, aGARCH process for the covariance matrix of the standardized residuals isformulated. For practical applications, the Risk Metrics estimator with equallyweighted moving averages (Longerstaey and Zangari, 1996) is used frequently. It hasno adjustable parameters and hence easily produces large dimensional covariancematrices.

In this manuscript, we go beyond previous RMT based approaches to correlationestimates in that we model the volatility dynamics together with the cross-correlations. Our novel contribution to the MGARCH literature is the formulationof MV-GARCH models with a reduced dimensionality of the correlation structure,which can be adjusted with the help of RMT tools in such a way that only

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 281

Page 4: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

statistically relevant information is used. In this way, one obtains models in whichonly O!N" parameters have to be estimated sequentially and which at the same timeprovide reliable and stable covariance estimates. In this study, we use intraday datato judge the economic relevance of our covariance forecasts. We find that slidingcorrelation models (SCMs) with a one year horizon for correlation estimates are verysuitable for tracking the volatility of an equal distributed portfolio. The same modelsallow the construction of minimum variance portfolios with only half the risk of anequally distributed portfolio.

The outline of the manuscript is as follows: In Section 2, we explain how RMT canbe used to distinguish information from noise in correlation matrices, while inSection 3 the removal of noise from empirical correlation estimates is described, andthis method of noise filtering is empirically illustrated in Section 4. In Section 5,existing MGARCH models are reviewed with emphasis on their effectivedimensionality and their applicability to a large number of time series, and twoclasses of models for multivariate volatility forecasts are defined in Section 6. InSection 7, a method for evaluating their performance is described. We present thetest results in Section 8, discuss them in Section 9, and sum up our key conclusions inSection 10.

2. RMT and correlations

Traditional multivariate statistics is not able to deal with problems, in which thenumber N of different time series is comparable to their length T. Traditionalestimators for sample correlation matrices are consistent only in the limit of T=Ngoing to infinity (Muirhead, 1982). In typical applications, one is dealing with ratiosT=N ranging from one to ten, and hence statistical fluctuations due to the lack of asufficient number of observations dominate an empirically observed correlationmatrix.

To identify the effects of randomness on the eigenvalue spectrum of an empiricalcross-correlation matrix, one considers the null hypothesis of normally distributedi.i.d. time series. As far as the eigenvalue statistics is concerned, the cross-correlationmatrix R of such time series is equivalent to a random Wishart matrix, i.e. tothe covariance matrix of standard normally distributed i.i.d. time series. Thisequivalence holds true because the influence of the N fluctuating diagonal elementsin the covariance matrix as compared to the fixed diagonal elements of the cross-correlation matrix is negligible in comparison to the fluctuating N!N # 1" off-diagonal elements. Hence, the cross-correlation matrix R has the eigenvalue pdf ofrandom Wishart matrices.

In the limit T ! 1, N ! 1, and Q $ T=N fixed, the eigenvalue pdf of Wishartmatrices converges to a limiting distribution. This limiting eigenvalue pdf is given by(Marcenko and Pastur, 1967; Stein, 1969; Dyson, 1971; Sengupta and Mitra, 1999)

r!l" $Q

2p

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!l% # l"!l# l#"

p

l(1)

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302282

Page 5: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

for l within the bounds l#olol%, where

l& $ 1%1

Q& 2

!!!!1

Q

s

(2)

are the minimum and maximum eigenvalues of R, respectively. Agreement betweenR and the empirical cross-correlation matrix C is a sign of randomness, whereasdeviations indicate the presence of economically meaningful information. Specifi-cally, eigenvalues larger than l% and the corresponding eigenvectors containinformation about true correlations and should be used for forecasting purposes.

Empirically, it was found that more than 95% of the observed eigenvalues fall inthe random matrix interval (Laloux et al., 1999; Plerou et al., 1999). In addition totheir eigenvalue pdf, which in general does depend on the specific way in which arandom matrix is defined, random matrices have universal statistical propertiesdepending on their symmetry only. For example, the distribution of spacingsbetween neighboring eigenvalues is the same for all real symmetric random matrices.Real symmetric random matrices whose elements are independently drawn from aGaussian distribution constitute the Gaussian orthogonal ensemble (GOE) and are,with respect to universal statistical properties, representative for all real symmetricrandom matrices. The investigation of universal properties is important as theeigenvalue pdf alone does not prove the randomness and lack of information in theeigenvalue spectrum (Mehta, 1991; Guhr et al., 1998) of a matrix.

In Plerou et al. (1999, 2002) the universal properties of empirical cross-correlationmatrices were studied. For a cross-correlation matrix calculated from two years ofintraday data for 1000 stocks, it was found that both the nearest neighbordistribution as well as long range spectral correlations show good agreement with theuniversal predictions for the GOE (Plerou et al., 1999, 2002), indicating that themain part of the eigenvalue spectrum indeed does not contain economicallymeaningful information. In addition, eigenvectors with eigenvalues in the central,random part of the spectrum are found to have components of similar size (Plerouet al., 1999, 2002). The distribution of the components of these eigenvectors isdescribed by a normal distribution (Laloux et al., 1999; Plerou et al., 2002). Incontrast, eigenvectors with eigenvalues at the edges of the spectrum were found to belocalized, i.e. to be dominated by a few large components. The eigenvectorsbelonging to the twelve largest eigenvalues were found to have an economicinterpretation (Gopikrishnan et al., 2001): the eigenvector with the largest eigenvaluedescribes correlations permeating the whole stock market and is similar to a marketindex. The next eigenvector describes correlations between companies with largemarket capitalization, whereas the remaining eigenvectors with large eigenvaluescould be identified with industry sectors or were found to describe geographicalcorrelations (Gopikrishnan et al., 2001; Plerou et al., 2002). These economicallymeaningful correlations are stable in time and hence suitable for forecasting futurecorrelations. As correlation matrices are not directly influenced by the volatilitydynamics on short time scales, they generally show a higher degree of time stabilitythan covariance matrices, which are fluctuating due to time changing volatility.

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 283

Page 6: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

3. Noise filtering of correlation matrices

As the RMT analysis of correlation matrices discussed in the last section suggeststhat economically relevant information resides mostly in the large eigenvalues andcorresponding eigenvectors, a filtering process keeping only this information anddiscarding the noise in the rest of the spectrum is called for. In this section, a filteringalgorithm based on principal component analysis will be described. A review oftraditional methods for forecasting correlation matrices can be found in Elton andGruber (1995) and Campbell et al. (1997).

In order to use only the statistically relevant information in the correlation matrixC, we diagonalize C by an orthogonal transformation U via UTCU $ K. Here, thekth column of U is the eigenvector u!k" with eigenvalue lk. We construct a filteredcorrelation matrix Cp by keeping only the p largest eigenvalues in the diagonalmatrix (Rosenow et al., 2002)

Lpii $

Lii for i4N # p;

0 for ipN # p;

(

(3)

and by transforming back to the original basis Cp $ UKpUT. To satisfy therequirement that every time series is fully correlated with itself, the diagonal elementsof Cp are set to one. Here, p should be chosen such that lp is the smallest eigenvalueclearly above the upper bound l% of the noise spectrum. For instance, in theeigenvalue spectrum Fig. 4 there is only one eigenvalue clearly above the upper edgeof the noise part of the spectrum, and two eigenvalues are close to the upper edge.

Alternatively, the rational behind this filtering procedure can be understood byapplying principal component analysis to the normalized returns ~!i;t $ !!i;t # h!ii"=si.Due to normalization, we have C $ Cov!f ~!ig", i.e. the correlation matrix is thecovariance matrix of the normalized returns. After rank ordering the eigenvalues, theeigenvectors u!N", u!N#1"; . . . ; u!N#p%1" correspond to the p largest eigenvalues lN ,lN#1; . . . ; lN#p%1. The eigenvectors define principal components w!k" $ ~e ' u!k", where~e is the T by N matrix of time series. Using the p principal components with thelargest eigenvalues, the normalized time series can be decomposed as

~!i;t $Xp#1

l$0

u!N#l"i w!N#l"

t % ri;t, (4)

where the residuals ri;t are defined by this decomposition. As the principalcomponents w!k", N # pokpN, are uncorrelated among each other and with theresiduals ri, the correlation matrix C can be expressed as

C $ UKp UT % Cov!frig". (5)

However, according to the RMT analysis of C, p was chosen such that the p largestprincipal components capture all the correlation information contained in C. Hence,the off-diagonal elements of Cov!frig" only contain noise and the residual time seriesri;t should be considered uncorrelated, i.e. their covariance matrix should havevariances s2ri on the diagonal and be zero elsewhere. Thus, the filtered correlation

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302284

Page 7: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

matrix is given by

Cp $ UKpUT % diag!fs2rig". (6)

However, adding the variances of the residuals on the diagonal amounts to justsetting the diagonal elements equal to one by hand. This filtering approach is similarto principal component based approaches like O-GARCH in that it uses only themost relevant principal components for the description of correlations. However,there are two differences: First, the residual variances fs2rig are kept and only residualcovariances are discarded. In this way, the method works well for weakly correlatedtime series, where O-GARCH may suffer from identification problems. Second, thenumber of relevant principal components is derived from theory and not from asomewhat arbitrary empirical rule.

4. Empirical illustration of noise filtering

The quality of correlation predictions can be judged either with respect to theirstatistical significance or with respect to their economic significance. While thestatistical significance of a correlation prediction is related to the difference betweenprediction and realization with respect to a given matrix norm, the economicsignificance is related to the ability of predicting the risk of a portfolio. Most studiesso far have investigated the economic significance of correlation forecasts by usingthe Markowitz (1959) portfolio theory to design minimum variance portfolios andby comparing the predicted risk of these portfolios to their realized risk. Toempirically demonstrate the virtues of noise filtering as described in the last section,we perform an analysis in the spirit of Laloux et al. (2000) and Rosenow et al. (2002).

Specifically, we study: (i) the error in predicting the variance of an equallydistributed portfolio, (ii) the error in predicting the variance of a minimum varianceportfolio, and (iii) the realized variance of minimum variance portfolios. In order todisentangle the influence of correlation predictions versus the influence of variancepredictions, we always use actually realized variances to calculate covariancematrices from correlation matrices. In this way, only the accuracy of correlationpredictions determines the final result and possible errors in the prediction ofvariances do not interfere. While a variance prediction for equally distributedportfolios is only sensitive to the average correlation strength, variance predictionsfor minimum variance portfolios depend on the whole structure of the correlationmatrix and are a more powerful test for correlation predictions.

We define a portfolio as a selection of N stocks with weight mi and the constraintPNi$1mi $ N. The variance D2 of such a portfolio is given by

D2 $XN

i;j%1

Sijmimj with Sij $ Cijsisj. (7)

Here, the fsig are the standard deviations of individual stocks. In an equallydistributed portfolio, all stocks have equal weight mi ( 1=N, whereas in a minimum

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 285

Page 8: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

variance portfolio, the weights are given by

mi $PN

j$1S#1i;jPN

i;j$1S#1ij

. (8)

For both equally weighted and minimum variance portfolios, we calculate theestimated variance D2

est, the realized variance D2real, and the mean average percentage

error (MAPE) jD2est #D2

realj=D2real.

For our empirical study, we use daily data for the years 1991–2000 for the 238most actively traded German stocks. We compare different predictions for theactually realized correlation matrices Creal of five one year testing periods during theyears 1996–2000. To obtain forecasts for them, we calculate correlation matrices Cest

for: (i) a one year estimation period prior to the testing period, and (ii) a five yearestimation period prior to the testing period. For the testing year 2000, theeigenvalue spectra of the one-year estimate, the five-year estimate, and the realizedmatrix are shown in Fig. 1 together with the respective random matrix predictions.We make two key observations. First, the largest eigenvalue lmax varies significantlyfrom 39.0 for the five year estimate 1995–1999 to 22.0 for the one-year estimate 1999to 16.2 for the one-year realization 2000. Second, due to the smaller width of therandom matrix part of the eigenvalue spectrum, there are significantly moreeigenvalues above the upper bound for the five year estimate (nine eigenvalues) as forthe one-year estimate (five eigenvalues). From the first observation it follows that ashort estimation period is desirable in order to accurately track changes in the overallcorrelation strength described by the largest eigenvalue, whereas the secondobservation implies that estimation noise is more pronounced for short estimationperiods.

The empirical results are shown in Tables 1 (one year estimation period) and 2(five year estimation period). For each estimation period, we calculate the numberprmt of eigenvalues larger than the upper bound l% of the random matrix part of thespectrum. We compare this with the number popt of eigenvalues kept in the filteringalgorithm, which minimizes the MAPE of the variance of minimum varianceportfolios. For the one year estimation period, we find that prmt and popt agree closelywith each other, while we find reasonable agreement for the five year estimationperiods. The discrepancy between prmt and popt for the five year estimation periods1991–1995, 1992–1996 and the respective testing periods hints at the possibility thatincluding only eigenvalues clearly separated from the bulk (about eight in both cases)might be a better choice for practical applications than the inclusion of alleigenvalues larger than l%. The full dependence of the MAPE on p for the 2000testing period is shown in Fig. 2, the MAPE becomes minimal for p ) prmt in thatcase. For both one year and five year forecasts, the relative forecast error MAPErmt

obtained from filtered matrices with prmt eigenvalues kept is significantly smallerthan the forecast error MAPEN for unfiltered estimates with p $ N.

In addition, we have calculated the predicted risk D2rmt of minimum variance

portfolios with p $ prmt, and the realized risk D2rmt;real for these portfolios. When

comparing the one year predictions with the five year ones, we find that the five year

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302286

Page 9: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

ARTICLE IN PRESS

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

pdf !

max

0.2

0.4

0.6

0.8

1

pdf !

max

5 10 15 20 25 30 35

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

pdf

eigenvalue

!max

Fig. 1. Eigenvalue spectrum of the cross-correlation matrix from daily data for 238 German stocks (greyline): (a) for the one year prediction period 1999, (b) for the five year prediction period 1995–1999, and(c) for the testing period 2000. The noise part of the spectrum (black line) of the five year period (b) iscompressed as compared to the noise part of the spectrum for the one year period (a), and moreeigenvalues are outside the noise part in (b) as compared to (a). In addition, the largest eigenvalue lmax in(a) is closer to the realization (c) as that of (b).

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 287

Page 10: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

predictions have a smaller prediction error four out of five times, but that the oneyear predictions have an about twenty percent smaller realized risk four out of fivetimes. We interpret the former finding as indication for a higher information contentof five year predictions, and the latter finding as evidence for a more accurateprediction of the actual correlation strength of the one year predictions. Thedependence of the realized risk of minimum variance portfolios on the number p ofeigenvalues kept in the filtering algorithm is shown in Fig. 3, the realized risk

ARTICLE IN PRESS

Table 1Estimation of the variance of minimum variance (upper part) and of equally distributed (lower part)portfolios, one year estimation period

Test period 2000 1999 1998 1997 1996

prmt 5 5 4 4 5popt 3 5 5 4 6

D2est;rmt * 105 0.303 0.407 0.544 0.437 0.292

D2real;rmt * 105 0.403 0.553 1.09 0.748 0.502

MAPErmt 0.249 0.264 0.501 0.415 0.418

D2est;N * 105 0.022 0.023 0.020 0.027 0.016

D2real;N * 105 5.43 11.6 15.8 9.58 17.4

MAPEN 0.996 0.998 0.999 0.997 0.999

lmax;est 22.0 39.0 43.4 30.5 31.8lmax;real 16.2 22.0 39.0 43.4 30.5MAPErmt;equal 0.341 0.995 0.025 0.337 0.142

MAPEN;equal 0.299 0.960 0.010 0.354 0.121

For the calculation of minimum variance portfolios, variances realized in the testing period were used.

Table 2Estimation of the variance of minimum variance (upper part) and of equally distributed (lower part)portfolios, five years estimation period

Test period 2000 1999 1998 1997 1996

prmt 9 9 13 15 19popt 10 8 8 9 2

D2est;rmt * 105 0.391 0.438 0.760 0.687 0.449

D2real;rmt * 105 0.527 0.630 1.04 0.854 0.602

MAPErmt 0.258 0.305 0.270 0.195 0.255

D2est;N * 105 0.332 0.355 0.602 0.500 0.309

D2real;N * 105 0.646 0.973 1.26 1.15 1.10

MAPEN 0.486 0.635 0.533 0.564 0.718

lmax;est 31.8 34.9 32.2 28.6 29.9lmax;real 16.2 22.0 39.0 43.4 30.5MAPErmt;equal 1.082 0.814 0.174 0.252 0.364

MAPEN;equal 1.032 0.775 0.192 0.269 0.333

For the calculation of minimum variance portfolios, variances realized in the testing period were used.

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302288

Page 11: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

becomes minimal for p ) prmt. For comparison, we have calculated D2est;N and D2

real;Nfor unfiltered estimates with p $ N. For both one and five year predictions, one findswithout exception that the filtered estimates have a significantly lower realized riskthan the unfiltered prediction.

For filtered correlation matrices with prmt eigenvalues kept, we calculate theprediction error MAPErmt;equal for the variance of equally distributed portfolios. Onesees that this prediction error is small when predicted and realized largest eigenvalueare close to each other, and large when they are far apart. It seems that an accurateprediction of the largest eigenvalue is important for the estimation of the variance ofequally distributed portfolios. On average, correlation estimates from one year

ARTICLE IN PRESS

0 10 20 30 40 50 60

0.2

0.3

0.4

0.5

0.6

0.7

p

MA

PE

Fig. 2. MAPE of one year prediction period (grey) line and five year prediction period (black line) forforecasting the volatility of minimum variance portfolios for the year 2000, as a function of the number pof eigenvalues kept in the filtering algorithm.

0 10 20 30 40 50 60

4·10-6

4.5·10-6

5·10-6

5.5·10-6

p

realize

d v

ari

an

ce

Fig. 3. Realized variance of minimum variance portfolios in the year 2000 for a one year prediction period(grey line) and a five year prediction period (black line), as a function of the number p of eigenvalues keptin the filtering algorithm.

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 289

Page 12: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

periods are more accurate than correlation estimates from five year periods. Whencomparing with the prediction error MAPEN ;equal of unfiltered correlation matriceswith that of filtered matrices, we find no significant difference. It is to be expectedthat noise filtering has little influence on the variance of equally distributedportfolios, as this variance is only sensitive to the average strength of correlations,which is not changed by discarding random contributions in the filtering algorithm.

In summary, we draw the following conclusions: (i) filtering of correlationmatrices with keeping all eigenvalues larger than the upper bound of the RMTspectrum significantly increases the accuracy of correlation estimates as compared tounfiltered matrices, and (ii) a short estimation period is desirable to capture thedynamical evolution of the correlation strength.

5. MV-GARCH models

The estimation of large dimensional covariance matrices is a key element in theprocess of estimating and minimizing portfolio risk. Although modern riskmanagement is founded on the analysis of huge data bases and the use ofsophisticated theoretical models, there has been only partial progress with respect tocovariance forecasts. The reason for this lack of progress is the ‘curse ofdimensionality’, as in modern financial engineering the dimension of the investmentuniverse is comparable to the number of observations.

Much research has been devoted to describing the volatility process with the helpof MV-GARCH models, which relate present volatilities to both past volatilities andpast returns squared. Here, MV-GARCH models are reviewed with emphasis onboth the dimensionality of their parameter space and the possibility to estimate themfor a large number of time series.

In the most general setup of an MV-GARCH model, each covariance is linked toall other covariances, and the number of parameters scales like the number N ofassets to the forth power. Linking a given covariance only to past values of itself andthe respective product of returns, the number of parameters is reduced to O!N2" inthe diagonal VECH model (Bollerslev et al., 1988). Another model specification withO!N2" parameters is obtained when the covariance matrices in a GARCH setup aretransformed by quadratic forms (Engle and Kroner, 1995). If these transformationmatrices are of rank one, one obtains a one-factor model with O!N" parameters to beestimated simultaneously. In Engle et al. (1990), such a model was applied to thepricing of Treasury bills by using factors with prespecified weights. In the CCCmodel of Bollerslev, N univariate GARCH processes for the individual time seriesare estimated. The covariance matrix is obtained by multiplying the volatilities of theindividual time series with the unconditional correlation matrix of the GARCHresiduals (Bollerslev, 1990).

By transforming the observed components of a time series to uncorrelatedcomponents via an orthogonal transformation, the volatility dynamics can bedescribed by specifying univariate GARCH processes for these uncorrelatedcomponents and then transforming back to the original basis. If the transformation

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302290

Page 13: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

matrix of such an O-GARCH model is estimated from unconditional informationlike principal components of the correlation matrix, the parameters describing thevolatility dynamics of the uncorrelated components can be estimated sequentially. Inapplications to strongly correlated time series like term structure data or commodityfutures with different maturity, it can be sufficient to keep O!1" uncorrelatedcomponents (Alexander, 2002).

While working well for highly correlated data sets, O-GARCH suffers from anidentification problem for weakly correlated time series. By relaxing the orthogon-ality condition on the transformation relating correlated observed and uncorrelatedunobserved components, generalized orthogonal GARCH (GO-GARCH) (van derWeide, 2002) is able to avoid the above-mentioned identification problems.However, the nonorthogonal transformation is estimated from conditionalinformation and thus the estimation complexity increases. The method was shownto work well when applied to modeling a small number of time series.

Bollerslev’s CCC model has been generalized to allow for a time evolution of thecorrelation matrix with a GARCH like structure (Tse and Tsui, 2002). Testing thisvarying correlation MGARCH (VC-MGARCH), the above authors indeed findevidence that the assumption of constant correlations has to be rejected. However, inthe present specification of the mode, O!N2" parameter have to be estimatedsimultaneously, making VC-MGARCH not suitable for the estimation of highdimensional covariance matrix.

In the dynamical conditional correlation model (DCC-GARCH) (Engle andSheppard, 2001; Engle, 2002), the authors suggest to first estimate GARCHparameters for the individual time series. Next, a GARCH process for the covariancematrix of the standardized residuals is estimated. Here, the unconditional covariancematrix of the residuals is used as the constant term, and the GARCH parameters forall matrix elements are the same. Hence, only O(1) parameters have to be estimatedin the second step. For this reason, the model is suitable for application to a largenumber of financial assets and is the only of the above mentioned models whichactually has been empirically studied for up to one hundred time series.

In practical applications, the Risk Metrics estimator with equally weighted movingaverages (Longerstaey and Zangari, 1996) is used frequently. As it has no adjustableparameters, it can be easily applied to the estimation of high dimensional covariancematrices.

6. Model description

The RMT approach to correlation matrices provides a rational for the use ofparsimonious models with a reduced dimensionality of the parameter space. A lowdimensional parameter space has the double benefit of more reliable forecasts due tothe removal of noise and the applicability of models to a large number of time seriesdue to a reduction of estimation complexity.

Of the above discussed models, the factor model (Engle et al., 1990) andO-GARCH (Alexander, 2002) employ a dimensionally reduced correlation structure.

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 291

Page 14: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

Despite the usage of a high dimensional correlation matrix, the CCC model(Bollerslev, 1990) and DCC-GARCH (Engle and Sheppard, 2001) can be easilyestimated even for a large number of time series. In the following, we suggest twoclasses of MV-GARCH models, which combine elements of these models with thepossibility to use an optimal number p of principal components or factors asdetermined by an RMT analysis of the correlation matrix.

The basis of the Sliding Correlation Model (SCM) is the decomposition of thecovariance matrix into the cross-correlation matrix C and standard deviations fsig asfirst suggested in Bollerslev (1990). Each individual time series is described by aunivariate GARCH(1,1) (Bollerslev, 1986) process defined by

s2i;t $ ai;0 % ai;1s2i;t#1 % bi;1!2i;t#1. (9)

Here, s2i;t and !i;t are the volatility and the innovation of time series i at time t,respectively. The innovations are modeled as a product !i;t $ si;t ' xi;t with normallydistributed x 2 N!0; 1". As the emphasis of the present article is on comparingdifferent multivariate volatility models, we make the simplifying normalityassumption although empirical studies suggest the presence of leptokurtosis inGARCH residuals, see e.g. Nelson (1991). The parameters ai;0, ai;1, and bi;1 areobtained from maximum likelihood estimations. In order to take into account thedynamics of the correlation strength, the cross-correlation matrix C is calculated as amoving average to accommodate the change of correlations in time. We use thefiltering method described in Section 3 and keep only the p largest principalcomponents. The covariance matrix in the SCM!p" of rank p is described by

Sij;t $ Cpij;t si;t sj;t. (10)

This model is related to the constant correlation model (Bollerslev, 1990), where afixed and unfiltered correlation matrix is used in contrast to our Cp

ij;t.As a second class of models with a reduced dimensionality of the covariance

structure, we consider GARCH factor models (Engle et al., 1990). If the marketindex is one of the factors, these models have the attractive property that the strengthof correlations is tied to market volatility. Considering only the market index as afactor for the moment, one finds Cij $ s2factorbibj=!sisj", where s

2factor is the volatility

of the market factor, s2i is the volatility of stock i, and bi is the regression coefficientof stock i on the market factor. When the market volatility increases more stronglythan the individual volatilities, the correlation coefficients increase in such a model inagreement with empirical observations (Drozdz et al., 2000; Plerou et al., 2002).

We consider GARCH factor models with p factors (abbreviated Factor!p")

Sij;t $ ~s2i;tdij %Xp

k$1

l!k"i l!k"j S!k"t . (11)

Here, ~s2i;t is the variance of the residual of time series i after linear regression onthe p factors f !k"t with k $ 1; . . . ; p, and l!k"i is the regression coefficient (factorloading) of factor k on time series i. The dynamics of the residuals is describedby a parsimonious GARCH(1,1)-process, and the dynamics of the factors is

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302292

Page 15: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

similarly given by

S!k"t $ a!k"0 % b!k"1 S!k"

t#1 % a!k"1 !f !k"t#1"2. (12)

The factors f !k"t are defined via an iterative procedure: we use the largest eigenvectoru!N" of C to define f !1"t $

PNi$1u

!N"i !i;t. Next, we perform a linear regression of f !1" on

the !i and recompute the correlation matrix from the residuals. From the neweigenvector corresponding to the largest eigenvalue, we define a factor f !2", performanother linear regression, and so on. Such an iterative procedure of calculatingfactor loadings was found to provide an intuitive decomposition of correlations inthe American stock market, whereas the additional orthogonality condition imposedon eigenvectors of the correlation matrix can lead to less intuitive results like themixing of two business sectors in one eigenvector (Gopikrishnan et al., 2001; Plerouet al., 2002). In contrast to the correlation matrix of the SCM models, the timeinterval for the calculation of factors and factor loadings is not moving but fixed.For both the SCM and the factor models, the number p of principal components orfactors should be chosen according to the number of eigenvalues of the correlationmatrix outside the RMT-interval.

We compare the SCM!p" and Factor!p" models to the commonly used RiskMetrics covariance estimator (Longerstaey and Zangari, 1996) defined by

Sij;t $ 0:94 Sij;t#1 % 0:06!i;t!j;t. (13)

This model is easy to estimate and provides an example of a model which iscompletely dominated by estimation noise when applied to a large number of assets:fluctuations of the covariance strength have a decay constant of approximately 16time steps, implying that for significantly more than 16 time series the estimatedcovariance matrix is almost singular.

7. Test method

We judge the quality of a multivariate volatility model by its ability to forecast thedaily variance of both equally distributed and minimum variance portfolios. Thepredicted portfolio volatility at time t is given by

D2t;predicted $

XN

i;j$1

mi;t mj;tSij;t. (14)

Here, mi;t is the fraction of the capital invested in stock i at time t. For an equallydistributed portfolio, we have mi;t ( 1=N, hence D2 is just the average element of thecovariance matrix. This test probes the ability of a model to correctly predict theaverage covariance between stocks.

On the other hand, the choice

mi;t $XN

j$1

S#1ij;t

XN

k;l$1

S#1k;l

,

, (15)

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 293

Page 16: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

minimizes the variance under the constraint that the total invested money is equal toone. The volatility prediction for minimum variance portfolios is a more powerfultest than the prediction for equally distributed portfolios, as it not only probes theprediction of the average covariance, but of the whole covariance structure: thecovariance matrix is first used to calculate the portfolio weights, and then used toestimate the variance of that portfolio.

For both types of portfolios, we compare the predicted volatility for a given day tothe realized volatility on that day, which is calculated from high frequency (hourly)data. This type of comparison was shown to provide for a good assessment of theprediction accuracy of univariate GARCH models (Anderson and Bollerslev, 1997).The realized portfolio variance at time t is calculated as

D2t;realized $

XN

i;j$1

mi;tmj;t!hgi;tgj;ti# hgi;tihgj;ti", (16)

where the expectation values are taken over 10 rescaled hourly returns gi;t!h" $!!!!!10

pln+si;t!h% 1"=si;t!h", per trading day, and si;t!h" is the price of stock i on day t and

hour h $ 0; . . . ; 10. We calculated the average predicted portfolio variance, theaverage realized variance, the mean square error (MSE) of the prediction and theMAPE.

For equally distributed portfolios, the quality of a volatility model is judged fromits tracking error with respect to the ‘true’ portfolio volatility calculated fromintraday data. With respect to minimum variance portfolios, the different volatilitymodels are judged with respect to their ability to: (i) produce portfolios with a lowaverage variance, and (ii) to correctly predict this variance.

8. Empirical results

For our empirical analysis, we use two different data sets. Data set one comprisesdaily closing prices of the 118 most frequently traded German stocks for the periodfrom 12/01/93 until 08/31/01. Data set two contains hourly returns for the same stocksfor the period 09/01/00 until 08/31/01, it is used for the estimation of daily volatilitiesin an out of sample test. The time period 12/01/93 until 08/31/00 contained in data setone is used for parameter estimation, and the year 09/01/00 until 08/31/01 for out ofsample covariance predictions from the SCM and factor models. The 118 stocks havebeen chosen in such a way that for the less frequently traded ones there is at least onetransaction per hour in fifty percent of the one-hour intervals.

In a first step, we use the estimation period 12/01/93 until 08/31/00 contained indata set one for the calculation of correlation matrices, whose eigenvalue spectrum isthen compared to RMT predictions. We define daily returns !i;t $ ln si;t # ln si;t#Dtwith Dt $ 1day, and calculate the sample covariance matrix

Sij $1

!T # 1"

XT

t$1

!!i;t # h!ii"!!j;t # h!ji". (17)

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302294

Page 17: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

The cross-correlation matrix C is defined as Cij $ Sij=!sisj" by normalizing with thestandard deviations si $

!!!!!!Sii

p. We calculate cross-correlation matrices Ca, Cb from:

(a) T $ 1711 daily returns !i;t starting 12/01/93 until 08/31/00, and (b) a subset ofT $ 250 daily returns starting 09/01/99 until 08/31/00. We diagonalize Ca, Cb andrank-order their eigenvalues liolk for iok. The eigenvalue pdf of Cb is displayed inFig. 4. The bulk of eigenvalues is due to noise and well described by the prediction ofRMT (black line). The largest eigenvalue lmax;b $ 7:1 is clearly above the RMTprediction, two more eigenvalues are just separated from the bulk. The spectrum ofCa looks similar to that of Cb, with the main difference that the largest eigenvaluelmax;a $ 14:5 is much larger than in the case of the one year interval. This confirmsthe result of (Plerou et al., 1999, 2002; Drozdz et al., 2000) that the strength ofmarket correlations changes in time. In both cases, we find that three eigenvalues lieoutside the random matrix interval, but only one of them clearly. In previousempirical studies (Gopikrishnan et al., 2001; Plerou et al., 2002) it was found that thetime stability of eigenvectors decreases when the corresponding eigenvalueapproaches the upper edge of the RMT spectrum. In agreement with this result,the prediction quality of filtered correlation matrices calculated according to thealgorithm described in Section 3 does not change much when eigenvalues barelyabove the RMT edge are included, whereas it decreases with the inclusion ofeigenvalues below the RMT edge, see (Rosenow et al., 2002) and the discussion inSection 4. As a conclusion, it seems best to choose p such that only eigenvaluesclearly above the upper RMT edge are included. For the data sets studied here, thisimplies that p $ 1 should yield best results, and p $ 2 should be as good. Choosingp43 should lead to a decrease in prediction quality. As the estimation of GARCHprocesses is too complex for considering many different values of p, we restrictourselves to p $ 1; 2 for both SCM and factor models, and in addition p $ N for theSCM models.

ARTICLE IN PRESS

0

0.2

0.4

0.6

0.8

1

0

eigenvalue

pd

f

2 4 6 8

Fig. 4. Probability distribution of the eigenvalues of the cross-correlation matrix Cb calculated from 118daily return time series from 09/01/99 until 08/31/00 (gray line). The bulk of eigenvalues is due to noise andwell described by the prediction of random matrix theory (black line). One eigenvalue is clearly separatedfrom the bulk and contains information about market correlations, two more eigenvalues are justseparated from the bulk.

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 295

Page 18: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

From Eq. (2) one sees that the length of the interval +l#; l%, shrinks to zero whenQ ! 1. The smaller Q gets, the larger is the influence of noise and fewer eigenvaluesof the empirical matrix are expected to lie above the RMT prediction. At first sight,the conclusion seems to be that the time period over which the cross-correlationmatrix is estimated should be as long as possible. However, we have seen that thelargest eigenvalue and hence the strength of market correlations changes in time aslmax;a $ 14:5 for the seven year interval is much larger than lmax;b $ 7:1 for the oneyear interval. The insight gained from an RMT analysis of cross-correlations helps instriking a compromise between the requirement of a long estimation window forincreased statistical accuracy and a short estimation window for capturing thedynamics of cross-correlations.

The data set for parameter estimation comprises seven years of daily data from12/01/93 until 08/31/00. These data are used for the maximum likelihood estimationof all GARCH parameters and for the calculation of factors and factor loadingsof the models Factor!p"-seven. For the models Factor!p"-one, only the last yearof daily data is used for estimating factors and factor loadings, but the fulldata set is used for estimating GARCH parameters. The cross-correlation matrixof model SCM!p"-seven is calculated over a sliding time window of lengthseven years, and a sliding window of length one year is used for model SCM!p"-one. We would like to note that the estimation of a given SCM or factor model takesless than half an hour on a PC. We perform an out of sample test from 09/01/00 until08/31/01.

Results for equally distributed portfolios: all models under study overestimate therealized variance. The SCMs with an estimation window of seven years have thelargest forecast error. For these models, the average estimated variance is about 65percent larger than the average realized variance, and the MAPE ranges from 86%to 92%. It seems that the origin of this error is the overestimation of cross-correlations, which apparently were much higher in the seven years preceding the testperiod than in the test period itself, see the discussion of the largest eigenvalue ofseven- and one-year correlation estimates. A similarly strong overestimation of thevariance of minimum variance portfolios was found for the five year estimationperiod 1995–1999, when applied to the testing period 2000, see Table 2.

The SCMs with a one year estimation window perform much better than thosewith the seven year window: the average estimated variance is only 22% and 26%higher than the realized one, and the MAPE lies between 46% and 49%. Theirprediction accuracy is higher than that of the Risk Metrics estimator, which has aMAPE of 53%. For all SCM models, dimensional reduction is unimportant forforecasting equally distributed portfolios, i.e. the results are similar for p $ 1, 2, andN. This is expected, as only the average matrix element is probed, and reducing thecovariance matrix to its average is already an extreme form of dimensional reductionin itself.

In contrast to SCM models, the performance of factor models with a seven yearestimation period is comparable to that of models with a one year estimation period.This result indicates that factor models are indeed able to describe the timedependence of the average correlation strength. The average predicted portfolio

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302296

Page 19: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

variance is between 43% and 45% larger than the average realized variance, and theMAPE ranges from 61.5% to 62.5%. This prediction accuracy lies between that ofthe SCM-one models, which provide the best forecasts, and the SCM-seven models,which provide the worst forecasts. However, from the day by day comparisonbetween predicted and realized variance in Fig. 5, one can see that the factor model isable to reproduce the effect of volatility shocks better than the SCM. Details for allmodels can be found in Table 3.

Minimum variance portfolios: Here, we need to distinguish between: (i) the abilityof a model to produce portfolios with a low average realized variance, and (ii) theability to correctly predict this variance. With respect to: (i), by far the best resultsare obtained with models SCM(1)-one and SCM(2)-one: the realized variance is 47%lower than that of an equally distributed portfolio, a quite remarkable achievement.In contrast, the average variance of model SCM!N"-one without any noise filtering isonly 28% lower than that of an equally distributed portfolio. The influence of

ARTICLE IN PRESS

vari

an

ce

0.0

0.5 ·10"4

1.0 ·10"4

09/00 12/00 03/01 06/01

date

vari

an

ce

0.5 ·10"4

0.0

1.0 ·10"4

Fig. 5. Prediction of daily volatility of equally weighted portfolios. The black line shows the estimatedvolatility, the thin grey line the realized volatility calculated from high frequency data, and the thick greyline is an seven-day average of the realized volatility. (a) Sliding correlation model SCM(1)-seven, whichuses one principal component and is estimated over a seven-year period, overestimates the variance by65% due to an overestimation of the average correlation strength. (b) Model Factor(1)-seven, which usesone factor and is estimated over a seven-year period is able to describe changes in the market volatility buttends to overestimate volatility in the time after a volatility peak.

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 297

Page 20: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

estimation noise on minimum variance portfolios is most pronounced in the RiskMetrics result: the average realized portfolio variance is 14% larger than that of anequally distributed portfolio, i.e. minimizing the variance with respect to the RiskMetrics leads to an increase of risk with respect to an equally distributed portfolio.The reduction of average variance achieved with the SCM-seven models lies between38% and 40%; here, the influence of noise filtering is much less pronounced than forthe SCM-one models, as the ratio of time series length to number of assets is largerby a factor of seven. The results for the various factor models are again similar toeach other: the average realized variance is between 21% and 24% percent lowerthan that of an equally distributed portfolio, see Table 4.

ARTICLE IN PRESS

Table 3Estimation of the variance of equally distributed portfolios with MV-GARCH models

Model D2est * 104 D2

real * 104 MSE* 108 MAPE

SCM(1)-seven 0.654 0.396 0.092 0.886SCM(2)-seven 0.666 0.396 0.098 0.916SCM!N"-seven 0.643 0.396 0.086 0.856SCM(1)-one 0.497 0.396 0.037 0.486SCM(2)-one 0.500 0.396 0.037 0.493SCM!N"-one 0.484 0.396 0.034 0.456

Factor(1)-seven 0.568 0.396 0.063 0.615Factor(2)-seven 0.576 0.396 0.065 0.634Factor(1)-one 0.571 0.396 0.062 0.618Factor(2)-one 0.575 0.396 0.064 0.629

Risk Metrics 0.519 0.396 0.057 0.535

Table 4Estimation of the variance of minimum variance portfolios with MV-GARCH models

Model D2est * 104 D2

real * 104 MSE* 108 MAPE

SCM(1)-seven 0.177 0.243 0.022 0.335SCM(2)-seven 0.188 0.238 0.025 0.415SCM!N"-seven 0.190 0.245 0.029 0.409SCM(1)-one 0.158 0.208 0.016 0.340SCM(2)-one 0.157 0.208 0.017 0.338SCM!N"-one 0.128 0.284 0.046 0.479

Factor(1)-seven 0.215 0.304 0.033 0.318Factor(2)-seven 0.238 0.303 0.035 0.375Factor(1)-one 0.212 0.300 0.031 0.319Factor(2)-one 0.220 0.312 0.033 0.305

Risk Metrics 0.013 0.453 0.241 0.967

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302298

Page 21: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

With respect to the ability to predict the realized variance, the effect of noisefiltering on models with a comparatively short estimation period is quite dramatic:whereas the models SCM(1)-one and SCM(2)-one have a MAPE of only 34%,SCM!N"-one without noise filtering has a MAPE of 47%, and the Risk Metricsestimator has a MAPE of 97%, i.e. the realized risk is larger by a factor of 35 ascompared to the predicted risk (Fig. 6a). The explanation for this poor performanceis the short decay constant of approximately 16 days in its exponential smoothingaverage. The resulting Q ) 0:15 explains that the Risk Metrics covariance estimatoris almost entirely dominated by noise.

On the other hand, for the SCM models with a seven year estimation period, theeffect of noise filtering is not very pronounced: SCM!N"-seven has a MAPE of40.9%, which lies between that of SCM(1)-seven and SCM(2)-seven. The fact thatnoise filtering is not necessary for the seven year estimation period is due to a large

ARTICLE IN PRESS

vari

an

ce

0.0

1.0·10"4

0.5·10"4

1.0·10"4

0.5·10"4

09/00 12/00 03/01 06/01

vari

an

ce

date

0.0

Fig. 6. Prediction of daily volatility of minimum variance portfolios. The black line shows the estimatedvolatility, the thin grey line the realized volatility calculated from high frequency data, and the thick greyline is an seven-day average of the realized volatility. (a) Risk Metrics estimator Eq. (13) predicts a verysmall variance for a minimum variance portfolio, while the realized variance is a factor 35 larger.(b) Sliding correlation model SCM(1)-one uses one principal component and estimates correlations over asliding time window of one year. The predicted variance is close to the realized one, which is 47% smallerthan for an equally distributed portfolio.

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 299

Page 22: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

Q ) 14:5 for the seven year window as compared to Q ) 2:1 for the one year windowand in agreement with the findings of Pafka and Kondor (2003).

The factor models do a good job at predicting the realized variance, their MAPEvaries between 31.8% and 37.5%. However, the large absolute portfolio variance ofthese models puts this good prediction accuracy in perspective.

9. Discussion

Analyzing the results of the last section, we find two main tendencies: for a goodcovariance prediction, it is necessary to describe the time dependence of thecorrelation strength correctly, which can be achieved by choosing a relatively shortestimation period for correlations, see also the results of Section 4. Second, a shortestimation period for correlations induces noise, which needs to be removed byfiltering. The clear winner of the comparison are the models SCM(1)-one andSCM(2)-one. These two models have an excellent prediction accuracy for bothequally distributed and minimum variance portfolios. In addition, the realized risk ofthe respective minimum variance portfolios is by far the lowest of all modelsconsidered, it is 47% lower than that of an equally distributed portfolio. Inagreement with the findings in Rosenow et al. (2002), it is a good choice to keep onlyeigenvalues of the correlation matrix which lie clearly above the random matrixbound, i.e. one eigenvalue in our example. It is expected that the inclusion of a thirdeigenvalue (barely above the RMT bound) would not change the performance much,and that the inclusion of further eigenvalues would lead to a gradual decrease inmodel performance as observed in Rosenow et al. (2002) and in Section 4.

The test method in this study is not altogether different from one of the testsperformed in Engle and Sheppard (2001), and a rough comparison between ourresults and those for DCC-GARCH is possible. Among other tests, in Engle andSheppard (2001) the returns of a minimum variance portfolio for one hundred timeseries of S&P500 sector indices were calculated for six years of daily data as an insample test. These returns were normalized with the predicted standard deviationevery day, and the standard deviation of these normalized portfolio returns wascalculated. The square of this standard deviation is roughly comparable to the ratioof average realized variance and average predicted variance in the present study. ForDCC-GARCH, the normalized portfolio returns have a standard deviation squaredof 3.17, which is larger than the ratio 1.51 of average predicted to average realizedvariance for the models SCM(1)-one and SCM(2)-one. Although a directcomparison between SCM and DCC models is clearly desirable, this finding hintsat the possibility that DCC-GARCH could benefit from noise filtering as discussedin the present manuscript. It would easily be possible to apply noise filtering assuggested in Eq. (3) to the covariance matrix of GARCH residuals in the DCC setupat every time step.

Possibly the performance of the factor models could be increased if the factorloadings were estimated by the maximum likelihood method instead of usingregression coefficients.

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302300

Page 23: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

10. Conclusion

In summary, we have shown that the ‘curse of dimensionality’ can be overcome byusing models with a reduced dimensionality of the parameter space. RMT allows todistinguish between statistically relevant information and noise in the correlationmatrix, and thus helps to determine the number of principle components or factorsto be used in parsimonious models for the description of covariances. We find thatthe method of RMT assisted dimensionality reduction is in the M-GARCH contextas successful as for the prediction of correlation matrices studied in previous work.

We have studied two types of MV-GARCH models that are easy to estimate evenfor more than 100 time series. For an accurate prediction of the variance of equallydistributed portfolios, one finds that it is important to take the dynamics of theaverage correlation strength into account. With respect to minimum varianceportfolios, it is crucial to use only statistically meaningful information in thecovariance estimates, i.e. to use models with a suitably reduced parameter space. Thebest prediction accuracy and the lowest realized variance of minimum varianceportfolios is achieved by the ‘filtered’ SCMs with an estimation window of one year.These models are clearly superior to the other models studied in this article andshould be a good basis for practical applications.

Acknowledgment

The author would like to thank C. Reese for performing the numerical analysis ofthe MV-GARCHmodels and for discussions on an earlier version of the manuscript.

References

Alexander, C., 2002. Principal component models for generating large garch covariance matrices.Economic Notes 31, 337–359.

Anderson, T., Bollerslev, T., 1997. Answering the critics: yes, ARCH models do provide good volatilityforecasts. Technical Report 6023, National Bureau of Economic Research.

Anderson, T.W., 1984. An Introduction to Multivariate Statistical Analysis, second ed. Wiley, New York.Bai, Z.D., 1999. Methodologies in spectral analysis for principal component analysis. Statistica Sinica 9,

611–677.Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics

31, 307–327.Bollerslev, T., 1990. Modelling the coherence in short-run nominal exchange rates: a multivariate

generalized arch approach. Review of Economics and Statistics 72, 498–505.Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time varying

covariances. Journal of Political Economy 96, 116–131.Campbell, J., Lo, A., MacKinlay, A., 1997. The Econometrics of Financial Markets. Princeton University

Press, Princeton, NJ.Drozdz, S., Grummer, F., Gorski, A., Ruf, F., Speth, J., 2000. Dynamics of competition between

collectivity and noise in the stock market. Physica A 287, 440.Dyson, F.J., 1971. Distribution of eigenvalues for a class of real symmetric matrices. Revista Mexicana de

Fısica 20, 231.

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302 301

Page 24: Determining the optimal dimensionality of multivariate volatility models …home.uni-leipzig.de/rosenow/Bernd_Rosenow/Financial... ·  · 2010-04-28Determining the optimal dimensionality

Author's personal copy

Elton, E., Gruber, M., 1995. Modern Portfolio Theory and Investment Analysis. Wiley, New York.Engle, R., 2002. Dynamic conditional correlation: a simple class of multivariate generalized autoregressive

conditional heteroscedasticity models. Journal of Business & Economic Statistics 20, 339–350.Engle, R., Kroner, K., 1995. Multivariate simultaneous generalized arch. Econometric Theory 11,

122–150.Engle, R., Sheppard, K., 2001. Theoretical and empirical properties of dynamic conditional correlation

multivariate garch. NBER Working Paper 8554.Engle, R., Ng, V., Rothschild, M., 1990. Asset pricing with a factor arch covariance structure: empirical

estimates for treasury bills. Journal of Econometrics 45, 213–238.Gopikrishnan, P., Rosenow, B., Plerou, V., Stanley, H.E., 2001. Quantifying and interpreting collective

behavior in financial markets. Physical Review E 64, 035106R.Guhr, T., Muller-Groeling, A., Weidenmuller, H.A., 1998. Random matrix theories in quantum physics:

common concepts. Physics Reports 299, 189–425.Laloux, L., Cizeau, P., Bouchaud, J.-P., Potters, M., 1999. Noise dressing of financial correlation matrices.

Physical Review Letters 83, 1467.Laloux, L., Cizeau, P., Potters, M., Bouchaud, J.-P., 2000. Random matrix theory and financial

correlations. International Journal of Theoretical and Applied Finance 3, 391.Longerstaey, J., Zangari, P., 1996. Riskmetrics-Technical Document, fourth ed. Morgan Guaranty Trust

Co., New York.Marcenko, V.A., Pastur, L.A., 1967. Distributions of eigenvalues of some sets of random matrices.

Mathematics of the USSR-Sbornik 1, 507–536.Markowitz, H., 1959. Portfolio Selection: Efficient Diversification of Investments. Wiley, New York.Mehta, M.L., 1991. Random Matrices. Academic Press, San Diego.Muirhead, R.J., 1982. Aspects of Multivariate Statistical Theory. Wiley, New York.Nelson, D., 1991. Conditional heteroscedasticity in asset returns: a new approach. Econometrica 59,

347–370.Pafka, S., Kondor, I., 2002. Noisy covariance matrices and portfolio optimization. European Physical

Journal B 27, 277–280.Pafka, S., Kondor, I., 2003. Noisy covariance matrices and portfolio optimization ii. Physica A 319,

487–494.Pafka, S., Kondor, I., 2004. Estimated correlation matrices and portfolio optimization. Physica A

623–634, 623–634.Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A., Stanley, H.E., 1999. Universal and

nonuniversal properties of cross correlations in financial time series. Physical Review Letters 83, 1471.Plerou, V., Gopikrishnan, P., Rosenow, B., Ameral, L., Guhr, T., Stanley, H., 2002. Random matrix

approach to cross correlations in financial data. Physical Review E 65, 066136.Rosenow, B., Gopikrishnan, P., Plerou, V., Stanley, H.E., 2002. Portfolio optimization and the random

magnet problem. Europhysics Letters 59, 500–506.Sengupta, A.M., Mitra, P.P., 1999. Distributions of singular values for some random matrices. Physical

Review E 60, 3389–3392.Stein, C., 1969. Multivariate analysis I. Technical report, Stanford University, Department of Statistics.

Notes prepared by M.L. Eaton in 1966, see pp. 79–81.Tse, Y.K., Tsui, A.K.C., 2002. A multivariate garch model with time-varying correlations. Journal of

Business and Economic Statistics 20, 351–362.van der Weide, R., 2002. Go-garch: a multivariate generalized orthogonal garch model. Journal of

Applied Econometrics 17, 549–564.

ARTICLE IN PRESS

B. Rosenow / Journal of Economic Dynamics & Control 32 (2008) 279–302302