Nonnegative Matrix Factorization and Financial...

Nonnegative Matrix Factorizationand Financial Applications

Zélia CazaletResearch & Development

Lyxor Asset Management, [email protected]

Thierry RoncalliResearch & Development

Lyxor Asset Management, [email protected]

May 2011

Abstract

Nonnegative matrix factorization (NMF) is a recent tool to analyse multivariatedata. It can be compared to other decomposition methods like principal componentanalysis (PCA) or independent component analysis (ICA). However, NMF differs fromthem because it requires and imposes the nonnegativity of matrices. In this paper, weuse this special feature in order to identify patterns in stock market data. Indeed, wemay use NMF to estimate common factors from the dynamics of stock prices. In thisperspective, we compare NMF and clustering algorithms to identify endogenous equitysectors.

Keywords: Nonnegative matrix factorization, principal component analysis, clustering,sparsity.

JEL classification: G1, C5.

1 IntroductionNMF is a recent technique which knows success not only in data analysis but also in imageand audio processing. It is an alternative approach to decomposition methods like PCAand ICA with the special feature to consider nonnegative matrices. Let A be a nonnegativematrix m× p. We define a NMF decomposition as follows:

A ≈ BC

with B and C two nonnegative matrices with respective dimensions m × n and n × p.Compared to classic decomposition algorithms, we remark that BC is an approximation ofA. There are also different ways to obtain this approximation meaning that B and C arenot necessarily unique. Because the dimensions m, n and p may be very large, one of thedifficulty of NMF is to derive a numerical algorithm with reasonable time of computation.In 1999, Lee and Seung develop a simple algorithm with strong performance and apply it topattern recognition with success. Since this seminal work, this algorithm has been improvedand there are today several ways to obtain a nonnegative matrix factorization.

Drakakis et al. (2008) apply NMF to analyse financial data. From a machine learningpoint of view, NMF could be viewed as a procedure to reduce the dimensionality of data.That’s why we could consider NMF in different fields: time series denoising, blind source

1

Nonnegative Matrix Factorization and Clustering Applications

deconvolution, pattern recognition, etc. One of the most interesting application concernsdata classification. Indeed, Drakakis et al. (2008) show a clustering example with stocks. Inthis paper, we explore this field of research. The paper is organized as follows. In Section 2,we present different NMF algorithms and compare them to PCA and ICA decomposition. InSection 3, we apply NMF to stocks classification and compare the results with these obtainedby clustering methods. Section 4 describes other applications to financial modeling. Finally,we conclude in Section 5.

2 Nonnegative matrix factorization

2.1 Interpretation of the NMF decompositionWe first notice that the decomposition A ≈ BC is equivalent to A⊤ ≈ C⊤B⊤. It means thatthe storage of the data is not important. Rows of A may represent either the observationsor the variables, but the interpretation of the B and C matrices depend on the choice of thestorage. We remark that:

Ai,j =

n∑k=1

Bi,kCk,j

Suppose that we consider a variable/observation storage. Therefore, Bi,k depend on thevariable i whereas Ck,j depend on the observation j. In this case, we may interpret B asa matrix of weights. In factor analysis, B is called the loading matrix and C is the factormatrix. Bi,k is then the weight of the factor k for the variable i and Ck,j is the value takenby the factor k for the observation j. If we use an observation/variable storage which isthe common way to store data in statistics, C becomes the loading matrix and B the factormatrix.

Remark 1 In the original work of Lee and Seung, the NMF decomposition is noted V ≈WH with W a matrix of weights meaning that the data are stored in a variable/observationorder.

Let D be a nonnegative matrix such that D−1 is nonnegative too. For example, D maybe a permutation of a diagonal matrix. In this case, we have:

A ≈ BD−1DC ≈ B⋆C⋆

with B⋆ = BD−1 and C⋆ = DC. It shows that the decomposition is not unique. Moreover,the decomposition may be rescaled by the matrix D = diag

(B⊤1

). In this case, B⋆ is a

nonnegative matrix such that:m∑i=1

B⋆i,k = 1

and B⋆ is a matrix of weights.

2.2 Some algorithmsIn order to find the approximate factorization, we need to define the cost function f whichquantify the quality of the factorization. The optimization program is then:

B, C

= argmin f (A,BC) (1)

u.c.

B ≥ 0C ≥ 0

2


Lee and Seung (2001) consider two cost functions. The first one is the Frobenious norm:

f (A,BC) =

m∑i=1

p∑j=1

(Ai,j − (BC)i,j

)2whereas the second one is Kullback-Leibler divergence:

f (A,BC) =

m∑i=1

p∑j=1

(Ai,j ln

Ai,j

(BC)i,j−Ai,j + (BC)i,j

)

To solve the problem (1), Lee and Seung (2001) propose to use the multiplicative updatealgorithm. Let B(t) and C(t) be the matrices at iteration t. For the Frobenious norm, wehave:

B(t+1) = B(t) ⊙(AC⊤

(t)

)⊘(B(t)C(t)C

⊤(t)

)C(t+1) = C(t) ⊙

(B⊤

(t+1)A)⊘(B⊤

(t+1)B(t+1)C(t)

)where ⊙ and ⊘ are respectively the element-wise multiplication and division operators1. Asimilar algorithm may be derived for the Kullback-Leibler divergence. Under some assump-tion, we may show that B = B(∞) and C = C(∞) meaning that the multiplicative updatealgorithm converges to the optimal solution.

For large datasets, the computational time to find the optimal solution may be large withthe previous algorithm. Since the seminal work of Lee and Seung, a lot of methods havealso been proposed to improve the multiplicative update algorithm and speed the converge.Among these methods, we may mention the algorithm developed by Lin (2007). The ideais to to use the alternating nonnegative least squares2:

B(t+1) = argmin f(A,BC(t)

)C(t+1) = argmin f

(A,B(t+1)C

) (2)

with the constraints B(t+1) ≥ 0 and C(t+1) ≥ 0. To solve the previous problem, Lin (2007)uses the projected gradient method for bound-constrained optimization. We first remarkthat the two optimization problems (2) are symmetric because we may cast the first problemin the form of the second problem:

B⊤(t+1) = argmin f

(A⊤, C⊤

(t)B⊤)

So, we may only focus on the following optimization problem:

C⋆ = argmin f (A,BC)

u.c. C ≥ 0

1To prevent problems with denominators close to zero, Pauca et al. (2006) propose to add a small positivenumber in the two previous denominators.

2We notice that the algorithm of Lee and Seung is a special case of this one where the optimality criterionis replaced by a sufficient criterion:

f(A,B(t+1)C(t)

)≤ f

(A,B(t)C(t)

)f(A,B(t+1)C(t+1)

)≤ f

(A,B(t+1)C(t)

)

3


Let us consider the case the Frobenious norm. We have:

∂Cf (A,BC) = 2B⊤ (BC −A)

The projected gradient method consists in the following iterating scheme:

C ← C − α∂Cf (A,BC)

with α the length of descent. In place of finding at each iteration the optimal value of α,Lin (2007) proposes to update α in a very simple way depending on the inequality equation:

(1− σ) ∂Cf (A,BC)⊤(C − C

)+

1

2

(C − C

)⊤∂2Cf (A,BC)

(C − C

)≤ 0

with C the update of C. If this inequality equation is verified, α is increased whereas wedecrease α otherwise.

2.3 Comparing NMF method with other factor decompositions

In order to understand why NMF is different from other factor methods, we consider asimulation study. We consider a basket of 4 financial assets. The asset prices are drivenby a multi-dimensional geometric brownian motion. The drift parameter is equal to 5%whereas the diffusion parameter is 20%. The cross-correlation ρi,j between assets i and j isequal to 20%, but ρ1,2 = 70% and ρ3,4 = 50%. In order to preserve the time homogeneity,the data correspond to xi,t = lnSi,t where Si,t is the price of the asset i at time t. In Figure1, we report the time series xi,t for the 4 assets and the first factor estimated by NMF andPCA methods. We remark that the NMF factors is not scaled in the same way than thePCA factor. However, the correlation between the first difference is equal to 98.8%.

Figure 1: Estimating the first factor of a basket of financial assets

4


Table 1: Loading matrix of the PCA factors

Asset F1 F2 F3 F4

#1 0.55 −0.47 0.19 0.66#2 0.60 −0.35 −0.33 −0.65#3 0.45 0.51 0.70 −0.22#4 0.37 0.63 −0.60 0.32

Table 2: Loading matrix of the NMF factors

n = 1 n = 2 n = 3Asset F1 F1 F2 F1 F2 F3

#1 0.91 0.85 0.68 0.86 0.93 0.29#2 0.99 0.89 1.00 0.85 1.00 1.00#3 1.00 1.00 0.18 1.00 0.05 0.69#4 0.93 0.95 0.03 0.99 0.13 0.13

If we compare now the loading matrices of implied factors, we obtain different results(see Tables 1 and 2). The first factor is comparable, which is not the case of the otherfactors, because all the others factors of the PCA decomposition are long/short factors. Byconstruction, NMF factors are long-only and depend on the choice of the number of factors.Therefore, their interpretation is more difficult. In Figure 2, we compare the dynamic of

Figure 2: Reconstruction of the asset prices

the first asset with the dynamic given by the NMF factors. Another interesting result isthe decomposition of the variance according to the factors. In Figure 3, we notice that

5


Figure 3: Variance explained by each factor

PCA explains more variance than NMF for a given number of factors. We may explain thisresult easily because NMF may be viewed as a constrained principal component analysiswith nonnegative matrices. However, it does not mean that PCA explains more variancethan NMF from a marginal point of view. For illustration, the second NMF factor explainsmore variance than the second PCA factor in Figure 3.

3 Financial applications with NMF

3.1 Factor extraction of an equity universe

In what follows, we consider the EuroStoxx 50 index. Using the composition of this index atthe end of 2010, we compute nonnegative matrix factorization on the logarithm of the stockprices. We have represented the first NMF factor in Figure 4. We may compare it with thelogarithm of the corresponding index. Even if the decomposition is done just for one dateand does not take into account of entry/exit in the index, we notice that the first NMF factoris highly correlated with the index3. One interesting thing is the sensibility values of thestocks with respect to this factor4. In Table 3, we indicate the stocks corresponding to the 5smallest and largest sensibility to the NMF factor. We also indicate their corresponding rankand sensibility in the case of PCA. We notice that NMF and PCA results are very different.First, the range of the sensibility is more important for PCA than for NMF. Second, thecorrelation between NMF and PCA rankings is low. For the first PCA factor, the largestcontribution comes from the financial and insurance sectors. This is not the case for NMF.

3The correlation between the index returns and the factor returns is equal to 97.2%. This is a little bithigher than those for the PCA, which is equal to 96.3%.

4In order to compare results, we normalize the sensibility such that the largest value is equal to 1.

6


Figure 4: Comparison between the EuroStoxx 50 and the first NMF factor

Table 3: Sensibility of the stock prices to the first factor

NMF PCAName Rank Sensibility Rank SensibilityKONINKLIJKE 1 0.56 12 0.46DEUTSCHE TELEKOM 2 0.56 23 0.57FRANCE TELECOM 3 0.57 30 0.62VIVENDI 4 0.64 31 0.64NOKIA 5 0.65 38 0.72

IBERDROLA 46 0.92 5 0.39ENI 47 0.94 10 0.44VINCI 48 0.99 13 0.46UNIBAIL-RODAMCO 49 1.00 2 0.31PERNOD-RICARD 50 1.00 1 0.29

7


If we consider now two factors, results are those given in Figure 5. In this case, wemay interpret them as a factor of bear market and a factor of bull market. Another wayto convince us about the nature of these factors is to consider the statistic βi defined asfollows5:

βi = B⋆i,2 −B⋆

i,1

We have βi ∈ [0, 1]. A large value of βi indicates a stock for which the sensibility to thesecond factor is much more important than the sensibility to the first factor. If we rank thestocks according to this statistic, we verify that high (resp. low) values of βi are associatedto stocks which have outperformed (resp. underperformed) the Eurostoxx 50 index.

In the case when the number n of factors increase, it is more and more difficult tointerpret the factors. For example, Figure 6 corresponds to the case n = 4. They are somesimilarities between the third or fourth factor with some sectors (see Table 4). For the firstand second factors, we don’t find an interpretation in terms of sectors. And the endogenouscharacteristic of these factors increases when n is large.

Table 4: Correlation of NMF factors with ICB sectors

Sector 0001 1000 2000 3000 4000 5000 6000 7000 8000 9000#1 −3.5 11.9 −13.6 4.8 −38.5 −16.6 −29.6 −7.3 −31.4 −25.7#2 0.8 6.4 13.0 −3.1 −24.2 2.0 6.5 −11.3 −13.0 17.0#3 2.8 52.2 52.0 43.7 70.5 61.5 73.8 45.8 68.7 70.7#4 4.7 34.0 28.5 14.7 59.9 36.2 38.0 37.9 53.9 24.7

We have the following correspondance between codes and sectors: 0001 = Oil & Gas, 1000 = Basic Materials,2000 = Industrials, 3000 = Consumer Goods, 4000 = Health Care, 5000 = Consumer Services, 6000 =Telecommunications, 7000 = Utilities, 8000 = Financials, 9000 = Technology.

3.2 Pattern recognition of asset returnsLet St,i be the price of the asset i at time t. We define the one-period return Rt,i as follows:

Rt,i =St,i

St−1,i− 1

We may decompose this return as the sum of a positive part and a negative part. We obtain:

Rt,i = max (Rt,i, 0)︸︷︷︸R+

t,i

−max (−Rt,i, 0)︸︷︷︸R−

t,i

Let R be a T ×N matrix containing the return Rt,i of the asset i at time t. We define thematrices R+ and R− in the same way by replacing the elements Rt,i respectively by R+

t,i

and R−t,i. We may apply the nonnegative matrix factorization to the matrices R+ and R−.

We have:

R = R+ −R−

= B+C+ −B−C− (3)

In this case, the dimension of the C+ and C− matrices is K × N . We may interpret thedecomposition of R as a projection of T periods to K patterns.

5We remind that we have rescaled the B and C matrices with D = diag (max (B)). In this case, B⋆

satisfies the property 0 ≤ B⋆i,k ≤ 1.

8


Figure 5: NMF with two factors

Figure 6: NMF with four factors

9


Figure 7: The case of one pattern

Remark 2 Another way to do pattern recognition is to consider the following factorization:(R+ R− )

= BC (4)

In this case, C is a K × 2N matrix. Decompositions (3) and (4) are very similar. But thelast one is more restrictive. Indeed, we have:(

R+ R− )= B

(C+ C− )

This factorization is a restrictive case of the decomposition(3) with B+ = B−.

In what follows, we consider the weekly returns of 20 stocks6 by considering the periodfrom January 2000 to December 2010. The dimension of the R matrix is then 574 × 20(number of periods T × number of stocks N). Let us consider the case of one pattern(K = 1). In Figure 7, we have reported the C matrix, that is the values of the pattern.Figure 8 corresponds to the B matrix, that is the sensibility of each weekly period to thepattern. If we compute the R2 statistic associated to this pattern model, we obtain a valueclosed to 50% (see Figure 9). If we consider more patterns, we may of course obtain betterR2. For example, with 12 patterns, the R2 statistic is equal to 90%.

To give an example of patterns, we have reported the case K = 4 in Figure 10. Wenotice that all the stocks are not always represented to define a pattern. For example, thesecond pattern to describe the positive part of returns concerns mainly 5 stocks (Siemens,Telefonica, Allianz, SAP, Deutsche Telekom). With these 4 patterns, the R2 statistic isequal to 70% for these 20 stocks and the entire period. We remark however that the R2

statistic differs between stocks (see 5). Indeed, it is equal to 90% for SAP whereas it is equalto 43% for Danone.

6Total, Siemens, Banco Santander, Telefonica, BASF, Sanofi, BNP Paribas, Bayer, Daimler, Allianz,ENI, E.ON, SAP, Deutsche Bank, BBVA, Unilever, ING, Schneider, Danone, Deustche Telekom.

10


Figure 8: Sensibility to the pattern

Figure 9: R2 (in %) of the pattern model

11


Figure 10: The case of 4 patterns

Table 5: R2 (in %) for each stock

Number of patternsStock 1 2 3 4 5 10 20Total 46 46 57 59 59 83 100Siemens 57 70 70 76 76 79 100Banco Santander 70 73 75 76 81 90 100Telefonica 36 54 57 66 64 73 97BASF 64 63 69 71 74 76 98Sanofi 27 28 45 54 56 67 100BNP Paribas 58 69 74 75 74 93 100Bayer 44 44 54 56 76 97 100Daimler 57 57 60 69 74 87 100Allianz 62 63 66 67 68 95 100ENI 44 48 57 61 62 83 100E.ON 35 36 58 59 60 65 100SAP 38 65 76 90 92 99 100Deutsche Bank 72 75 75 75 75 91 99BBVA 72 75 77 78 84 91 100Unilever 20 20 43 51 51 68 99ING 71 82 84 86 94 97 100Schneider Electric 49 49 50 52 56 87 100Danone 26 25 40 43 42 62 100Deutsche Telekom 22 54 56 77 80 91 98

12


3.3 Classification of stocks

In what follows, we consider the universe of the 100 largest stocks of the DJ EurostoxxIndex. The study period begins at January 2000 and ends at December 2010. Let Pi,t bethe price of the stock i for the period t. We consider the matrix A0 with elements lnPi,t. Weperform a NMF of the matrix A0 with several initialization methods. They are described inAppendix B:

1. RAND corresponds to the random method;

2. NNDSVD uses the algorithm of nonnegative double singular value decomposition pro-posed by Boutsidis and Gallopoulos (2008);

3. NNDSVDa and NNDSVDar are two variants of NNDSVD where the zeros are re-placed by the average value a of the matrix A or random numbers from the uniformdistribution U[0,a/100];

4. KM is based on the K-means algorithm proposed by Bertin (2009);

5. CRO corresponds to the closeness to rank-one algorithm described by Kim and Choi(2007).

3.3.1 Relationship between NMF factors and sectors

Remark 3 Let A0 be the m0 × p matrix which represents the initial data. With NMF,we estimate the factorization A0 ≈ B0C0 with B0, C0 = argmin f (A0, BC) under theconstraints B ≥ 0 and C ≥ 0. Suppose now that the m1×p matrix A1 represents other data.We may estimate a new NMF and obtain A1 ≈ B1C1. In some application, we would like tohave the same factors for A0 and A1, that is C1 = C0. In this case, the second optimizationprogram becomes then B1 = argmin f (A1, BC0) u.c. B ≥ 0. For the Frobenious norm,this program may be easily solved with quadratic programming. The optimal solution isB1 =

[β1 · · · βm1

]⊤ with:

βi = argmin1

2β⊤ (C0C

⊤0

)β − β⊤ (C0A

⊤1 ei)

u.c. β ≥ 0

Of course, this optimal solution is valid even if the matrix C0 is not given by NMF, butcorresponds to exogenous factors.

One may wonder if the NMF factors are linked to sectors. We use the ICB classificationwith 10 sectors. Let Ij,t be the value of the jth sector index. A first idea is to applyNMF to the matrix A1 with elements ln Ii,t, but by imposing that the matrix of factors Ccorresponds to the matrix C0 estimated with the stocks. Using the results of the previousremark, the estimation of the matrix B1 with B1 ≥ 0 is straightforward. If we have a one-to-one correspondance between NMF factors and ICB sectors, B1 must be a permutation ofa diagonal matrix. Results are reported in Appendix C.1 with n = 10 factors. We noticethat results depend on the initialization method. By construction, two methods (NNDSVDand CRO) produce sparse factorization. For the four others, the factorization is very dense.Except for some very specific cases, a sector is generally linked to several factors.

Another way to see if sectors could be related to NMF factors is to consider the C2

matrix with elements ln Ii,t. In this case, we impose exogenous factors and we could use the

13


Table 6: Frequencies of the pi statistic

pi Frequency0% 39

]0, 10%] 4]10%, 20%] 7]20%, 30%] 4]30%, 40%] 9]40%, 50%] 7]50%, 60%] 5]60%, 70%] 7]70%, 80%] 3]80%, 90%] 4]90%, 100%[ 2

100% 9

previous technique to estimate the B2 matrix such that A0 = B2C2 and B2 ≥ 0. If sectorsare the natural factors, we must verify that (B2)i,j > 0 if the stock i belongs to the sectorj and (B2)i,j ≃ 0 otherwise. Let S (i) = j be the mapping function between stocks andsectors. For each stock i, we may compute the proportion of factor weights explained by thesector S (i) with respect to the sum of weights:

pi =(B2)i,S(i)∑nj=1 (B2)i,j

If pi = 100%, it means that the sector S (i) of the stock explains all the factor component.If pi = 0%, it means that the sector S (i) of the stock explains nothing. In Table 6, wehave reported the distribution of the pi statistic. We notice that, among the 100 stocks,only 11 stocks have a statistic larger than 90%. These stocks are TOTAL (Oil & Gas),BASF (Basic Materials), SANOFI (Health Care), ARCELORMITTAL (Basic Materials),LINDE (Basic Materials), NOKIA (Technology), KONINKLIJKE (Telecommunications),ALCATEL-LUCENT (Technology), K+S (Basic Materials), CAP GEMINI (Technology)and STMICROELECTRONICS (Technology). In the same time, 39 stocks are not explainedby their sectors. We have reported the 25 largest stocks with pi = 0 in Table 15 (seeAppendix C.2 page 30). In Figure 11, we have reported some of these stocks7. For eachstock, we also report its corresponding sector (green line) and the “representative" sectorwhich presents the largest weight in the B2 matrix (red line). Most of the times, we confirmthat the behavior of the stock is closer to the behavior of the representative sector than tothe behavior of the corresponding sector.

Remark 4 The previous analysis is done for a long period (11 years). If we consider ashorter period (for example 1 or 2 years), difference between the corresponding sector andthe representative sector is more important.

7We remind that the data are the logarithm of the prices lnPi,t. Moreover, we have normalized the pricessuch that Pi,0 = 100.

14


Figure 11: Some stocks which have a behavior different of their sector index

3.3.2 NMF classifiers

Before doing a classification based on NMF, we apply the K-means procedure directly onthe stocks returns. The results set a benchmark for the future results of NMF classifiers.For that, we consider 10 clusters. In Table 7, we remark that the cluster #1 groups togethera large part of stocks with 8 sectors represented. This cluster has certainly no economicsignification. It contains the stocks which are difficult to classify in the other clusters.

Table 7: Number of stocks by sector/cluster

clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 5 51000 3 6 1 102000 4 2 1 7 1 1 163000 5 1 3 1 7 174000 4 1 55000 4 2 3 1 106000 6 67000 6 68000 4 8 3 4 199000 6 6total 35 9 8 9 19 2 3 3 8 4 100

In Figure 12, we report the frequency of each sector in each cluster. We do not notice an

15


exact correspondence between sectors and clusters because several clusters are representedby the same sector. Indeed, this classification highlights the Financials sector which has afrequency of 100% for 3 of the 10 clusters (#3, #7 and #10). In addition, the clusters #2, #4and #9, respectively represented by Technology, Telecommunication and Consumer Goodssectors, are pointed up. It seems logical that the classification brings to light these sectorsbecause the studied period is strongly influenced by the dot.com and financial bubbles.However, we notice that some clusters, strongly characterized by some sectors, only representa small part of them. For instance, clusters #7 and #10 represent each one less than 25%of the Financials sector. As a consequence, some clusters represent more a subset of asector than the average behavior of the sector. Figure 13 confirms this idea. In some cases,the dynamics of centroids differ from the dynamics of the most represented sectors. Forinstance, the evolution in cluster #10 differs from the global evolution of the Financialssector. Indeed, the centroid owns an evolution flatter than the evolution of the sector. So,the cluster #10 highlights a subset of Financials stocks which has a behavior which is verydifferent than the Financials sector, especially during the burst of the dot.com bubble.

To conclude, this preliminary study shows the heterogeneity into sectors and points upsome special stocks whose behaviors differs from the behaviors of their sectors.

Figure 12: Results of the cluster analysis

Let us see now if NMF classifiers can represent an alternative of the sectors classification.Before performing the classification, we have to do two choices: the initialization methodand the order n of the factorization. In Figure 14, we see the evolution of the NMF errorwith respect to the dimension n. We notice that NNDSVD and CRO methods convergemore slowly than the others. It is explained by the fact that these methods produce sparsefactorization. The other initialization methods provide similar results. However, NNDSDVar

has the lowest euclidean error in most cases. As a result, in what follows, we choose to analyseresults with the NNDSDVar initialization method.

16


Figure 13: Centroid of some clusters

Figure 14: Evolution of the error with respect to initialization method

17


The next step consists to fix the smallest order n of the factorization which does not leadto serious information loss. For this, we study the inertia retrieved by the NMF method andwe select the order which permits to keep more than 90% of the information. According toFigure 15, n = 4 is sufficient to retrieve 90% of the information. But n = 5 permits to gain3.47% supplementary information. As a consequence, we select n = 5 which represents thenumber of factors resulting from NMF.

Figure 15: NMF inertia

We then proceed to classify stocks by applying an unsupervised algorithm to the normal-ized matrix B⋆. The idea is to group stocks that are influenced by the same factors. For this,we consider the K-means algorithm with 10 clusters. According to Table 8, we notice thatclusters are well distributed. Contrary to the previous classification, we de not observe onecluster which owns a lot of stocks. Figure 16 indicates that these clusters do not representsectors. An exception is the cluster #8 which is only represented by the Industrials sector.

18


Table 8: Number of stocks by sector/cluster

clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 1 2 0 1 1 51000 4 1 1 2 1 1 102000 3 1 3 1 1 2 4 1 163000 5 3 1 4 2 1 1 174000 1 3 1 55000 1 2 1 2 4 106000 3 3 67000 1 4 1 68000 3 2 1 3 4 6 199000 1 2 1 2 6total 17 7 14 10 10 9 7 4 12 10 100

Figure 16: Frequencies of sectors in each cluster

19


In Figure 17, we report the evolution of the centroid of each cluster k. It is computed asfollows:

xk = w⊤k C

⋆

with:

wk,j =1

Nk

m∑i=1

1 C (i) = k ·B⋆i,j

where Nk =∑m

i=1 1 C (i) = k is the number of stocks in the cluster k.

Figure 17: Centroid of the clusters

We notice that NMF classifiers permit to underline distinct behaviors. Indeed, thecentroids of clusters #5 and #8 are strongly influenced by the financial bubble whereas thecentroid of cluster #2 is influenced by the dot.com bubble. Stocks which belong to clusters#1, #6 and #7 don’t suffer from the dot.com bubble, whereas it is not the case of stockswhich belong to clusters #3, #4 #9 and #10. We notice also that the centroids of clusters#1, #6 and #7 are very different at the end of the study period.

To conclude, NMF classifiers are useful for pattern recognition but generally, clustersdo not have economic signification. An other drawback is the clusters dependence on theselected initialization procedure for NMF.

4 ConclusionKnowing its results in image recognition or audio processing, NMF seems to be a usefuldecomposition method for financial applications. Imposing the non-negativity of matricespermits to consider the factorization as a decomposition with a loading matrix and a matrixof long-only factors. This special feature is interesting to do pattern recognition but the

20


interpretation is more and more difficult with the increasing number of factors. We can alsoapply a clustering algorithm on the loading matrix in order to group together stocks withsame patterns. According to our results, NMF classifiers set apart different behaviors butthe economical interpretation of clusters is difficult. A direct classification on stock returnswith the K-means procedure seems more robust and highlights some special stocks whosebehaviors differs from the behaviors of their sectors.

References[1] Bertin N. (2009), Les factorisations en matrices non-négatives : approches contraintes

et probabilistes, application à la transcription automatique de musique polyphonique,Mémoire de thèse, Télécom ParisTech.

[2] Boutsidis C. and Gallopoulos E. (2008), SVD Based Initialization: A Head Startfor Nonnegative Matrix Factorization, Pattern Recognition, 41(4), pp. 1350-1362.

[3] Drakakis K., Rickard S., de Fréin R. and Cichocki A. (2008), Analysis of Finan-cial Data Using Non-negative Matrix Factorization, International Mathematical Forum,3(38), pp. 1853-1870.

[4] Hastie T., Tibshirani R. and Friedman R. (2009), The Elements of Statistical Learn-ing, Second Edition, Springer.

[5] Hoyer P.O. (2004), Non-negative Matrix Factorization with Sparseness Constraints,Journal of Machine Learning Research, 5, pp. 1457-1469.

[6] Kim Y-D. and Choi S. (2007), A Method of Initialization for Nonnegative Matrix Fac-torization, IEEE International Conference on Acoustics Speech and Signal Processing,2(1), pp. 537-540.

[7] Lee D.D. and Seung H.S. (1999), Learning the Parts of Objects by Non-negativeMatrix Factorization, Nature, 401, pp. 788-791.

[8] Lee D.D. and Seung H.S. (2001), Algorithms for Non-negative Matrix Factorization,Advances in Neural Information Processing Systems, 13, pp. 556-562.

[9] Lin C.J. (2007), Projected Gradient Methods for Non-negative Matrix Factorization,Neural Computation, 19(10), pp. 2756-2779.

[10] Pauca V.P., Piper J. and Plemmons J (2006), Nonnegative Matrix Factorization forSpectral Data Analysis, Linear Algebra and its Applications, 416(1), pp. 29-47.

[11] Sonneveld P., van Kan J.J.I.M., Huang X. and Oosterlee C.W. (2009), Nonnega-tive Matrix Factorization of a Correlation Matrix, Linear Algebra and its Applications,431(3-4), pp. 334-349.

21


A Cluster analysisCluster analysis is a method for the assignment of observations into groups (or clusters). Itis then an exploratory data analysis which allows to group similar observations together. Asa result, the objective of clustering methods is to maximize the pairwise proximity betweenobservations of a same cluster and to maximize the dissimilarity between observations whichbelong to different clusters. In what follows, we are concerned by unsupervised learningalgorithms, that is segmentation methods with no information on the groups.

A.1 The K-means algorithmIt is a special case of combinatorial algorithms. This kind of algorithm does not use aprobability distribution but works directly on observed data. We consider m objects withn attributes xi,j (i = 1, . . . ,m and j = 1, . . . , n). We would like to build K clusters definedby the index k (k = 1, . . . ,K). Let C be the mapping function which permits to assignan object to a cluster8. The principe of combinatorial algorithms is to adjust the mappingfunction C in order to minimize the following loss function:

L (C) = 1

2

K∑k=1

∑C(i)=k

∑C(i′)=k

d (xi, xi′)

where d (xi, xi′) is the dissimilarity measure between the objects i and i′. As a result, theoptimal mapping function is denoted C⋆ = argminL (C).

In the case of the K-means algorithm, the dissimilarity measure is the Frobenius distance:

d (xi, xi′) =

n∑j=1

(xi,j − xi′,j)2

= ∥xi − xj∥2

Therefore, the loss function becomes (Hastie et al., 2009):

L (C) =K∑

k=1

Nk

∑C(i)=k

∥xi − xk∥2

where xk = (x1,k, ..., xm,k) is the mean vector associated with the kth cluster and Nk =∑mi=1 1 C (i) = k. Because m⋆

Ω = argmin∑

i∈Ω ∥xi −m∥2, another from of the previousminimization problem is:

C⋆,m⋆1, . . . ,m

⋆K = argmin

K∑k=1

Nk

∑C(i)=k

∥xi −mk∥2

This problem is solved by iterations. At the iteration s, we compute the optimal means ofthe clusters

m

(s)1 , . . . ,m

(s)K

for the given mapping function C(s−1). Then, we update the

mapping function using the following rule:

k = C(s) (i) = argmin∥∥∥xi −m

(s)k

∥∥∥2We repeat these two steps until the convergence of the algorithm C⋆ = C(s) = C(s−1).

8For example, C (i) = k assigns the ith observation to the kth cluster.

22


A.2 Hierarchical clusteringThis algorithm creates a hierarchy of clusters which may be represented in a tree structure.Unlike the K-means algorithm, this method depends on neither the number of clustersnor the starting configuration assignment. The lowest assignment is the individual objectswhereas the highest corresponds to one cluster containing all the objects. We generallydistinguish two methods:

• in the agglomerative method, the algorithm starts with the individual clusters andrecursively merge the closest pair of clusters into one single cluster;

• in the divise method, the algorithm starts with the single cluster and recursively splitthe cluster into two new clusters which presents the maximum dissimilarity.

In this study, we only consider the agglomerative method.

Let k and k′ be two clusters. We define the dissimilarity measure d (k, k′) as a linkagefunction of pairwise dissimilarities d (xi, xi′) where C (i) = k and C (i′) = k′:

d (k, k′) = ℓ (d (xi, xi′) , C (i) = k , C (i′) = k′)

There exists different ways to define the function ℓ:

• Single linkaged (k, k′) = min

xi∈k,xi′∈k′d (xi, xi′)

• Complete linkaged (k, k′) = max

xi∈k,xi′∈k′d (xi, xi′)

• Average linkage

d (k, k′) =1

NkNk′

∑xi∈k

∑xi′∈k′

d (xi, xi′)

At each iteration, we search the clusters k and k′ which minimize the dissimilarity measureand we merge them into one single cluster. When we have merged all the objects into onlyone cluster, we obtain a tree which is called a dendrogram. It is also easy to perform asegmentation by considering a particular level of the tree. In Figure 18, we report a dendro-gram on simulated data using the single linkage rule. We consider 20 objects divided intotwo groups. The attributes of the first (resp. second) one correspond to simulated Gaussianvariates with a mean 20% (resp. 30%) and a standard deviation 5% (resp. 10%). Theintra-group cross-correlation is set to 75% whereas the inter-group correlation is equal to 0.We obtain very good results. In practice, hierarchical clustering may produce concentratedsegmentation as illustrated in Figure 19. We use the same simulated data as previouslyexcept that the standard deviation for the second group is set to 25%. In this case, if wewould like to consider two clusters, we obtain a cluster with 19 elements and another clusterwith only one element (the 18th object).

B Initialization methods for NMF algorithmsWe have to use starting matrices B(0) and C(0) to initialize NMF algorithms. In this section,we present the most popular methods to define them.

23


Figure 18: An example of dendrogram

Figure 19: Bad classification

24


B.1 Random method

The first idea is to consider matrices of positive random numbers. Generally, one use uniformdistribution U[0,1] or the absolute value of Gaussian distribution |N (0, 1)|.

B.2 K-means method

Bertin (2009) proposes to apply the K-means algorithm on the matrix x = A⊤, meaning thatthe clustering is done on the columns of the matrix. When the n clusters are determined,we compute the centroid matrix xn×m:

xn×m =

x⊤1...x⊤n

where xk = (xk,1, . . . , xk,m) is the mean vector associated with the kth cluster. Finally, wehave B(0) = x⊤

n×mand C(0) is a matrix of positive random numbers.

B.3 SVD method

Let A be a m× p matrix. The singular value decomposition is given by:

A = usv⊤

where u is a m ×m unitary matrix, s is a m × p diagonal matrix with nonnegative entriesand v is a p× p unitary matrix. The rank-n approximation is then:

A ≈n∑

k=1

ukσkv⊤k

with uk and vk the kth left and right singular vectors and σi = si,i the kth largest singularvalue (σ1 ≥ . . . ≥ σn). An idea to initialize the NMF algorithm is to define B(0) andC(0) such that the kth column of B(0) is equal to uk

√σk and the kth row of C(0) is equal

to√σkv

⊤k . However, This method does not work because the vectors uk and vk are not

necessarily nonnegative except for the largest singular value (k = 1) as explained in thefollowing remark.

Remark 5 We may show that the left singular vectors u of A are the eigenvectors of AA⊤,the right singular vectors v of A are the eigenvectors of A⊤A and the non-zero singularvalues σi are the square roots of the non-zero eigenvalues of AA⊤. The Perron-Frobeniustheorem implies that the first eigenvector of nonnegative irreducible matrices is nonnegative.We deduce that the first eigenvector of the AA⊤ or A⊤A matrices is nonnegative if all entriesof A are nonnegative. It proves that the singular vectors u1 and v1 are nonnegative.

Boutsidis and Gallopoulos (2008) propose to modify the previous factorization:

A ≈n∑

k=1

ukσkv⊤k

where uk and vk are nonnegative vectors. The algorithm is called nonnegative double sin-gular value decomposition (NNDSVD) and is a simple modification of the singular value

25


decomposition by considering only the nonnegative part of the singular values. For eachsingular value σk, we have to decide what is the largest nonnegative part by noticing that:

ukσkv⊤k = (−uk)σk (−vk)⊤

Let x+ be the nonnegative part of the vector x. We define u−k = (−uk)

+ and v−k = (−vk)+.It comes that:

uk = γk ×

u+k

/∥∥u+k

∥∥ if m+ > m−

u−k

/∥∥u−k

∥∥ otherwiseand:

vk = γk ×

v+k/∥∥v+k ∥∥ if m+ > m−

v−k/∥∥v−k ∥∥ otherwise

where m+ =∥∥u+

k

∥∥ ∥∥v+k ∥∥ and m− =∥∥u−

k

∥∥ ∥∥v−k ∥∥. In order to preserve the inertia, the singular

vectors are scaled by the factor γk = max(√

m+,√m−). The initialization of B(0) and C(0)

is then done by using uk and vk in place of uk and vk.

B.4 CRO methodThe CRO-based hierarchical clustering is an alternative method of the classical agglomera-tive hierarchical clustering when the similarity distance is given by the closeness to rank-one(CRO) measure:

cro (X) =σ21∑i σ

2i

with σ1 ≥ σ2 ≥ . . . ≥ 0 the singular values of the matrix X. Let k1 and k2 be two clusters.The CRO measure between two clusters k1 and k2 of the matrix A is defined by cro

(A(k1,k2)

)where A(k1,k2) is the submatrix of A which contains the row vectors of the two clusters k1and k2. Kim and Choi (2007) summarize the CRO-based hierarchical algorithm as follows.First, we assign each row vector of A into m clusters. Then, we merge the pairs of clusterswith the largest CRO measure into one single cluster until n clusters remains.

When the CRO-based hierarchical clustering is applied to the matrix A, we obtain nclusters represented by the submatrices A1, . . . , An. Kim and Choi (2007) consider then therank-one SVD approximation9: A1

...An

≈ u1σ1v

⊤1

...unσnv

⊤n

This decomposition leads to a very simple assignment of the matrix B(0) and C(0). The kth

column of B(0) corresponds to the vector uk for the rows which belong the kth cluster andthe rest of elements of the kth column are equal to zero (or a small number). The kth rowof C(0) corresponds to σkv

⊤k . By construction, this initialization method produces sparse

factorization.

Remark 6 We have proved previously that the singular vector u and v associated to therank-one SVD approximation of a nonnegative matrix A are nonnegative. From a numericalanalysis point of view, the rank-one SVD approximation may be usv⊤ or (−u) s (−v)⊤. Inpractice, we consider also |u| and |v| in place of u and v when working with the rank-oneapproximation.

9We have Ak = ukσkv⊤k where uk and vk are the left and right vectors associated with the largest singular

value σk.

26


C Results

C.1 Estimation of the B1 matrix

Table 9: B1 matrix with sectors indexes (RAND)

NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 1.00 0.84 0.56 0.69 0.63 0.84 0.82 0.69 0.54 0.79Basic Materials 0.58 1.00 0.53 0.69 0.64 0.86 1.00 0.49 0.49 1.00Industrials 0.47 0.83 0.68 0.64 0.71 0.72 0.74 0.60 0.55 0.84Consumer Goods 0.70 0.89 0.59 0.68 0.81 0.77 0.81 0.67 0.35 0.87Health Care 0.95 0.95 0.44 0.80 0.71 0.79 0.66 1.00 0.51 0.56Consumer Services 0.55 0.68 0.66 0.46 0.55 0.65 0.68 0.84 0.41 0.49Telecommunications 0.69 0.66 1.00 0.30 0.47 0.78 0.43 0.62 0.27 0.49Utilities 0.97 0.91 0.71 0.72 0.65 1.00 0.83 0.42 0.63 0.65Financials 0.89 0.76 0.60 1.00 1.00 0.81 0.60 0.40 1.00 0.46Technology 0.54 0.49 0.96 0.55 0.45 0.45 0.69 0.53 0.57 0.63

Table 10: B1 matrix with sectors indexes (NNDSVD)

NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.99 0.24 0.02 0.21 0.34 0.20 0.36Basic Materials 0.93 0.21 0.44 0.21 0.01 0.82 0.71 0.13 0.21 1.00Industrials 0.79 0.52 0.55 0.64 0.31 1.00 0.46 0.34 0.59 0.64Consumer Goods 0.93 0.26 0.18 0.11 0.86 0.32 0.10 0.24 0.60Health Care 1.00 0.35 0.02 0.13 0.14 0.41Consumer Services 0.71 0.68 0.34 0.53 0.38 0.60 0.18 0.33 0.12Telecommunications 0.59 0.73 1.00 0.59 1.00 0.68 1.00 0.10Utilities 0.92 0.33 0.64 0.41 0.30 1.00 0.56 0.30Financials 0.88 0.47 0.49 1.00 0.11 0.50 0.51 0.67 0.43Technology 0.59 1.00 0.63 0.97 0.65 0.23 0.70 0.73 1.00 0.16

Table 11: B1 matrix with sectors indexes (NNDSVDa)


27


Table 12: B1 matrix with sectors indexes (NNDSVDar)


Table 13: B1 matrix with sectors indexes (KM)


Table 14: B1 matrix with sectors indexes (CRO)

NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.22 0.93 0.34 0.09 0.67Basic Materials 0.29 1.00 0.25 1.00 0.17Industrials 0.75 0.05 0.60 0.30 0.71 0.80 0.20Consumer Goods 0.92 0.20 0.15 0.59Health Care 1.00 0.03 0.60 0.12Consumer Services 0.46 0.76 0.07 0.35 0.41 0.26 0.21 0.23Telecommunications 0.04 0.27 0.75 1.00 0.78Utilities 0.77 1.00 1.00 0.26 0.27Financials 0.38 0.70 1.00 0.02 1.00 0.05 1.00 0.13Technology 0.28 0.10 0.66 0.17 0.18 0.76 0.65 1.00

28


C.2

Res

ult

sof

clas

sifica

tion

Her

ear

eth

ecl

uste

rsob

tain

edw

ith

the

K-m

eans

clas

sifie

r:

•#

1:T

OTA

LSA

(1);

EN

ISP

A(1

);E

.ON

AG

(700

0);

DA

NO

NE

(300

0);

EN

EL

SPA

(700

0);

AIR

LIQ

UID

ESA

(100

0);

IBE

RD

RO

LA

SA(7

000)

;V

INC

ISA

(200

0);A

RC

ELO

RM

ITTA

L(1

000)

;L’O

RE

AL

(300

0);A

SSIC

UR

AZIO

NIG

EN

ER

ALI(8

000)

;RE

PSO

LY

PF

SA(1

);C

AR

RE

FO

UR

SA(5

000)

;RW

EA

G(7

000)

;U

NIB

AIL

-RO

DA

MC

OSE

(800

0);

PE

RN

OD

-RIC

AR

DSA

(300

0);

ESS

ILO

RIN

TE

RN

AT

ION

AL

(400

0);

SAM

PO

OY

J-A

SHS

(800

0);

FO

RT

UM

OY

J(7

000)

;H

EIN

EK

EN

NV

(300

0);FR

ESE

NIU

SM

ED

ICA

LC

AR

EA

G&

(400

0);SA

IPE

MSP

A(1

);VA

LLO

UR

EC

(200

0);H

EN

KE

LA

G&

CO

KG

AA

VO

RZU

G(3

000)

;FR

ESE

NIU

SSE

&C

OK

GA

A(4

000)

;T

EC

HN

IPSA

(1);

RE

ED

ELSE

VIE

RN

V(5

000)

;SO

LVAY

SA(1

000)

;E

DP

-EN

ER

GIA

SD

EP

ORT

UG

AL

SA(7

000)

;A

BE

RT

ISIN

FR

AE

STR

UC

TU

RA

SSA

(200

0);D

ELH

AIZ

EG

RO

UP

(500

0);SO

DE

XO

(500

0);M

ER

CK

KG

AA

(400

0);G

RO

UP

EB

RU

XE

LLE

SLA

MB

ERT

SA(8

000)

;AT

LA

NT

IASP

A(2

000)

;

•#

2:SI

EM

EN

SA

G-R

EG

(200

0);S

AP

AG

(900

0);K

ON

INK

LIJ

KE

PH

ILIP

SE

LE

CT

RO

N(3

000)

;NO

KIA

OY

J(9

000)

;ASM

LH

OLD

ING

NV

(900

0);B

OU

YG

UE

SSA

(200

0);A

LC

AT

EL-L

UC

EN

T(9

000)

;C

AP

GE

MIN

I(9

000)

;ST

MIC

RO

ELE

CT

RO

NIC

SN

V(9

000)

;

•#

3:B

AN

CO

SAN

TA

ND

ER

SA(8

000)

;D

EU

TSC

HE

BA

NK

AG

-RE

GIS

TE

RE

D(8

000)

;B

AN

CO

BIL

BA

OV

IZC

AYA

AR

GE

NTA

(800

0);

AX

ASA

(800

0);

INT

ESA

SAN

PA

OLO

(800

0);U

NIC

RE

DIT

SPA

(800

0);E

RST

EG

RO

UP

BA

NK

AG

(800

0);N

AT

ION

AL

BA

NK

OF

GR

EE

CE

(800

0);

•#

4:T

ELE

FO

NIC

ASA

(600

0);D

EU

TSC

HE

TE

LE

KO

MA

G-R

EG

(600

0);F

RA

NC

ET

ELE

CO

MSA

(600

0);V

IVE

ND

I(5

000)

;KO

NIN

KLIJ

KE

KP

NN

V(6

000)

;T

ELE

CO

MIT

ALIA

SPA

(600

0);SA

FR

AN

SA(2

000)

;P

UB

LIC

ISG

RO

UP

E(5

000)

;P

ORT

UG

AL

TE

LE

CO

MSG

PS

SA-R

EG

(600

0);

•#

5:B

ASF

SE(1

000)

;SC

HN

EID

ER

ELE

CT

RIC

SA(2

000)

;LV

MH

MO

ET

HE

NN

ESS

YLO

UIS

VU

I(3

000)

;LIN

DE

AG

(100

0);

CO

MPA

GN

IED

ESA

INT

-G

OB

AIN

(200

0);T

HY

SSE

NK

RU

PP

AG

(200

0);A

KZO

NO

BE

L(1

000)

;C

RH

PLC

(200

0);A

DID

AS

AG

(300

0);P

PR

(500

0);LA

FAR

GE

SA(2

000)

;K

+S

AG

(100

0);K

ON

INK

LIJ

KE

DSM

NV

(100

0);H

EID

ELB

ER

GC

EM

EN

TA

G(2

000)

;U

PM

-KY

MM

EN

EO

YJ

(100

0);C

HR

IST

IAN

DIO

R(3

000)

;M

ET

RO

AG

(500

0);

RYA

NA

IRH

OLD

ING

SP

LC

(500

0);M

ET

SOO

YJ

(200

0);

•#

6:SA

NO

FI

(400

0);U

NIL

EV

ER

NV

-CVA

(300

0);

•#

7:B

NP

PA

RIB

AS

(800

0);IN

GG

RO

EP

NV

-CVA

(800

0);SO

CIE

TE

GE

NE

RA

LE

(800

0);

•#

8:B

AY

ER

AG

-RE

G(1

000)

;K

ON

INK

LIJ

KE

AH

OLD

NV

(500

0);A

LST

OM

(200

0);

•#

9:D

AIM

LE

RA

G-R

EG

IST

ER

ED

SHA

RE

S(3

000)

;BAY

ER

ISC

HE

MO

TO

RE

NW

ER

KE

AG

(300

0);V

OLK

SWA

GE

NA

G-P

FD

(300

0);M

ICH

ELIN

(CG

DE

)-B

(300

0);M

AN

SE(2

000)

;R

EN

AU

LTSA

(300

0);P

OR

SCH

EA

UT

OM

OB

ILH

LD

G-P

FD

(300

0);FIA

TSP

A(3

000)

;

•#

10:

ALLIA

NZ

SE-R

EG

(800

0);M

UE

NC

HE

NE

RR

UE

CK

VE

RA

G-R

EG

(800

0);C

OM

ME

RZB

AN

KA

G(8

000)

;A

EG

ON

NV

(800

0);

The

next

tabl

ein

dica

tes

som

est

ocks

whi

char

eno

tex

plai

ned

byth

eir

sect

orin

dex.

29


Tab

le15

:B

2co

effici

ents

Fact

ors

Nam

eSe

ctor

0001

1000

2000

3000

4000

5000

6000

7000

8000

9000

ALL

IAN

ZFin

anci

als

0.24

0.78

SAP

Tec

hnol

ogy

0.08

0.12

0.12

0.39

0.30

EN

EL

Uti

litie

s0.66

0.34

VIV

EN

DI

Con

sum

erSe

rvic

es0.03

0.91

VIN

CI

Indu

stri

als

1.17

0.02

SAIN

T-G

OB

AIN

Indu

stri

als

0.48

0.01

0.47

VO

LKSW

AG

EN

Con

sum

erG

oods

1.08

MU

EN

CH

EN

ER

RU

EC

KV

ER

Fin

anci

als

0.86

0.22

UN

IBA

IL-R

OD

AM

CO

Fin

anci

als

1.19

PE

RN

OD

-RIC

AR

DC

onsu

mer

Goo

ds0.12

1.08

ESS

ILO

RIN

TE

RN

AT

ION

AL

Hea

lth

Car

e1.13

TH

YSS

EN

KR

UP

PIn

dust

rial

s0.82

0.09

MIC

HE

LIN

Con

sum

erG

oods

0.28

0.68

0.07

CR

HIn

dust

rial

s0.61

0.04

0.08

0.26

AD

IDA

SC

onsu

mer

Goo

ds0.10

1.01

FORT

UM

Uti

litie

s1.25

TE

LEC

OM

ITA

LIA

Tel

ecom

mun

icat

ions

1.02

FR

ESE

NIU

SM

ED

ICA

LC

AR

EH

ealt

hC

are

0.86

0.17

MA

NIn

dust

rial

s1.02

SAIP

EM

Oil

&G

as1.27

VA

LLO

UR

EC

Indu

stri

als

1.37

ALS

TO

MIn

dust

rial

s0.79

LAFA

RG

EIn

dust

rial

s0.48

0.28

0.20

FR

ESE

NIU

SSE

&C

OH

ealt

hC

are

0.99

0.08

0.01

AE

GO

NFin

anci

als

0.89

We

have

the

follo

win

gco

rres

pond

ance

betw

een

code

san

dse

ctor

s:00

01=

Oil

&G

as,1

000

=B

asic

Mat

eria

ls,2

000

=In

dust

rial

s,30

00=

Con

sum

erG

oods

,40

00=

Hea

lth

Car

e,50

00=

Con

sum

erSe

rvic

es,60

00=

Tel

ecom

mun

icat

ions

,70

00=

Uti

litie

s,80

00=

Fin

anci

als,

9000

=Tec

hnol

ogy.

30


Her

ear

eth

ecl

uste

rsob

tain

edw

ith

the

NM

Fcl

assi

fier

(NN

DSV

Dar,n

=5)

:

•#

1:T

OTA

LSA

(1);

BA

SFSE

(100

0);

BN

PPA

RIB

AS

(800

0);

DA

NO

NE

(300

0);

AIR

LIQ

UID

ESA

(100

0);

UN

IBA

IL-R

OD

AM

CO

SE(8

000)

;E

SSIL

OR

INT

ER

NAT

ION

AL

(400

0);

MIC

HE

LIN

(CG

DE

)-B

(300

0);

CR

HP

LC

(200

0);

AD

IDA

SA

G(3

000)

;H

EIN

EK

EN

NV

(300

0);

HE

NK

EL

AG

&C

OK

GA

AV

OR

ZU

G(3

000)

;K

ON

INK

LIJ

KE

DSM

NV

(100

0);

SOLV

AY

SA(1

000)

;A

BE

RT

ISIN

FR

AE

STR

UC

TU

RA

SSA

(200

0);

GR

OU

PE

BR

UX

ELLE

SLA

MB

ERT

SA(8

000)

;A

TLA

NT

IASP

A(2

000)

;

•#

2:D

EU

TSC

HE

TE

LE

KO

MA

G-R

EG

(600

0);

FR

AN

CE

TE

LE

CO

MSA

(600

0);

VIV

EN

DI

(500

0);

AR

CE

LO

RM

ITTA

L(1

000)

;K

ON

INK

LIJ

KE

KP

NN

V(6

000)

;SA

FR

AN

SA(2

000)

;C

AP

GE

MIN

I(9

000)

;

•#

3:B

AN

CO

SAN

TA

ND

ER

SA(8

000)

;B

AY

ER

AG

-RE

G(1

000)

;SC

HN

EID

ER

ELE

CT

RIC

SA(2

000)

;IN

TE

SASA

NPA

OLO

(800

0);

RE

PSO

LY

PF

SA(1

);T

HY

SSE

NK

RU

PP

AG

(200

0);F

RE

SEN

IUS

ME

DIC

AL

CA

RE

AG

&(4

000)

;BO

UY

GU

ES

SA(2

000)

;FR

ESE

NIU

SSE

&C

OK

GA

A(4

000)

;TE

CH

NIP

SA(1

);E

DP

-EN

ER

GIA

SD

EP

ORT

UG

AL

SA(7

000)

;M

ET

RO

AG

(500

0);D

ELH

AIZ

EG

RO

UP

(500

0);M

ER

CK

KG

AA

(400

0);

•#

4:SI

EM

EN

SA

G-R

EG

(200

0);T

ELE

FO

NIC

ASA

(600

0);SA

PA

G(9

000)

;LV

MH

MO

ET

HE

NN

ESS

YLO

UIS

VU

I(3

000)

;K

ON

INK

LIJ

KE

PH

ILIP

SE

LE

C-

TR

ON

(300

0);A

SML

HO

LD

ING

NV

(900

0);T

ELE

CO

MIT

ALIA

SPA

(600

0);C

HR

IST

IAN

DIO

R(3

000)

;PU

BLIC

ISG

RO

UP

E(5

000)

;PO

RT

UG

AL

TE

LE

CO

MSG

PS

SA-R

EG

(600

0);

•#

5:E

.ON

AG

(700

0);IB

ER

DR

OLA

SA(7

000)

;V

INC

ISA

(200

0);LIN

DE

AG

(100

0);V

OLK

SWA

GE

NA

G-P

FD

(300

0);RW

EA

G(7

000)

;SA

MP

OO

YJ-

ASH

S(8

000)

;FO

RT

UM

OY

J(7

000)

;SA

IPE

MSP

A(1

);K

+S

AG

(100

0);

•#

6:SA

NO

FI

(400

0);

EN

ISP

A(1

);U

NIL

EV

ER

NV

-CVA

(300

0);

L’O

RE

AL

(300

0);

BAY

ER

ISC

HE

MO

TO

RE

NW

ER

KE

AG

(300

0);

PE

RN

OD

-RIC

AR

DSA

(300

0);R

EE

DE

LSE

VIE

RN

V(5

000)

;U

PM

-KY

MM

EN

EO

YJ

(100

0);RYA

NA

IRH

OLD

ING

SP

LC

(500

0);

•#

7:SO

CIE

TE

GE

NE

RA

LE

(800

0);U

NIC

RE

DIT

SPA

(800

0);C

OM

PA

GN

IED

ESA

INT

-GO

BA

IN(2

000)

;LA

FAR

GE

SA(2

000)

;RE

NA

ULT

SA(3

000)

;ER

STE

GR

OU

PB

AN

KA

G(8

000)

;P

OR

SCH

EA

UT

OM

OB

ILH

LD

G-P

FD

(300

0);

•#

8:M

AN

SE(2

000)

;VA

LLO

UR

EC

(200

0);H

EID

ELB

ER

GC

EM

EN

TA

G(2

000)

;M

ET

SOO

YJ

(200

0);

•#

9:D

AIM

LE

RA

G-R

EG

IST

ER

ED

SHA

RE

S(3

000)

;DE

UT

SCH

EB

AN

KA

G-R

EG

IST

ER

ED

(800

0);B

AN

CO

BIL

BA

OV

IZC

AYA

AR

GE

NTA

(800

0);E

NE

LSP

A(7

000)

;ASS

ICU

RA

ZIO

NIG

EN

ER

ALI(

8000

);C

AR

RE

FO

UR

SA(5

000)

;MU

EN

CH

EN

ER

RU

EC

KV

ER

AG

-RE

G(8

000)

;AK

ZO

NO

BE

L(1

000)

;KO

NIN

KLIJ

KE

AH

OLD

NV

(500

0);P

PR

(500

0);SO

DE

XO

(500

0);ST

MIC

RO

ELE

CT

RO

NIC

SN

V(9

000)

;

•#

10:

ALLIA

NZ

SE-R

EG

(800

0);

ING

GR

OE

PN

V-C

VA

(800

0);

AX

ASA

(800

0);

NO

KIA

OY

J(9

000)

;C

OM

ME

RZB

AN

KA

G(8

000)

;A

LST

OM

(200

0);

ALC

AT

EL-L

UC

EN

T(9

000)

;A

EG

ON

NV

(800

0);FIA

TSP

A(3

000)

;N

AT

ION

AL

BA

NK

OF

GR

EE

CE

(800

0);

31

Nonnegative Matrix Factorization and Financial...

Documents

Transcript of Nonnegative Matrix Factorization and Financial...