Nonnegative Matrix Factorization and Financial...
Transcript of Nonnegative Matrix Factorization and Financial...
Nonnegative Matrix Factorizationand Financial Applications
Zélia CazaletResearch & Development
Lyxor Asset Management, [email protected]
Thierry RoncalliResearch & Development
Lyxor Asset Management, [email protected]
May 2011
Abstract
Nonnegative matrix factorization (NMF) is a recent tool to analyse multivariatedata. It can be compared to other decomposition methods like principal componentanalysis (PCA) or independent component analysis (ICA). However, NMF differs fromthem because it requires and imposes the nonnegativity of matrices. In this paper, weuse this special feature in order to identify patterns in stock market data. Indeed, wemay use NMF to estimate common factors from the dynamics of stock prices. In thisperspective, we compare NMF and clustering algorithms to identify endogenous equitysectors.
Keywords: Nonnegative matrix factorization, principal component analysis, clustering,sparsity.
JEL classification: G1, C5.
1 IntroductionNMF is a recent technique which knows success not only in data analysis but also in imageand audio processing. It is an alternative approach to decomposition methods like PCAand ICA with the special feature to consider nonnegative matrices. Let A be a nonnegativematrix m× p. We define a NMF decomposition as follows:
A ≈ BC
with B and C two nonnegative matrices with respective dimensions m × n and n × p.Compared to classic decomposition algorithms, we remark that BC is an approximation ofA. There are also different ways to obtain this approximation meaning that B and C arenot necessarily unique. Because the dimensions m, n and p may be very large, one of thedifficulty of NMF is to derive a numerical algorithm with reasonable time of computation.In 1999, Lee and Seung develop a simple algorithm with strong performance and apply it topattern recognition with success. Since this seminal work, this algorithm has been improvedand there are today several ways to obtain a nonnegative matrix factorization.
Drakakis et al. (2008) apply NMF to analyse financial data. From a machine learningpoint of view, NMF could be viewed as a procedure to reduce the dimensionality of data.That’s why we could consider NMF in different fields: time series denoising, blind source
1
Nonnegative Matrix Factorization and Clustering Applications
deconvolution, pattern recognition, etc. One of the most interesting application concernsdata classification. Indeed, Drakakis et al. (2008) show a clustering example with stocks. Inthis paper, we explore this field of research. The paper is organized as follows. In Section 2,we present different NMF algorithms and compare them to PCA and ICA decomposition. InSection 3, we apply NMF to stocks classification and compare the results with these obtainedby clustering methods. Section 4 describes other applications to financial modeling. Finally,we conclude in Section 5.
2 Nonnegative matrix factorization
2.1 Interpretation of the NMF decompositionWe first notice that the decomposition A ≈ BC is equivalent to A⊤ ≈ C⊤B⊤. It means thatthe storage of the data is not important. Rows of A may represent either the observationsor the variables, but the interpretation of the B and C matrices depend on the choice of thestorage. We remark that:
Ai,j =
n∑k=1
Bi,kCk,j
Suppose that we consider a variable/observation storage. Therefore, Bi,k depend on thevariable i whereas Ck,j depend on the observation j. In this case, we may interpret B asa matrix of weights. In factor analysis, B is called the loading matrix and C is the factormatrix. Bi,k is then the weight of the factor k for the variable i and Ck,j is the value takenby the factor k for the observation j. If we use an observation/variable storage which isthe common way to store data in statistics, C becomes the loading matrix and B the factormatrix.
Remark 1 In the original work of Lee and Seung, the NMF decomposition is noted V ≈WH with W a matrix of weights meaning that the data are stored in a variable/observationorder.
Let D be a nonnegative matrix such that D−1 is nonnegative too. For example, D maybe a permutation of a diagonal matrix. In this case, we have:
A ≈ BD−1DC ≈ B⋆C⋆
with B⋆ = BD−1 and C⋆ = DC. It shows that the decomposition is not unique. Moreover,the decomposition may be rescaled by the matrix D = diag
(B⊤1
). In this case, B⋆ is a
nonnegative matrix such that:m∑i=1
B⋆i,k = 1
and B⋆ is a matrix of weights.
2.2 Some algorithmsIn order to find the approximate factorization, we need to define the cost function f whichquantify the quality of the factorization. The optimization program is then:
B, C
= argmin f (A,BC) (1)
u.c.
B ≥ 0C ≥ 0
2
Nonnegative Matrix Factorization and Clustering Applications
Lee and Seung (2001) consider two cost functions. The first one is the Frobenious norm:
f (A,BC) =
m∑i=1
p∑j=1
(Ai,j − (BC)i,j
)2whereas the second one is Kullback-Leibler divergence:
f (A,BC) =
m∑i=1
p∑j=1
(Ai,j ln
Ai,j
(BC)i,j−Ai,j + (BC)i,j
)
To solve the problem (1), Lee and Seung (2001) propose to use the multiplicative updatealgorithm. Let B(t) and C(t) be the matrices at iteration t. For the Frobenious norm, wehave:
B(t+1) = B(t) ⊙(AC⊤
(t)
)⊘(B(t)C(t)C
⊤(t)
)C(t+1) = C(t) ⊙
(B⊤
(t+1)A)⊘(B⊤
(t+1)B(t+1)C(t)
)where ⊙ and ⊘ are respectively the element-wise multiplication and division operators1. Asimilar algorithm may be derived for the Kullback-Leibler divergence. Under some assump-tion, we may show that B = B(∞) and C = C(∞) meaning that the multiplicative updatealgorithm converges to the optimal solution.
For large datasets, the computational time to find the optimal solution may be large withthe previous algorithm. Since the seminal work of Lee and Seung, a lot of methods havealso been proposed to improve the multiplicative update algorithm and speed the converge.Among these methods, we may mention the algorithm developed by Lin (2007). The ideais to to use the alternating nonnegative least squares2:
B(t+1) = argmin f(A,BC(t)
)C(t+1) = argmin f
(A,B(t+1)C
) (2)
with the constraints B(t+1) ≥ 0 and C(t+1) ≥ 0. To solve the previous problem, Lin (2007)uses the projected gradient method for bound-constrained optimization. We first remarkthat the two optimization problems (2) are symmetric because we may cast the first problemin the form of the second problem:
B⊤(t+1) = argmin f
(A⊤, C⊤
(t)B⊤)
So, we may only focus on the following optimization problem:
C⋆ = argmin f (A,BC)
u.c. C ≥ 0
1To prevent problems with denominators close to zero, Pauca et al. (2006) propose to add a small positivenumber in the two previous denominators.
2We notice that the algorithm of Lee and Seung is a special case of this one where the optimality criterionis replaced by a sufficient criterion:
f(A,B(t+1)C(t)
)≤ f
(A,B(t)C(t)
)f(A,B(t+1)C(t+1)
)≤ f
(A,B(t+1)C(t)
)
3
Nonnegative Matrix Factorization and Clustering Applications
Let us consider the case the Frobenious norm. We have:
∂Cf (A,BC) = 2B⊤ (BC −A)
The projected gradient method consists in the following iterating scheme:
C ← C − α∂Cf (A,BC)
with α the length of descent. In place of finding at each iteration the optimal value of α,Lin (2007) proposes to update α in a very simple way depending on the inequality equation:
(1− σ) ∂Cf (A,BC)⊤(C − C
)+
1
2
(C − C
)⊤∂2Cf (A,BC)
(C − C
)≤ 0
with C the update of C. If this inequality equation is verified, α is increased whereas wedecrease α otherwise.
2.3 Comparing NMF method with other factor decompositions
In order to understand why NMF is different from other factor methods, we consider asimulation study. We consider a basket of 4 financial assets. The asset prices are drivenby a multi-dimensional geometric brownian motion. The drift parameter is equal to 5%whereas the diffusion parameter is 20%. The cross-correlation ρi,j between assets i and j isequal to 20%, but ρ1,2 = 70% and ρ3,4 = 50%. In order to preserve the time homogeneity,the data correspond to xi,t = lnSi,t where Si,t is the price of the asset i at time t. In Figure1, we report the time series xi,t for the 4 assets and the first factor estimated by NMF andPCA methods. We remark that the NMF factors is not scaled in the same way than thePCA factor. However, the correlation between the first difference is equal to 98.8%.
Figure 1: Estimating the first factor of a basket of financial assets
4
Nonnegative Matrix Factorization and Clustering Applications
Table 1: Loading matrix of the PCA factors
Asset F1 F2 F3 F4
#1 0.55 −0.47 0.19 0.66#2 0.60 −0.35 −0.33 −0.65#3 0.45 0.51 0.70 −0.22#4 0.37 0.63 −0.60 0.32
Table 2: Loading matrix of the NMF factors
n = 1 n = 2 n = 3Asset F1 F1 F2 F1 F2 F3
#1 0.91 0.85 0.68 0.86 0.93 0.29#2 0.99 0.89 1.00 0.85 1.00 1.00#3 1.00 1.00 0.18 1.00 0.05 0.69#4 0.93 0.95 0.03 0.99 0.13 0.13
If we compare now the loading matrices of implied factors, we obtain different results(see Tables 1 and 2). The first factor is comparable, which is not the case of the otherfactors, because all the others factors of the PCA decomposition are long/short factors. Byconstruction, NMF factors are long-only and depend on the choice of the number of factors.Therefore, their interpretation is more difficult. In Figure 2, we compare the dynamic of
Figure 2: Reconstruction of the asset prices
the first asset with the dynamic given by the NMF factors. Another interesting result isthe decomposition of the variance according to the factors. In Figure 3, we notice that
5
Nonnegative Matrix Factorization and Clustering Applications
Figure 3: Variance explained by each factor
PCA explains more variance than NMF for a given number of factors. We may explain thisresult easily because NMF may be viewed as a constrained principal component analysiswith nonnegative matrices. However, it does not mean that PCA explains more variancethan NMF from a marginal point of view. For illustration, the second NMF factor explainsmore variance than the second PCA factor in Figure 3.
3 Financial applications with NMF
3.1 Factor extraction of an equity universe
In what follows, we consider the EuroStoxx 50 index. Using the composition of this index atthe end of 2010, we compute nonnegative matrix factorization on the logarithm of the stockprices. We have represented the first NMF factor in Figure 4. We may compare it with thelogarithm of the corresponding index. Even if the decomposition is done just for one dateand does not take into account of entry/exit in the index, we notice that the first NMF factoris highly correlated with the index3. One interesting thing is the sensibility values of thestocks with respect to this factor4. In Table 3, we indicate the stocks corresponding to the 5smallest and largest sensibility to the NMF factor. We also indicate their corresponding rankand sensibility in the case of PCA. We notice that NMF and PCA results are very different.First, the range of the sensibility is more important for PCA than for NMF. Second, thecorrelation between NMF and PCA rankings is low. For the first PCA factor, the largestcontribution comes from the financial and insurance sectors. This is not the case for NMF.
3The correlation between the index returns and the factor returns is equal to 97.2%. This is a little bithigher than those for the PCA, which is equal to 96.3%.
4In order to compare results, we normalize the sensibility such that the largest value is equal to 1.
6
Nonnegative Matrix Factorization and Clustering Applications
Figure 4: Comparison between the EuroStoxx 50 and the first NMF factor
Table 3: Sensibility of the stock prices to the first factor
NMF PCAName Rank Sensibility Rank SensibilityKONINKLIJKE 1 0.56 12 0.46DEUTSCHE TELEKOM 2 0.56 23 0.57FRANCE TELECOM 3 0.57 30 0.62VIVENDI 4 0.64 31 0.64NOKIA 5 0.65 38 0.72
IBERDROLA 46 0.92 5 0.39ENI 47 0.94 10 0.44VINCI 48 0.99 13 0.46UNIBAIL-RODAMCO 49 1.00 2 0.31PERNOD-RICARD 50 1.00 1 0.29
7
Nonnegative Matrix Factorization and Clustering Applications
If we consider now two factors, results are those given in Figure 5. In this case, wemay interpret them as a factor of bear market and a factor of bull market. Another wayto convince us about the nature of these factors is to consider the statistic βi defined asfollows5:
βi = B⋆i,2 −B⋆
i,1
We have βi ∈ [0, 1]. A large value of βi indicates a stock for which the sensibility to thesecond factor is much more important than the sensibility to the first factor. If we rank thestocks according to this statistic, we verify that high (resp. low) values of βi are associatedto stocks which have outperformed (resp. underperformed) the Eurostoxx 50 index.
In the case when the number n of factors increase, it is more and more difficult tointerpret the factors. For example, Figure 6 corresponds to the case n = 4. They are somesimilarities between the third or fourth factor with some sectors (see Table 4). For the firstand second factors, we don’t find an interpretation in terms of sectors. And the endogenouscharacteristic of these factors increases when n is large.
Table 4: Correlation of NMF factors with ICB sectors
Sector 0001 1000 2000 3000 4000 5000 6000 7000 8000 9000#1 −3.5 11.9 −13.6 4.8 −38.5 −16.6 −29.6 −7.3 −31.4 −25.7#2 0.8 6.4 13.0 −3.1 −24.2 2.0 6.5 −11.3 −13.0 17.0#3 2.8 52.2 52.0 43.7 70.5 61.5 73.8 45.8 68.7 70.7#4 4.7 34.0 28.5 14.7 59.9 36.2 38.0 37.9 53.9 24.7
We have the following correspondance between codes and sectors: 0001 = Oil & Gas, 1000 = Basic Materials,2000 = Industrials, 3000 = Consumer Goods, 4000 = Health Care, 5000 = Consumer Services, 6000 =Telecommunications, 7000 = Utilities, 8000 = Financials, 9000 = Technology.
3.2 Pattern recognition of asset returnsLet St,i be the price of the asset i at time t. We define the one-period return Rt,i as follows:
Rt,i =St,i
St−1,i− 1
We may decompose this return as the sum of a positive part and a negative part. We obtain:
Rt,i = max (Rt,i, 0)︸ ︷︷ ︸R+
t,i
−max (−Rt,i, 0)︸ ︷︷ ︸R−
t,i
Let R be a T ×N matrix containing the return Rt,i of the asset i at time t. We define thematrices R+ and R− in the same way by replacing the elements Rt,i respectively by R+
t,i
and R−t,i. We may apply the nonnegative matrix factorization to the matrices R+ and R−.
We have:
R = R+ −R−
= B+C+ −B−C− (3)
In this case, the dimension of the C+ and C− matrices is K × N . We may interpret thedecomposition of R as a projection of T periods to K patterns.
5We remind that we have rescaled the B and C matrices with D = diag (max (B)). In this case, B⋆
satisfies the property 0 ≤ B⋆i,k ≤ 1.
8
Nonnegative Matrix Factorization and Clustering Applications
Figure 5: NMF with two factors
Figure 6: NMF with four factors
9
Nonnegative Matrix Factorization and Clustering Applications
Figure 7: The case of one pattern
Remark 2 Another way to do pattern recognition is to consider the following factorization:(R+ R− )
= BC (4)
In this case, C is a K × 2N matrix. Decompositions (3) and (4) are very similar. But thelast one is more restrictive. Indeed, we have:(
R+ R− )= B
(C+ C− )
This factorization is a restrictive case of the decomposition(3) with B+ = B−.
In what follows, we consider the weekly returns of 20 stocks6 by considering the periodfrom January 2000 to December 2010. The dimension of the R matrix is then 574 × 20(number of periods T × number of stocks N). Let us consider the case of one pattern(K = 1). In Figure 7, we have reported the C matrix, that is the values of the pattern.Figure 8 corresponds to the B matrix, that is the sensibility of each weekly period to thepattern. If we compute the R2 statistic associated to this pattern model, we obtain a valueclosed to 50% (see Figure 9). If we consider more patterns, we may of course obtain betterR2. For example, with 12 patterns, the R2 statistic is equal to 90%.
To give an example of patterns, we have reported the case K = 4 in Figure 10. Wenotice that all the stocks are not always represented to define a pattern. For example, thesecond pattern to describe the positive part of returns concerns mainly 5 stocks (Siemens,Telefonica, Allianz, SAP, Deutsche Telekom). With these 4 patterns, the R2 statistic isequal to 70% for these 20 stocks and the entire period. We remark however that the R2
statistic differs between stocks (see 5). Indeed, it is equal to 90% for SAP whereas it is equalto 43% for Danone.
6Total, Siemens, Banco Santander, Telefonica, BASF, Sanofi, BNP Paribas, Bayer, Daimler, Allianz,ENI, E.ON, SAP, Deutsche Bank, BBVA, Unilever, ING, Schneider, Danone, Deustche Telekom.
10
Nonnegative Matrix Factorization and Clustering Applications
Figure 8: Sensibility to the pattern
Figure 9: R2 (in %) of the pattern model
11
Nonnegative Matrix Factorization and Clustering Applications
Figure 10: The case of 4 patterns
Table 5: R2 (in %) for each stock
Number of patternsStock 1 2 3 4 5 10 20Total 46 46 57 59 59 83 100Siemens 57 70 70 76 76 79 100Banco Santander 70 73 75 76 81 90 100Telefonica 36 54 57 66 64 73 97BASF 64 63 69 71 74 76 98Sanofi 27 28 45 54 56 67 100BNP Paribas 58 69 74 75 74 93 100Bayer 44 44 54 56 76 97 100Daimler 57 57 60 69 74 87 100Allianz 62 63 66 67 68 95 100ENI 44 48 57 61 62 83 100E.ON 35 36 58 59 60 65 100SAP 38 65 76 90 92 99 100Deutsche Bank 72 75 75 75 75 91 99BBVA 72 75 77 78 84 91 100Unilever 20 20 43 51 51 68 99ING 71 82 84 86 94 97 100Schneider Electric 49 49 50 52 56 87 100Danone 26 25 40 43 42 62 100Deutsche Telekom 22 54 56 77 80 91 98
12
Nonnegative Matrix Factorization and Clustering Applications
3.3 Classification of stocks
In what follows, we consider the universe of the 100 largest stocks of the DJ EurostoxxIndex. The study period begins at January 2000 and ends at December 2010. Let Pi,t bethe price of the stock i for the period t. We consider the matrix A0 with elements lnPi,t. Weperform a NMF of the matrix A0 with several initialization methods. They are described inAppendix B:
1. RAND corresponds to the random method;
2. NNDSVD uses the algorithm of nonnegative double singular value decomposition pro-posed by Boutsidis and Gallopoulos (2008);
3. NNDSVDa and NNDSVDar are two variants of NNDSVD where the zeros are re-placed by the average value a of the matrix A or random numbers from the uniformdistribution U[0,a/100];
4. KM is based on the K-means algorithm proposed by Bertin (2009);
5. CRO corresponds to the closeness to rank-one algorithm described by Kim and Choi(2007).
3.3.1 Relationship between NMF factors and sectors
Remark 3 Let A0 be the m0 × p matrix which represents the initial data. With NMF,we estimate the factorization A0 ≈ B0C0 with B0, C0 = argmin f (A0, BC) under theconstraints B ≥ 0 and C ≥ 0. Suppose now that the m1×p matrix A1 represents other data.We may estimate a new NMF and obtain A1 ≈ B1C1. In some application, we would like tohave the same factors for A0 and A1, that is C1 = C0. In this case, the second optimizationprogram becomes then B1 = argmin f (A1, BC0) u.c. B ≥ 0. For the Frobenious norm,this program may be easily solved with quadratic programming. The optimal solution isB1 =
[β1 · · · βm1
]⊤ with:
βi = argmin1
2β⊤ (C0C
⊤0
)β − β⊤ (C0A
⊤1 ei)
u.c. β ≥ 0
Of course, this optimal solution is valid even if the matrix C0 is not given by NMF, butcorresponds to exogenous factors.
One may wonder if the NMF factors are linked to sectors. We use the ICB classificationwith 10 sectors. Let Ij,t be the value of the jth sector index. A first idea is to applyNMF to the matrix A1 with elements ln Ii,t, but by imposing that the matrix of factors Ccorresponds to the matrix C0 estimated with the stocks. Using the results of the previousremark, the estimation of the matrix B1 with B1 ≥ 0 is straightforward. If we have a one-to-one correspondance between NMF factors and ICB sectors, B1 must be a permutation ofa diagonal matrix. Results are reported in Appendix C.1 with n = 10 factors. We noticethat results depend on the initialization method. By construction, two methods (NNDSVDand CRO) produce sparse factorization. For the four others, the factorization is very dense.Except for some very specific cases, a sector is generally linked to several factors.
Another way to see if sectors could be related to NMF factors is to consider the C2
matrix with elements ln Ii,t. In this case, we impose exogenous factors and we could use the
13
Nonnegative Matrix Factorization and Clustering Applications
Table 6: Frequencies of the pi statistic
pi Frequency0% 39
]0, 10%] 4]10%, 20%] 7]20%, 30%] 4]30%, 40%] 9]40%, 50%] 7]50%, 60%] 5]60%, 70%] 7]70%, 80%] 3]80%, 90%] 4]90%, 100%[ 2
100% 9
previous technique to estimate the B2 matrix such that A0 = B2C2 and B2 ≥ 0. If sectorsare the natural factors, we must verify that (B2)i,j > 0 if the stock i belongs to the sectorj and (B2)i,j ≃ 0 otherwise. Let S (i) = j be the mapping function between stocks andsectors. For each stock i, we may compute the proportion of factor weights explained by thesector S (i) with respect to the sum of weights:
pi =(B2)i,S(i)∑nj=1 (B2)i,j
If pi = 100%, it means that the sector S (i) of the stock explains all the factor component.If pi = 0%, it means that the sector S (i) of the stock explains nothing. In Table 6, wehave reported the distribution of the pi statistic. We notice that, among the 100 stocks,only 11 stocks have a statistic larger than 90%. These stocks are TOTAL (Oil & Gas),BASF (Basic Materials), SANOFI (Health Care), ARCELORMITTAL (Basic Materials),LINDE (Basic Materials), NOKIA (Technology), KONINKLIJKE (Telecommunications),ALCATEL-LUCENT (Technology), K+S (Basic Materials), CAP GEMINI (Technology)and STMICROELECTRONICS (Technology). In the same time, 39 stocks are not explainedby their sectors. We have reported the 25 largest stocks with pi = 0 in Table 15 (seeAppendix C.2 page 30). In Figure 11, we have reported some of these stocks7. For eachstock, we also report its corresponding sector (green line) and the “representative" sectorwhich presents the largest weight in the B2 matrix (red line). Most of the times, we confirmthat the behavior of the stock is closer to the behavior of the representative sector than tothe behavior of the corresponding sector.
Remark 4 The previous analysis is done for a long period (11 years). If we consider ashorter period (for example 1 or 2 years), difference between the corresponding sector andthe representative sector is more important.
7We remind that the data are the logarithm of the prices lnPi,t. Moreover, we have normalized the pricessuch that Pi,0 = 100.
14
Nonnegative Matrix Factorization and Clustering Applications
Figure 11: Some stocks which have a behavior different of their sector index
3.3.2 NMF classifiers
Before doing a classification based on NMF, we apply the K-means procedure directly onthe stocks returns. The results set a benchmark for the future results of NMF classifiers.For that, we consider 10 clusters. In Table 7, we remark that the cluster #1 groups togethera large part of stocks with 8 sectors represented. This cluster has certainly no economicsignification. It contains the stocks which are difficult to classify in the other clusters.
Table 7: Number of stocks by sector/cluster
clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 5 51000 3 6 1 102000 4 2 1 7 1 1 163000 5 1 3 1 7 174000 4 1 55000 4 2 3 1 106000 6 67000 6 68000 4 8 3 4 199000 6 6total 35 9 8 9 19 2 3 3 8 4 100
In Figure 12, we report the frequency of each sector in each cluster. We do not notice an
15
Nonnegative Matrix Factorization and Clustering Applications
exact correspondence between sectors and clusters because several clusters are representedby the same sector. Indeed, this classification highlights the Financials sector which has afrequency of 100% for 3 of the 10 clusters (#3, #7 and #10). In addition, the clusters #2, #4and #9, respectively represented by Technology, Telecommunication and Consumer Goodssectors, are pointed up. It seems logical that the classification brings to light these sectorsbecause the studied period is strongly influenced by the dot.com and financial bubbles.However, we notice that some clusters, strongly characterized by some sectors, only representa small part of them. For instance, clusters #7 and #10 represent each one less than 25%of the Financials sector. As a consequence, some clusters represent more a subset of asector than the average behavior of the sector. Figure 13 confirms this idea. In some cases,the dynamics of centroids differ from the dynamics of the most represented sectors. Forinstance, the evolution in cluster #10 differs from the global evolution of the Financialssector. Indeed, the centroid owns an evolution flatter than the evolution of the sector. So,the cluster #10 highlights a subset of Financials stocks which has a behavior which is verydifferent than the Financials sector, especially during the burst of the dot.com bubble.
To conclude, this preliminary study shows the heterogeneity into sectors and points upsome special stocks whose behaviors differs from the behaviors of their sectors.
Figure 12: Results of the cluster analysis
Let us see now if NMF classifiers can represent an alternative of the sectors classification.Before performing the classification, we have to do two choices: the initialization methodand the order n of the factorization. In Figure 14, we see the evolution of the NMF errorwith respect to the dimension n. We notice that NNDSVD and CRO methods convergemore slowly than the others. It is explained by the fact that these methods produce sparsefactorization. The other initialization methods provide similar results. However, NNDSDVar
has the lowest euclidean error in most cases. As a result, in what follows, we choose to analyseresults with the NNDSDVar initialization method.
16
Nonnegative Matrix Factorization and Clustering Applications
Figure 13: Centroid of some clusters
Figure 14: Evolution of the error with respect to initialization method
17
Nonnegative Matrix Factorization and Clustering Applications
The next step consists to fix the smallest order n of the factorization which does not leadto serious information loss. For this, we study the inertia retrieved by the NMF method andwe select the order which permits to keep more than 90% of the information. According toFigure 15, n = 4 is sufficient to retrieve 90% of the information. But n = 5 permits to gain3.47% supplementary information. As a consequence, we select n = 5 which represents thenumber of factors resulting from NMF.
Figure 15: NMF inertia
We then proceed to classify stocks by applying an unsupervised algorithm to the normal-ized matrix B⋆. The idea is to group stocks that are influenced by the same factors. For this,we consider the K-means algorithm with 10 clusters. According to Table 8, we notice thatclusters are well distributed. Contrary to the previous classification, we de not observe onecluster which owns a lot of stocks. Figure 16 indicates that these clusters do not representsectors. An exception is the cluster #8 which is only represented by the Industrials sector.
18
Nonnegative Matrix Factorization and Clustering Applications
Table 8: Number of stocks by sector/cluster
clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 1 2 0 1 1 51000 4 1 1 2 1 1 102000 3 1 3 1 1 2 4 1 163000 5 3 1 4 2 1 1 174000 1 3 1 55000 1 2 1 2 4 106000 3 3 67000 1 4 1 68000 3 2 1 3 4 6 199000 1 2 1 2 6total 17 7 14 10 10 9 7 4 12 10 100
Figure 16: Frequencies of sectors in each cluster
19
Nonnegative Matrix Factorization and Clustering Applications
In Figure 17, we report the evolution of the centroid of each cluster k. It is computed asfollows:
xk = w⊤k C
⋆
with:
wk,j =1
Nk
m∑i=1
1 C (i) = k ·B⋆i,j
where Nk =∑m
i=1 1 C (i) = k is the number of stocks in the cluster k.
Figure 17: Centroid of the clusters
We notice that NMF classifiers permit to underline distinct behaviors. Indeed, thecentroids of clusters #5 and #8 are strongly influenced by the financial bubble whereas thecentroid of cluster #2 is influenced by the dot.com bubble. Stocks which belong to clusters#1, #6 and #7 don’t suffer from the dot.com bubble, whereas it is not the case of stockswhich belong to clusters #3, #4 #9 and #10. We notice also that the centroids of clusters#1, #6 and #7 are very different at the end of the study period.
To conclude, NMF classifiers are useful for pattern recognition but generally, clustersdo not have economic signification. An other drawback is the clusters dependence on theselected initialization procedure for NMF.
4 ConclusionKnowing its results in image recognition or audio processing, NMF seems to be a usefuldecomposition method for financial applications. Imposing the non-negativity of matricespermits to consider the factorization as a decomposition with a loading matrix and a matrixof long-only factors. This special feature is interesting to do pattern recognition but the
20
Nonnegative Matrix Factorization and Clustering Applications
interpretation is more and more difficult with the increasing number of factors. We can alsoapply a clustering algorithm on the loading matrix in order to group together stocks withsame patterns. According to our results, NMF classifiers set apart different behaviors butthe economical interpretation of clusters is difficult. A direct classification on stock returnswith the K-means procedure seems more robust and highlights some special stocks whosebehaviors differs from the behaviors of their sectors.
References[1] Bertin N. (2009), Les factorisations en matrices non-négatives : approches contraintes
et probabilistes, application à la transcription automatique de musique polyphonique,Mémoire de thèse, Télécom ParisTech.
[2] Boutsidis C. and Gallopoulos E. (2008), SVD Based Initialization: A Head Startfor Nonnegative Matrix Factorization, Pattern Recognition, 41(4), pp. 1350-1362.
[3] Drakakis K., Rickard S., de Fréin R. and Cichocki A. (2008), Analysis of Finan-cial Data Using Non-negative Matrix Factorization, International Mathematical Forum,3(38), pp. 1853-1870.
[4] Hastie T., Tibshirani R. and Friedman R. (2009), The Elements of Statistical Learn-ing, Second Edition, Springer.
[5] Hoyer P.O. (2004), Non-negative Matrix Factorization with Sparseness Constraints,Journal of Machine Learning Research, 5, pp. 1457-1469.
[6] Kim Y-D. and Choi S. (2007), A Method of Initialization for Nonnegative Matrix Fac-torization, IEEE International Conference on Acoustics Speech and Signal Processing,2(1), pp. 537-540.
[7] Lee D.D. and Seung H.S. (1999), Learning the Parts of Objects by Non-negativeMatrix Factorization, Nature, 401, pp. 788-791.
[8] Lee D.D. and Seung H.S. (2001), Algorithms for Non-negative Matrix Factorization,Advances in Neural Information Processing Systems, 13, pp. 556-562.
[9] Lin C.J. (2007), Projected Gradient Methods for Non-negative Matrix Factorization,Neural Computation, 19(10), pp. 2756-2779.
[10] Pauca V.P., Piper J. and Plemmons J (2006), Nonnegative Matrix Factorization forSpectral Data Analysis, Linear Algebra and its Applications, 416(1), pp. 29-47.
[11] Sonneveld P., van Kan J.J.I.M., Huang X. and Oosterlee C.W. (2009), Nonnega-tive Matrix Factorization of a Correlation Matrix, Linear Algebra and its Applications,431(3-4), pp. 334-349.
21
Nonnegative Matrix Factorization and Clustering Applications
A Cluster analysisCluster analysis is a method for the assignment of observations into groups (or clusters). Itis then an exploratory data analysis which allows to group similar observations together. Asa result, the objective of clustering methods is to maximize the pairwise proximity betweenobservations of a same cluster and to maximize the dissimilarity between observations whichbelong to different clusters. In what follows, we are concerned by unsupervised learningalgorithms, that is segmentation methods with no information on the groups.
A.1 The K-means algorithmIt is a special case of combinatorial algorithms. This kind of algorithm does not use aprobability distribution but works directly on observed data. We consider m objects withn attributes xi,j (i = 1, . . . ,m and j = 1, . . . , n). We would like to build K clusters definedby the index k (k = 1, . . . ,K). Let C be the mapping function which permits to assignan object to a cluster8. The principe of combinatorial algorithms is to adjust the mappingfunction C in order to minimize the following loss function:
L (C) = 1
2
K∑k=1
∑C(i)=k
∑C(i′)=k
d (xi, xi′)
where d (xi, xi′) is the dissimilarity measure between the objects i and i′. As a result, theoptimal mapping function is denoted C⋆ = argminL (C).
In the case of the K-means algorithm, the dissimilarity measure is the Frobenius distance:
d (xi, xi′) =
n∑j=1
(xi,j − xi′,j)2
= ∥xi − xj∥2
Therefore, the loss function becomes (Hastie et al., 2009):
L (C) =K∑
k=1
Nk
∑C(i)=k
∥xi − xk∥2
where xk = (x1,k, ..., xm,k) is the mean vector associated with the kth cluster and Nk =∑mi=1 1 C (i) = k. Because m⋆
Ω = argmin∑
i∈Ω ∥xi −m∥2, another from of the previousminimization problem is:
C⋆,m⋆1, . . . ,m
⋆K = argmin
K∑k=1
Nk
∑C(i)=k
∥xi −mk∥2
This problem is solved by iterations. At the iteration s, we compute the optimal means ofthe clusters
m
(s)1 , . . . ,m
(s)K
for the given mapping function C(s−1). Then, we update the
mapping function using the following rule:
k = C(s) (i) = argmin∥∥∥xi −m
(s)k
∥∥∥2We repeat these two steps until the convergence of the algorithm C⋆ = C(s) = C(s−1).
8For example, C (i) = k assigns the ith observation to the kth cluster.
22
Nonnegative Matrix Factorization and Clustering Applications
A.2 Hierarchical clusteringThis algorithm creates a hierarchy of clusters which may be represented in a tree structure.Unlike the K-means algorithm, this method depends on neither the number of clustersnor the starting configuration assignment. The lowest assignment is the individual objectswhereas the highest corresponds to one cluster containing all the objects. We generallydistinguish two methods:
• in the agglomerative method, the algorithm starts with the individual clusters andrecursively merge the closest pair of clusters into one single cluster;
• in the divise method, the algorithm starts with the single cluster and recursively splitthe cluster into two new clusters which presents the maximum dissimilarity.
In this study, we only consider the agglomerative method.
Let k and k′ be two clusters. We define the dissimilarity measure d (k, k′) as a linkagefunction of pairwise dissimilarities d (xi, xi′) where C (i) = k and C (i′) = k′:
d (k, k′) = ℓ (d (xi, xi′) , C (i) = k , C (i′) = k′)
There exists different ways to define the function ℓ:
• Single linkaged (k, k′) = min
xi∈k,xi′∈k′d (xi, xi′)
• Complete linkaged (k, k′) = max
xi∈k,xi′∈k′d (xi, xi′)
• Average linkage
d (k, k′) =1
NkNk′
∑xi∈k
∑xi′∈k′
d (xi, xi′)
At each iteration, we search the clusters k and k′ which minimize the dissimilarity measureand we merge them into one single cluster. When we have merged all the objects into onlyone cluster, we obtain a tree which is called a dendrogram. It is also easy to perform asegmentation by considering a particular level of the tree. In Figure 18, we report a dendro-gram on simulated data using the single linkage rule. We consider 20 objects divided intotwo groups. The attributes of the first (resp. second) one correspond to simulated Gaussianvariates with a mean 20% (resp. 30%) and a standard deviation 5% (resp. 10%). Theintra-group cross-correlation is set to 75% whereas the inter-group correlation is equal to 0.We obtain very good results. In practice, hierarchical clustering may produce concentratedsegmentation as illustrated in Figure 19. We use the same simulated data as previouslyexcept that the standard deviation for the second group is set to 25%. In this case, if wewould like to consider two clusters, we obtain a cluster with 19 elements and another clusterwith only one element (the 18th object).
B Initialization methods for NMF algorithmsWe have to use starting matrices B(0) and C(0) to initialize NMF algorithms. In this section,we present the most popular methods to define them.
23
Nonnegative Matrix Factorization and Clustering Applications
Figure 18: An example of dendrogram
Figure 19: Bad classification
24
Nonnegative Matrix Factorization and Clustering Applications
B.1 Random method
The first idea is to consider matrices of positive random numbers. Generally, one use uniformdistribution U[0,1] or the absolute value of Gaussian distribution |N (0, 1)|.
B.2 K-means method
Bertin (2009) proposes to apply the K-means algorithm on the matrix x = A⊤, meaning thatthe clustering is done on the columns of the matrix. When the n clusters are determined,we compute the centroid matrix xn×m:
xn×m =
x⊤1...x⊤n
where xk = (xk,1, . . . , xk,m) is the mean vector associated with the kth cluster. Finally, wehave B(0) = x⊤
n×mand C(0) is a matrix of positive random numbers.
B.3 SVD method
Let A be a m× p matrix. The singular value decomposition is given by:
A = usv⊤
where u is a m ×m unitary matrix, s is a m × p diagonal matrix with nonnegative entriesand v is a p× p unitary matrix. The rank-n approximation is then:
A ≈n∑
k=1
ukσkv⊤k
with uk and vk the kth left and right singular vectors and σi = si,i the kth largest singularvalue (σ1 ≥ . . . ≥ σn). An idea to initialize the NMF algorithm is to define B(0) andC(0) such that the kth column of B(0) is equal to uk
√σk and the kth row of C(0) is equal
to√σkv
⊤k . However, This method does not work because the vectors uk and vk are not
necessarily nonnegative except for the largest singular value (k = 1) as explained in thefollowing remark.
Remark 5 We may show that the left singular vectors u of A are the eigenvectors of AA⊤,the right singular vectors v of A are the eigenvectors of A⊤A and the non-zero singularvalues σi are the square roots of the non-zero eigenvalues of AA⊤. The Perron-Frobeniustheorem implies that the first eigenvector of nonnegative irreducible matrices is nonnegative.We deduce that the first eigenvector of the AA⊤ or A⊤A matrices is nonnegative if all entriesof A are nonnegative. It proves that the singular vectors u1 and v1 are nonnegative.
Boutsidis and Gallopoulos (2008) propose to modify the previous factorization:
A ≈n∑
k=1
ukσkv⊤k
where uk and vk are nonnegative vectors. The algorithm is called nonnegative double sin-gular value decomposition (NNDSVD) and is a simple modification of the singular value
25
Nonnegative Matrix Factorization and Clustering Applications
decomposition by considering only the nonnegative part of the singular values. For eachsingular value σk, we have to decide what is the largest nonnegative part by noticing that:
ukσkv⊤k = (−uk)σk (−vk)⊤
Let x+ be the nonnegative part of the vector x. We define u−k = (−uk)
+ and v−k = (−vk)+.It comes that:
uk = γk ×
u+k
/∥∥u+k
∥∥ if m+ > m−
u−k
/∥∥u−k
∥∥ otherwiseand:
vk = γk ×
v+k/∥∥v+k ∥∥ if m+ > m−
v−k/∥∥v−k ∥∥ otherwise
where m+ =∥∥u+
k
∥∥ ∥∥v+k ∥∥ and m− =∥∥u−
k
∥∥ ∥∥v−k ∥∥. In order to preserve the inertia, the singular
vectors are scaled by the factor γk = max(√
m+,√m−). The initialization of B(0) and C(0)
is then done by using uk and vk in place of uk and vk.
B.4 CRO methodThe CRO-based hierarchical clustering is an alternative method of the classical agglomera-tive hierarchical clustering when the similarity distance is given by the closeness to rank-one(CRO) measure:
cro (X) =σ21∑i σ
2i
with σ1 ≥ σ2 ≥ . . . ≥ 0 the singular values of the matrix X. Let k1 and k2 be two clusters.The CRO measure between two clusters k1 and k2 of the matrix A is defined by cro
(A(k1,k2)
)where A(k1,k2) is the submatrix of A which contains the row vectors of the two clusters k1and k2. Kim and Choi (2007) summarize the CRO-based hierarchical algorithm as follows.First, we assign each row vector of A into m clusters. Then, we merge the pairs of clusterswith the largest CRO measure into one single cluster until n clusters remains.
When the CRO-based hierarchical clustering is applied to the matrix A, we obtain nclusters represented by the submatrices A1, . . . , An. Kim and Choi (2007) consider then therank-one SVD approximation9: A1
...An
≈ u1σ1v
⊤1
...unσnv
⊤n
This decomposition leads to a very simple assignment of the matrix B(0) and C(0). The kth
column of B(0) corresponds to the vector uk for the rows which belong the kth cluster andthe rest of elements of the kth column are equal to zero (or a small number). The kth rowof C(0) corresponds to σkv
⊤k . By construction, this initialization method produces sparse
factorization.
Remark 6 We have proved previously that the singular vector u and v associated to therank-one SVD approximation of a nonnegative matrix A are nonnegative. From a numericalanalysis point of view, the rank-one SVD approximation may be usv⊤ or (−u) s (−v)⊤. Inpractice, we consider also |u| and |v| in place of u and v when working with the rank-oneapproximation.
9We have Ak = ukσkv⊤k where uk and vk are the left and right vectors associated with the largest singular
value σk.
26
Nonnegative Matrix Factorization and Clustering Applications
C Results
C.1 Estimation of the B1 matrix
Table 9: B1 matrix with sectors indexes (RAND)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 1.00 0.84 0.56 0.69 0.63 0.84 0.82 0.69 0.54 0.79Basic Materials 0.58 1.00 0.53 0.69 0.64 0.86 1.00 0.49 0.49 1.00Industrials 0.47 0.83 0.68 0.64 0.71 0.72 0.74 0.60 0.55 0.84Consumer Goods 0.70 0.89 0.59 0.68 0.81 0.77 0.81 0.67 0.35 0.87Health Care 0.95 0.95 0.44 0.80 0.71 0.79 0.66 1.00 0.51 0.56Consumer Services 0.55 0.68 0.66 0.46 0.55 0.65 0.68 0.84 0.41 0.49Telecommunications 0.69 0.66 1.00 0.30 0.47 0.78 0.43 0.62 0.27 0.49Utilities 0.97 0.91 0.71 0.72 0.65 1.00 0.83 0.42 0.63 0.65Financials 0.89 0.76 0.60 1.00 1.00 0.81 0.60 0.40 1.00 0.46Technology 0.54 0.49 0.96 0.55 0.45 0.45 0.69 0.53 0.57 0.63
Table 10: B1 matrix with sectors indexes (NNDSVD)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.99 0.24 0.02 0.21 0.34 0.20 0.36Basic Materials 0.93 0.21 0.44 0.21 0.01 0.82 0.71 0.13 0.21 1.00Industrials 0.79 0.52 0.55 0.64 0.31 1.00 0.46 0.34 0.59 0.64Consumer Goods 0.93 0.26 0.18 0.11 0.86 0.32 0.10 0.24 0.60Health Care 1.00 0.35 0.02 0.13 0.14 0.41Consumer Services 0.71 0.68 0.34 0.53 0.38 0.60 0.18 0.33 0.12Telecommunications 0.59 0.73 1.00 0.59 1.00 0.68 1.00 0.10Utilities 0.92 0.33 0.64 0.41 0.30 1.00 0.56 0.30Financials 0.88 0.47 0.49 1.00 0.11 0.50 0.51 0.67 0.43Technology 0.59 1.00 0.63 0.97 0.65 0.23 0.70 0.73 1.00 0.16
Table 11: B1 matrix with sectors indexes (NNDSVDa)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.83 0.99 1.00 1.00 0.67 0.97 0.29 0.34 0.70 0.07Basic Materials 0.79 1.00 0.56 0.84 0.50 0.57 0.86 1.00 0.64 0.99Industrials 0.90 0.75 0.54 0.65 0.55 0.49 0.94 0.91 0.64 0.95Consumer Goods 0.86 0.86 0.60 0.88 0.45 0.60 0.90 0.87 0.65 0.87Health Care 0.83 0.90 0.85 0.99 0.66 0.92 0.77 0.30 0.81 0.37Consumer Services 0.85 0.33 0.54 0.47 0.61 0.69 1.00 0.72 1.00 1.00Telecommunications 1.00 0.35 0.57 0.42 0.33 0.93 0.69 0.52 0.78 0.71Utilities 0.80 0.92 0.84 0.77 0.70 1.00 0.46 0.56 0.87 0.74Financials 0.73 0.94 0.90 0.69 1.00 0.68 0.48 0.48 0.81 0.80Technology 0.98 0.39 0.71 0.50 0.79 0.67 0.46 0.43 0.65 0.30
27
Nonnegative Matrix Factorization and Clustering Applications
Table 12: B1 matrix with sectors indexes (NNDSVDar)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.94 0.38 0.32 0.25 0.27 0.16 0.83 0.28 0.33 1.00Basic Materials 0.79 0.50 0.83 0.67 0.23 0.92 1.00 0.32 0.38 0.87Industrials 0.65 0.65 0.73 0.79 0.57 1.00 0.97 0.56 0.55 0.64Consumer Goods 0.82 0.46 0.55 0.36 0.36 0.78 0.97 0.36 0.42 0.56Health Care 1.00 0.44 0.21 0.06 0.15 0.10 0.46 0.25 0.31 0.83Consumer Services 0.63 0.71 0.49 0.51 0.56 0.50 0.86 0.49 0.39 0.29Telecommunications 0.52 0.61 0.96 0.59 1.00 0.29 0.90 0.50 0.56 0.22Utilities 0.82 0.53 1.00 0.70 0.31 0.16 0.95 0.53 0.41 0.90Financials 0.74 0.69 0.84 1.00 0.20 0.32 0.78 1.00 0.68 0.94Technology 0.43 1.00 0.83 0.99 0.78 0.54 0.99 0.76 1.00 0.51
Table 13: B1 matrix with sectors indexes (KM)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.93 0.98 0.84 0.76 0.95 0.99 0.87 0.88 0.79 0.71Basic Materials 1.00 0.93 0.76 1.00 0.75 0.83 0.84 0.89 0.91 0.61Industrials 0.80 0.83 0.72 0.88 0.75 0.74 0.78 0.74 0.83 0.79Consumer Goods 0.87 1.00 0.72 0.95 0.82 0.88 0.82 0.83 0.76 0.72Health Care 0.96 0.94 0.82 0.77 1.00 1.00 1.00 0.91 0.59 0.77Consumer Services 0.66 0.69 0.59 0.62 0.74 0.78 0.78 0.67 0.60 0.88Telecommunications 0.70 0.71 0.51 0.52 0.74 0.63 0.55 0.74 0.62 0.93Utilities 1.00 0.89 0.85 0.77 0.84 0.86 0.86 1.00 0.95 0.67Financials 0.81 0.82 1.00 0.78 0.81 0.80 0.97 0.85 1.00 0.65Technology 0.57 0.73 0.58 0.44 0.73 0.70 0.76 0.55 0.64 1.00
Table 14: B1 matrix with sectors indexes (CRO)
NMF factorsSector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10Oil & Gas 0.22 0.93 0.34 0.09 0.67Basic Materials 0.29 1.00 0.25 1.00 0.17Industrials 0.75 0.05 0.60 0.30 0.71 0.80 0.20Consumer Goods 0.92 0.20 0.15 0.59Health Care 1.00 0.03 0.60 0.12Consumer Services 0.46 0.76 0.07 0.35 0.41 0.26 0.21 0.23Telecommunications 0.04 0.27 0.75 1.00 0.78Utilities 0.77 1.00 1.00 0.26 0.27Financials 0.38 0.70 1.00 0.02 1.00 0.05 1.00 0.13Technology 0.28 0.10 0.66 0.17 0.18 0.76 0.65 1.00
28
Nonnegative Matrix Factorization and Clustering Applications
C.2
Res
ult
sof
clas
sifica
tion
Her
ear
eth
ecl
uste
rsob
tain
edw
ith
the
K-m
eans
clas
sifie
r:
•#
1:T
OTA
LSA
(1);
EN
ISP
A(1
);E
.ON
AG
(700
0);
DA
NO
NE
(300
0);
EN
EL
SPA
(700
0);
AIR
LIQ
UID
ESA
(100
0);
IBE
RD
RO
LA
SA(7
000)
;V
INC
ISA
(200
0);A
RC
ELO
RM
ITTA
L(1
000)
;L’O
RE
AL
(300
0);A
SSIC
UR
AZIO
NIG
EN
ER
ALI(8
000)
;RE
PSO
LY
PF
SA(1
);C
AR
RE
FO
UR
SA(5
000)
;RW
EA
G(7
000)
;U
NIB
AIL
-RO
DA
MC
OSE
(800
0);
PE
RN
OD
-RIC
AR
DSA
(300
0);
ESS
ILO
RIN
TE
RN
AT
ION
AL
(400
0);
SAM
PO
OY
J-A
SHS
(800
0);
FO
RT
UM
OY
J(7
000)
;H
EIN
EK
EN
NV
(300
0);FR
ESE
NIU
SM
ED
ICA
LC
AR
EA
G&
(400
0);SA
IPE
MSP
A(1
);VA
LLO
UR
EC
(200
0);H
EN
KE
LA
G&
CO
KG
AA
VO
RZU
G(3
000)
;FR
ESE
NIU
SSE
&C
OK
GA
A(4
000)
;T
EC
HN
IPSA
(1);
RE
ED
ELSE
VIE
RN
V(5
000)
;SO
LVAY
SA(1
000)
;E
DP
-EN
ER
GIA
SD
EP
ORT
UG
AL
SA(7
000)
;A
BE
RT
ISIN
FR
AE
STR
UC
TU
RA
SSA
(200
0);D
ELH
AIZ
EG
RO
UP
(500
0);SO
DE
XO
(500
0);M
ER
CK
KG
AA
(400
0);G
RO
UP
EB
RU
XE
LLE
SLA
MB
ERT
SA(8
000)
;AT
LA
NT
IASP
A(2
000)
;
•#
2:SI
EM
EN
SA
G-R
EG
(200
0);S
AP
AG
(900
0);K
ON
INK
LIJ
KE
PH
ILIP
SE
LE
CT
RO
N(3
000)
;NO
KIA
OY
J(9
000)
;ASM
LH
OLD
ING
NV
(900
0);B
OU
YG
UE
SSA
(200
0);A
LC
AT
EL-L
UC
EN
T(9
000)
;C
AP
GE
MIN
I(9
000)
;ST
MIC
RO
ELE
CT
RO
NIC
SN
V(9
000)
;
•#
3:B
AN
CO
SAN
TA
ND
ER
SA(8
000)
;D
EU
TSC
HE
BA
NK
AG
-RE
GIS
TE
RE
D(8
000)
;B
AN
CO
BIL
BA
OV
IZC
AYA
AR
GE
NTA
(800
0);
AX
ASA
(800
0);
INT
ESA
SAN
PA
OLO
(800
0);U
NIC
RE
DIT
SPA
(800
0);E
RST
EG
RO
UP
BA
NK
AG
(800
0);N
AT
ION
AL
BA
NK
OF
GR
EE
CE
(800
0);
•#
4:T
ELE
FO
NIC
ASA
(600
0);D
EU
TSC
HE
TE
LE
KO
MA
G-R
EG
(600
0);F
RA
NC
ET
ELE
CO
MSA
(600
0);V
IVE
ND
I(5
000)
;KO
NIN
KLIJ
KE
KP
NN
V(6
000)
;T
ELE
CO
MIT
ALIA
SPA
(600
0);SA
FR
AN
SA(2
000)
;P
UB
LIC
ISG
RO
UP
E(5
000)
;P
ORT
UG
AL
TE
LE
CO
MSG
PS
SA-R
EG
(600
0);
•#
5:B
ASF
SE(1
000)
;SC
HN
EID
ER
ELE
CT
RIC
SA(2
000)
;LV
MH
MO
ET
HE
NN
ESS
YLO
UIS
VU
I(3
000)
;LIN
DE
AG
(100
0);
CO
MPA
GN
IED
ESA
INT
-G
OB
AIN
(200
0);T
HY
SSE
NK
RU
PP
AG
(200
0);A
KZO
NO
BE
L(1
000)
;C
RH
PLC
(200
0);A
DID
AS
AG
(300
0);P
PR
(500
0);LA
FAR
GE
SA(2
000)
;K
+S
AG
(100
0);K
ON
INK
LIJ
KE
DSM
NV
(100
0);H
EID
ELB
ER
GC
EM
EN
TA
G(2
000)
;U
PM
-KY
MM
EN
EO
YJ
(100
0);C
HR
IST
IAN
DIO
R(3
000)
;M
ET
RO
AG
(500
0);
RYA
NA
IRH
OLD
ING
SP
LC
(500
0);M
ET
SOO
YJ
(200
0);
•#
6:SA
NO
FI
(400
0);U
NIL
EV
ER
NV
-CVA
(300
0);
•#
7:B
NP
PA
RIB
AS
(800
0);IN
GG
RO
EP
NV
-CVA
(800
0);SO
CIE
TE
GE
NE
RA
LE
(800
0);
•#
8:B
AY
ER
AG
-RE
G(1
000)
;K
ON
INK
LIJ
KE
AH
OLD
NV
(500
0);A
LST
OM
(200
0);
•#
9:D
AIM
LE
RA
G-R
EG
IST
ER
ED
SHA
RE
S(3
000)
;BAY
ER
ISC
HE
MO
TO
RE
NW
ER
KE
AG
(300
0);V
OLK
SWA
GE
NA
G-P
FD
(300
0);M
ICH
ELIN
(CG
DE
)-B
(300
0);M
AN
SE(2
000)
;R
EN
AU
LTSA
(300
0);P
OR
SCH
EA
UT
OM
OB
ILH
LD
G-P
FD
(300
0);FIA
TSP
A(3
000)
;
•#
10:
ALLIA
NZ
SE-R
EG
(800
0);M
UE
NC
HE
NE
RR
UE
CK
VE
RA
G-R
EG
(800
0);C
OM
ME
RZB
AN
KA
G(8
000)
;A
EG
ON
NV
(800
0);
The
next
tabl
ein
dica
tes
som
est
ocks
whi
char
eno
tex
plai
ned
byth
eir
sect
orin
dex.
29
Nonnegative Matrix Factorization and Clustering Applications
Tab
le15
:B
2co
effici
ents
Fact
ors
Nam
eSe
ctor
0001
1000
2000
3000
4000
5000
6000
7000
8000
9000
ALL
IAN
ZFin
anci
als
0.24
0.78
SAP
Tec
hnol
ogy
0.08
0.12
0.12
0.39
0.30
EN
EL
Uti
litie
s0.66
0.34
VIV
EN
DI
Con
sum
erSe
rvic
es0.03
0.91
VIN
CI
Indu
stri
als
1.17
0.02
SAIN
T-G
OB
AIN
Indu
stri
als
0.48
0.01
0.47
VO
LKSW
AG
EN
Con
sum
erG
oods
1.08
MU
EN
CH
EN
ER
RU
EC
KV
ER
Fin
anci
als
0.86
0.22
UN
IBA
IL-R
OD
AM
CO
Fin
anci
als
1.19
PE
RN
OD
-RIC
AR
DC
onsu
mer
Goo
ds0.12
1.08
ESS
ILO
RIN
TE
RN
AT
ION
AL
Hea
lth
Car
e1.13
TH
YSS
EN
KR
UP
PIn
dust
rial
s0.82
0.09
MIC
HE
LIN
Con
sum
erG
oods
0.28
0.68
0.07
CR
HIn
dust
rial
s0.61
0.04
0.08
0.26
AD
IDA
SC
onsu
mer
Goo
ds0.10
1.01
FORT
UM
Uti
litie
s1.25
TE
LEC
OM
ITA
LIA
Tel
ecom
mun
icat
ions
1.02
FR
ESE
NIU
SM
ED
ICA
LC
AR
EH
ealt
hC
are
0.86
0.17
MA
NIn
dust
rial
s1.02
SAIP
EM
Oil
&G
as1.27
VA
LLO
UR
EC
Indu
stri
als
1.37
ALS
TO
MIn
dust
rial
s0.79
LAFA
RG
EIn
dust
rial
s0.48
0.28
0.20
FR
ESE
NIU
SSE
&C
OH
ealt
hC
are
0.99
0.08
0.01
AE
GO
NFin
anci
als
0.89
We
have
the
follo
win
gco
rres
pond
ance
betw
een
code
san
dse
ctor
s:00
01=
Oil
&G
as,1
000
=B
asic
Mat
eria
ls,2
000
=In
dust
rial
s,30
00=
Con
sum
erG
oods
,40
00=
Hea
lth
Car
e,50
00=
Con
sum
erSe
rvic
es,60
00=
Tel
ecom
mun
icat
ions
,70
00=
Uti
litie
s,80
00=
Fin
anci
als,
9000
=Tec
hnol
ogy.
30
Nonnegative Matrix Factorization and Clustering Applications
Her
ear
eth
ecl
uste
rsob
tain
edw
ith
the
NM
Fcl
assi
fier
(NN
DSV
Dar,n
=5)
:
•#
1:T
OTA
LSA
(1);
BA
SFSE
(100
0);
BN
PPA
RIB
AS
(800
0);
DA
NO
NE
(300
0);
AIR
LIQ
UID
ESA
(100
0);
UN
IBA
IL-R
OD
AM
CO
SE(8
000)
;E
SSIL
OR
INT
ER
NAT
ION
AL
(400
0);
MIC
HE
LIN
(CG
DE
)-B
(300
0);
CR
HP
LC
(200
0);
AD
IDA
SA
G(3
000)
;H
EIN
EK
EN
NV
(300
0);
HE
NK
EL
AG
&C
OK
GA
AV
OR
ZU
G(3
000)
;K
ON
INK
LIJ
KE
DSM
NV
(100
0);
SOLV
AY
SA(1
000)
;A
BE
RT
ISIN
FR
AE
STR
UC
TU
RA
SSA
(200
0);
GR
OU
PE
BR
UX
ELLE
SLA
MB
ERT
SA(8
000)
;A
TLA
NT
IASP
A(2
000)
;
•#
2:D
EU
TSC
HE
TE
LE
KO
MA
G-R
EG
(600
0);
FR
AN
CE
TE
LE
CO
MSA
(600
0);
VIV
EN
DI
(500
0);
AR
CE
LO
RM
ITTA
L(1
000)
;K
ON
INK
LIJ
KE
KP
NN
V(6
000)
;SA
FR
AN
SA(2
000)
;C
AP
GE
MIN
I(9
000)
;
•#
3:B
AN
CO
SAN
TA
ND
ER
SA(8
000)
;B
AY
ER
AG
-RE
G(1
000)
;SC
HN
EID
ER
ELE
CT
RIC
SA(2
000)
;IN
TE
SASA
NPA
OLO
(800
0);
RE
PSO
LY
PF
SA(1
);T
HY
SSE
NK
RU
PP
AG
(200
0);F
RE
SEN
IUS
ME
DIC
AL
CA
RE
AG
&(4
000)
;BO
UY
GU
ES
SA(2
000)
;FR
ESE
NIU
SSE
&C
OK
GA
A(4
000)
;TE
CH
NIP
SA(1
);E
DP
-EN
ER
GIA
SD
EP
ORT
UG
AL
SA(7
000)
;M
ET
RO
AG
(500
0);D
ELH
AIZ
EG
RO
UP
(500
0);M
ER
CK
KG
AA
(400
0);
•#
4:SI
EM
EN
SA
G-R
EG
(200
0);T
ELE
FO
NIC
ASA
(600
0);SA
PA
G(9
000)
;LV
MH
MO
ET
HE
NN
ESS
YLO
UIS
VU
I(3
000)
;K
ON
INK
LIJ
KE
PH
ILIP
SE
LE
C-
TR
ON
(300
0);A
SML
HO
LD
ING
NV
(900
0);T
ELE
CO
MIT
ALIA
SPA
(600
0);C
HR
IST
IAN
DIO
R(3
000)
;PU
BLIC
ISG
RO
UP
E(5
000)
;PO
RT
UG
AL
TE
LE
CO
MSG
PS
SA-R
EG
(600
0);
•#
5:E
.ON
AG
(700
0);IB
ER
DR
OLA
SA(7
000)
;V
INC
ISA
(200
0);LIN
DE
AG
(100
0);V
OLK
SWA
GE
NA
G-P
FD
(300
0);RW
EA
G(7
000)
;SA
MP
OO
YJ-
ASH
S(8
000)
;FO
RT
UM
OY
J(7
000)
;SA
IPE
MSP
A(1
);K
+S
AG
(100
0);
•#
6:SA
NO
FI
(400
0);
EN
ISP
A(1
);U
NIL
EV
ER
NV
-CVA
(300
0);
L’O
RE
AL
(300
0);
BAY
ER
ISC
HE
MO
TO
RE
NW
ER
KE
AG
(300
0);
PE
RN
OD
-RIC
AR
DSA
(300
0);R
EE
DE
LSE
VIE
RN
V(5
000)
;U
PM
-KY
MM
EN
EO
YJ
(100
0);RYA
NA
IRH
OLD
ING
SP
LC
(500
0);
•#
7:SO
CIE
TE
GE
NE
RA
LE
(800
0);U
NIC
RE
DIT
SPA
(800
0);C
OM
PA
GN
IED
ESA
INT
-GO
BA
IN(2
000)
;LA
FAR
GE
SA(2
000)
;RE
NA
ULT
SA(3
000)
;ER
STE
GR
OU
PB
AN
KA
G(8
000)
;P
OR
SCH
EA
UT
OM
OB
ILH
LD
G-P
FD
(300
0);
•#
8:M
AN
SE(2
000)
;VA
LLO
UR
EC
(200
0);H
EID
ELB
ER
GC
EM
EN
TA
G(2
000)
;M
ET
SOO
YJ
(200
0);
•#
9:D
AIM
LE
RA
G-R
EG
IST
ER
ED
SHA
RE
S(3
000)
;DE
UT
SCH
EB
AN
KA
G-R
EG
IST
ER
ED
(800
0);B
AN
CO
BIL
BA
OV
IZC
AYA
AR
GE
NTA
(800
0);E
NE
LSP
A(7
000)
;ASS
ICU
RA
ZIO
NIG
EN
ER
ALI(
8000
);C
AR
RE
FO
UR
SA(5
000)
;MU
EN
CH
EN
ER
RU
EC
KV
ER
AG
-RE
G(8
000)
;AK
ZO
NO
BE
L(1
000)
;KO
NIN
KLIJ
KE
AH
OLD
NV
(500
0);P
PR
(500
0);SO
DE
XO
(500
0);ST
MIC
RO
ELE
CT
RO
NIC
SN
V(9
000)
;
•#
10:
ALLIA
NZ
SE-R
EG
(800
0);
ING
GR
OE
PN
V-C
VA
(800
0);
AX
ASA
(800
0);
NO
KIA
OY
J(9
000)
;C
OM
ME
RZB
AN
KA
G(8
000)
;A
LST
OM
(200
0);
ALC
AT
EL-L
UC
EN
T(9
000)
;A
EG
ON
NV
(800
0);FIA
TSP
A(3
000)
;N
AT
ION
AL
BA
NK
OF
GR
EE
CE
(800
0);
31