Low-rank matrix approximations in Python by Christian Thurau PyData 2014
-
Upload
pydata -
Category
Data & Analytics
-
view
138 -
download
1
description
Transcript of Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations with Python
Christian Thurau
Table of Contents
1 Intro
2 The Basics
3 Matrix approximation
4 Some methods
5 Matrix Factorization with Python
6 Example & Conclusion
2
For Starters...
Observations
• Data matrix factorization has become an important tool ininformation retrieval, data mining, and pattern recognition
• Nowadays, typical data matrices are HUGE
• Examples include:• Gene expression data and microarrays• Digital images• Term by document matrices• User ratings for movies, products, ...• Graph adjacency matrices
3
Matrix Factorization
• given a matrix
V
• determine matrices
W and H
• such that
V = WH or V ≈ WH
• characteristics such as entries, shape, rank of V ,W , and H willdepend on application context
4
The Basics
matrix factorization allows for:
• solving linear equations
• transforming data
• compressing data
matrix factorization facilitates subsequent processing in:
• information retrieval
• pattern recognition
• data mining
5
Low-rank Matrix Approximations
• Aapproximate V
V ≈ WH
• where
V ∈ Rm×n
W ∈ Rm×k
H ∈ Rk×n
• and
rank(W ) ≪ rank(V )
k ≪ min(m, n)
V
=
W H
6
Matrix Approximation
• If
V = WH
• then
vi ,j = wi ,∗h∗,j
=k∑
x=1
wi ,xhx ,j
V
=
W H
7
Matrix Approximation
• More importantly:
v∗,j = Wh∗,j
=k∑
x=1
w∗,xhx ,j
• therefore
W ↔ ”basis” matrix
H ↔ coefficient matrix
V
=
W H
= + +
8
On Matrix Factorization Methods
• matrix factorization ↔ data transformation
• matrix rank reduction ↔ data compression
• Common form: V = WH• Broad range of methods:
• K-means clustering• SVD/PCA• Non-negative Matrix Factorization• Archetypal Analysis• Binary matrix factorization• CUR decomposition• ...
• Each method yields a unique view on data . . .
• . . . and is suited for different tasks
9
K-means Clustering1
• Baseline clustering method
• Constrained quadradic optimization problem:
minW ,H
∥V − WH∥2
s.t. H = [0; 1],∑k
hk,i = 1
• Find W ,H using expectation maximization
• Optimal k-means partitioning is np-hard
• Goal: group similar data points
• Interesting: K-means clustering is matrix factorization
1J.B. MacQueen, Some Methods for classification and Analysis of MultivariateObservations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967
10
K-means Clustering is Matrix Factorization!
x1,1 x1,2 x1,3 . . . x1,nx2,1 x2,2 x2,3 . . . x2,nx3,1 x3,2 x3,3 . . . x3,n...
......
. . ....
xm,1 xm,2 xm,3 . . . xm,n
b1,1 b1,2 b1,3b2,1 b2,2 b2,3b3,1 b3,2 b2,3...
......
bn,1 bn,2 bn,3
0 1 1 . . . 01 0 0 . . . 00 0 0 . . . 1
• i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, theproduct
XBA = MA
realizes an assignment
xi → mj , where mj = Xbj
11
Example: K-means
≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 =
• Similar images are grouped into k groups
• Approximate data by mapping each data point onto the mean of acluster regions
12
Python Matrix Factorization Toolbox (PyMF)2
• Started in 2010 at Fraunhofer IAIS/University of Bonn
• Vast number of different methods!
• Supports hdf5/h5py and sparse matrices
How to factorize a data matrix V :
>>>import pymf
>>>import numpy as np
>>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])
>>>mdl = pymf.kmeans.Kmeans(data, num_bases=2)
>>>mdl.factorize(niter=10) # optimize for WH>>>V_approx = np.dot(mdl.W, mdl.H) # V = WH
2http://github.com/cthurau/pymf13
Python Matrix Factorization Toolbox (PyMF)2
• Restarted development a few weeks back ;)
• Looking for contributors!
How to map data onto W :
>>>import pymf
>>>import numpy as np
>>>test_data = np.array([[1.0], [0.3]])
>>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2)
>>>mdl_test.W = mdl.W # mdl.W -> existing basis W>>>mdl_test.factorize(compute_w=False)
>>>test_datx_approx = np.dot(mdl.W, mdl_test.H)
2http://github.com/cthurau/pymf14
PCA
Principal Component Analysis (PCA)3
• SVD/PCA are baseline matrix factorization methods
• Optimize:
minW ,H
∥V − WH∥2
s.t. W TW = I
• Restrict W to singular vectors of V (orthogonal matrix)
• Can (usually does) violate non-negativity
• Goal: best possible matrix approximation for a given k
• Great for compression or filtering out noise!
3K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space,Philosophical Magazine, 1901.
15
Example PCA
>>>from pymf.pca import PCA
>>>import numpy as np
>>>mdl = PCA(data, num_bases=2)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Usage for data analysis questionable
• Basis vectors usually not interpretable
V
≈
Vapprox
W = . . .
16
Non-negative Matrix Factorization4
• For V ≥ 0 constrained quadradic optimization problem:
minW ,H
∥V − WH∥2
s.t. W ≥ 0
H ≥ 0
• a globally optimal solution provably exists; algorithms guaranteed tofind it remain elusive; exact NMF is NP hard
• Often W converges to partial representations
• Active area of research
• Goal: reconstruct data by independent parts
4D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative MatrixFactorization, Nature, 401(6755), 1999
17
Example NMF
>>>from pymf.nmf import NMF
>>>import numpy as np
>>>mdl = NMF(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Additive combination of parts
• Interesting options for data analysis
V
≈
Vapprox
W = . . .
18
Archetypal Analysis5
• Convexity constrained quadratic optmization problem:
minW ,H
∥V − VWH∥2
s.t. wl ,i ≥ 0,∑l
wl ,i = 1
hk,i ≥ 0,∑k
hk,i = 1
• Reconstruct data by its archetypes, i.e. convex combinations of polaropposites
• Yields novel and intuitive insights into data
• Great for interpretable data representations!
• O(n2), but: efficient approximations for large data exist5A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994
19
Example Archetypal Analysis
>>>from pymf.aa import AA
>>>import numpy as np
>>>mdl = AA(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Existent data points as basis vectors
• Convex combination allows aprobablilist interpretation
V
≈
Vapprox
W = . . .
20
Method Summary
• Common form: V = WH (or V = VWH)
W constraint H constraint Outcome
PCA - - compressed VK-means - H = [0; 1],
∑k hk,i = 1 groups
NMF W ≥ 0 H ≥ 0 partsAA W ≥ 0,
∑l wl,i = 1 H ≥ 0,
∑k hk,i = 1 opposites
• Doesn’t only work for images ;)
• More complex constraints usually result in more complex solvers
• Active area of research deals with approximations for large data
21
Large matrices: PyMF and h5py
>>> import h5py
>>> import numpy as np
>>> from pymf.sivm import SIVM # uses [6]
>>> file = h5py.File(’myfile.hdf5’, ’w’)
>>> file[’dataset’] = np.random.random((100,1000))
>>> file[’W’] = np.random.random((100,10))
>>> file[’H’] = np.random.random((10,1000))
>>> sivm_mdl = SIVM(file[’dataset’], num_bases=10)
>>> sivm_mdl.W = file[’W’]
>>> sivm_mdl.H = file[’H’]
>>> sivm_mdl.factorize()
6Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptiveweb scale matrix factorization”, CIKM’2010
22
7Science, 2010: Vol. 330
Take Home Message
• Most clustering, and data analysis methods are matrixapproximations
• Imposed constraints shape the factorization
• Imposed constraints yield different views on data
• One of the most effective and versatile tools for data exploration!
• Python implementation → http://github.com/cthurau/pymf
24
Thank you for your attention!