Post on 22-Aug-2020
Provable Deterministic Leverage Score Sampling
Dimitris Papailiopoulos (UC Berkeley)Anastasios Kyrillidis (EPFL)
Christos Boutsidis (Yahoo Labs)
KDD
New York, New York
August 27th, 2014
Singular Value Decomposition
m × n matrix A
k < ρ = rank(A)
Low-rank matrix approximation problem:
minX∈Rm×n,rank(X)≤k
||A − X||F
Singular Value Decomposition (SVD):
A = U · Σ · VT =(
Uk Uρ−k)︸ ︷︷ ︸
m×ρ
(Σk 00 Σρ−k
)︸ ︷︷ ︸
ρ×ρ
(VT
k
VTρ−k
)︸ ︷︷ ︸
ρ×n
Uk ∈ Rm×k , Σk ∈ Rk×k , and Vk ∈ Rn×k
Solution via Eckart-Young Theorem
Ak = Uk Σk VTk = AVk VT
k . O(mn min{m,n}) time
The Column Subset Selection Problem (CSSP)
Definition
Let A ∈ Rm×n and let c < n be a sampling parameter. Find ccolumns of A – denoted as C ∈ Rm×c – that minimize
‖A − CC†A‖F or ‖A − CC†A‖2,
where C† denotes the Moore-Penrose pseudo-inverse.
CSSP gives a low-rank matrix factorization to A (X = C†A): A
=
C
( X)
+
E
Motivation
Consider applying this to date-by-stock matrices.
Returns the most important stocks in the portfolio.
Interpretable matrix decompositions in general.
Prior work on CSSP
c ‖A − CC†A‖2F ≤ Running time
1 k/ε2 ‖A − Ak‖2F + ε‖A‖2
F nnz(A)2 (k log k)/ε2 (1 + ε)‖A − Ak‖2
F mn2
3 (k log k)/ε2 (1 + ε)‖A − Ak‖2F mnk2 log k
4 k/ε (1 + ε)‖A − Ak‖2F mnk/ε
5 k/ε (1 + ε)‖A − Ak‖2F m3nk/ε
References:1 Frieze, Kannan, Vempala. FOCS. 2003.
2 Drineas, Mahoney, and Muthukrishnan. RANDOM, 2006.
3 Deshpande, Rademacher, Vempala, Wang. SODA, 2006.
4 Boutsidis, Drineas, Magdon-Ismail. FOCS, 2011.
5 Guruswami, Sinop. SODA, 2012
There are more results in the linear algebra literature focusing on the spectral norm version of the CSSP.
Leverage scores and randomized samplingDrineas, Mahoney, and Muthukrishnan. RANDOM, 2006.
Definition
[Leverage scores] Let Vk ∈ Rn×k contain the top k right singularvectors of an m × n matrix A with rank ρ = rank(A) ≥ k . Then,the (rank-k ) leverage score of the i-th column of A is defined as
`(k)i = ‖[Vk ]i,:‖22, i = 1,2, . . . ,n.
For a target rank k < rank(A), define a probabilitydistribution over the columns of A, pi = `
(k)i /k ;
In c independent and identically distributed passes,sample with replacement c columns from AFor c = O(k log k/ε2) and with constant probability:‖A − CC†A‖F ≤ (1 + ε) ‖A − Ak‖F.
Deterministic leverage score sampling[Jollife, 1972]
Compute the leverage scores of A w.r.t. some k .
Pick the c columns with the largest leverage scores.
Nice empirical results.
No theoretical analysis.
Contribution of this talk: theoretical analysis of deterministicleverage scores sampling.
Deterministic leverage score sampling[revisited]
Input: A ∈ Rm×n, k , θ (0 < θ < 1)- ComputeVk ∈Rn×k (via SVD).- Compute the leverage scores:for i = 1,2, . . . ,n`(k)i =
∥∥[Vk ]i,:∥∥2
2end forWithout loss of generality, let `(k)i ’s be sorted:
`(k)1 ≥ · · · ≥ `(k)i ≥ `(k)i+1 ≥ · · · ≥ `
(k)n .
Find index c ∈ {1, . . . ,n} such that:
c = argminc
(c∑
i=1
`(k)i > θ
).
If c < k , set c = k .Output: C ∈ Rm×c containing the first c columns of A.
Main result
Theorem
Letθ = k − ε,
for some ε ∈ (0,1). Then, for ξ = {2,F}, we have
‖A − CC†A‖2ξ < (1 + ε) · ‖A − Ak‖2ξ .
Weak result if the leverage scores are almost uniform.
Main result: leverage scores following a power law
Theorem
Let the leverage scores follow a power-law decay with exponentαk = 1 + η, for η > 0:
`(k)i =
`(k)1iαk
.
Let θ = k − ε. Then,
c =
(2kε
) 11+η
and‖A − CC†A‖2ξ < (1 + ε) · ‖A − Ak‖2ξ .
Is power law a realistic assumption?
Test leverage scores of large graphs.
Show leverage scores follow power law decays.
Power law is a realistic assumption
1 200 400 600 800 100010−5
100
α 1 0 = 1 .45
amazon
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .5
citeseer
1 200 400 600 800 100010−10
10−5
100
α 1 0 = 1 .7
foursquare
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .13
github
1 200 400 600 800 100010−5
100
105
α 1 0 = 2
gnutella
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .6
1 200 400 600 800 100010−4
10−2
100
α 1 0 = 0 .9
gowalla
1 200 400 600 800 100010−3
10−2
10−1
α 1 0 = 0 .2
livejournal
1 200 400 600 800 100010−4
10−2
100
α 1 0 = 0 .9
slashdot
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .6
nips
1 200 400 600 800 100010−4
10−3
10−2
α 1 0 = 0 .2
skitter
1 200 400 600 800 1000
10−3.6
10−3.3α 1 0 = 0 .12
slice
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .58
cora
1 200 400 600 800 100010−10
100
1010
α 1 0 = 4
writers
1 200 400 600 800 100010−5
100
105
α 1 0 = 1 .75
youtube groups
1 200 400 600 800 100010−4
10−2
100
α 1 0 = 0 .5
youtube
k = 10Show decay of leverage scores logarithmic scalePlot a fitting power-law curve β · x−αk .True leverage scores are plotted with a red× marker.The fitted curves are denoted with a solid blue line.
Power-law decaying leverage scores
5 5000
0.5
1
1.5
∥A−CC
† A∥2 2
∥A−A
k∥2 2
c
c =10
k = 5
10 5000
0.5
1
1.5
c
c =38
k = 10
50 5000
1
2
c
c =97
k = 50
100 5000
2
4
6
c
c =152
k = 100
5 5000
0.5
1
1.5
∥A−CC
† A∥2 2
∥A−A
k∥2 2
c
c =7
10 5000
0.5
1
1.5
c
c =11
50 5000
1
2
c
c =88
100 5000
2
4
6
c
c =129
↵k
=0.
5↵
k=
1.5
m = 200, n = 1000.k = 5, 10, 50, 100.c = 1, 2, ..., 1000.αk = 0.5 and αk = 1.5.
Blue curve is the relative error ratio ‖A − CC†A‖22/‖A − Ak‖2
2The vertical cyan line corresponds to the point where k = cThe vertical magenta line indicates the point where the c sampled columns offer a better approximationcompared to the best rank-k matrix Ak
Nearly-uniform leverage scores
5 500 10000
0.5
1
1.5
∥A−CC
† A∥2 2
∥A−A
k∥2 2
c
c =473
k = 5
10 500 10000
0.5
1
1.5
c
c =404
k = 10
50 500 10000
0.5
1
1.5
2
c
c =629
k = 50
100 500 10000
2
4
6
c
c =630
k = 100
m = 200, n = 1000.
k = 5, 10, 50, 100.
c = 1, 2, ..., 1000.
Blue curve is the relative error ratio ‖A − CC†A‖22/‖A − Ak‖2
2
The leftmost vertical cyan line corresponds to the point where k = c.
The rightmost vertical magenta line indicates the point where the c sampled columns offer as good anapproximation as that of the best rank-k matrix Ak
Conclusions
The Column Subset Selection Problemapproach: sampling w.r.t the leverage scores.
Randomized leverage scores sampling
theory: strong results [Drineas et al, 2008].practice: strong performance
Deterministic leverage scores sampling
theory: good performance if leverage scores follow apower law decay.
practice: many real data exhibit leverage scores withpower law decays.