Principal component analysis and matrix factorizations for learning (part 3) ding - icml tutorial...

23
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 100 Part 3. Nonnegative Matrix Factorization K-means and Spectral Clustering

Transcript of Principal component analysis and matrix factorizations for learning (part 3) ding - icml tutorial...

Page 1: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 100

Part 3.Nonnegative Matrix Factorization ⇔

K-means and Spectral Clustering

Page 2: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 101

Nonnegative Matrix Factorization (NMF)

),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

TFGX ≈Decomposition (low-rank approximation)

Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX

),,,( 21 kgggG L=),,,( 21 kfffF L=

is an image, document, webpage, etc

ix

Page 3: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 102

Some historical notes

• Earlier work by statistics people• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)

– Parts of whole (no cancellation)– A multiplicative update algorithm

Page 4: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 103

0 0050 710

080 20 0

.

.

.

.

.

.

.

M

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

Pixel vector

Page 5: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 104

),,,( 21 kfffF L= ),,,( 21 kgggG L=

TFGX ≈

),,,( 21 nxxxX L=Parts-based perspective

Page 6: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 105

TFGX ≈ ),,,( 21 kfffF L=

Sparsify F to get parts-based picture

Li, et al, 200; Hoyer 2003

Page 7: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 106

Theorem.NMF = kernel K-means clustering

NMF produces holistic modeling of the dataTheoretical results and experiments verification

(Ding, He, Simon, 2005)

Page 8: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 107

Our Results: NMF = Data Clustering

Page 9: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 108

Our Results: NMF = Data Clustering

Page 10: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 109

Theorem: K-means = NMF

• Reformulate K-means and Kernel K-means

• Show equivalence

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

Page 11: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 110

NMF = K-means

22

0,||||)](-2Tr||W[||minarg HHWHH TT

HIHHT+=

≥=

)(Trmaxarg0,

WHHH T

HIHHT ≥==

)]([-2Trminarg0,

WHH T

HIHHT ≥==

2

0,||||minarg T

HIHHHHW

T−=

≥=2

0||||minarg T

HHHW −⇒

Page 12: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 111

Spectral Clustering = NMF

Normalized Cut ⇒

Unsigned cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:

}||||/)00,11,00( 2/12/1

kT

n

k hDDyk

LLL=

IYYYW T =tosubject :Optimize ),~YTY

Tr(max

2/12/1~ −−= WDDW

2

0,||~||min T

HIHHHHW

T−

≥=

kTk

kTk

T

T

DhhhWDh

DhhhWDh )()(

11

11 −++−= L

(Gu , et al, 2001)

∑∑ −=⎟⎟⎠

⎞⎜⎜⎝

⎛+=

>< k k

kk

lk l

lk

k

lk

dC,GCs

d,CCs

d,CCsJ )()()(

,NcutNormalized Cut:

Page 13: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 112

Advantages of NMF over standard K-means

Soft clustering

Page 14: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 113

Experiments on Internet Newsgroups

NG2: comp.graphicsNG9: rec.motorcyclesNG10: rec.sport.baseballNG15: sci.spaceNG18: talk.politics.mideast

100 articles from each group. 1000 wordsTf.idf weight. Cosine similarity

Accuracy of clustering resultscosine similarity

0.7110.6970.6520.632

0.6080.576

0.5900.491

0.6120.531

W=HH’K-means

Page 15: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 114

Summary for Symmetric NMF

• K-means , Kernel K-means• Spectral clustering

• Equivalence to

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

Page 16: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 115

Nonsymmetric NMF

• K-means , Kernel K-means• Spectral clustering

• Equivalence to

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

Page 17: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 116

Non-symmetric NMFRectangular Data Matrix

Bipartite Graph

• Information Retrieval: word-to-document• DNA gene expressions• Image pixels• Supermarket transaction data

Page 18: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 117

K-means Clustering of Bipartite Graphs

)(Tr||||

)(

1

, BGFCR

BsJ T

k

j jj

CRKmeans

jj == ∑=

Simultaneous clustering of rows and columns

cut

Ffff ==

⎥⎥⎥⎥

⎢⎢⎢⎢

),,(

1100

0011

321

Gggg ==

⎥⎥⎥⎥

⎢⎢⎢⎢

),,(

1010

0101

321

row indicators

column indicators

Page 19: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 118

NMF = K-means clustering

)(Trmaxarg0,

0,BGFH T

FIFFGIGTG

T

≥=≥=

=

)2(Trminarg0,

0,BGFT

FIFFGIGTG

T−=

≥=≥=

)2||(||Trminarg 2

0,0,

GFGFBGFB TTT

FIFFGIGTG

T+−=

≥=≥=

2

0,||||minarg

0,

T

FIFFFGB

GIGTG

T−=

≥=≥=

Page 20: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 119

Solving NMF with non-negative least square

Fix F, solve for G; Fix G, solve for F

Iterate, converge to a local minima

0,0,|||| 2 ≥≥−= GFFGXJ T

∑= ⎟

⎟⎟

⎜⎜⎜

⎛=−=

n

in

ii

g

gGgFxJ

1

12

~

~

,||~|| M

Page 21: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 120

Solving NMF with multiplicative updating

Fix F, solve for G; Fix G, solve for F

Lee & Seung ( 2000) propose

0,0,|||| 2 ≥≥−= GFFGXJ T

jkT

jkT

jkjk FGFFX

GG)(

)(←

ikT

ikikik GFG

XGFF)(

)(←

Page 22: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 121

Symmetric NMF

Complementarity slackness condition

Constraint Optimization. KKT 1st condition

0,|||| 2 ≥−= HHHWJ T

ikikikik H

JHH∂∂−← ε

))(

)(1(ik

Tik

ikik HHHWHHH ββ +−←

ikikT

ikik

HHHHWHHHJ )44(0 +−=⎟⎠⎞

⎜⎝⎛

∂∂=

ikTik

ik HHHH

)(4=ε

Gradient decent

Page 23: Principal component analysis and matrix factorizations for learning (part 3)   ding - icml tutorial 2005 - 2005

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 122

Summary

• NMF is a new low-rank approximation• The holistic picture (vs. parts-based)• NMF is equivalent to spectral clustering• Main advantage: soft clustering