Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

80
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte

Transcript of Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Page 1: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral Clustering

Jianping Fan

Dept of Computer Science

UNC, Charlotte

Page 2: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

2

Lecture Outline

Motivation Graph overview and construction Spectral Clustering Cool implementations

Page 3: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

3

Semantic interpretations of clusters

Page 4: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

4

Spectral Clustering Example – 2 Spirals

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Dataset exhibits Dataset exhibits complex cluster shapescomplex cluster shapes

K-meansK-means performs very performs very poorly in this space due poorly in this space due bias toward dense bias toward dense spherical clusters.spherical clusters.

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the embedded space In the embedded space given by two leading given by two leading eigenvectors, clusters eigenvectors, clusters are trivial to separate.are trivial to separate.

Page 5: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Original Points K-means (2 Clusters)

Spectral Clustering Example

Why k-means fail for these two examples?

Geometry vs. Manifold

Page 6: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

6

Lecture Outline

Motivation Graph overview and construction Spectral Clustering Cool implementation

Page 7: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

7

Graph-based Representation of Data Similarity

Page 8: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

8

Graph-based Representation of Data Similarity

similarity

Page 9: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

9

Graph-based Representation of Data Relationship

Page 10: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

10

Manifold

Page 11: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

11

Graph-based Representation of Data Relationships

Manifold

Page 12: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

12

Graph-based Representation of Data Relationships

Page 13: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

13Data Graph Construction

Page 14: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

14

Graph-based Representation of Data Relationships

Page 15: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

15

Graph-based Representation of Data Relationships

Page 16: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

16

Page 17: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

17

Graph-based Representation of Data Relationships

Page 18: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

18

Graph-based Representation of Data Relationships

Page 19: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

19

Graph Cut

Page 20: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

20

Lecture Outline

Motivation Graph overview and construction Spectral Clustering Cool implementations

Page 21: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

21

Graph-based Representation of Data Relationships

Page 22: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

22

Page 23: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

23

Graph Cut

Page 24: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

24

Page 25: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

25

Page 26: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

26

Page 27: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

27

Page 28: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

28

Graph-based Representation of Data Relationships

Page 29: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

29

Graph Cut

Page 30: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

30

Page 31: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

31

Page 32: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

32

Page 33: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

33

Page 34: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

34

Eigenvectors & Eigenvalues

Page 35: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

35

Page 36: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

36

Page 37: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

37

Normalized Cut

A graph G(V, E) can be partitioned into two disjoint sets A, B

Optimal partition of the graph G is achieved by minimizing the cut

Cut is defined as :

Min) (

Page 38: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

38

Normalized Cut

Normalized Cut

Association between partition set and whole graph

Page 39: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

39

Normalized Cut

Page 40: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

40

Normalized Cut

Page 41: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

41

Normalized Cut

Page 42: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

42

Normalized Cut

Normalized Cut becomes

Normalized cut can be solved by eigenvalue equation:

Page 43: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

43

K-way Min-Max Cut

Intra-cluster similarity

Inter-cluster similarity

Decision function for spectral clustering

Page 44: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

44

Mathematical Description of Spectral Clustering

Refined decision function for spectral clustering

We can further define:

Page 45: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

45

Refined decision function for spectral clustering

This decision function can be solved as

Page 46: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

46

Spectral Clustering Algorithm Ng, Jordan, and Weiss

Motivation Given a set of points

We would like to cluster them into k subsets

1,...,l

nS s s R

Page 47: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

47

Algorithm

Form the affinity matrix Define if

Scaling parameter chosen by user

Define D a diagonal matrix whose

(i,i) element is the sum of A’s row i

nxnW Ri j

0iiW

2 2|| || / 2i js s

ijW e

Page 48: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

48

Algorithm

Form the matrix

Find , the k largest eigenvectors of L These form the the columns of the new

matrix X Note: have reduced dimension from nxn to nxk

1/ 2 1/ 2L D WD

1 2, ,..., kx x x

Page 49: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

49

Algorithm

Form the matrix Y Renormalize each of X’s rows to have unit length Y

Treat each row of Y as a point in Cluster into k clusters via K-means

2 2/( )ij ij ijj

Y X X kR

nxkR

Page 50: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

50

Algorithm

Final Cluster Assignment Assign point to cluster j iff row i of Y was

assigned to cluster jis

Page 51: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

51

Why?

If we eventually use K-means, why not just apply K-means to the original data?

This method allows us to cluster non-convex regions

Page 52: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

52

Some Examples

Page 53: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

53

Page 54: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

54

Page 55: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

55

Page 56: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

56

Page 57: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

57

Page 58: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

58

Page 59: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

59

Page 60: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

60

Page 61: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

61

User’s Prerogative

Affinity matrix construction Choice of scaling factor

Realistically, search over and pick value that gives the tightest clusters

Choice of k, the number of clusters Choice of clustering method

2

Page 62: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

62

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

K

Eig

enva

lue

Largest Largest eigenvalueseigenvalues

of Cisi/Medline of Cisi/Medline datadata

λ1

λ2

How to select k? Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that

maximises the expression

1k k k

Choose Choose k=2k=2

12max k

Page 63: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

63

Recap – The bottom line

Page 64: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

64

Summary

Spectral clustering can help us in hard clustering problems

The technique is simple to understand The solution comes from solving a simple

algebra problem which is not hard to implement

Great care should be taken in choosing the “starting conditions”

Page 65: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 66: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 67: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 68: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 69: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 70: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 71: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 72: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 73: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 74: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 75: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 76: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 77: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Spectral ClusteringSpectral Clustering

Page 78: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Page 79: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Page 80: Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.