Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image...

40
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND

Transcript of Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image...

Page 1: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Cut-based & divisive clustering

Clustering algorithms: Part 2b

Pasi Fränti

17.3.2014

Speech & Image Processing UnitSchool of Computing

University of Eastern FinlandJoensuu, FINLAND

Page 2: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Part I:Cut-based clustering

Page 3: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Cut-based clustering

• What is cut?

• Can we used graph theory in clustering?

• Is normalized-cut useful?

• Are cut-based algorithms efficient?

Page 4: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

• Clustering method = defines the problem

• Clustering algorithm = solves the problem

• Problem defined as cost function– Goodness of one cluster– Similarity vs. distance – Global vs. local cost function (what is “cut”)

• Solution: algorithm to solve the problem

Clustering method

Page 5: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

• Usually assumes graph

• Based on concept of cut

• Includes implicit assumptions which are often:– No difference than clustering in vector space– Implies sub-optimal heuristics– Sometimes even false assumptions!

Cut-based clustering

Page 6: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

• Minimum-spanning tree based clustering (single link)

• Split-and-merge (Lin&Chen TKDE 2005): Split the data set using K-means, then merge similar clusters based on Gaussian distribution cluster similarity.

• Split-and-merge (Li, Jiu, Cot, PR 2009): Splits data into a large number of subclusters, then remove and add prototypes until no change.

• DIVFRP (Zhong et al, PRL 2008): Dividing according to furthest point heuristic.

• Normalized-cut (Shi&Malik, PAMI-2000): Cut-based, minimizing the disassociation between the groups and maximizing the association within the groups.

• Ratio-Cut (Hagen&Kahng, 1992)

• Mcut (Ding et al, ICDM 2001)

• Max k-cut (Frieze&Jerrum 1997)

• Feng et al, PRL 2010. Particle Swarm Optimization for selecting the hyperplane.

Cut-based clustering methods

Details to be

added later…

Page 7: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Clustering a graph

But where we get this…?

Page 8: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Distance graph

Distance graph

37

3

2

6

5

74

2

5

34

7

Calculate from vector space!

Page 9: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Space complexity of graph

Distance graph Complete graph

N∙(N-1)/2 edges = O(N2)

37

3

2

6

5

74

2

5

34

7

But…

Page 10: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Minimum spanning tree (MST)

Distance graph MST

Works with simple examples like this

37

3

2

6

5

74

2

5

34

7 4

23

52

3

Page 11: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Cut

Graph cut Resulted clusters

Cost function is to maximize the weight of edges cut

This equals to minimizing the within cluster edge weights

Page 12: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Cut

Graph cut Resulted clusters

Equivalent to minimizing MSE!

Page 13: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Stopping criterionEnds up to a local minimum

21.2

21.3

21.4

21.5

21.6

21.7

21.8

50 60 70 80 90

Number of clusters

SC

RepeatedK-means

RLS

Divisive

Agglomerativ

e

Page 14: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Clustering method

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Number of cluster

MS

E

0

2

4

6

8

10

12

DB

I & F

-tes

t

MSEDBIF-test

Minimum point

Page 15: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Conclusions of “Cut”

• Cut Same as partition• Cut-based method Empty concept• Cut-based algorithm Same as divisive• Graph-based clustering Flawed concept• Clustering of graph more relevant topic

Page 16: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Part II:Divisive algorithms

Page 17: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Motivation

• Efficiency of divide-and-conquer approach• Hierarchy of clusters as a result• Useful when solving the number of clusters

Challenges• Design problem 1: What cluster to split?• Design problem 2: How to split?• Sub-optimal local optimization at best

Divisive approach

Page 18: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Split(X, M) C, P m 1; REPEAT

Select cluster to be split; Split the cluster; m m+1; UpdateDataStructures;

UNTIL m=M;

Split-based (divisive) clustering

Page 19: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

• Heuristic choices:– Cluster with highest variance (MSE)

– Cluster with most skew distribution (3rd moment)

• Optimal choice:– Tentatively split all clusters

– Select the one that decreases MSE most!

• Complexity of choice:– Heuristics take the time to compute the measure

– Optimal choice takes only twice (2) more time!!!

– The measures can be stored, and only two new clusters appear at each step to be calculated.

Select cluster to be split

Use this !

Page 20: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Selection example

8.2

6.5

7.5

4.3

11.2

11.6

Biggest MSE…

… but dividing this decreases MSE more

Page 21: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Selection example

4.1

6.3

Only two new values need to be calculated

7.5

8.2

4.3

6.5

11.6

Page 22: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

How to split

• Centroid methods:– Heuristic 1: Replace C by C- and C+– Heuristic 2: Two furthest vectors.– Heuristic 3: Two random vectors.

• Partition according to principal axis:

– Calculate principal axis– Select dividing point along the axis– Divide by a hyperplane– Calculate centroids of the two sub-clusters

Page 23: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Splitting along principal axispseudo code

Step 1: Calculate the principal axis.

Step 2: Select a dividing point.

Step 3: Divide the points by a hyper plane.

Step 4: Calculate centroids of the new clusters.

Page 24: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Example of dividing

Dividing hyper plane

Prin

cipa

l axi

s

Page 25: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Optimal dividing pointpseudo code of Step 2

Step 2.1: Calculate projections on the principal axis.

Step 2.2: Sort vectors according to the projection.

Step 2.3: FOR each vector xi DO:- Divide using xi as dividing point.- Calculate distortion of subsets D1 and D2.

Step 2.4: Choose point minimizing D1+D2.

Page 26: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Finding dividing point

2

22

22

11

1

11' ii vc

n

nvc

n

nDD

1'

1

111

n

vcnc i

1'

2

222

n

vcnc i

• Calculating error for next dividing point:

• Update centroids:

Can be done in O(1) time!!!

Page 27: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Sub-optimality of the split

optimal partitionboundary for2-level clustering

Page 28: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Example of splitting processD

ivid

ing

hype

r pl

ane

Principal axis

2 clusters 3 clusters

Page 29: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

4 clusters 5 clusters

Example of splitting process

Page 30: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

6 clusters 7 clusters

Example of splitting process

Page 31: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

8 clusters 9 clusters

Example of splitting process

Page 32: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

10 clusters 11 clusters

Example of splitting process

Page 33: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

12 clusters 13 clusters

Example of splitting process

Page 34: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

14 clusters 15 clusters

MSE = 1.94

Example of splitting process

Page 35: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

K-means refinement

Result afterre-partition:MSE = 1.39

Result after K-means: MSE = 1.33

Result directly

after split: MSE = 1.94

Page 36: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

2

...2

...444422

M

N

M

NNNNNNNNni

MNM

NMNNN log

22...

44

22

n p nmin max

mp

Nn

npmnmnnnN m

max

maxmin21 ...

Time complexity

Number of processed vectors, assuming that clusters are always split into two equal halves:

Assuming unequal split to nmax and nmin sizes:

Page 37: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Number of vectors processed:

MNmp

N

Mp

N

p

N

p

N

p

Nn

M

m

i

log

...32

1

At each step, sorting the vectors is bottleneck:

MNNmp

N

p

N

mp

N

mp

NnnNT

M

m

M

mii

logloglog

loglog

1

1

Time complexity

Page 38: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Comparison of results

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

0 20 40 60 80 100 120 140 160

Time (seconds)

MS

E

Split 2

Split 1

SLR(KM)

S+KMSLR

SLR+KM S(KM)

Birch1

Page 39: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

Conclusions

• Divisive algorithms are efficient• Good quality clustering• Several non-trivial design choices• Selection of dividing axis can be improved!

Page 40: Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti 17.3.2014 Speech & Image Processing Unit School of Computing University of Eastern.

References1. P Fränti, T Kaukoranta and O Nevalainen, "On the splitting method for

vector quantization codebook generation", Optical Engineering, 36 (11), 3043-3051, November 1997.

2. C-R Lin and M-S Chen, “ Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging”, TKDE, 17(2), 2005.

3. M Liu, X Jiang, AC Kot, “A multi-prototype clustering algorithm”, Pattern Recognition, 42(2009) 689-698.

4. J Shi and J Malik, “Normalized cuts and image segmentation”, TPAMI, 22(8), 2000.

5. L Feng, M-H Qiu, Y-X Wang, Q-L Xiang, Y-F Yang, K Liu, ”A fast divisive clustering algorithm using an improved discrete particle swarm optimizer”, Pattern Recognition Letters, 2010.

6. C Zhong, D Miao, R Wang, X Zhou, “DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points”, Pattern Recognition Letters, 29 (2008) 2067–2077.