Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of...

79
Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based algorithms

Transcript of Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of...

Page 1: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Clustering Methods: Part 2d

Pasi Fränti

31.3.2014

Speech & Image Processing UnitSchool of Computing

University of Eastern FinlandJoensuu, FINLAND

Swap-based algorithms

Page 2: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Part I:

Random Swap algorithm

P. Fränti and J. KivijärviRandomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), 358-369, 2000.

Page 3: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Pseudo code of Random Swap

RandomSwap(X) C, P

C SelectRandomRepresentatives(X); P OptimalPartition(X, C);

REPEAT T times

(Cnew,j) RandomSwap(X, C); Pnew LocalRepartition(X, Cnew, P, j); Cnew, Pnew Kmeans(X, Cnew, Pnew); IF f(Cnew, Pnew) < f(C, P) THEN

(C, P) Cnew , Pnew;

RETURN (C, P);

Page 4: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Demonstration of the algorithm

Two centroids , butonly one cluster .

One centroid , buttwo clusters .

Two centroids , butonly one cluster .

One centroid , buttwo clusters .

Page 5: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Centroid swap

Swap is made fromcentroid rich area tocentroid poor area.

Swap is made fromcentroid rich area tocentroid poor area.

Page 6: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Local repartition

Page 7: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means1st iteration

Page 8: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means2nd iteration

Page 9: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means3rd iteration

Page 10: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means16th iteration

Page 11: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means17th iteration

Page 12: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means18th iteration

Page 13: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-means19th iteration

Page 14: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tuning by K-meansFinal result after 25 iterations

Page 15: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Implementation of the swap

1. Random swap:

2. Re-partition vectors from old cluster:

3. Create new cluster:

c x j random M i random Nj i ( , ), ( , )1 1

p d x c i p jik M

i k i

arg min ,1

2

p d x c i Nik j k p

i ki

arg min , ,2

1

Page 16: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Random swap as local search

Study neighbor solutions

Page 17: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Select one and move

Random swap as local search

Page 18: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Fine-tune solution by hill-climbing technique!

Role of K-means

Page 19: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Consider only local optima!

Role of K-means

Page 20: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Effective search space

Role of swap: reduce search space

Page 21: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Chain reaction by K-means after swap

Page 22: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

176.53

163.93 163.63 163.51 163.08

150

155

160

165

170

175

180

185

190

K-means Random+ RS

K-means+ RS

Split +RS

Ward +RS

MS

E

Bridge

Independency of initializationResults for T = 5000 iterations

Worst

BestInitial

Initial

Initial

Page 23: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Part II:

Efficiency of Random Swap

Page 24: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Probability of good swap

• Select a proper centroid for removal:

– There are M clusters in total: premoval=1/M.

• Select a proper new location:

– There are N choices: padd=1/N

– Only M are significantly different: padd=1/M

• In total:– M2 significantly different swaps.

– Probability of each different swap is pswap=1/M2

– Open question: how many of these are good?

Page 25: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Number of neighbors

1

4

2

3

6

5

Open question: what is the size of neighborhood ()?

1

2

3

Voronoi neighbors Neighbors by distance

Page 26: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %

45 %

1 2 3 4 5 6 7 8 9

Number of neighbours

Fre

qu

en

cy

Average = 3.9

Observed number of neighborsData set S2

Page 27: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Average number of neighbors

Page 28: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

• Probability of not finding good swap:T

Mq

2

2

1

Expected number of iterations

2

2

1loglogM

Tq

2

2

1log

log

M

qT

• Estimated number of iterations:

Page 29: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Observed q-values Estimated iterations (T ) S1 S2 S3 S4 S1 S2 S3 S4

q=10% 19% 14% 22% 22% 53 47 39 37 q=1% 3.1% 1.2% 1.0% 3.6% 106 93 78 74 q=0.1% 0.1% 0.1% 0.2% 1.1% 159 140 117 111 Expected: 72 56 55 48 23 21 17 16

Estimated number of iterationsdepending on T

S1 S2 S3 S4

Observed = Number of iterations needed in practice.Estimated = Estimate of the number of iterations needed for given q

Page 30: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Probability of success (p)depending on T

0

20

40

60

80

100

0 50 100 150 200 250 300

Iterations

p

Page 31: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300

Iterations

q

Probability of failure (q) depending on T

Page 32: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

0.01 %

0.10 %

1.00 %

10.00 %

100.00 %

16 32 64 128 256 512 1024

Dimensionality

q

Observed for q=0.10%

Observed for q=1%

Observed for q=10%

Observed probabilities depending on dimensionality

Page 33: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

2

2

ln -α

MqT

2

2

2222-ln

/

ln -

/1ln

ln

α

Mq

q

qT

Bounds for the number of iterations

Upper limit:

Lower limit similarly; resulting in:

Page 34: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

2

2

1 2

M

iT

w

i

Multiple swaps (w)

Probability for performing less than w swaps:

Expected number of iterations:

iTiw

i MMi

Tq

2

2

2

21

0

1

Page 35: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

K-means clustering result(3 swaps needed)

Final clustering result

Number of swaps neededExample from image quantization

Page 36: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Efficiency of the random swap

Total time to find correct clustering:– Time per iteration Number of

iterations

Time complexity of a single step:– Swap: O(1)– Remove cluster: 2MN/M = O(N)– Add cluster: 2N = O(N)– Centroids: 2(2N/M) + 2 + 2 = O(N/M) – (Fast) K-means iteration: 4N = O(N)*

*See Fast K-means for analysis.

Page 37: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Observed number of steps at iteration: Step: Time complexity:

50 100 500 Centroid swap 2 2 2 2

Cluster removal 2N 7,526 8,448 10,137 Cluster addition 2N 8,192 8,192 8,192 Update centroids 4N/M + 2 + 1 53 61 60

K-means iterations 4N 300,901 285,555 197,327 Total O(N) 316,674 302,258 215,718

Time complexity and the observed number of steps

Page 38: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

0

20

40

60

80

100

120

140

0 50 100 150 200 250 300 350 400 450 500

k-means 2. iterationk-means 1. iterationlocal repartition

Bridge

Time spent by K-means iterations

Page 39: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

0.1 1 10 100160

165

170

175

180

185

190

1 iteration2 iterations3 iterations4 iterations5 iterations

10 20 30 40 50167

168

169

170

171

172

173

174

Ve rs io n w ith o n e it e ra t io n s e e ms to b e w e a ke s t a ll th e t ime .

Ve rs io n s w ith o th e r a mo u n ts o fit e ra t io n s a re p re t ty e v e n .

T im e (s )

Err

or (

MSE

)

B rid ge

Effect of K-means iterations

Page 40: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Total time complexity

Number of iterations needed (T):

α

NMq-N

α

Mq-MNT

2

2

2 lnln ,

2

2

ln -α

MqT

t = O(αN)

Total time:

Time complexity of a single step (t):

Page 41: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time complexity: conclusions

1. Logarithmic dependency on q

2. Linear dependency on N

3. Quadratic dependency on M (With large number of clusters, can be too slow)

4. Inverse dependency on (worst case = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)

α

NMq-MNT

2ln,

Page 42: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Bridge

160

165

170

175

180

185

190

0.1 1 10 100 1000Time

MS

E

Random Swap

Repeated k-means

Time-distortion performance

Page 43: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time-distortion performance

Missa1

5.00

5.25

5.50

5.75

6.00

6.25

6.50

1 10 100 1000Time

MS

E

RamdomSwap

Repeated k-means

Page 44: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time-distortion performance

400

450

500

550

600

1 10 100 1000 10000Time

MS

E

Random Swap

Repeated k-means

Birch1

Mill

ions

Page 45: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time-distortion performance

Birch2

0.0

2.0

4.0

6.0

8.0

10.0

1 10 100 1000

Mill

ions

Time

MS

E

Repeated k-means

Random Swap

Page 46: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time-distortion performance

Europe

2

4

6

8

10

12

14

16

1 10 100 1000Time

MS

E Repeated k-means

RandomSwap

Mill

ion

s

Page 47: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Time-distortion performance

KDD-Cup04 Bio

7.42

7.44

7.46

7.48

7.50

7.52

7.54

7.56

7.58

7.60

100 1000 10000 100000Time

MS

E

Random Swap

Repeated k-means

Mill

ions

Page 48: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

References

Random swap algorithm:• P. Fränti and J. Kivijärvi, "Randomised local search algorithm

for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.

• P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998.

Pseudo code:• http://cs.joensuu.fi/sipu/soft/

Efficiency of Random swap algorithm:• P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of

random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

Page 49: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Part III:

Example when 4 swaps needed

Page 50: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

MSE = 4.2 * 109 MSE = 3.4 * 109

1st swap

Page 51: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

MSE = 3.1* 109 MSE = 3.0 * 109

2nd swap

Page 52: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

MSE = 2.3 * 109 MSE = 2.1 * 109

3rd swap

Page 53: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

MSE = 1.9 * 109 MSE = 1.7 * 109

4th swap

Page 54: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

MSE = 1.3 * 109

Final result

Page 55: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Part IV:

Deterministic Swap

Page 56: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

13

10

15

6

11

1

7

4

5

12

8

14

2

3

9

Two centroids , butonly one cluster .

One centroid , buttwo clusters .

Deterministic swap

Cluster Removal Addition 1 0.80 0.39

2 1.04 0.64 3 5.48 1.09 4 5.66 0.92 5 6.50 0.76 6 7.67 1.01 7 8.47 0.45 8 9.10 0.75 9 9.90 1.42

10 11.09 1.26 11 11.47 0.61 12 12.17 4.70 13 14.61 0.94 14 16.41 0.93 15 16.68 1.41

Costs for the swap:

From where to where?

Page 57: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

• Merge two existing clusters [Frigui 1997, Kaukoranta 1998] following the spirit of agglomerative clustering.

• Local optimization: remove the prototype that increases the cost function value least [Fritzke 1997, Likas 2003, Fränti 2006].

• Smart swap: find two nearest prototypes, and remove one of them randomly [Chen, 2010].

• Pairwise swap: locate a pair of inconsistent prototypes in two solutions [Zhao, 2012].

Cluster removal

Page 58: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

1. Select an existing cluster– Depending on strategy: 1..M choices.– Each choice takes O(N) time to test.

2. Select a location within this cluster– Add new prototype– Consider only existing points

Cluster addition

Page 59: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Select the cluster

• Cluster with the biggest MSE– Intuitive heuristic [Fritzke 1997, Chen 2010]

– Computationally demanding:

• Local optimization– Try all clusters for the addition [Likas et al, 2003]

– Computationally demanding: O(NM)-O(N2)

Page 60: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Select the location

1. Current prototype + ε [Fritzke 1997]

2. Furthest vector [Fränti et al 1997]

3. Any other split heuristic [Fränti et al, 1997]

4. Random location

5. Every possible location [Likas et al, 2003]

Page 61: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Complexity of swaps

Page 62: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Furthest point in cluster

Prototype removed

Cluster where added

Furthest pointselected

Page 63: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

• Initialization: O(MN) • Swap Iteration

– Finding nearest pair: O(M2) – Calculating distortion: O(N) – Sorting clusters: O(M∙logM) – Evaluation of result: O(N) – Repartition and fine-tuning: O(N) Total: O(MN+M2+I∙N)

• Number of iteration expected: < 2∙M

• Estimated total time: O(2M2N)

Smart swap

Page 64: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Smart swap

Nearestprototypes

Cluster with largest distortion

Page 65: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

SmartSwap(X,M) → C,PC ← InitializeCentroids(X);P ←PartitionDataset(X, C);Maxorder ← log2M;

order ← 1;WHILE order < Maxorder ci, cj ←FindNearestPair(C);

S ← SortClustersByDistortion(P, C); cswap ←RandomSelect(ci, cj );

clocation ←sorder;

Cnew ← Swap(cswap, clocation);

Pnew ← LocalRepartition(P, Cnew);

KmeansIteration(Pnew, Cnew);

IF f(Cnew) < f(C), THEN

order ← 1; C ←Cnew ;

ELSE order ← order + 1; KmeansIteration(P, C);

Smart swappseudo code

Page 66: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Pairwise swap

Unpaired prototypes

Unpairedprototypes

Nearest neighborsof each other

Nearest neighbor ofthe other set further than in the same set

→Subject to swap

Page 67: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Combinations of random and deterministic swap

Variant Removal Addition

RR Random Random

RD Random Deterministic

DR Deterministic Random

DD Deterministic Deterministic

D2R Deterministic+ data update

Random

D2D Deterministic+ data update

Deterministic

Page 68: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Summary of the time complexities

Random removal

Deterministic removal

RR RD DR DD D2R D2DRemoval O(1) O(1) O(MN) O(MN) O(αN) O(αN)

Addition O(1) O(N) O(1) O(N) O(1) O(N)

Repartition

O(N) O(N) O(N) O(N) O(N) O(N)

K-means O(αN) O(αN) O(αN) O(αN) O(αN) O(αN)

O(αN) O(αN) O(MN) O(MN) O(αN) O(αN)

Page 69: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Profiles of the processing time

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

0,40

0,45

RR RD DR DD D2R D2D

Time (

s) / it

eratio

n

Others

Repartition

Sw ap

K-means

0,00

0,50

1,00

1,50

2,00

2,50

RR RD DR DD D2R D2D

Time (

s) / it

eratio

n

Others

Repartition

Sw ap

K-means

Bridge Birch2

Page 70: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Test data setsData set Type of data set Number of data

vectors (N) Number of clusters (M)

Dimension of data vector (d)

Bridge Gray-scale image 4086 256 16

House* RGB image 34112 256 3

Miss America Residual vectors 6480 256 16

Europe Differential coordinates 169673 2

BIRCH1-BIRCH3 Synthetically generated 100000 100 2

S1- S4 Synthetically generated 5000 15 2

Dim32-1024 Synthetically generated 1000 256 32 – 1024

Data set S1 Data set S2 Data set S3 Data set S4

Page 71: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Birch data sets

Birch1 Birch2 Birch3

Page 72: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

ExperimentsBridge

1 10 100160

165

170

175

180

185

Time (s)

Err

or

(MS

E)

Bridge

RRDRRDDD

RD

DD

DR

RandomSwap

Page 73: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

ExperimentsBridge

Bridge

165

170

175

180

185

190

0.1 1 10 100Time

MS

E

Random Swap

Repeated k-means

DR

D2R

Page 74: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

10 1002

2.5

3

3.5

4

4.5

5x 10

6

Time (s)

Err

or

(MS

E)

Birch2

RRDRRDDD

ExperimentsBirch2

Random Swap

DD

DRRD

Page 75: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Missa1

5.3

5.5

5.7

5.9

6.1

6.3

6.5

1 10 100

Time

MS

E

RamdomSwap

Repeated k-means

DR

D2R

ExperimentsMiss America

Page 76: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Quality comparisons (MSE)with 10 second time constraint

18:14:16:15:14:12:1Average speed-up from RR to RD

2.785.111.025.586.10171.20RD-variant

4.435.701.265.856.41174.08Random Swap

4.105.491.525.926.58177.66Repeated K-means

22.3513.102.378.3412.12251.32Repeated Random

Birch2

×106

Birch1

×108

Europe×107

Miss America

HouseBridge

Page 77: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

Literature1. P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the

clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.

2. P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998.

3. P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

4. P. Fränti, M. Tuononen and O. Virmajoki, "Deterministic and randomized local search algorithms for clustering", IEEE Int. Conf. on Multimedia and Expo, (ICME'08), Hannover, Germany, 837-840, June 2008.

5. P. Fränti and O. Virmajoki, "On the efficiency of swap-based clustering", Int. Conf. on Adaptive and Natural Computing Algorithms (ICANNGA'09), Kuopio, Finland, LNCS 5495, 303-312, April 2009.

Page 78: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

5. J. Chen, Q. Zhao, and P. Fränti, "Smart swap for more efficient clustering", Int. Conf. Green Circuits and Systems (ICGCS’10), Shanghai, China, 446-450, June 2010.

6. B. Fritzke, The LBG-U method for vector quantization – an improvement over LBG inspired from neural networks. Neural Processing Letters 5(1) (1997) 35-45.

7. P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pat. Rec., 39 (5), 761-765, May 2006.

8. T. Kaukoranta, P. Fränti and O. Nevalainen "Iterative split-and-merge algorithm for VQ codebook generation", Optical Engineering, 37 (10), 2726-2732, October 1998.

9. H. Frigui and R. Krishnapuram, "Clustering by competitive agglomeration". Pattern Recognition, 30 (7), 1109-1119, July 1997.

Literature

Page 79: Clustering Methods: Part 2d Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.

10. A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, 451-461, 2003.

11. PAM (Kaufman and Rousseeuw, 1987)

12. CLARA (Kaufman and Rousseeuw in 1990)

13. CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han 1994)

14. R.T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Transactions on knowledge and data engineering, 14 (5), September/October 2002.

Literature