University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358...

33
University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955 www.uef.fi/cs K-means*: Clustering by Gradual Data Transformation Mikko Malinen and Pasi Fränti Speech and Image Processing Unit School of Computing University of Eastern Finland

Transcript of University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358...

Page 1: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means*: Clustering by Gradual Data Transformation

Mikko Malinen and Pasi Fränti

Speech and Image Processing Unit

School of Computing

University of Eastern Finland

Page 2: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means* clustering Gradual transformation of data

Model

Data

Fit the data to a model

Intermediate Final

Page 3: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means clustering

Iterate between two steps:

1. Assignment step Assign the points to the nearest centroids

2. Update step Update the location of centroids

)(

)(

)1( 1t

ij Sjt

i

ti

S x

x m

},...,1*:{ )(*

)()( kiS tij

tijj

ti m x m x x

Page 4: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

K-means* clustering

Page 5: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Example of clustering (s2 dataset)

Page 6: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

0% done

Page 7: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

10% done

Page 8: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

20% done

Page 9: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

30% done

Page 10: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

40% done

Page 11: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

50% done

Page 12: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

60% done

Page 13: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

70% done

Page 14: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

80% done

Page 15: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

90% done

Page 16: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

100% done

Page 17: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Empty clusters problem

Page 18: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Initialization

Data set transform

Empty clusters removal

K-means

Algorithm total

)(nOkfree kPhase )( nOk )1(Ok

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO

)(nO)( 2nkO )( 3nO )( 2nO

)( 1kdknO )( 2)( dnOnO )( 2

3dn

nO )( 1kdnO

)( 1kdknO )( 2)( dnOnO )( 2

3dn

nO )( 1kdnO

Time Complexity

Page 19: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Time ComplexityFixed k-means

Initialization

Data set transform

Empty clusters removal

K-means

Algorithm total

)(nOkfree kPhase )( nOk )1(Ok

)(nO )(nO )(nO )(nO

)(nO)( 2nkO )( 3nO )( 2nO

)(knO )( 2nO )( 5.1nO

)(nO )(nO )(nO )(nO

)(nO)( 2nkO )( 3nO )( 2nO

)(nO

Page 20: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

s1 d = 2n = 5000 k = 15

s2 d = 2n = 5000 k = 15

s3 d = 2n = 5000 k = 15

s4 d = 2n = 5000 k = 15

bridge d = 16n = 4096 k= 256

missa d = 16n = 6480 k= 256

house d = 3n=34000 k=256

thyroid d = 5n = 215 k = 2

iris d = 4n = 150 k = 2

wine d = 13n = 178 k = 3

Datasets

Page 21: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error

Dataset k-means proposed GKM optimal

s1 1.85 1.01 0.89 0.89

s2 1.94 1.52 1.33 1.33

s3 1.97 1.71 1.69 1.69

s4 1.69 1.63 1.57 1.57

bridge 168.2 164.7 164.1 160.7

missa 5.33 5.15 5.34 5.12

house 9.88 9.48 5.94 5.86

thyroid 6.97 6.92 1.52 1.52

iris 3.70 3.70 2.02 2.02

wine 1.92 1.90 0.88 0.88

Page 22: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 23: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 24: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 25: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 26: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 27: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 28: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Mean square error vs.number of steps

Page 29: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

All correct:

Number of incorrect clusters

proposed: 36%k-means: 14%

Page 30: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

1 incorrect:

Number of incorrect clusters

proposed: 64%k-means: 38%

Page 31: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

2 incorrect:

Number of incorrect clusters

proposed: 0%k-means: 34%

Page 32: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

3 incorrect:

Number of incorrect clusters

proposed: 0%k-means: 10%

Page 33: University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358 13 251 7959 fax +358 13 251 7955  K-means*:

University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs

Summary

• We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model.

• The proposed method gives better mean square error than k-means.