University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358...
-
Upload
emily-mantel -
Category
Documents
-
view
226 -
download
1
Transcript of University of Eastern Finland School of Computing P.O. Box 111 FIN- 80101 Joensuu FINLAND Tel. +358...
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means*: Clustering by Gradual Data Transformation
Mikko Malinen and Pasi Fränti
Speech and Image Processing Unit
School of Computing
University of Eastern Finland
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means* clustering Gradual transformation of data
Model
Data
Fit the data to a model
Intermediate Final
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means clustering
Iterate between two steps:
1. Assignment step Assign the points to the nearest centroids
2. Update step Update the location of centroids
)(
)(
)1( 1t
ij Sjt
i
ti
S x
x m
},...,1*:{ )(*
)()( kiS tij
tijj
ti m x m x x
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means* clustering
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Example of clustering (s2 dataset)
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
0% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
10% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
20% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
30% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
40% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
50% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
60% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
70% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
80% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
90% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
100% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Empty clusters problem
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
)(nOkfree kPhase )( nOk )1(Ok
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO)( 2nkO )( 3nO )( 2nO
)( 1kdknO )( 2)( dnOnO )( 2
3dn
nO )( 1kdnO
)( 1kdknO )( 2)( dnOnO )( 2
3dn
nO )( 1kdnO
Time Complexity
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Time ComplexityFixed k-means
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
)(nOkfree kPhase )( nOk )1(Ok
)(nO )(nO )(nO )(nO
)(nO)( 2nkO )( 3nO )( 2nO
)(knO )( 2nO )( 5.1nO
)(nO )(nO )(nO )(nO
)(nO)( 2nkO )( 3nO )( 2nO
)(nO
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
s1 d = 2n = 5000 k = 15
s2 d = 2n = 5000 k = 15
s3 d = 2n = 5000 k = 15
s4 d = 2n = 5000 k = 15
bridge d = 16n = 4096 k= 256
missa d = 16n = 6480 k= 256
house d = 3n=34000 k=256
thyroid d = 5n = 215 k = 2
iris d = 4n = 150 k = 2
wine d = 13n = 178 k = 3
Datasets
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error
Dataset k-means proposed GKM optimal
s1 1.85 1.01 0.89 0.89
s2 1.94 1.52 1.33 1.33
s3 1.97 1.71 1.69 1.69
s4 1.69 1.63 1.57 1.57
bridge 168.2 164.7 164.1 160.7
missa 5.33 5.15 5.34 5.12
house 9.88 9.48 5.94 5.86
thyroid 6.97 6.92 1.52 1.52
iris 3.70 3.70 2.02 2.02
wine 1.92 1.90 0.88 0.88
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
All correct:
Number of incorrect clusters
proposed: 36%k-means: 14%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
1 incorrect:
Number of incorrect clusters
proposed: 64%k-means: 38%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
2 incorrect:
Number of incorrect clusters
proposed: 0%k-means: 34%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
3 incorrect:
Number of incorrect clusters
proposed: 0%k-means: 10%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Summary
• We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model.
• The proposed method gives better mean square error than k-means.