New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf ·...
Transcript of New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf ·...
![Page 1: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/1.jpg)
New Developments In The Theory Of Clusteringthat’s all very well in practice, but does it work in theory ?
Sergei Vassilvitskii (Yahoo! Research)Suresh Venkatasubramanian (U. Utah)
Sergei V. and Suresh V. Theory of Clustering
![Page 2: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/2.jpg)
Overview
What we will coverA few of the recent theory results on clustering:
Practical algorithms that have strong theoretical guarantees
Models to explain behavior observed in practice
Sergei V. and Suresh V. Theory of Clustering
![Page 3: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/3.jpg)
Overview
What we will not coverThe rest:
Recent strands of theory of clustering such as metaclusteringand privacy preserving clustering
Clustering with distributional data assumptions
Proofs
Sergei V. and Suresh V. Theory of Clustering
![Page 4: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/4.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III Stability
How to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 5: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/5.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III Stability
How to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 6: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/6.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III Stability
How to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 7: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/7.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III StabilityHow to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 8: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/8.jpg)
Euclidean Clustering and k-means
Sergei V. and Suresh V. Theory of Clustering
![Page 9: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/9.jpg)
Introduction
What does it mean to cluster?
Given n points in Rd find the best way to split them into k groups.
Sergei V. and Suresh V. Theory of Clustering
![Page 10: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/10.jpg)
Introduction
How do we define “best" ?Example:
Given points in split them into similar groups.
Clusteringn Rd k
3Sergei V. and Suresh V. Theory of Clustering
![Page 11: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/11.jpg)
Introduction
How do we define “best" ?Example:
Given points in split them into similar groups.
Clusteringn Rd k
3Sergei V. and Suresh V. Theory of Clustering
![Page 12: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/12.jpg)
Introduction
How do we define “best" ?Minimize the maximum radius of a cluster
Given points in split them into similar groups.
Clusteringn Rd k
Objective: minimize maximum radius
4Sergei V. and Suresh V. Theory of Clustering
![Page 13: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/13.jpg)
Introduction
How do we define “best" ?Maximize the average inter-cluster distance
maximize inter-cluster distanceObjective: minimize maximum radius
Given points in split them into similar groups.
Clusteringn Rd k
5Sergei V. and Suresh V. Theory of Clustering
![Page 14: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/14.jpg)
Introduction
How do we define “best" ?Minimize the variance within each cluster.
maximize inter-cluster distanceObjective: minimize maximum radius
Given points in split them into similar groups.
Clusteringn Rd k
6Sergei V. and Suresh V. Theory of Clustering
![Page 15: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/15.jpg)
Introduction
How do we define “best" ?Minimize the variance within each cluster.
Minimizing total variance
For each cluster Ci ∈ C , ci =1|Ci|
∑
x∈Cix is the expected location of
a point in a cluster.Then the variance of each cluster is:
∑
x∈Ci
= ‖x− ci‖2
And the total objective is:
φ =∑
ci
∑
x∈Ci
‖x− ci‖2
Sergei V. and Suresh V. Theory of Clustering
![Page 16: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/16.jpg)
Approximations
Minimizing VarianceGiven X and k, find a clustering C = C1, C2, . . . , Ck thatminimizes: φ(X,C ) =
∑
ci
∑
x∈Ci‖x− ci‖2
DefinitionLet φ∗ denote the value of the optimum solution above. We say that aclustering C ′ is α-approximate if:
φ∗ ≤ φ(X,C ′)≤ α ·φ∗
Sergei V. and Suresh V. Theory of Clustering
![Page 17: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/17.jpg)
Approximations
Minimizing VarianceGiven X and k, find a clustering C = C1, C2, . . . , Ck thatminimizes: φ(X,C ) =
∑
ci
∑
x∈Ci‖x− ci‖2
DefinitionLet φ∗ denote the value of the optimum solution above. We say that aclustering C ′ is α-approximate if:
φ∗ ≤ φ(X,C ′)≤ α ·φ∗
Sergei V. and Suresh V. Theory of Clustering
![Page 18: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/18.jpg)
Approximations
Minimizing VarianceGiven X and k, find a clustering C = C1, C2, . . . , Ck thatminimizes: φ(X,C ) =
∑
ci
∑
x∈Ci‖x− ci‖2
Solving this problemThis problem is NP-complete, even when the pointset X lies in twodimensions...
...but we’ve been solving it for over 50 years! [S56][L57][M67]
Sergei V. and Suresh V. Theory of Clustering
![Page 19: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/19.jpg)
Approximations
Minimizing VarianceGiven X and k, find a clustering C = C1, C2, . . . , Ck thatminimizes: φ(X,C ) =
∑
ci
∑
x∈Ci‖x− ci‖2
Solving this problemThis problem is NP-complete, even when the pointset X lies in twodimensions...
...but we’ve been solving it for over 50 years! [S56][L57][M67]
Sergei V. and Suresh V. Theory of Clustering
![Page 20: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/20.jpg)
k-means
ExampleGiven a set of data points
Lloyd’s Method: k-means
Initialize with random clusters
15
Sergei V. and Suresh V. Theory of Clustering
![Page 21: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/21.jpg)
k-means
ExampleGiven a set of data points
Lloyd’s Method: k-means
Initialize with random clusters
15Sergei V. and Suresh V. Theory of Clustering
![Page 22: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/22.jpg)
k-means
ExampleSelect initial centers at random
Lloyd’s Method: k-means
Initialize with random clusters
16Sergei V. and Suresh V. Theory of Clustering
![Page 23: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/23.jpg)
k-means
ExampleAssign each point to nearest center
Lloyd’s Method: k-means
Assign each point to nearest center
17Sergei V. and Suresh V. Theory of Clustering
![Page 24: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/24.jpg)
k-means
ExampleRecompute optimum centers given a fixed clustering
Lloyd’s Method: k-means
Recompute optimum centers (means)
18Sergei V. and Suresh V. Theory of Clustering
![Page 25: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/25.jpg)
k-means
ExampleRepeat
Lloyd’s Method: k-means
Repeat: Assign points to nearest center
19Sergei V. and Suresh V. Theory of Clustering
![Page 26: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/26.jpg)
k-means
ExampleRepeat
Lloyd’s Method: k-means
Repeat: Recompute centers
20Sergei V. and Suresh V. Theory of Clustering
![Page 27: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/27.jpg)
k-means
ExampleRepeat
Lloyd’s Method: k-means
Repeat...
21Sergei V. and Suresh V. Theory of Clustering
![Page 28: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/28.jpg)
k-means
ExampleUntil the clustering doesn’t change
Lloyd’s Method: k-means
Repeat...Until clustering does not change
22Sergei V. and Suresh V. Theory of Clustering
![Page 29: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/29.jpg)
Performance
This algorithm terminates!Recall the total error:
φ(X,C ) =∑
ci
∑
x∈Ci
‖x− ci‖2
In every iteration φ is reduced:
Assigning each point to the nearest center reduces φ
Given a fixed cluster, the mean is the optimal location for thecenter (requires proof)
Sergei V. and Suresh V. Theory of Clustering
![Page 30: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/30.jpg)
Performance
The algorithm finds a local minimum . . .
k-means Accuracy
How good is this algorithm?
Finds a local optimum
That is potentially arbitrarily worse than optimal solution
32Sergei V. and Suresh V. Theory of Clustering
![Page 31: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/31.jpg)
Performance
. . . that’s potentially arbitrarily worse than optimum solution
k-means Accuracy
How good is this algorithm?
Finds a local optimum
That is potentially arbitrarily worse than optimal solution
33Sergei V. and Suresh V. Theory of Clustering
![Page 32: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/32.jpg)
Performance
But does this really happen?k-means Accuracy
But does this really happen?
37Sergei V. and Suresh V. Theory of Clustering
![Page 33: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/33.jpg)
Performance
But does this really happen? YES!k-means Accuracy
But does this really happen? YES
Even with many random restarts!
38Sergei V. and Suresh V. Theory of Clustering
![Page 34: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/34.jpg)
Performance
Finding a good set of initial points is a black art
Try many times with different random seedsMost common methodHas limited benefit even in case of Gaussians
Find a different way to initialize centersHundreds of heuristicsIncluding pre & post processing ideas
There exists a fast and simple initialization scheme with provableperformance guarantees
Sergei V. and Suresh V. Theory of Clustering
![Page 35: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/35.jpg)
Performance
Finding a good set of initial points is a black art
Try many times with different random seedsMost common methodHas limited benefit even in case of Gaussians
Find a different way to initialize centersHundreds of heuristicsIncluding pre & post processing ideas
There exists a fast and simple initialization scheme with provableperformance guarantees
Sergei V. and Suresh V. Theory of Clustering
![Page 36: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/36.jpg)
Performance
Finding a good set of initial points is a black art
Try many times with different random seedsMost common methodHas limited benefit even in case of Gaussians
Find a different way to initialize centersHundreds of heuristicsIncluding pre & post processing ideas
There exists a fast and simple initialization scheme with provableperformance guarantees
Sergei V. and Suresh V. Theory of Clustering
![Page 37: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/37.jpg)
Performance
Finding a good set of initial points is a black art
Try many times with different random seedsMost common methodHas limited benefit even in case of Gaussians
Find a different way to initialize centersHundreds of heuristicsIncluding pre & post processing ideas
There exists a fast and simple initialization scheme with provableperformance guarantees
Sergei V. and Suresh V. Theory of Clustering
![Page 38: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/38.jpg)
Random Initializations on Gaussians
Some Gaussians are combined
k-means on Gaussians
45
Sergei V. and Suresh V. Theory of Clustering
![Page 39: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/39.jpg)
Random Initializations on Gaussians
Some Gaussians are combined
k-means on Gaussians
46Sergei V. and Suresh V. Theory of Clustering
![Page 40: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/40.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Sergei V. and Suresh V. Theory of Clustering
![Page 41: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/41.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Simple Fix
Select centers using a furthest point algorithm (2-approximation to k-Center clustering).
47
Sergei V. and Suresh V. Theory of Clustering
![Page 42: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/42.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Simple Fix
Select centers using a furthest point algorithm (2-approximation to k-Center clustering).
48
Sergei V. and Suresh V. Theory of Clustering
![Page 43: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/43.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Simple Fix
Select centers using a furthest point algorithm (2-approximation to k-Center clustering).
49
Sergei V. and Suresh V. Theory of Clustering
![Page 44: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/44.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Simple Fix
Select centers using a furthest point algorithm (2-approximation to k-Center clustering).
50
Sergei V. and Suresh V. Theory of Clustering
![Page 45: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/45.jpg)
Seeding on Gaussians
But the Gaussian case has an easy fix: use a furthest point heuristic
Simple Fix
Select centers using a furthest point algorithm (2-approximation to k-Center clustering).
51
Sergei V. and Suresh V. Theory of Clustering
![Page 46: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/46.jpg)
Seeding on Gaussians
But this fix is overly sensitive to outliers
Sensitive to Outliers
53Sergei V. and Suresh V. Theory of Clustering
![Page 47: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/47.jpg)
Seeding on Gaussians
But this fix is overly sensitive to outliers
Sensitive to Outliers
54Sergei V. and Suresh V. Theory of Clustering
![Page 48: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/48.jpg)
Seeding on Gaussians
But this fix is overly sensitive to outliers
Sensitive to Outliers
55Sergei V. and Suresh V. Theory of Clustering
![Page 49: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/49.jpg)
Seeding on Gaussians
But this fix is overly sensitive to outliers
Sensitive to Outliers
56Sergei V. and Suresh V. Theory of Clustering
![Page 50: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/50.jpg)
Seeding on Gaussians
But this fix is overly sensitive to outliers
Sensitive to Outliers
57Sergei V. and Suresh V. Theory of Clustering
![Page 51: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/51.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 52: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/52.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 53: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/53.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 54: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/54.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 55: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/55.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 56: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/56.jpg)
k-means++
What if we interpolate between the two methods?
Let D(x) be the distance between a point x and its nearest clustercenter. Chose the next point proportionally to Dα(x).
α= 0 −→ Random initialization
α=∞ −→ Furthest point heuristic
α= 2 −→ k-means++
More generallySet the probability of selecting a point proportional to itscontribution to the overall error.
If minimizing∑
ci
∑
x∈Ci‖x− ci‖, sample according to D.
If minimizing∑
ci
∑
c∈Ci‖x− ci‖∞, sample according to D∞
(take the furthest point).
Sergei V. and Suresh V. Theory of Clustering
![Page 57: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/57.jpg)
Example of k-means++
If the data set looks Gaussian. . .k-Means++
56
Sergei V. and Suresh V. Theory of Clustering
![Page 58: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/58.jpg)
Example of k-means++
If the data set looks Gaussian. . .k-Means++
57
Sergei V. and Suresh V. Theory of Clustering
![Page 59: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/59.jpg)
Example of k-means++
If the data set looks Gaussian. . .k-Means++
58
Sergei V. and Suresh V. Theory of Clustering
![Page 60: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/60.jpg)
Example of k-means++
If the data set looks Gaussian. . .k-Means++
59
Sergei V. and Suresh V. Theory of Clustering
![Page 61: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/61.jpg)
Example of k-means++
If the data set looks Gaussian. . .k-Means++
60
Sergei V. and Suresh V. Theory of Clustering
![Page 62: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/62.jpg)
Example of k-means++
If the outlier should be its own cluster . . .k-Means++
61
Sergei V. and Suresh V. Theory of Clustering
![Page 63: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/63.jpg)
Example of k-means++
If the outlier should be its own cluster . . .k-Means++
62
Sergei V. and Suresh V. Theory of Clustering
![Page 64: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/64.jpg)
Example of k-means++
If the outlier should be its own cluster . . .k-Means++
63
Sergei V. and Suresh V. Theory of Clustering
![Page 65: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/65.jpg)
Example of k-means++
If the outlier should be its own cluster . . .k-Means++
64
Sergei V. and Suresh V. Theory of Clustering
![Page 66: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/66.jpg)
Example of k-means++
If the outlier should be its own cluster . . .k-Means++
65
Sergei V. and Suresh V. Theory of Clustering
![Page 67: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/67.jpg)
Analyzing k-means++
What can we say about performance of k-means++?
Theorem (AV07)This algorithm always attains an O(log k) approximation inexpectation
Theorem (ORSS06)A slightly modified version of this algorithm attains an O(1)approximation if the data is ‘nicely clusterable’ with k clusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 68: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/68.jpg)
Analyzing k-means++
What can we say about performance of k-means++?
Theorem (AV07)This algorithm always attains an O(log k) approximation inexpectation
Theorem (ORSS06)A slightly modified version of this algorithm attains an O(1)approximation if the data is ‘nicely clusterable’ with k clusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 69: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/69.jpg)
Analyzing k-means++
What can we say about performance of k-means++?
Theorem (AV07)This algorithm always attains an O(log k) approximation inexpectation
Theorem (ORSS06)A slightly modified version of this algorithm attains an O(1)approximation if the data is ‘nicely clusterable’ with k clusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 70: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/70.jpg)
Nice Clusterings
What do we mean by ‘nicely clusterable’?
Intuitively, X is nicely clusterable if going from k− 1 to k clustersdrops the total error by a constant factor.
Definition
A pointset X is (k,ε)-separated if φ∗k(X)≤ ε2φ∗k−1(X).
Sergei V. and Suresh V. Theory of Clustering
![Page 71: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/71.jpg)
Nice Clusterings
What do we mean by ‘nicely clusterable’?
Intuitively, X is nicely clusterable if going from k− 1 to k clustersdrops the total error by a constant factor.
Definition
A pointset X is (k,ε)-separated if φ∗k(X)≤ ε2φ∗k−1(X).
Sergei V. and Suresh V. Theory of Clustering
![Page 72: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/72.jpg)
Why does this work?
IntuitionLook at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, thatcluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,then other clusters must contribute little to the overall error
As long as the points are reasonably well separated, the firstcondition holds.
Two theoremsAssume the points are (k,ε)-separated and get an O(1)approximation.
Make no assumptions about separability and get an O(log k)approximation.
Sergei V. and Suresh V. Theory of Clustering
![Page 73: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/73.jpg)
Why does this work?
IntuitionLook at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, thatcluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,then other clusters must contribute little to the overall error
As long as the points are reasonably well separated, the firstcondition holds.
Two theoremsAssume the points are (k,ε)-separated and get an O(1)approximation.
Make no assumptions about separability and get an O(log k)approximation.
Sergei V. and Suresh V. Theory of Clustering
![Page 74: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/74.jpg)
Why does this work?
IntuitionLook at the optimum clustering. In expectation:
1 If the algorithm selects a point from a new OPT cluster, thatcluster is covered pretty well
2 If the algorithm picks two points from the same OPT cluster,then other clusters must contribute little to the overall error
As long as the points are reasonably well separated, the firstcondition holds.
Two theoremsAssume the points are (k,ε)-separated and get an O(1)approximation.
Make no assumptions about separability and get an O(log k)approximation.
Sergei V. and Suresh V. Theory of Clustering
![Page 75: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/75.jpg)
Summary
k-means++ Summary:To select the next cluster, sample a point in proportion to itscurrent contribution to the error
Works for k-means, k-median, other objective functions
Universal O(log k) approximation, O(1) approximation undersome assumptions
Can be implemented to run in O(nkd) time (same as a singlek-means step)
But does it actually work?
Sergei V. and Suresh V. Theory of Clustering
![Page 76: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/76.jpg)
Summary
k-means++ Summary:To select the next cluster, sample a point in proportion to itscurrent contribution to the error
Works for k-means, k-median, other objective functions
Universal O(log k) approximation, O(1) approximation undersome assumptions
Can be implemented to run in O(nkd) time (same as a singlek-means step)
But does it actually work?
Sergei V. and Suresh V. Theory of Clustering
![Page 77: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/77.jpg)
Large Evaluation
Sergei V. and Suresh V. Theory of Clustering
![Page 78: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/78.jpg)
Typical Run
KM++ v. KM v. KM-Hybrid
600
700
800
900
1000
1100
1200
1300
0 50 100 150 200 250 300 350 400 450 500
Stage
Erro
r LLOYD
HYBRID
KM++
Sergei V. and Suresh V. Theory of Clustering
![Page 79: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/79.jpg)
Other Runs
KM++ v. KM v. KM-Hybrid
0
50000
100000
150000
200000
250000
020406080100120140160180200220240260280300320340360380400420440460480500
Stage
Erro
r LLOYD
HYBRID
KM++
Sergei V. and Suresh V. Theory of Clustering
![Page 80: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/80.jpg)
Convergence
How fast does k-means converge?It appears the algorithm converges in under 100 iterations (evenfaster with smart initialization).
Sergei V. and Suresh V. Theory of Clustering
![Page 81: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/81.jpg)
Convergence
How fast does k-means converge?It appears the algorithm converges in under 100 iterations (evenfaster with smart initialization).
Theorem (V09)
There exists a pointset X in R2 and a set of initial centers C so thatk-means takes 2Ω(k) iterations to converge when initialized with C .
Sergei V. and Suresh V. Theory of Clustering
![Page 82: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/82.jpg)
Theory vs. Practice
Finding the disconnectIn theory:
k-means might run in exponential time
In practice:
k-means converges after a handful of iterations
It works in practice but it does not work in theory!
Sergei V. and Suresh V. Theory of Clustering
![Page 83: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/83.jpg)
Finding the disconnect
Robustness of worst case examplesPerhaps the worst case examples are too precise, and can neverarise out of natural data
Quantifying the robustnessIf we slightly perturb the points of the example:
The optimum solution shouldn’t change too much
Will the running time stay exponential?
Sergei V. and Suresh V. Theory of Clustering
![Page 84: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/84.jpg)
Small Perturbations
k-Means Convergence
Huge gap between worst-case and observed results.
Check how fragile the worst case is.
Add a little bit of noise to the data before running the algorithm
28
Sergei V. and Suresh V. Theory of Clustering
![Page 85: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/85.jpg)
Small Perturbations
k-Means Convergence
Huge gap between worst-case and observed results.
Check how fragile the worst case is.
Add a little bit of noise to the data before running the algorithm
Optimum solution barely changes29
Sergei V. and Suresh V. Theory of Clustering
![Page 86: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/86.jpg)
Small Perturbations
k-Means Convergence
Huge gap between worst-case and observed results.
Check how fragile the worst case is.
Add a little bit of noise to the data before running the algorithm
28
Sergei V. and Suresh V. Theory of Clustering
![Page 87: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/87.jpg)
Smoothed Analysis
Perturbation
To each point x ∈ X add independent noise drawn from N(0,σ2).
DefinitionThe smoothed complexity of an algorithm is the maximum expectedrunning time after adding the noise:
maxXEσ[Time(X +σ)]
Sergei V. and Suresh V. Theory of Clustering
![Page 88: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/88.jpg)
Smoothed Analysis
Theorem (AMR09)The smoothed complexity of k-means is bounded by
O
n34k34d8D6 log4 n
σ6
Notes
While the bound is large, it is not exponential (2k k34 forlarge enough k)
The (D/σ)6 factor shows the bound is scale invariant
Sergei V. and Suresh V. Theory of Clustering
![Page 89: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/89.jpg)
Smoothed Analysis
Comparing boundsThe smoothed complexity of k-means is polynomial in n, k and D/σ
where D is the diameter of X, whereas the worst case complexity ofk-means is exponential in k
ImplicationsThe pathological examples:
Are very brittle
Can be avoided with a little bit of random noise
Sergei V. and Suresh V. Theory of Clustering
![Page 90: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/90.jpg)
k-means Summary
Running TimeExponential worst case running time
Polynomial typical case running time
Solution QualityArbitrary local optimum, even with many random restarts
Simple initialization leads to a good solution
Sergei V. and Suresh V. Theory of Clustering
![Page 91: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/91.jpg)
k-means Summary
Running TimeExponential worst case running time
Polynomial typical case running time
Solution QualityArbitrary local optimum, even with many random restarts
Simple initialization leads to a good solution
Sergei V. and Suresh V. Theory of Clustering
![Page 92: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/92.jpg)
Large Datasets
Implementing k-means++
Initialization:
Takes O(nd) time and one pass over the data to select the nextcenter
Takes O(nkd) time total
Overall running time:
Each round of k-means takes O(nkd) running time
Typically finish after a constant number of rounds
Large DataWhat if O(nkd) is too much, can we parallelize this algorithm?
Sergei V. and Suresh V. Theory of Clustering
![Page 93: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/93.jpg)
Large Datasets
Implementing k-means++
Initialization:
Takes O(nd) time and one pass over the data to select the nextcenter
Takes O(nkd) time total
Overall running time:
Each round of k-means takes O(nkd) running time
Typically finish after a constant number of rounds
Large DataWhat if O(nkd) is too much, can we parallelize this algorithm?
Sergei V. and Suresh V. Theory of Clustering
![Page 94: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/94.jpg)
Parallelizing k-means
ApproachPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
In parallel compute a clustering on each partition:
Find C j = Cj1, . . . , Cj
k: a good clustering on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters:
Let Y = ∪1≤j≤mC j. Find a clustering of Y, weighted by the
weights W = wji.
Sergei V. and Suresh V. Theory of Clustering
![Page 95: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/95.jpg)
Parallelizing k-means
ApproachPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
In parallel compute a clustering on each partition:
Find C j = Cj1, . . . , Cj
k: a good clustering on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters:
Let Y = ∪1≤j≤mC j. Find a clustering of Y, weighted by the
weights W = wji.
Sergei V. and Suresh V. Theory of Clustering
![Page 96: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/96.jpg)
Parallelizing k-means
ApproachPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
In parallel compute a clustering on each partition:
Find C j = Cj1, . . . , Cj
k: a good clustering on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters:
Let Y = ∪1≤j≤mC j. Find a clustering of Y, weighted by the
weights W = wji.
Sergei V. and Suresh V. Theory of Clustering
![Page 97: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/97.jpg)
Parallelization Example
Given XSpeed Up: Intuition
70
Sergei V. and Suresh V. Theory of Clustering
![Page 98: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/98.jpg)
Parallelization Example
Partition the dataset
Speed Up: Intuition
71
Sergei V. and Suresh V. Theory of Clustering
![Page 99: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/99.jpg)
Parallelization Example
Cluster each partition separatelySpeed Up: Intuition
72
Sergei V. and Suresh V. Theory of Clustering
![Page 100: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/100.jpg)
Parallelization Example
Cluster each partition separatelySpeed Up: Intuition
73
Sergei V. and Suresh V. Theory of Clustering
![Page 101: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/101.jpg)
Parallelization Example
Cluster each partition separatelySpeed Up: Intuition
74
Sergei V. and Suresh V. Theory of Clustering
![Page 102: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/102.jpg)
Parallelization Example
Cluster each partition separatelySpeed Up: Intuition
75
Sergei V. and Suresh V. Theory of Clustering
![Page 103: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/103.jpg)
Parallelization Example
Cluster the clusters
Speed Up: Intuition
76
Sergei V. and Suresh V. Theory of Clustering
![Page 104: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/104.jpg)
Parallelization Example
Cluster the clusters
Speed Up: Intuition
77
Sergei V. and Suresh V. Theory of Clustering
![Page 105: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/105.jpg)
Parallelization Example
Cluster the clusters
Speed Up: Intuition
78
Sergei V. and Suresh V. Theory of Clustering
![Page 106: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/106.jpg)
Parallelization Example
Final clustering:Speed Up: Intuition
79
Sergei V. and Suresh V. Theory of Clustering
![Page 107: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/107.jpg)
Parallelization Example
Final clustering:Speed Up: Intuition
80
Sergei V. and Suresh V. Theory of Clustering
![Page 108: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/108.jpg)
Analysis
Quality of the solutionWhat happens when we approximate the approximation?
Suppose the algorithm in phase 1 gave a β-approximatesolution to its input
Algorithm in phase 2 gave a γ-approximate solution to its(smaller) input
Theorem (GNMO00, AJM09)The two phase algorithm gives a 4γ(1+ β) + 2β approximatesolution.
Sergei V. and Suresh V. Theory of Clustering
![Page 109: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/109.jpg)
Analysis
Quality of the solutionWhat happens when we approximate the approximation?
Suppose the algorithm in phase 1 gave a β-approximatesolution to its input
Algorithm in phase 2 gave a γ-approximate solution to its(smaller) input
Theorem (GNMO00, AJM09)The two phase algorithm gives a 4γ(1+ β) + 2β approximatesolution.
Sergei V. and Suresh V. Theory of Clustering
![Page 110: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/110.jpg)
Analysis
Running timeSuppose we partition the input across m different machines.
First phase running time: O(nkdm).
Second phase running time O(mk2d).
Sergei V. and Suresh V. Theory of Clustering
![Page 111: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/111.jpg)
Improving the algorithm
Approximation Guarantees
Using k-means++ sets β = γ= O(log k) and leads to a O(log2 k)approximation.
Improving the ApproximationMust improve the approximation guarantee of the first round, butcan use a larger k to ensure every cluster is well summarized.
Theorem (ADK09)Running k-means++ initialization for O(k) rounds leads to a O(1)approximation to the optimal solution (but uses more centers thanOPT).
Sergei V. and Suresh V. Theory of Clustering
![Page 112: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/112.jpg)
Improving the algorithm
Approximation Guarantees
Using k-means++ sets β = γ= O(log k) and leads to a O(log2 k)approximation.
Improving the ApproximationMust improve the approximation guarantee of the first round, butcan use a larger k to ensure every cluster is well summarized.
Theorem (ADK09)Running k-means++ initialization for O(k) rounds leads to a O(1)approximation to the optimal solution (but uses more centers thanOPT).
Sergei V. and Suresh V. Theory of Clustering
![Page 113: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/113.jpg)
Improving the algorithm
Approximation Guarantees
Using k-means++ sets β = γ= O(log k) and leads to a O(log2 k)approximation.
Improving the ApproximationMust improve the approximation guarantee of the first round, butcan use a larger k to ensure every cluster is well summarized.
Theorem (ADK09)Running k-means++ initialization for O(k) rounds leads to a O(1)approximation to the optimal solution (but uses more centers thanOPT).
Sergei V. and Suresh V. Theory of Clustering
![Page 114: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/114.jpg)
Two round k-means++
Final AlgorithmPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
Compute a clustering using `= O(k) centers each partition:
Find C j = Cj1, . . . , Cj
` using k-means++ on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters.
Let Y = ∪1≤j≤mC j be a set of O(`m) points. Use k-means++
to cluster Y, weighted by the weights W = wji.
TheoremThe algorithm achieves an O(1) approximation in timeO(nkd
m+mk2d)
Sergei V. and Suresh V. Theory of Clustering
![Page 115: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/115.jpg)
Two round k-means++
Final AlgorithmPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
Compute a clustering using `= O(k) centers each partition:
Find C j = Cj1, . . . , Cj
` using k-means++ on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters.
Let Y = ∪1≤j≤mC j be a set of O(`m) points. Use k-means++
to cluster Y, weighted by the weights W = wji.
TheoremThe algorithm achieves an O(1) approximation in timeO(nkd
m+mk2d)
Sergei V. and Suresh V. Theory of Clustering
![Page 116: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/116.jpg)
Two round k-means++
Final AlgorithmPartition the data:
Split X into X1, X2, . . . , Xm of roughly equal size.
Compute a clustering using `= O(k) centers each partition:
Find C j = Cj1, . . . , Cj
` using k-means++ on each partition,
and denote by wji the number of points in cluster Cj
i.
Cluster the clusters.
Let Y = ∪1≤j≤mC j be a set of O(`m) points. Use k-means++
to cluster Y, weighted by the weights W = wji.
TheoremThe algorithm achieves an O(1) approximation in timeO(nkd
m+mk2d)
Sergei V. and Suresh V. Theory of Clustering
![Page 117: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/117.jpg)
Summary
Before...k-means used to be a prime example of the disconnect betweentheory and practice – it works well, but has horrible worst caseanalysis
...and afterSmoothed analysis explains the running time and rigorouslyanalyzed initializations routines help improve clustering quality.
Sergei V. and Suresh V. Theory of Clustering
![Page 118: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/118.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III Stability
How to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 119: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/119.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III Stability
How to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 120: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/120.jpg)
Outline
OutlineI Euclidean Clustering and k-means algorithm
What to do to select initial centers (and what not to do)How long does k-means take to run in theory, practice andtheoretical practiceHow to run k-means on large datasets
II Bregman Clustering and k-means
Bregman Clustering as generalization of k-meansPerformance Results
III StabilityHow to relate closeness in cost function to closeness inclusters.
Sergei V. and Suresh V. Theory of Clustering
![Page 121: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/121.jpg)
Clustering With Non-Euclidean Metrics
Sergei V. and Suresh V. Theory of Clustering
![Page 122: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/122.jpg)
Application I: Clustering Documents
Kullback-Leibler distance:
D(p, q) =∑
i
pi logpi
qi
Sergei V. and Suresh V. Theory of Clustering
![Page 123: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/123.jpg)
Application II: Image Analysis
Kullback-Leibler distance:
D(p, q) =∑
i
pi logpi
qi
Sergei V. and Suresh V. Theory of Clustering
![Page 124: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/124.jpg)
Application III: Speech Analysis
Itakuro-Saito distance:
D(p, q) =∑
i
pi
qi− log
pi
qi− 1
Sergei V. and Suresh V. Theory of Clustering
![Page 125: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/125.jpg)
Bregman Divergences
Definition
Let φ : Rd→ R be a strictly convex function. The Bregmandivergence dφ is defined as
Dφ(x ‖ y) = φ(x)−φ(y)− ⟨∇φ(y), x− y⟩
Examples:
Kullback-Leibler: φ(x) =∑
xi ln xi− xi, Dφ(x ‖ y) =∑
xi lnxi
yi
Itakura-Saito: φ(x) =−∑
ln xi, Dφ(x ‖ y) =∑
ixi
yi− log xi
yi− 1
`22: φ(x) = 1
2‖x‖2, Dφ(x ‖ y) = ‖x− y‖2
Sergei V. and Suresh V. Theory of Clustering
![Page 126: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/126.jpg)
Overview
k-means clustering ≡ Bregman clustering
The algorithm works the same way.
Same (bad) worst-case behavior
Same (good) smoothed behavior
Same (good) quality guarantees, with correct initialization
Sergei V. and Suresh V. Theory of Clustering
![Page 127: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/127.jpg)
Properties
p
q
D(q‖p)
D(p‖q)
Dφ(x ‖ y) = φ(x)−φ(y)− ⟨∇φ(y), x− y⟩
Asymmetry: In general, Dφ(p ‖ q) 6= Dφ(q ‖ p)No triangle inequality: Dφ(p ‖ q) +Dφ(q ‖ r) can be less thanDφ(p ‖ r) !
How can we now do clustering ?Sergei V. and Suresh V. Theory of Clustering
![Page 128: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/128.jpg)
Breaking down k-means
Initialize cluster centerswhile not converged do
Assign points to nearest cluster center
Find new cluster center by averaging points assigned togetherend while
Key PointSetting cluster center as centroid minimizes the average squareddistance to center
Sergei V. and Suresh V. Theory of Clustering
![Page 129: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/129.jpg)
Breaking down k-means
Initialize cluster centerswhile not converged do
Assign points to nearest cluster center
Find new cluster center by averaging points assigned together
end while
Key PointSetting cluster center as centroid minimizes the average squareddistance to center
Sergei V. and Suresh V. Theory of Clustering
![Page 130: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/130.jpg)
Bregman Centroids
Problem
Given points x1, . . . xn ∈ Rd, find c such that∑
i
Dφ(xi ‖ c)
is minimized.
Answer
c=1
n
∑
xi
Independent of φ[BMDG05] !
Sergei V. and Suresh V. Theory of Clustering
![Page 131: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/131.jpg)
Bregman k-means
Initialize cluster centerswhile not converged do
Assign points to nearest cluster center (by measuringDφ(x ‖ c))Find new cluster center by averaging points assigned together
end while
Key PointSetting cluster center as centroid minimizes average Bregmandivergence to center
Sergei V. and Suresh V. Theory of Clustering
![Page 132: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/132.jpg)
Convergence
Lemma ([BMDG05])The (Bregman) k-means algorithm converges in cost.
Euclidean distance:The quantity
∑
C
∑
x∈C
‖x− center(C)‖2
decreases with each iteration ofk-means
Bregman divergence: BregmanInformation:
∑
C
∑
x∈C
Dφ(x ‖ center(C))
decreases with each iteration ofthe Bregman k-means algorithm.
Sergei V. and Suresh V. Theory of Clustering
![Page 133: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/133.jpg)
EM and Soft Clustering
Expectation maximization:Initialize density parameters and means for k distributionswhile not converged do
For distribution i and point x, compute conditional probabilityp(i|x) that x was drawn from i (by Bayes rule)For each distribution i, recompute new density parametersand means (via maximum likelihood)
end while
This yields a soft clustering of points to “clusters”
Originally used for mixtures of Gaussians.
Sergei V. and Suresh V. Theory of Clustering
![Page 134: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/134.jpg)
Exponential Families And Bregman Divergences
Definition (Exponential Family)Parametric family of distributions pΨ,θ is an exponential family ifeach density is of the form
pΨ,θ = exp(⟨x,θ ⟩ −Ψ(θ))p0(x)
with Ψ convex.
Let φ(t) = Ψ∗(t) be the Legendre-Fenchel dual of Ψ(x):
φ(t) = supx
⟨x, t⟩ −Ψ(x)
Theorem ([BMDG05])
pΨ,θ = exp(−Dφ(x ‖ µ))bφ(x)
where µ is the expectation parameter ∇Ψ(θ)
Sergei V. and Suresh V. Theory of Clustering
![Page 135: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/135.jpg)
EM: Euclidean and Bregman
Expectation maximization:Initialize density parameters and means for k distributionswhile not converged do
For distribution i and point x, compute conditional probabilityp(i|x) that x was drawn from i (by Bayes rule)For each distribution i, recompute new density parametersand means (via maximum likelihood)
end while
Choosing the corresponding Bregman divergence Dφ(· ‖ ·),φ =Ψ∗
gives mixture density estimation for any exponential family pΨ,θ .
Sergei V. and Suresh V. Theory of Clustering
![Page 136: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/136.jpg)
Performance Analysis
Sergei V. and Suresh V. Theory of Clustering
![Page 137: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/137.jpg)
Performance Analysis
Two questions:
Problem (Rate of convergence)Given an arbitrary set of n points in d dimensions, how long does ittake for (Bregman) k-means to converge ?
Problem (Quality of Solution)Let OPT denote the optimal clustering that minimizes the averagesum of (Bregman) distances to cluster centers. How close to OPT isthe solution returned by (Bregman) k-means ?
Sergei V. and Suresh V. Theory of Clustering
![Page 138: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/138.jpg)
Performance Analysis
Two questions:
Problem (Rate of convergence)Given an arbitrary set of n points in d dimensions, how long does ittake for (Bregman) k-means to converge ?
Problem (Quality of Solution)Let OPT denote the optimal clustering that minimizes the averagesum of (Bregman) distances to cluster centers. How close to OPT isthe solution returned by (Bregman) k-means ?
Sergei V. and Suresh V. Theory of Clustering
![Page 139: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/139.jpg)
Performance Analysis
Two questions:
Problem (Rate of convergence)Given an arbitrary set of n points in d dimensions, how long does ittake for (Bregman) k-means to converge ?
Problem (Quality of Solution)Let OPT denote the optimal clustering that minimizes the averagesum of (Bregman) distances to cluster centers. How close to OPT isthe solution returned by (Bregman) k-means ?
Sergei V. and Suresh V. Theory of Clustering
![Page 140: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/140.jpg)
Convergence of k-means
Parameters: n, k, d.
, Good news
k-means always converges in O(nkd) time.
/ Bad news
k-means can take time 2Ω(k) to converge:
Even if d= 2, i.e in the plane
Even if centers are chosen from the initial data
Sergei V. and Suresh V. Theory of Clustering
![Page 141: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/141.jpg)
Convergence of k-means
Parameters: n, k, d.
, Good news
k-means always converges in O(nkd) time.
/ Bad news
k-means can take time 2Ω(k) to converge:
Even if d= 2, i.e in the plane
Even if centers are chosen from the initial data
Sergei V. and Suresh V. Theory of Clustering
![Page 142: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/142.jpg)
Convergence of k-means
Parameters: n, k, d.
, Good news
k-means always converges in O(nkd) time.
/ Bad news
k-means can take time 2Ω(k) to converge:
Even if d= 2, i.e in the plane
Even if centers are chosen from the initial data
Sergei V. and Suresh V. Theory of Clustering
![Page 143: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/143.jpg)
Convergence of Bregman k-means
Euclidean distance:k-means can take time 2Ω(k) toconverge:
Even if d= 2, i.e in theplane
Even if centers are chosenfrom the initial data
Bregman divergence:For some Bregman divergences,k-means can take time 2Ω(k) toconverge[MR09]:
Even if d= 2, i.e in theplane
Even if centers are chosenfrom the initial data
Sergei V. and Suresh V. Theory of Clustering
![Page 144: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/144.jpg)
Proof Idea
"Well behaved" Bregman divergences look "locally Euclidean":
x|‖x− c‖2 ≤ 1
c
x | Dφ(x, c) ≤ 1
c
Take a bad Euclidean instance and shrink it to make it local.
Sergei V. and Suresh V. Theory of Clustering
![Page 145: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/145.jpg)
Smoothed Analysis
Real inputs aren’t worst-case!
k-Means Convergence
Huge gap between worst-case and observed results.
Check how fragile the worst case is.
Add a little bit of noise to the data before running the algorithm
28Analyze expected run-time over perturbations.
Sergei V. and Suresh V. Theory of Clustering
![Page 146: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/146.jpg)
Smoothed Analysis
Real inputs aren’t worst-case!
k-Means Convergence
Huge gap between worst-case and observed results.
Check how fragile the worst case is.
Add a little bit of noise to the data before running the algorithm
Optimum solution barely changes29
Analyze expected run-time over perturbations.
Sergei V. and Suresh V. Theory of Clustering
![Page 147: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/147.jpg)
k-means: Worst-case vs Smoothed
TheoremSmoothed complexity of k-means using Gaussian noise with varianceσ is polynomial in n and 1/σ.
Compare this to worst-case lower bound of 2Θ(n)
Sergei V. and Suresh V. Theory of Clustering
![Page 148: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/148.jpg)
Bregman Smoothing
Normal smoothing doesn’t work !
∆n = (x1, . . . xn) | ∑xi = 1
Sergei V. and Suresh V. Theory of Clustering
![Page 149: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/149.jpg)
Bregman smoothing
More general notion of smoothing:
perturbation should stay close to a hyperplane
density of perturbation is proportional to 1/σd
Sergei V. and Suresh V. Theory of Clustering
![Page 150: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/150.jpg)
Bregman smoothing
More general notion of smoothing:
perturbation should stay close to a hyperplane
density of perturbation is proportional to 1/σd
Sergei V. and Suresh V. Theory of Clustering
![Page 151: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/151.jpg)
Bregman smoothing
More general notion of smoothing:
perturbation should stay close to a hyperplane
density of perturbation is proportional to 1/σd
Sergei V. and Suresh V. Theory of Clustering
![Page 152: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/152.jpg)
Bregman smoothing: Results
Theorem ([MR09])For “well-behaved” Bregman divergences, smoothed complexity isbounded by poly(n
pk, 1/σ) and kkdpoly(n, 1/σ).
This is in comparison to worst-case bound of 2Ω(n).
Sergei V. and Suresh V. Theory of Clustering
![Page 153: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/153.jpg)
Performance Analysis
Two questions:
Problem (Rate of convergence)Given an arbitrary set of n points in d dimensions, how long does ittake for (Bregman) k-means to converge ?
Problem (Quality of Solution)Let OPT denote the optimal clustering that minimizes the averagesum of (Bregman) distances to cluster centers. How close to OPT isthe solution returned by (Bregman) k-means ?
Sergei V. and Suresh V. Theory of Clustering
![Page 154: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/154.jpg)
Performance Analysis
Two questions:
Problem (Rate of convergence)Given an arbitrary set of n points in d dimensions, how long does ittake for (Bregman) k-means to converge ?
Problem (Quality of Solution)Let OPT denote the optimal clustering that minimizes the averagesum of (Bregman) distances to cluster centers. How close to OPT isthe solution returned by (Bregman) k-means ?
Sergei V. and Suresh V. Theory of Clustering
![Page 155: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/155.jpg)
Optimality and Approximations
ProblemGiven x1, . . . , xn, and parameter k, find k centers c1, . . . , ck such that
n∑
x=1
kminj=1
d(xi, cj)
is minimized.
Problem (c-approximation)Let OPT be the optimal solution above. Fix c> 0. Find centersc′1, . . . c′k such that if A=
∑nx=1 mink
j=1 d(xi, c′j), then
OPT≤ A≤ c ·OPT
Sergei V. and Suresh V. Theory of Clustering
![Page 156: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/156.jpg)
Optimality and Approximations
ProblemGiven x1, . . . , xn, and parameter k, find k centers c1, . . . , ck such that
n∑
x=1
kminj=1
d(xi, cj)
is minimized.
Problem (c-approximation)Let OPT be the optimal solution above. Fix c> 0. Find centersc′1, . . . c′k such that if A=
∑nx=1 mink
j=1 d(xi, c′j), then
OPT≤ A≤ c ·OPT
Sergei V. and Suresh V. Theory of Clustering
![Page 157: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/157.jpg)
k-means++: Initialize carefully!
InitializationLet distance from x to nearest cluster center be D(x)
Pick x as new center with probability
p(x)∝ D2(x)
Properties of solution:
For arbitrary data, this gives O(log n)-approximation
For “well-separated data”, this gives constant(O(1))-approximation.
Sergei V. and Suresh V. Theory of Clustering
![Page 158: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/158.jpg)
What is ’well-separated’
Informally, data is (k,α)-well separated if the best clustering thatuses k− 1 clusters has cost that is ≥ 1/α ·OPT.
Sergei V. and Suresh V. Theory of Clustering
![Page 159: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/159.jpg)
What is ’well-separated’
Informally, data is (k,α)-well separated if the best clustering thatuses k− 1 clusters has cost that is ≥ 1/α ·OPT.
Sergei V. and Suresh V. Theory of Clustering
![Page 160: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/160.jpg)
What is ’well-separated’
Informally, data is (k,α)-well separated if the best clustering thatuses k− 1 clusters has cost that is ≥ 1/α ·OPT.
Sergei V. and Suresh V. Theory of Clustering
![Page 161: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/161.jpg)
Bregman k-means++
InitializationLet Bregman divergence from x to nearest cluster center beD(x)
Pick x as new center with probability
p(x)∝ D(x)
Run algorithm as before.
Theorem ([AB09, AB10])O(1)-approximation for (k,α)-separated sets.
O(log n) approximation in general.
Sergei V. and Suresh V. Theory of Clustering
![Page 162: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/162.jpg)
Stability in clustering
Sergei V. and Suresh V. Theory of Clustering
![Page 163: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/163.jpg)
Target and Optimal clustering
C∗OPT
C
d(OPT, C∗)
dq(OPT, C)
Two measures of cost:
Distance between clusterings C ,C ∗:
d(C ,C ∗) = fraction of points on which they disagree
(Quality) distance from C to OPT:
dq(C , OPT) =cost(C )
cost(OPT)
Can closeness in dq imply closeness in d ?
Sergei V. and Suresh V. Theory of Clustering
![Page 164: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/164.jpg)
NP-hardness
NP-hardness is an obstacle to finding good clusterings.
k-means and k-median are NP-hard, and hard to approximatein general graphs
k-means, k-median can be approximated in Rd but seem toneed time exponential in d
Same is true for Bregman clustering[CM08]
Sergei V. and Suresh V. Theory of Clustering
![Page 165: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/165.jpg)
Target And Optimal Clusterings
What happens if target clustering and optimal clustering are notthe same ?
C∗OPT
C
Measuring dq
The two distance functions might be incompatible.
Sergei V. and Suresh V. Theory of Clustering
![Page 166: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/166.jpg)
Target And Optimal Clusterings
What happens if target clustering and optimal clustering are notthe same ?
C∗OPT
C
Measuring d
The two distance functions might be incompatible.
Sergei V. and Suresh V. Theory of Clustering
![Page 167: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/167.jpg)
Stability Of Clusterings
An instance is stable if approximating the cost function gives us asolution close to the target clustering.
View 1: If we perturb inputs, the output should not change.
View 2: If we change the distance function, output should notchange.
View 3: If we change the cost quality of solution, then outputshould not change.
Sergei V. and Suresh V. Theory of Clustering
![Page 168: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/168.jpg)
Stability Of Clusterings
An instance is stable if approximating the cost function gives us asolution close to the target clustering.
View 1: If we perturb inputs, the output should not change.
View 2: If we change the distance function, output should notchange.
View 3: If we change the cost quality of solution, then outputshould not change.
Sergei V. and Suresh V. Theory of Clustering
![Page 169: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/169.jpg)
Stability Of Clusterings
An instance is stable if approximating the cost function gives us asolution close to the target clustering.
View 1: If we perturb inputs, the output should not change.
View 2: If we change the distance function, output should notchange.
View 3: If we change the cost quality of solution, then outputshould not change.
Sergei V. and Suresh V. Theory of Clustering
![Page 170: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/170.jpg)
Stability I: Perturbing Inputs
Well separated sets:
Data is (k,α)-well separated if the best clustering that uses k− 1clusters has cost that is ≥ 1/α ·OPT.
Two interesting properties[ORSS06]:
All optimal clusterings mostly look the same: dq small⇒ dsmall.
Small perturbations of the data don’t change this property.
Computationally, well-separatedness makes k-means work well
Sergei V. and Suresh V. Theory of Clustering
![Page 171: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/171.jpg)
Stability II: Perturbing Distance Function
Definition (α-perturbations[BL09])A clustering instance (P, d) is α-perturbation-resilient if the optimalclustering is identical to the optimal clustering for any (P, d′), where
d(x, y)/α≤ d′(x, y)≤ d(x, y) ·α
The smaller the α, the more resilient the instance (and themore “stable”)
Center-based clustering problems (k-median, k-means,k-center) can be solved optimally forp
3-perturbation-resilient inputs[ABS10]
Sergei V. and Suresh V. Theory of Clustering
![Page 172: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/172.jpg)
Stability III: Perturbing Quality of Solution
Definition ((c,ε)-property[BBG09])Given an input, all clusterings that are c-approximate are alsoε-close.
Surprising facts:
Finding a c-approximation in general might be NP-hard.
Finding a c-approximation here is easy !
Sergei V. and Suresh V. Theory of Clustering
![Page 173: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/173.jpg)
Proof Idea
If near-optimal clusters are close to true answer, then clustersmust be well-separated.
If clusters are well-separated, then choosing the rightthreshold separates them cleanly.
Important that ALL near-optimal clusterings are close to trueanswer.
Sergei V. and Suresh V. Theory of Clustering
![Page 174: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/174.jpg)
Proof Idea
If near-optimal clusters are close to true answer, then clustersmust be well-separated.
If clusters are well-separated, then choosing the rightthreshold separates them cleanly.
Important that ALL near-optimal clusterings are close to trueanswer.
Sergei V. and Suresh V. Theory of Clustering
![Page 175: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/175.jpg)
Proof Idea
If near-optimal clusters are close to true answer, then clustersmust be well-separated.
If clusters are well-separated, then choosing the rightthreshold separates them cleanly.
Important that ALL near-optimal clusterings are close to trueanswer.
Sergei V. and Suresh V. Theory of Clustering
![Page 176: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/176.jpg)
Proof Idea
If near-optimal clusters are close to true answer, then clustersmust be well-separated.
If clusters are well-separated, then choosing the rightthreshold separates them cleanly.
Important that ALL near-optimal clusterings are close to trueanswer.
Sergei V. and Suresh V. Theory of Clustering
![Page 177: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/177.jpg)
Main Result
TheoremIn polynomial time, we can find a clustering that is O(ε)-close to thetarget clustering, even if finding a c-approximation is NP-hard.
Sergei V. and Suresh V. Theory of Clustering
![Page 178: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/178.jpg)
Generalization
Strong assumption: ALL near-optimal clusterings are close to trueanswer.Variant[ABS10]: Only consider Voronoi-based clusterings, whereeach point is assigned to nearest cluster center.
Same results hold as for previous case.
Sergei V. and Suresh V. Theory of Clustering
![Page 179: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/179.jpg)
Generalization
Strong assumption: ALL near-optimal clusterings are close to trueanswer.Variant[ABS10]: Only consider Voronoi-based clusterings, whereeach point is assigned to nearest cluster center.
Same results hold as for previous case.
Sergei V. and Suresh V. Theory of Clustering
![Page 180: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/180.jpg)
Wrap Up
Sergei V. and Suresh V. Theory of Clustering
![Page 181: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/181.jpg)
We understand much more about the behavior of k-means,and why it does well in practice.
A simple initialization procedure for k-means is both effectiveand gives provable guarantees
Much of the theoretical machinery around k-means works forthe generalization to Bregman divergences.
New and interesting questions on the relationship betweenthe target clustering and cost measures used to get near it:ways of subverting NP-hardness.
Sergei V. and Suresh V. Theory of Clustering
![Page 182: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/182.jpg)
Thank You
Slides for this tutorial can be found at
http://www.cs.utah.edu/~suresh/web/2010/05/08/new-developments-in-the-theory-of-clustering-tutorial/
Research on this tutorial was partially supported by NSF CCF-0953066
Sergei V. and Suresh V. Theory of Clustering
![Page 183: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/183.jpg)
References I
Marcel R. Ackermann and Johannes Blömer.
Coresets and approximate clustering for bregman divergences.In Mathieu [Mat09], pages 1088–1097.
Marcel R. Ackermann and Johannes Blömer.
Bregman clustering for separable instances.In Kaplan [Kap10], pages 212–223.
P. Awasthi, A. Blum, and O. Sheffet.
Clustering Under Natural Stability Assumptions.Computer Science Department, page 123, 2010.
Ankit Aggarwal, Amit Deshpande, and Ravi Kannan.
Adaptive sampling for k-means clustering.In APPROX ’09 / RANDOM ’09: Proceedings of the 12th International Workshop and 13th International Workshop onApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 15–28, Berlin,Heidelberg, 2009. Springer-Verlag.
Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni.
Streaming k-means approximation.In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural InformationProcessing Systems 22, pages 10–18. 2009.
David Arthur, Bodo Manthey, and Heiko Röglin.
k-means has polynomial smoothed complexity.In FOCS ’09: Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages405–414, Washington, DC, USA, 2009. IEEE Computer Society.
Sergei V. and Suresh V. Theory of Clustering
![Page 184: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/184.jpg)
References II
David Arthur and Sergei Vassilvitskii.
k-means++: the advantages of careful seeding.In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035,Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.
Maria-Florina Balcan, Avrim Blum, and Anupam Gupta.
Approximate clustering without the approximation.In Mathieu [Mat09], pages 1068–1077.
Yonatan Bilu and Nathan Linial.
Are stable instances easy?CoRR, abs/0906.3162, 2009.
Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh.
Clustering with bregman divergences.Journal of Machine Learning Research, 6:1705–1749, 2005.
Kamalika Chaudhuri and Andrew McGregor.
Finding metric structure in information theoretic clustering.In Servedio and Zhang [SZ08], pages 391–402.
Yingfei Dong, Ding-Zhu Du, and Oscar H. Ibarra, editors.
Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18,2009. Proceedings, volume 5878 of Lecture Notes in Computer Science. Springer, 2009.
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan.
Clustering data streams.In FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, page 359, Washington,DC, USA, 2000. IEEE Computer Society.
Sergei V. and Suresh V. Theory of Clustering
![Page 185: New Developments In The Theory Of Clusteringtheory.stanford.edu/~sergei/slides/kdd10-thclust.pdf · New Developments In The Theory Of Clustering ... Suresh Venkatasubramanian (U.](https://reader031.fdocuments.net/reader031/viewer/2022030701/5aec608e7f8b9a90318e3701/html5/thumbnails/185.jpg)
References III
Haim Kaplan, editor.
Algorithm Theory - SWAT 2010, 12th Scandinavian Symposium and Workshops on Algorithm Theory, Bergen,Norway, June 21-23, 2010. Proceedings, volume 6139 of Lecture Notes in Computer Science. Springer, 2010.
Claire Mathieu, editor.
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA,January 4-6, 2009. SIAM, 2009.
Bodo Manthey and Heiko Röglin.
Worst-case and smoothed analysis of -means clustering with bregman divergences.In Dong et al. [DDI09], pages 1024–1033.
Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy.
The effectiveness of lloyd-type methods for the k-means problem.In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 165–176,Washington, DC, USA, 2006. IEEE Computer Society.
Rocco A. Servedio and Tong Zhang, editors.
21st Annual Conference on Learning Theory - COLT 2008, Helsinki , Finland, July 9-12, 2008. Omnipress, 2008.
Andrea Vattani.
k-means requires exponentially many iterations even in the plane.In SCG ’09: Proceedings of the 25th annual symposium on Computational geometry, pages 324–332, New York, NY,USA, 2009. ACM.
Sergei V. and Suresh V. Theory of Clustering