Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf ·...

33
Margareta Ackerman Based on joint work with Shai Ben-David, David Loker, and Simina Branzei. Weighted clustering

Transcript of Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf ·...

Page 1: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Margareta Ackerman !

!Based on joint work with Shai Ben-David, David Loker, and Simina Branzei.

Weighted clustering

Page 2: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

2

! There are three fundamental categories that clearly

delineate some essential differences between common clustering methods

! The strength of these categories lies in their simplicity. !!

!

Properties based on weight response Ackerman, Ben-David, Branzei and Loker (AAAI 2012)

Page 3: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Every element is associated with a real valued weight, representing its “mass” or “importance”.

! Generalizes the notion of element duplication. !

3

Weighted Clustering

Page 4: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

• Apply clustering to facility allocation, such as the

placement of police stations in a new district. • The distribution of stations should enable quick

access to most areas in the district.

4

• Accessibility of different

institutions to a station may have varying importance.

• The weighted setting enables a convenient method for prioritizing certain landmarks.

!

Other Reasons to Add Weight: An Example

Page 5: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Traditional clustering algorithms can be

readily translated into the weighted setting by considering their behavior on data containing element duplicates.

!

5

Algorithms in the Weighted Clustering Setting

Page 6: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

A weight function w: X →R+ defines the weight of every element. !!!A dissimilarity function d: X x X →R + u {0} is the dissimilarity defined between pairs of elements. !

Formal Setting

Page 7: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

(w[X],d) denotes weighted data ! A Partitional Algorithm maps Input: (w[X],d,k) to Output: a k-clustering of X !!

!!

Formal Setting Partitional Clustering Algorithm

Page 8: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Range(A(X, d)) = {C | ∃ w s.t. C=A(w[X], d)} ! The set of clusterings that A outputs on (X, d) over all

possible weight functions. !

Towards Basic CategoriesRange(X,d)

Page 9: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

9

A is weight-robust if for all (X, d), |Range(X,d)| = 1. ! !!! A never responds to weight.

!

Categories: Weight Robust

Page 10: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

10

!

A is weight-sensitive if for all (X, d), |Range(X,d)| > 1. !

!

!A always responds to weight.

!

Categories: Weight Sensitive

Page 11: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

11

An algorithm A is weight-considering if 1) There exists (X, d) where |Range(X,d)|=1. 2) There exists (X, d) where |Range(X,d)|>1. !

!

A responds to weight on some data sets, but not others.

Categories: Weight Considering

Page 12: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Range(A(X, d)) = {C | ∃ w such that A(w[X], d) = C}

!!

Weight-robust: for all (X, d), |Range(X,d)| = 1.

!!

Weight-sensitive: for all (X, d),|Range(X,d)| > 1. !!

Weight-considering: 1) ∃ (X, d) where |Range(X,d)|=1.

2) ∃ (X, d) where |Range(X,d)|>1.

!!

12

Summary of Categories

Page 13: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

In the facility allocation example above, a weight-sensitive algorithm may be preferred.

In phylogeny, where sampling procedures can be highly biased, weight robustness may be desired.

The desired category depends on the application.

Connecting To Applications

Page 14: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Algorithms

Weight Robust Min Diameter K-center Single Linkage Complete Linkage

Weight Sensitive K-means, k-medoids, k-median, min-sum, Ward’s Method, Bisecting K-means

Weight Considering Ratio Cut, Average Linkage !

Classification

Page 15: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

15

We show that k-means (the objective function) is weight-sensitive. !A is weight-separable if for any data set (X, d) and subset S of X with at most k points, ∃ w so that A(w[X],d,k) separates all points of S. !!Fact: Every algorithm that is weight-separable is also weight-sensitive. !

!

Zooming Into: Weight Sensitive Algorithms

Page 16: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

!Given a clustering {C1, C2, …, Ck}, the weighted k-means objective function is !!!!Where ci is the mean of Ci. That is, !!

16

Weighted k-means objective function

Page 17: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

17

Proof: • Show that k-means is weight-separable

• Consider any (X,d) and S⊂X on at least k points

• Increase weight of points in S until each belongs to a distinct cluster.

Theorem: k-means is weight-sensitive. !

K-means is Weight-Sensitive

Page 18: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

18

These algorithms are invariant to element duplication. !Ex. Single linkage (Kruskle’s algorithm for minimum spanning tree) !As the minimum spanning tree is independent of the weight of the points, single-linkage is weight robust.

Zooming Into: Weight Robust Algorithms

Page 19: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

• We will show that Average-Linkage is Weight Considering.

• We could also characterize the precise conditions under which it is sensitive to weight.

19

Recall: An algorithm A is weight-considering if 1) There exists (X, d) where |Range(X,d)|=1. 2) There exists (X, d) where |Range(X,d)|>1. !!!

!

Zooming Into: Weight Considering Algorithms

Page 20: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

20

• Average Linkage starts by creating a cluster for every element. • It then repeatedly merges the “closest” clusters using the following linkage function, until exactly k clusters remain: !!!!!!!!

!

Weighted Average Linkage

Page 21: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Data where Average Linkage ignore weights: !!!!!!!For k=2, average-linkage outputs the clustering {{A,B}, {C,D}} regardless of the weights (or number of occurrences) of these points. !!!!!!!!

!

A B C D

Average Linkage is Weight Considering

Page 22: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Data where Average Linkage responds to weights: !!!!!!!!!!!

!

A B C D E

2+2ϵ 11 1+ϵ

A B C D E

2+2ϵ 11 1+ ϵ

Weights are all 1: !{{A,B,C,D},{E}} !!!

Dark points have much higher weights than light points: !{{A,B},{C,D,E}} !

Average Linkage is Weight Considering

Page 23: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

23

When is average linkage sensitive to weight? !

!

Turns out that it can be characterized! !

!

!!

!

When is Average Linkage Sensitive to Weight?

Page 24: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

24

A clustering is nice if every point is closer to all points within its cluster than to all other points. !

!

!!

!

Nice

Nice clustering

Page 25: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

25

A clustering is nice if every point is closer to all points within its cluster than to all other points. !

!

!!

!

Nice

Nice clustering

Page 26: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

26

A clustering is nice if every point is closer to all points within its cluster than to all other points. !

!

!!

!

Not nice

Nice clustering

Page 27: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

27

It can be shown that average-linkage ignores weights (for fixed k) on data that has a (unique) nice k-clustering. !Furthermore, it responds to weight when there are no nice k-clusterings. !!There is a more elegant result in the hierarchical clustering setting.

When is Average Linkage Sensitive to Weight?

Page 28: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

28

!

!

• The above analysis for k-means and similar methods is for their corresponding objective functions. !• Unfortunately, optimal partitions are NP-hard to find. In practice, heuristics such as the Lloyd method are used. !!

!

What about heuristics?

Page 29: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

29

!

!

• Some heuristics are randomized. As such, we introduce the notion of a randomized range. !!!!•That is, the randomized range is the set of clusterings that are produced with arbitrarily high probably, when we can modify weights. !

!

Randomized heuristics

randRange(A(X, d)) = {C | 8✏ < 19w such that P (A(w[X], d) = C) > 1� ✏}

Page 30: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Heuristics

Weight Sensitive •Lloyd with random initialization •K-means++ •PAM

Weight Considering •The Lloyd Method with Further Centroid Initialization

Note that the more popular heuristics respond to weights in the same way as the k-means and k-medoids objective functions.

Heuristics classification

Page 31: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Heuristics

Weight Sensitive •Lloyd with random initialization •K-means++ •PAM

Weight Considering •The Lloyd Method with Further Centroid Initialization

Just like Average-Linkage, the Lloyd Method with Furthest Centroid initialization responds to weight only on data without nice clusterings.

Heuristics classification

Page 32: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

Algorithms

Weight Robust •Min Diameter •K-center •Single Linkage •Complete Linkage

Weight Sensitive •K-means •k-medoids •k-median •Min-Sum •Randomized Lloyd •k-means++ •Ward’s Method •Bisecting K-means

Weight Considering •Ratio Cut •Lloyd with Furthest Centroid •Average Linkage

Classification

Page 33: Margareta Ackerman - Computer Science, FSUackerman/CIS5930/notes/Weighted clustering.pdf · Margareta Ackerman !! Based on joint work with Shai Ben-David, David Loker, and Simina

• We saw some of the most popular clustering algorithms

• We introduced a framework for choosing clustering algorithms based on their input-output behavior

• We saw three categories describing how algorithms respond to weights

• The same results apply in the non-weighted setting for data duplicates

Conclusions