Towards the world's fastest k-means...

59
The k-means clustering algorithm Opportunities to speed up Lloyd’s algorithm Algorithms that avoid distance calculations Experimental results Finally Towards the world’s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Transcript of Towards the world's fastest k-means...

Page 1: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Towards the world’s fastest k-means algorithm

Greg HamerlyAssociate Professor

Computer Science DepartmentBaylor University

Joint work with Jonathan Drake

May 15, 2014

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 2: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Towards the world’s fastest k-means algorithm

1 The k-means clustering algorithmObjective function and optimizationLloyd’s algorithm

2 Opportunities to speed up Lloyd’s algorithm

3 Algorithms that avoid distance calculations

4 Experimental results

5 Finally

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 3: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Visual representation of k-means

Input Output

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 4: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Popularity and applications of k-means

Google searches (May 2014):

# hits

Search query Google Google Scholar

k-means clustering 2.6M 316k

support vector machine classifier 1.7M 477k

nearest neighbor classifier 0.5M 103k

logistic regression classifier 0.3M 61k

Applications:

Discovering groups/structure in data

Lossy data compression (e.g. color quantization, voice coding,representative sampling)

Initialize more expensive algorithms (e.g. Gaussian mixtures)

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 5: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Optimization criteria and NP-hardness

K-means is not really an algorithm, it’s a criterion for clusteringquality.

Criterion: J(C ,X ) =∑x∈X

minc∈C||x − c||2

Goal: Find C that minimizes J(C ,X )

NP-hard in general.

There are lots of approaches to finding ‘good enough’solutions.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 6: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Hill-climbing approaches

The most popular algorithms rely on hill-climbing:

Choose an initial set of centers.

Repeat until convergence:

Move the centers to better locations.

Because J(C ,X ) is non-convex, hill-climbing won’t in general findoptimal solutions.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 7: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Lloyd’s algorithm

The most popular algorithm for k-means (Lloyd 1982)

Batch version:

Choose an initial set of centers.

Repeat until convergence:

Assign each point x ∈ X to its currently closest center.Move each center c ∈ C to the average of its assigned points.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 8: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Example

K-means (k=3, n=100) iteration 1

K-means (k=3, n=100) iteration 2 K-means (k=3, n=100) iteration 3

K-means (k=3, n=100) iteration 4 K-means (k=3, n=100) iteration 5

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 9: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Example

K-means (k=3, n=100) iteration 1 K-means (k=3, n=100) iteration 2

K-means (k=3, n=100) iteration 3

K-means (k=3, n=100) iteration 4 K-means (k=3, n=100) iteration 5

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 10: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Example

K-means (k=3, n=100) iteration 1 K-means (k=3, n=100) iteration 2 K-means (k=3, n=100) iteration 3

K-means (k=3, n=100) iteration 4 K-means (k=3, n=100) iteration 5

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 11: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Example

K-means (k=3, n=100) iteration 1 K-means (k=3, n=100) iteration 2 K-means (k=3, n=100) iteration 3

K-means (k=3, n=100) iteration 4

K-means (k=3, n=100) iteration 5

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 12: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Example

K-means (k=3, n=100) iteration 1 K-means (k=3, n=100) iteration 2 K-means (k=3, n=100) iteration 3

K-means (k=3, n=100) iteration 4 K-means (k=3, n=100) iteration 5

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 13: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Efficiency

Lloyd’s algorithm is ‘fast enough’ most of the time:

Each iteration is O(nkd) in the size of the data

number of points n, clusters k , dimension d

The number of iterations is usually small...

Theoretically it can be superpolynomial: 2Ω(√

n) (Vassilvitskiiand Arthur 2006).

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 14: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Objective function and optimizationLloyd’s algorithm

Initialization (and restarting)

Lloyd’s algorithm is deterministic (given the same initialization).

A ‘good’ initialization is ‘close to’ the global optimum.

What if the initialization is at the local (or global) optimum?

Common practice: try many initializations, keep the best.

k-means++ is a really good initialization, with provably optimalexpected quality (Arthur and Vassilvitskii 2007).

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 15: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Towards the world’s fastest k-means algorithm

1 The k-means clustering algorithm

2 Opportunities to speed up Lloyd’s algorithmMany unnecessary distance calculationsThree key ideas

3 Algorithms that avoid distance calculations

4 Experimental results

5 Finally

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 16: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Way too many distance calculations

Lloyd’s algorithm spends the vast majority of its time determiningdistances.

For each point, what is its closest center?

Naively, this is O(kd) for each point.

Many (most!) of these distance calculations are unnecessary.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 17: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Reinventing the wheel

If you’re like me, you want to implement algorithms to understandthem.

K-means is available in many packages: ELKI, graphlab, Mahout,MATLAB, MLPACK, Octave, OpenCV, R, SciPy, Weka, and Yael.

None of these implement the accelerations presented here.

Let’s do something about this.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 18: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Exactly replacing Lloyd’s algorithm

Lloyd’s algorithm is pervasive.

Therefore, we have a strong desire to create a fast version.

The algorithms I’ll talk about give exactly the same answer, butmuch faster.

This work is not about approximation, which can of course bemany times faster still.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 19: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Key idea #1: Caching previous distances

From one iteration to the next, if a center doesn’t move much, theO(n) distances to that center won’t change much either.

K-means (k=3, n=100) iteration 3 K-means (k=3, n=100) iteration 4

Could we save the distances computed in iteration t to use initeration t + 1?

Not directly...

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 20: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Key idea #2: Distances are sufficient but not necessary

What if we didn’t have distances, but we had an oracle that cananswer the question:

Given a point x , what is its closest c ∈ C?

We could still run Lloyd’s k-means algorithm!

Point: distances are unnecessary; we only need which center isclosest.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 21: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Key idea #3: Triangle inequality

||a− b|| ≤ ||a− c ||+ ||b − c||

We can apply this to moving centers.

If we know ||x − c||, and c moves to c ′,then

||x − c ′|| ≤ ||x − c ||+ ||c − c ′||

x

c

c ′

This is an upper bound; we can also construct a lower bound.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 22: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Key idea #3: Triangle inequality

||a− b|| ≤ ||a− c ||+ ||b − c||

We can apply this to moving centers.

If we know ||x − c||, and c moves to c ′,then

||x − c ′|| ≤ ||x − c ||+ ||c − c ′||

x

c

c ′

This is an upper bound; we can also construct a lower bound.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 23: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Key idea #3: Triangle inequality

||a− b|| ≤ ||a− c ||+ ||b − c||

We can apply this to moving centers.

If we know ||x − c||, and c moves to c ′,then

||x − c ′|| ≤ ||x − c ||+ ||c − c ′||

x

c

c ′

This is an upper bound; we can also construct a lower bound.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 24: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Many unnecessary distance calculationsThree key ideas

Combining these three ideas

We can maintain bounds on the distance between x and eachcenter c ∈ C .

Upper bound between x and its closest center.

Lower bound(s) between x and other centers.

Efficiently update bounds when centers move, using the triangleinequality.

Use the bounds to prune point-center distance computations.

Between points and far-away centers.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 25: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Towards the world’s fastest k-means algorithm

1 The k-means clustering algorithm

2 Opportunities to speed up Lloyd’s algorithm

3 Algorithms that avoid distance calculations

4 Experimental results

5 Finally

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 26: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Avoiding distance calculations

K-d tree:

Pelleg & Moore (1999)

Kanungo et. al (1999)

Triangle inequality:

Moore (2000) (anchors hierarchy)

Phillips (2002) (compare-means, sort-means)

Triangle inequality plus distance bounds (today’s talk):

Elkan (2003)

Hamerly (2010)

Drake (2012)

Annular (2014)

Heap (2014)

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 27: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Elkan’s k-means

Elkan (2003) proposed using:

`(x , c): k lower bounds per point (one for each center)

u(x): one lower bound per point (for its assigned center)

k2 inter-center distances

s(c): distance from c to the closest other center

Several ways to apply these bounds. Key ones are:

if u(x) ≤ s(a(x))/2, then a(x) is closest to x

if u(x) ≤ `(x , c), then a(x) is closer than c ′ to x

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 28: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Elkan’s k-means

Elkan (2003) proposed using:

`(x , c): k lower bounds per point (one for each center)

u(x): one lower bound per point (for its assigned center)

k2 inter-center distances

s(c): distance from c to the closest other center

Several ways to apply these bounds. Key ones are:

if u(x) ≤ s(a(x))/2, then a(x) is closest to x

if u(x) ≤ `(x , c), then a(x) is closer than c ′ to x

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 29: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Elkan’s k-means

Elkan (2003) proposed using:

`(x , c): k lower bounds per point (one for each center)

u(x): one lower bound per point (for its assigned center)

k2 inter-center distances

s(c): distance from c to the closest other center

Several ways to apply these bounds. Key ones are:

if u(x) ≤ s(a(x))/2, then a(x) is closest to x

if u(x) ≤ `(x , c), then a(x) is closer than c ′ to x

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 30: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Hamerly’s k-means

Hamerly (2010) proposed the following simplifications of Elkan’salgorithm:

`(x): only one lower bound per point (for the second-closestcenter)

no inter-center distances for pruning

Advantages:

Simpler (u(x) ≤ `(x))

Lower memory footprint

Better at skipping innermost loop over centers

Faster in practice in low dimension

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 31: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Drake’s k-means

Drake (2012) proposed a bridge between Elkan and Hamerly’salgorithms:

`(x , c): b lower bounds per point (1 < b < k), for the bclosest centers

Advantages:

Tunable parameter b

Faster in practice for moderate dimensions

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 32: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Annular k-means

Hamerly and Drake (2014) proposed an extraacceleration on Hamerly’s algorithm.

Each iteration, order the centers by distancefrom the origin.

When searching for the closest center, use dis-tance bounds to prune the search.

Advantages:

Negligible extra memory and overhead

Large benefit in low dimension

annulus

x

‖x − c(4)‖

c(1)

c(2)

c(3) c(4)

c(5)c(6)

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 33: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Heap k-means

Hamerly and Drake (2014) inverted the order of loops using kmin-heaps.

For each center cFor each point x assigned to c

Find the closest center to x

Idea: Use priority queues to prune those pointsclose to their assigned centers.

Each cluster has a heap, ordered by the difference between thelower and upper bounds: `(x)− u(x).

Naively, heap priorities change with each center move.Efficient updates are an interesting problem.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 34: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Heap k-means

Hamerly and Drake (2014) inverted the order of loops using kmin-heaps.

For each center cFor each point x assigned to c

Find the closest center to x

Idea: Use priority queues to prune those pointsclose to their assigned centers.

Each cluster has a heap, ordered by the difference between thelower and upper bounds: `(x)− u(x).

Naively, heap priorities change with each center move.Efficient updates are an interesting problem.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 35: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Heap k-means

Hamerly and Drake (2014) inverted the order of loops using kmin-heaps.

For each center cFor each point x assigned to c

Find the closest center to x

Idea: Use priority queues to prune those pointsclose to their assigned centers.

Each cluster has a heap, ordered by the difference between thelower and upper bounds: `(x)− u(x).

Naively, heap priorities change with each center move.Efficient updates are an interesting problem.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 36: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Summary

Upper Lower ClosestAlgorithm Year bound bounds other center Sorting Other

Compare-means (1) 2002 - - x - -Sort-means (1) 2002 - - - k2 centers (2)Elkan 2003 1 k x - (2)Hamerly 2010 1 1 x - -Annular 2014 1 1 x centers -Heap 2014 1 1 x lower - upper -Drake 2012 1 b x lower bounds -

(1) Phillips

(2) k2 center-center distances

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 37: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Towards the world’s fastest k-means algorithm

1 The k-means clustering algorithm

2 Opportunities to speed up Lloyd’s algorithm

3 Algorithms that avoid distance calculations

4 Experimental resultsSpeedupEffect of dimensionBound effectivenessParallelismMemory use

5 Finally Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 38: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Datasets

Name Description Number of points n Dimension d

uniform-2/8/32 synthetic, uniform distribution 1,000,000 2/8/32

clustered-2/8/32 synthetic, 50 separated spheri-cal Gaussian clusters

1,000,000 2/8/32

BIRCH 10 × 10 grid of Gaussian clus-ters

100,000 2

MNIST-50 random projection frommnist784

60,000 50

Covertype soil cover measurements 581,012 54

KDD Cup 1998 response rates for fundraisingcampaign

95,412 56

MNIST-784 raster images of handwrittendigits

60,000 784

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 39: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Experimental platform

Linux running on 8-12 core machines with 16 GB of RAM permachine.

Software written in C++ with a lot of shared code for similaralgorithms.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 40: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Speedup (relative to naive algorithm) for clustered data

2 8 32 128k

0

10

20

30

40

50

60

Speedup (versus naive)

Algorithmic Speedupclustered (n=106 , d=2)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

5

10

15

20

25

30

35

Speedup (versus naive)

Algorithmic Speedupclustered (n=106 , d=8)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

5

10

15

20

25

30

35

40

45

Speedup (versus naive)

Algorithmic Speedupclustered (n=106 , d=32)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

d = 2 d = 8 d = 32

50 true Gaussians, n = 106.

K varies from 2 to 128.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 41: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Speedup (relative to naive algorithm) for uniform data

2 8 32 128k

0

5

10

15

20

25

30

35

40

45

Speedup (versus naive)

Algorithmic Speedupuniform (n=106 , d=2)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

2

4

6

8

10

12

Speedup (versus naive)

Algorithmic Speedupuniform (n=106 , d=8)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

2

4

6

8

10

12

14

16

18

Speedup (versus naive)

Algorithmic Speedupuniform (n=106 , d=32)

annulus

compare

drake

elkan

hamerly

heap

naive

sort

d = 2 d = 8 d = 32

Uniform distribution, n = 106.

K varies from 2 to 128.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 42: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Speedup (relative to naive algorithm) for Covtype, KDDCup

2 8 32 128k

0

5

10

15

20

25

30

35

Speedup (versus naive)

Algorithmic SpeedupCovertype

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

5

10

15

20

Speedup (versus naive)

Algorithmic Speedup1998 KDD Cup

annulus

compare

drake

elkan

hamerly

heap

naive

sort

d = 54, n = 581012 d = 56, n = 95412

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 43: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Speedup (relative to naive algorithm) for MNIST

2 8 32 128k

0

5

10

15

20

25

30

35

Speedup (versus naive)

Algorithmic SpeedupMNIST-784

annulus

compare

drake

elkan

hamerly

heap

naive

sort

2 8 32 128k

0

5

10

15

20

25

30

Speedup (versus naive)

Algorithmic SpeedupMNIST-50

annulus

compare

drake

elkan

hamerly

heap

naive

sort

d = 784, n = 60000 d = 50, n = 60000

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 44: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Curse of dimensionality

annu

lus

compa

redra

keelk

an

hamerl

yhe

apna

ive sort

algorithm

0

1

2

3

4

5

6

7

num

ber of dis

tance

calc

ula

tions

1e10

# distances; uniform (n=106 , d=2)k=128; 484 iters

annulus

compare

drake

elkan

hame

rlyheap

naive sor

t

algorithm

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

number of distance calculations

1e10

# distances; uniform (n=106 , d=8)k=128; 136 iters

annulus

compare

drake

elkan

hame

rlyheap

naive sor

t

algorithm

0.0

0.5

1.0

1.5

2.0

2.5

3.0

number of distance calculations

1e11

# distances; uniform (n=106 , d=32)k=128; 2269 iters

d = 2 d = 8 d = 32

Uniform data, k = 128.

Reporting number of distance calculations.

Algorithms which use bounds do much better.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 45: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Effectiveness of bounds

Elkan and Drake’s algorithms use multiple lower bounds.

Which bounds are most effective?

Hamerly showed the single lower bound can avoid 80+% ofinnermost loops, regardless of dataset and dimension.

Drake showed:

In early iterations (< 10), the first several bounds are mosteffective.

After that, the first bound prevents 90+% of avoided distancecalculations.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 46: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

K-means has natural parallelism

0 2 4 6 8 10 12 14number of threads

0

2

4

6

8

10

12

speedup (versus one thread)

Parallel Speedupuniform (n=106 , d=8); k=32

annulus

compare

drake

elkan

hamerly

heap

naive

sort

0 2 4 6 8 10 12 14number of threads

0

2

4

6

8

10

12

speedup (versus one thread)

Parallel Speedupuniform (n=106 , d=8); k=128

annulus

compare

drake

elkan

hamerly

heap

naive

sort

k = 32 k = 128

Used pthreads on 12-core machine.Naive algorithm embarrassingly parallel within an iteration.Partition data over threads, replicate centers.Acceleration can cause work imbalance and addsynchronization.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 47: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

SpeedupEffect of dimensionBound effectivenessParallelismMemory use

Memory overhead

annulus

compare

drake

elkan

hamerl

yheap

naive sor

t

algorithm

0

50

100

150

200

250

300

350

memory use

d (MB)

Memory useduniform (n=106 , d=32); k=8

annulus

compare

drake

elkan

hame

rlyheap

naive sor

t

algorithm

0

100

200

300

400

500

600

memory used (MB)

Memory useduniform (n=106 , d=32); k=32

annulus

compare

drake

elkan

hame

rlyheap

naive sor

t

algorithm

0

200

400

600

800

1000

1200

1400

memory used (MB)

Memory useduniform (n=106 , d=32); k=128

k = 8 k = 32 k = 128

Uniform dataset, d = 32.

Algorithms using 1 lower bound use negligible extra memory.

Drake & Elkan’s algorithms use significantly more memorywhen k is large.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 48: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Towards the world’s fastest k-means algorithm

1 The k-means clustering algorithm

2 Opportunities to speed up Lloyd’s algorithm

3 Algorithms that avoid distance calculations

4 Experimental results

5 Finally

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 49: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Discussion

Key to acceleration: cached bounds, updated using the triangleinequality.

More lower bounds avoids more distances, giving betterperformance in high dimension.

Low dimension (< 50) really only needs one lower bound.

Memory impact is negligible for one lower bound.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 50: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Future work

Theoretical lower bounds on required number of distancecalculations.

Other clever ways to avoid doing work; e.g. other bounds.

Accelerating other algorithms using these techniques.

Dynamic nearest neighbor search.

Clustering of dynamic datasets – any takers?

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 51: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

Conclusion

K-means is popular, and easy to imple-ment.

Therefore, everyone implementsit... slowly.

Simple acceleration methods exist thatuse little extra memory.

Key ideas: caching, triangleinequality, and distance bounds.

Software (C++) is available, just email [email protected]

Questions?

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 52: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

The k-means clustering algorithmOpportunities to speed up Lloyd’s algorithmAlgorithms that avoid distance calculations

Experimental resultsFinally

References

Accelerating k-means:

Pelleg, Moore. Accelerating exact k-means with geometric reasoning. KDD, 1999.

Moore. The anchors hierarchy: Using the triangle inequality to survive high-dimensional data. UAI, 2000.

Phillips. Acceleration of k-means and related clustering algorithms. ALENEX, 2002.

Kanungo et. al. An efficient k-means clustering algorithm: analysis and an algorithm. TPAMI, 2002.

Elkan. Using the triangle inequality to accelerate k-means. ICML, 2003.

Hamerly. Making k-means even faster. SDM, 2010.

Drake, Hamerly. Accelerated k-means with adaptive distance bounds. OPT, 2012.

Drake, Hamerly. Accelerating Lloyd’s algorithm for k-means clustering. Chapter in forthcoming Springerbook (to appear).

Other references:

Lloyd. Least squares quantization in PCM. Trans. Inf. Theory, 1982.

Dasgupta. Experiments with random projection. UAI, 2001.

Vassilvitskii, Arthur. k-means++: The advantages of careful seeding. SODA, 2007.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 53: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Towards the world’s fastest k-means algorithm

6 Other acceleration methods

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 54: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Tree indexes

Pelleg & Moore, Kanungo & Mount (1999) each separatelyproposed using k-d trees to accelerate k-means.

Works well in low dimension, but slow above about 8 dimensions.

Moore (2000) proposed a new structure, the anchors hierarchy,based on the triangle inequality. Uses carefully-chosen ‘anchors’.

Built middle-out (rather than top-down).

Common disadvantages: extra structure, complicated,preprocessing, don’t adapt to changing centers.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 55: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Tree indexes

Pelleg & Moore, Kanungo & Mount (1999) each separatelyproposed using k-d trees to accelerate k-means.

Works well in low dimension, but slow above about 8 dimensions.

Moore (2000) proposed a new structure, the anchors hierarchy,based on the triangle inequality. Uses carefully-chosen ‘anchors’.

Built middle-out (rather than top-down).

Common disadvantages: extra structure, complicated,preprocessing, don’t adapt to changing centers.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 56: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Sampling-based approximations

Downsample the dataset and cluster just that sample.

Stochastic gradient descent (Bottou and Bengio 1995): movecenters after considering each example.

Mini-batch (Sculley 2010): stochastic gradient descent using smallsamples.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 57: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Projection to combat dimensionality problems

The curse of dimensionality limits acceleration algorithms.

Random projection (see Dasgupta 2000) is an excellent way toreduce the dimension of data for clustering.

fast – linear time

tends to produce spherical, well-separated clusters

Applying random projection:

generate a random projection matrix P

project the data using P

cluster in the low-dimension space

project clusterings back to original space using assignments

finish clustering in original space (if desired)

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 58: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Good initializations

A good initialization leads to few k-means iterations.

K-means++ is the best current initialization method, but it isslow.

Runs in time O(nkd).

Can apply triangle inequality to reduce the d factor.

Can we do it faster? (Current work.)

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm

Page 59: Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Other acceleration methods

Partial distance search

Partial distance search can prune parts of distance calculations,especially in high dimension.

Suppose d is dimension, d ′ < d , and x , a, b are d-dimensionalpoints:

d∑i=1

(xi − ai )2 ≤

d ′∑i=1

(xi − bi )2

Then we know a is closer than b to x , even before computing thedistance between x and b.

This works for k-means: any known distance (or upper bound) canprune the search.

Greg Hamerly / Baylor University Towards the world’s fastest k-means algorithm