Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Theoretical Foundations of Clustering – CS497 Talk

Shai Ben-David

February 2007

What is clustering?

Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

40 45 50 55

74

76

78

80

82

84

Another example

Why should we care about clustering?

Clustering is a basic step in most data mining procedures:

Examples :

Clustering movie viewers for movie ranking.

Clustering proteins by their functionality.

Clustering text documents for content similarity.

Clustering is one of the most widely used toolfor exploratory data analysis. Social Sciences Biology Astronomy Computer Science . .All apply clustering to gain a first understanding of the structure of large data sets.

The Theory-Practice Gap

Yet, there exist distressingly little theoretical understanding of clustering

What questions should research address? What is clustering?

What is “good” clustering?

Can clustering be carried out efficiently?

Can we distinguish “clusterable” from “structureless” data?

Many more …

“Clustering” is an ill defined problem

There are many different clustering tasks, leading to different clustering paradigms:

There are Many Clustering TasksThere are Many Clustering Tasks

Some more examples

-2 0 2

-3-2

-10

12

3

2-d data set

-2 0 2

-3-2

-10

12

3

Compact partitioning into tw o strata

-2 0 2

-3-2

-10

12

3

Unsupervised learning

I would like to discuss the broad notion of clustering

Independently of any particular algorithm, objective function or generative data model

What for?

Choosing a suitable algorithm for a given task.

Evaluating the quality of clustering methods. Distinguishing significant structure from random fata morgana,

Distinguishing clusterable data from structure-less data.

Providing performance guarantees for clustering algorithms.

Much more …

The Basic Setting

For a finite domain set SS, a dissimilarity function (DF) is a symmetric mapping

d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.

A clustering function takes a dissimilarity function on SS and returns a partition of SS.

We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).

Kleinberg’s Axioms

Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive λλ.

Richness For any finite domain SS, {F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S} S}

Consistency d’d’ equals dd except for shrinking distances within

clusters of F(d)F(d) or stretching between-cluster distances (w.r.t. F(d)F(d)), then F(d)=F(d’).F(d)=F(d’).

Note that any pair is realizable

Consider Single-Linkage with different stopping criteria:

k connected components. Distance r stopping. Scale α stopping:

add edges as long as their length is

at most α(max-distance)

Kleinberg’s Impossibility result

There exist no clustering function

Proof:

Scaling up

Consistency

A Different Perspective

The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.

Axioms as a tool for classifying clustering paradigms

Axioms as a tool for a taxonomy of clustering paradigms

The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.

Scale Invariance

Richness Local Consistency

Full Consistency

Single Linkage - + + +Center Based + + + -Spectral + + + -MDL + + -Rate Distortion + + -

“Axioms” “Properties”

Ideal Theory

We would like to have a list of simple properties so that major clustering methods are distinguishable from

each other using these properties.

We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them.

(this is probably too much to hope for).

In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of

what this theory-development program may involve.

Types of Axioms/Properties

Richness requirements

E.g., relaxations of Kelinberg’s richness, e.g.,

{F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S S into k k sets}}

Invariance/Robustness/Stability requirements.

E.g., Scale-Invariance, Consistency, robustness

to perturbations of dd (“smoothness” of F F) or stability

w.r.t. sampling of SS.

Relaxations of Consistency

Local Consistency –

Let CC11, …C…Ckk be the clusters of F(d). F(d).

For every λλ0 0 ≥ 1 ≥ 1 and positive λλ11, .., ..λλk k ≤≤ 11, if d’d’ is defined by:

λλiid(a,b)d(a,b) if aa and bb are in CCii

d’(a,b)=d’(a,b)=

λλ00d(a,b)d(a,b) if a,ba,b are not in the same F(d)F(d)--cluster,

then F(d)=F(d’).F(d)=F(d’).

Is there any known clustering method for which it fails?

Other types of clustering

Culotta and McCallum’s “Clusterwise Similarity”

Edge-Detection (advantage to smooth contours)

Texture clustering

The professors example.

Conclusions and open questions

There is a place for developing an axiomatic framework for clustering.

The existing negative results do not rule

out the possibility of useful axiomatization. We should also develop a system of “clustering

properties” for a taxonomy of clustering methods. There are many possible routes to take and

hidden subtleties in this project.

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Documents

Transcript of Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.