Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
-
Upload
belinda-potter -
Category
Documents
-
view
222 -
download
0
Transcript of Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
Theoretical Foundations of Clustering – CS497 Talk
Shai Ben-David
February 2007
What is clustering?
Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
40 45 50 55
74
76
78
80
82
84
Another example
Why should we care about clustering?
Clustering is a basic step in most data mining procedures:
Examples :
Clustering movie viewers for movie ranking.
Clustering proteins by their functionality.
Clustering text documents for content similarity.
Clustering is one of the most widely used toolfor exploratory data analysis. Social Sciences Biology Astronomy Computer Science . .All apply clustering to gain a first understanding of the structure of large data sets.
The Theory-Practice Gap
Yet, there exist distressingly little theoretical understanding of clustering
What questions should research address? What is clustering?
What is “good” clustering?
Can clustering be carried out efficiently?
Can we distinguish “clusterable” from “structureless” data?
Many more …
“Clustering” is an ill defined problem
There are many different clustering tasks, leading to different clustering paradigms:
There are Many Clustering TasksThere are Many Clustering Tasks
“Clustering” is an ill defined problem
There are many different clustering tasks, leading to different clustering paradigms:
There are Many Clustering TasksThere are Many Clustering Tasks
Some more examples
-2 0 2
-3-2
-10
12
3
2-d data set
-2 0 2
-3-2
-10
12
3
Compact partitioning into tw o strata
-2 0 2
-3-2
-10
12
3
Unsupervised learning
I would like to discuss the broad notion of clustering
Independently of any particular algorithm, objective function or generative data model
What for?
Choosing a suitable algorithm for a given task.
Evaluating the quality of clustering methods. Distinguishing significant structure from random fata morgana,
Distinguishing clusterable data from structure-less data.
Providing performance guarantees for clustering algorithms.
Much more …
The Basic Setting
For a finite domain set SS, a dissimilarity function (DF) is a symmetric mapping
d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.
A clustering function takes a dissimilarity function on SS and returns a partition of SS.
We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).
Kleinberg’s Axioms
Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive λλ.
Richness For any finite domain SS, {F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S} S}
Consistency d’d’ equals dd except for shrinking distances within
clusters of F(d)F(d) or stretching between-cluster distances (w.r.t. F(d)F(d)), then F(d)=F(d’).F(d)=F(d’).
Note that any pair is realizable
Consider Single-Linkage with different stopping criteria:
k connected components. Distance r stopping. Scale α stopping:
add edges as long as their length is
at most α(max-distance)
Kleinberg’s Impossibility result
There exist no clustering function
Proof:
Scaling up
Consistency
A Different Perspective
The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.
Axioms as a tool for classifying clustering paradigms
Axioms as a tool for a taxonomy of clustering paradigms
The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.
Scale Invariance
Richness Local Consistency
Full Consistency
Single Linkage - + + +Center Based + + + -Spectral + + + -MDL + + -Rate Distortion + + -
“Axioms” “Properties”
Ideal Theory
We would like to have a list of simple properties so that major clustering methods are distinguishable from
each other using these properties.
We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them.
(this is probably too much to hope for).
In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of
what this theory-development program may involve.
Types of Axioms/Properties
Richness requirements
E.g., relaxations of Kelinberg’s richness, e.g.,
{F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S S into k k sets}}
Invariance/Robustness/Stability requirements.
E.g., Scale-Invariance, Consistency, robustness
to perturbations of dd (“smoothness” of F F) or stability
w.r.t. sampling of SS.
Relaxations of Consistency
Local Consistency –
Let CC11, …C…Ckk be the clusters of F(d). F(d).
For every λλ0 0 ≥ 1 ≥ 1 and positive λλ11, .., ..λλk k ≤≤ 11, if d’d’ is defined by:
λλiid(a,b)d(a,b) if aa and bb are in CCii
d’(a,b)=d’(a,b)=
λλ00d(a,b)d(a,b) if a,ba,b are not in the same F(d)F(d)--cluster,
then F(d)=F(d’).F(d)=F(d’).
Is there any known clustering method for which it fails?
Other types of clustering
Culotta and McCallum’s “Clusterwise Similarity”
Edge-Detection (advantage to smooth contours)
Texture clustering
The professors example.
Conclusions and open questions
There is a place for developing an axiomatic framework for clustering.
The existing negative results do not rule
out the possibility of useful axiomatization. We should also develop a system of “clustering
properties” for a taxonomy of clustering methods. There are many possible routes to take and
hidden subtleties in this project.