Other Clustering Algorithms

8/3/2019 Other Clustering Algorithms

1/42

MST: Divisive Hierarchical Clustering

q

Build MST (Minimum Spanning Tree) Start with a tree that consists of any point In successive steps, look for the closest pair of points (p, q) such

that one point (p) is in the current tree but the other (q) is not

Add q to the tree and put an edge between p and q


2/42

MST: Divisive Hierarchical Clustering

q

Use MST for constructing hierarchy of clusters


3/42

DBSCAN

q DBSCAN is a density-based algorithm. Density = number of points within a specified radius (Eps)

A point is a core point if it has more than a specified number

of points (MinPts) within Eps

These are points that are at the interior of a cluster

A border point has fewer than MinPts within Eps, but is inthe neighborhood of a core point

A noise point is any point that is not a core point or a borderpoint.


4/42

DBSCAN: Core, Border, and Noise Points


5/42

DBSCAN Algorithm

q

Eliminate noise pointsq Perform clustering on the remaining points


6/42

DBSCAN: Core, Border and Noise Points

Original Points Point types: core,

border and noise

Eps = 10, MinPts = 4


7/42

When DBSCAN Works Well

Original Points

Clusters

Resistant to Noise

Can handle clusters of different shapes and sizes


8/42

When DBSCAN Does NOT Work Well

Original Points

(MinPts=4, Eps=9.75).

(MinPts=4, Eps=9.92)

Varying densities

High-dimensional data


9/42

DBSCAN: Determining EPS and MinPts

q Idea is that for points in a cluster, their kth nearest

neighbors are at roughly the same distanceq Noise points have the kth nearest neighbor at farther

distance

q So, plot sorted distance of every point to its kth

nearest neighbor


10/42

Cluster Validity

q For supervised classification we have a variety of

measures to evaluate how good our model is Accuracy, precision, recall

q For cluster analysis, the analogous question is how toevaluate the goodness of the resulting clusters?

q But clusters are in the eye of the beholder!

q Then why do we want to evaluate them?

To avoid finding patterns in noise To compare clustering algorithms

To compare two sets of clusters

To compare two clusters


11/42

Graph Theoretic Interpretation

As we calculate the nearest pair to join. Willthis distance go up or down?

Let dn now be the distance between the

clusters merged at step n (counting frombottom)

Let G(dn) be a graph that has edges betweenall data points within distance dn of each

other. Then the clusters after step n are theconnected components of G(dn). Why?

Explains potentially long straggly clusters.

Assumptions?


12/42

How to Define Inter-Cluster Similarity

q MIN

q MAX

q Group Average

q Distance Between Centroids

q Other methods driven by an objective function Wards Method uses squared error

Paraphrasing The President:

Your either part of my

cluster or your not


13/42

Graph Theoretic Interpretation

Assume each pair of points has a uniquedistance. Let d

nbe the diameterof the

cluster formed at step n.

Let G(dn) be a graph that has edges betweenall data points within dn distance from one

another.

Then the clusters after step n are *cliques* ofG(dn). Why?

How can we make use of these graphtheoretic interpretations?


14/42

How to Define Inter-Cluster Similarity

q MIN

q MAX

q Group Averageq Distance Between Centroids

q Other methods driven by an objective function Wards Method uses squared error

Paraphrasing The President:

Your either part of my

cluster or your not


15/42

Graph-Based Clustering

q Graph-Based clustering uses the proximity graph

Start with the proximity matrix

Consider each point as a node in a graph

Each edge between two nodes has a weight which is

the proximity between the two points Initially the proximity graph is fully connected

MIN (single-link) and MAX (complete-link) can be

viewed as starting with this graph

q In the simplest case, clusters are connected

components in the graph.


16/42


17/42

Graph-Based Clustering: Sparsification

q Clustering may work better Sparsification techniques keep the connections to the most

similar (nearest) neighbors of a point while breaking theconnections to less similar points.

The nearest neighbors of a point tend to belong to the sameclass as the point itself.

This reduces the impact of noise and outliers and sharpens

the distinction between clusters.

q Sparsification facilitates the use of graphpartitioning algorithms (or algorithms basedon graph partitioning algorithms.

Chameleon and Hypergraph-based Clustering


18/42

Sparsification in the Clustering Process


19/42

Chameleon: Clustering Using Dynamic

Modeling

q Adapt to the characteristics of the data set to find the

natural clusters

q Use a dynamic model to measure the similarity between

clusters

Main property is the relative closeness and relative inter-

connectivity of the cluster

Two clusters are combined if the resulting cluster shares certain

properties with the constituent clusters

The merging scheme preserves self-similarity

q One of the areas of application is spatial data


20/42

Characteristics of Spatial Data Sets

Clusters are defined as densely

populated regions of the space

Clusters have arbitrary shapes,

orientation, and non-uniform sizes

Difference in densities across clusters

and variation in density within clusters

Existence of special artifacts (streaks)

and noise

The clustering algorithm must addressthe above characteristics and also

require minimal supervision.


21/42

Chameleon: Steps

q Preprocessing Step:Represent the Data by a Graph Given a set of points, construct the k-nearest-

neighbor (k-NN) graph to capture the relationshipbetween a point and its k nearest neighbors

Concept of neighborhood is captured dynamically(even if region is sparse)

q Phase 1: Use a multilevel graph partitioning

algorithm on the graph to find a large number ofclusters of well-connected vertices Each cluster should contain mostly points from one

true cluster, i.e., is a sub-cluster of a real cluster


22/42

Chameleon: Steps

q Phase 2: Use Hierarchical Agglomerative

Clustering to merge sub-clusters

Two clusters are combined if the resulting cluster

shares certain properties with the constituent clusters

Two key properties used to model cluster similarity:Trying to capture how similar two sub-graphs are ...

Relative Interconnectivity: Absolute interconnectivity of two

clusters normalized by the internal connectivity of the clusters

Relative Closeness: Absolute closeness of two clusters

normalized by the internal closeness of the clusters


23/42

Experimental Results: CHAMELEON


24/42


E i t l R lt CURE (10


25/42

Experimental Results: CURE (10clusters)



26/42



27/42




28/42


E perimental Res lts CURE (15


29/42



30/42

i j i j4

SNN graph: the weight of an edge is the number of sharedneighbors between vertices given that the vertices are connected

Shared Near Neighbor Approach


31/42

Creating the SNN Graph

Sparse Graph

Link weights are similarities

between neighboring points

Shared Near Neighbor Graph

Link weights are number of

Shared Nearest Neighbors

ROCK (RObust Clustering using


32/42

ROCK (RObust Clustering usinglinKs)q Clustering algorithm for data with categorical and

Boolean attributes A pair of points is defined to be neighbors if their similarity is greater

than some threshold

Use a hierarchical clustering scheme to cluster the data.

2. Obtain a sample of points from the data set3. Compute the link value for each set of points, i.e., transform the

original similarities (computed by Jaccard coefficient) into similaritiesthat reflect the number of shared neighbors between points

4. Perform an agglomerative hierarchical clustering on the data using

the number of shared neighbors as similarity measure andmaximizing the shared neighbors objective function

5. Assign the remaining points to the clusters that have been found

i i k Cl i


33/42

Jarvis-Patrick Clustering

q First, the k-nearest neighbors of all points are found

In graph terms this can be regarded as breaking all but the k

strongest links from a point to other points in the proximity graph

q A pair of points is put in the same cluster if

any two points share more than T neighbors and the two points are in each others k nearest neighbor list

q For instance, we might choose a nearest neighbor list of

size 20 and put points in the same cluster if they sharemore than 10 near neighbors

q Jarvis-Patrick clustering is too brittle

When Jarvis Patrick Works Reasonably


34/42

When Jarvis-Patrick Works Reasonably

Well

Original Points Jarvis Patrick Clustering

6 shared neighbors out of 20

When Jarvis Patrick Does NOT Work


35/42

Smallest threshold, T,

that does not merge

clusters.

Threshold of T - 1

When Jarvis-Patrick Does NOT Work

Well

SNN Cl t i Al ith


36/42

SNN Clustering Algorithm

q Compute the similarity matrix

This corresponds to a similarity graph with data points for nodes andedges whose weights are the similarities between data points

q Sparsify the similarity matrix by keeping only the kmost similarneighborsThis corresponds to only keeping the kstrongest links of the similarity

graphq Construct the shared nearest neighbor graph from the sparsified

similarity matrix.At this point, we could apply a similarity threshold and find theconnected components to obtain the clusters (Jarvis-Patrick

algorithm)q Find the SNN density of each Point.

Using a user specified parameters,Eps, find the number points thathave an SNN similarity of Eps or greater to each point. This is theSNN density of the point


37/42

SNN Clustering Algorithm

q Find the core points

Using a user specified parameter, MinPts, find the corepoints, i.e., all points that have an SNN density greaterthan MinPts

q Form clusters from the core pointsIf two core points are within a radius, Eps, of each other

they are place in the same clusterq Discard all noise points

All non-core points that are not within a radius ofEps ofa core point are discarded

q Assign all non-noise, non-core points to clustersThis can be done by assigning such points to thenearest core point

(Note that steps 4-8 are DBSCAN)

SNN D it


38/42

SNN Density

a) All Points b) High SNN Density

c) Medium SNN Density d) Low SNN Density


39/42

SNN Clustering Can Handle Differing Densities

Original Points SNN Clustering

SNN Clustering Can Handle Other Difficult


40/42

SNN Clustering Can Handle Other Difficult

Situations

Finding Clusters of Time Series In Spatio Temporal


41/42

Finding Clusters of Time Series In Spatio-Temporal

Data

26 SLP Clusters via Shared Nearest Neighbor Clustering (100 NN, 1982-1994)

longitude

lat

itude

-1 80 -1 50 -12 0 -9 0 -6 0 -3 0 0 3 0 6 0 9 0 1 20 1 50 1 80

90

60

30

0

-30

-60

-90

1326

2425

22

14

16 20 17 18

19

15

23

1 9

6

4

7101211

3

52

8

21

SNN Clusters of SLP.

SNN Density of SLP Time Series Data

longitude

latitude

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

90

60

30

0

-30

-60

-90

SNN Density of Points on the Globe.

Features and Limitations of SNN


42/42

Features and Limitations of SNN

Clustering

q Does not cluster all the points

q Complexity of SNN Clustering is high O( n * time to find numbers of neighbor within Eps)

In worst case, this is O(n2) For lower dimensions, there are more efficient ways to find

the nearest neighbors

R* Tree

k-d Trees

Other Clustering Algorithms

Documents

Transcript of Other Clustering Algorithms