A New Gravitational Clustering Algorithm

28
A New Gravitational Clustering Algorithm Jonatan Gomez, Dipankar Dasgupta, Olfa Nasraoui

description

A New Gravitational Clustering Algorithm. Jonatan Gomez, Dipankar Dasgupta , Olfa Nasraoui. Outline. Introduction Background Proposed Algorithm Analysis. Introduction. Many clustering techniques rely on the assumption that a data set follows a certain distribution and is free of noise - PowerPoint PPT Presentation

Transcript of A New Gravitational Clustering Algorithm

Page 1: A New Gravitational Clustering Algorithm

A New Gravitational Clustering Algorithm

Jonatan Gomez, Dipankar Dasgupta, Olfa Nasraoui

Page 2: A New Gravitational Clustering Algorithm

Outline Introduction Background Proposed Algorithm Analysis

Page 3: A New Gravitational Clustering Algorithm

Introduction Many clustering techniques rely on the assumption

that a data set follows a certain distribution and is free of noise

Given noise, several techniques (k-means, fuzzy k-means) based on a least squares estimate are spoiled

Most clustering algorithms require the number of clusters to be specified

The authors propose a novel, robust, unsupervised clustering technique based on Newton’s Law of Gravitation, and Newton’s second law of motion

Page 4: A New Gravitational Clustering Algorithm

Introduction Gravitational concepts have been applied to

cluster visualization and analysis before Properties of Wright’s Gravitational Clustering [2]:

New position of a particle is found using remaining particles

When two particles are close they merge Maximum movement of particles per iteration is

capped Algorithm terminates when only one particle remains

Improvements over Wright: Speed, robustness, and determining number of

clusters

Page 5: A New Gravitational Clustering Algorithm

Background Newton’s Laws of Motion

If acceleration is constant:

Page 6: A New Gravitational Clustering Algorithm

Background Newton’s Law of Gravitation

Page 7: A New Gravitational Clustering Algorithm

Background Optimal Disjoint Set Union-Find Structure

A disjoint set Union-Find structure supports three operators: MAKESET(X) FIND(X) UNION(X,Y)

Time complexity of any sequence of m Union and Find operations on n elements is at most O(m+n) in practice

Page 8: A New Gravitational Clustering Algorithm

Proposed Algorithm Ideas behind applying gravitational law:

A data point exerts a higher gravitational force on other data points in the same cluster than on data points not in the same cluster. Thus, points in a cluster move toward the center of the cluster.

If a point is a noise point, the gravitational forces acting on it will be so small the point will be immobile. Thus, noise points won’t be assigned to any cluster

Page 9: A New Gravitational Clustering Algorithm

Proposed Algorithm Simplified equation used to move point x

according to gravitational field of point y

Velocity considered to be zero at all points in time

Reduce G after each iteration to prevent the “big crunch”

Page 10: A New Gravitational Clustering Algorithm

Proposed Algorithm

Page 11: A New Gravitational Clustering Algorithm

Proposed Algorithm Use threshold to extract valid clusters

which have at least a minimum number of points

Page 12: A New Gravitational Clustering Algorithm

Proposed Algorithm Similarities to Agglomerative Hierarchical

Clustering

Differences from Agglomerative Hierarchical Clustering

Page 13: A New Gravitational Clustering Algorithm

Proposed Algorithm Comparison to Wright [2]

Page 14: A New Gravitational Clustering Algorithm

Experiments Synthetic data

Page 15: A New Gravitational Clustering Algorithm

Experiments Results (over 10 trials)

Parameters: M = 500, G = 7x10-6, ∆G = 0.01, ε = 10-4

k-Means and Fuzzy k-Means given 150 iterations

Page 16: A New Gravitational Clustering Algorithm

Experiments Clusters found by the G-algorithm

Page 17: A New Gravitational Clustering Algorithm

Experiments Clusters found by the G-algorithm (noise

removed)

Page 18: A New Gravitational Clustering Algorithm

Experiments Movement of points over iterations

Page 19: A New Gravitational Clustering Algorithm

Experiments Scalability (average of 50 trails for each

percentage)

Do not need to use entire data set to get good results

Page 20: A New Gravitational Clustering Algorithm

Experiments Sensitivity to α

Use α = 0.03

Page 21: A New Gravitational Clustering Algorithm

Experiments Sensitivity to G

To big => one cluster To small => no clusters No universal value => depends on data set

Page 22: A New Gravitational Clustering Algorithm

Experiments Sensitivity to ∆(G)

To big => no clusters To small => one cluster Best value ~0.01 based on experiments

Page 23: A New Gravitational Clustering Algorithm

Experiments Sensitivity to ε

To big => one cluster

Page 24: A New Gravitational Clustering Algorithm

Experiments Real data set

Intrusion detection benchmark data set 42 attributes, 33 numerical, N = 492,021 2 classes – no intrusion (19.3%) and intrusion

(80.7%) Use only the numerical attributes Use only 1% of the data (chosen randomly) Parameter settings

G = 1x10-4 (based on testing) ∆(G) = 0.01 α = 0.03 ε = 1x10-6

M = 100

Page 25: A New Gravitational Clustering Algorithm

Experiments Clustering-Classification Strategy

Assign to each cluster the class with more training data records assigned to that cluster

Given an unknown data point, the data point is assigned to the closest cluster (the center of the clusters is used to compute the distance)

Page 26: A New Gravitational Clustering Algorithm

Experiments Real data set results (over 100 trials)

Page 27: A New Gravitational Clustering Algorithm

Conclusions / Future Work Successfully determines the number of

clusters in noisy data sets Can be used to pre-process data by

removing noise Three of four parameters can be set to

constant values

Future Work: Determine method to automatically set G Extend to different distance metrics

Page 28: A New Gravitational Clustering Algorithm

References [1] J. Gomez, D. Dasgupta, and O. Nasraoui, “A New Gravitational

Clustering Algorithm,” In Proc. of the SIAM Int. Conf. on Data Mining, 2003.

[2] W. E. Wright, “Gravitational Clustering,” Pattern Recognition, 9:151-166, Pergamon Press, 1977.