CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Clustering Density
-
Upload
guilherme-andrade -
Category
Documents
-
view
225 -
download
0
Transcript of Clustering Density
-
8/14/2019 Clustering Density
1/16
ClusteringLecture 4: Density-based Methods
Jing GaoSUNY Buffalo
1
-
8/14/2019 Clustering Density
2/16
Outline
Basics
Motivation, definition, evaluation
Methods
Partitional
Hierarchical
Density-based
Mixture model
Spectral methods
Advanced topics
Clustering ensemble Clustering in MapReduce
Semi-supervised clustering, subspace clustering, co-clustering,etc.
2
-
8/14/2019 Clustering Density
3/16
Density-based Clustering
Basic idea Clusters are dense regions in the data space,
separated by regions of lower object density
A cluster is defined as a maximal set of density-connected points
Discovers clusters of arbitrary shape
Method
DBSCAN
3
-
8/14/2019 Clustering Density
4/16
Density Definition
-NeighborhoodObjects within a radius of
froman object.
High density - -Neighborhood of an object contains
at least MinPtsof objects.
q p
-Neighborhood ofp
-Neighborhood of q
Density of p is high (MinPts = 4)
Density of qis low (MinPts = 4)
}),(|{:)(
qpdqpN
4
-
8/14/2019 Clustering Density
5/16
Core, Border & Outlier
Given and MinPts,categorize the objects into
three exclusive groups.
= 1unit, MinPts = 5
Core
Border
Outlier
A point is a core pointif it has more than a
specified number of points (MinPts) within
EpsThese are points that are at the
interior of a cluster.
A border pointhas fewer than MinPts
within Eps, but is in the neighborhoodof a core point.
A noise pointis any point that is not a
core point nor a border point.
5
-
8/14/2019 Clustering Density
6/16
Example
Original Points Point types: core,
borderand outliers
= 10, MinPts = 4
6
-
8/14/2019 Clustering Density
7/16
Density-reachability
Directly density-reachable
An object qis directly density-reachable from objectp
ifpis a core object and qis in ps -neighborhood.
q p
qis directly density-reachable fromp
pis not directly density-reachable from
q
Density-reachability is asymmetric
MinPts = 4
7
-
8/14/2019 Clustering Density
8/16
-
8/14/2019 Clustering Density
9/16
DBSCAN Algorithm: Example
Parameter
= 2 cm
MinPts= 3
foreach o Ddo
ifois not yet classified thenifois a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign oto NOISE
9
-
8/14/2019 Clustering Density
10/16
DBSCAN Algorithm: Example
Parameter
= 2 cm
MinPts= 3
foreach o Ddo
ifois not yet classified thenifois a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign oto NOISE
10
-
8/14/2019 Clustering Density
11/16
DBSCAN Algorithm: Example
Parameter
= 2 cm
MinPts= 3
foreach o Ddo
ifois not yet classified then
ifois a core-object thencollect all objects density-reachable from o
and assign them to a new cluster.
else
assign oto NOISE
11
-
8/14/2019 Clustering Density
12/16
DBSCAN: Sensitive to Parameters
12
-
8/14/2019 Clustering Density
13/16
DBSCAN: Determining EPS and MinPts
Idea is that for points in a cluster, their kthnearest
neighbors are at roughly the same distance
Noise points have the kthnearest neighbor at fartherdistance
So, plot sorted distance of every point to its kthnearest
neighbor
13
-
8/14/2019 Clustering Density
14/16
When DBSCAN Works Well
Original Points Clusters
Resistant to Noise
Can handle clusters of different shapes and sizes
14
-
8/14/2019 Clustering Density
15/16
-
8/14/2019 Clustering Density
16/16
Take-away Message
The basic idea of density-based clustering The two important parameters and the definitions of
neighborhood and density in DBSCAN
Core, border and outlier points
DBSCAN algorithm
DBSCANs pros and cons
16