Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions...
Transcript of Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions...
![Page 1: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/1.jpg)
Density-Based ClusteringIzabela Moise, Evangelos Pournaras, Dirk Helbing
Izabela Moise, Evangelos Pournaras, Dirk Helbing 1
![Page 2: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/2.jpg)
Reminder
Unsupervised data miningX Clustering→ k -Means
Izabela Moise, Evangelos Pournaras, Dirk Helbing 2
![Page 3: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/3.jpg)
Main Clustering Approaches
• Partitioning method→ constructs partitions of data points→ evaluates the partitions by some criterion→ k -means, k -medoids
• Density-based method:→ based on connectivity and density functions→ DBSCAN, DJCluster
Izabela Moise, Evangelos Pournaras, Dirk Helbing 3
![Page 4: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/4.jpg)
Density-Based Clustering
Izabela Moise, Evangelos Pournaras, Dirk Helbing 4
![Page 5: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/5.jpg)
Density-Based Clustering
Density-Based Clustering
locates regions of high density that are separated from one anotherby regions of low density.
Izabela Moise, Evangelos Pournaras, Dirk Helbing 4
![Page 6: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/6.jpg)
Main principles
• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts
• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
Izabela Moise, Evangelos Pournaras, Dirk Helbing 5
![Page 7: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/7.jpg)
Main principles
• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts
• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
Izabela Moise, Evangelos Pournaras, Dirk Helbing 5
![Page 8: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/8.jpg)
Main principles
• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts
• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function
Izabela Moise, Evangelos Pournaras, Dirk Helbing 5
![Page 9: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/9.jpg)
Core, Border and Noise/Outlier
1
1Jing Gao, SUNY BuffaloIzabela Moise, Evangelos Pournaras, Dirk Helbing 6
![Page 10: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/10.jpg)
Directly Density-Reachable
Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:
1. p ∈ NEps(q) and
2. |NEps(q)| ≥ MinPts
Izabela Moise, Evangelos Pournaras, Dirk Helbing 7
![Page 11: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/11.jpg)
Directly Density-Reachable
Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:
1. p ∈ NEps(q) and
2. |NEps(q)| ≥ MinPts
Izabela Moise, Evangelos Pournaras, Dirk Helbing 7
![Page 12: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/12.jpg)
Density-Reachable
• Density-reachable:→ A point p is density-reachable from a point q wrt. Eps,MinPts if there is a chain of points p1, ..., pn, withp1 = q, pn = p, s.t .pi+1 is directly density reachable from pi
• transitive but not symmetric
Izabela Moise, Evangelos Pournaras, Dirk Helbing 8
![Page 13: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/13.jpg)
Density-Connected
Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts
Izabela Moise, Evangelos Pournaras, Dirk Helbing 9
![Page 14: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/14.jpg)
Density-Connected
Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric
Izabela Moise, Evangelos Pournaras, Dirk Helbing 9
![Page 15: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/15.jpg)
Density-Connected
Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric
Izabela Moise, Evangelos Pournaras, Dirk Helbing 9
![Page 16: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/16.jpg)
DBSCAN - Density-Based Spatial Clustering of Applicationswith Noise
Izabela Moise, Evangelos Pournaras, Dirk Helbing 10
![Page 17: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/17.jpg)
Main Principles
One of the most cited clustering algorithms
Main principle:
a cluster is defined as a maximal set of density-connected points.
• Discovers clusters of arbitrary shapes (spherical, elongated,linear), and noise
• Works with spatial datasets:→ geomarketing, tomography, satellite images
• Requires only two parameters (no prior knowledge of thenumber of clusters)
Izabela Moise, Evangelos Pournaras, Dirk Helbing 11
![Page 18: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/18.jpg)
Definition: Cluster
2
2Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 12
![Page 19: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/19.jpg)
Definition: Noise
3
3Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 13
![Page 20: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/20.jpg)
The Algorithm
1. Randomly select a point p
2. Retrieve all points density-reachable from p wrt. Eps andMinPts
3. If p is a core point, a cluster is formed
4. If p is a border point, no points are density-reachable from p→visit the next data point
5. Continue the process until all points have been processed
Izabela Moise, Evangelos Pournaras, Dirk Helbing 14
![Page 21: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/21.jpg)
Selecting Eps and MinPts
The two parameters can be determined by a heuristic
Observation:• For points in a cluster their k -th nearest neighbours are at
roughly the same distance.
• Noise points have the k -th nearest neighbour at fartherdistance.
Izabela Moise, Evangelos Pournaras, Dirk Helbing 15
![Page 22: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/22.jpg)
4
4Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
![Page 23: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/23.jpg)
5
5Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 17
![Page 24: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/24.jpg)
6
6Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 18
![Page 25: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/25.jpg)
Pros and Cons
Pros:
X discovers clusters of arbitrary shapes
X handles noise
X needs density parameters as termination condition
Izabela Moise, Evangelos Pournaras, Dirk Helbing 19
![Page 26: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0be31c7e708231d432b491/html5/thumbnails/26.jpg)
Pros and Cons
Cons:
X cannot handle varying densities
X sensitive to parameters→ hard to determine the correct set ofparameters
X sampling affects density measures
Izabela Moise, Evangelos Pournaras, Dirk Helbing 20