CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data...
Transcript of CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data...
![Page 1: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/1.jpg)
CPSC 340: Machine Learning and Data Mining
Density-Based Clustering
Fall 2015
![Page 2: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/2.jpg)
Admin
• Tutorials today.
• Office hours tomorrow
• Assignment 2 due Friday.
![Page 3: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/3.jpg)
K-Means++
• Steps of k-means++:
1. Select initial mean µ1, from among the object xi.
2. Compute distance dic of object xi to each mean µc.
3. For each object set di to the minimum distance across all clusters c.
4. Choose next mean by sampling proportional to (di)2.
5. Stop when we have k means, otherwise return to 2.
• Expected approximation ratio is O(log(k)).
![Page 4: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/4.jpg)
K-Means++
![Page 5: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/5.jpg)
K-Means++
First mean is a random example.
![Page 6: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/6.jpg)
K-Means++
Weight examples by distance squared.
![Page 7: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/7.jpg)
K-Means++
Sample mean proportional to distances squared.
![Page 8: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/8.jpg)
K-Means++
Weight examples by squared distance to mean.
![Page 9: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/9.jpg)
K-Means++
Sample mean proportional to distances squared.
![Page 10: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/10.jpg)
K-Means++
Weight examples by squared distance to mean.
![Page 11: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/11.jpg)
K-Means++
Sample mean proportional to distances squared.
(We’ve now hit target k=4.)
![Page 12: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/12.jpg)
K-Means++
Assign each object to the closest mean.
![Page 13: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/13.jpg)
K-Means++
Update the mean of each group.
![Page 14: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/14.jpg)
K-Means++
Assign each object to the closest mean.
Keep going until no o objects change groups.
![Page 15: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/15.jpg)
Shape of K-Means Clusters
• K-means clusters are formed by the intersection of half-spaces.
Half-space
![Page 16: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/16.jpg)
Shape of K-Means Clusters
• K-means clusters are formed by the intersection of half-spaces.
Half-space
Intersection
Half-space
![Page 17: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/17.jpg)
Shape of K-Means Clusters
![Page 18: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/18.jpg)
Shape of K-Means Clusters
Red over green half-space
Green over red half-space
![Page 19: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/19.jpg)
Shape of K-Means Clusters
Blue over green half-space
Green over blue half-space
![Page 20: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/20.jpg)
Shape of K-Means Clusters
Magenta over green half-space
Green over magenta half-space
![Page 21: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/21.jpg)
Shape of K-Means Clusters
![Page 22: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/22.jpg)
Shape of K-Means Clusters
• Intersection of half-spaces forms a convex set:
– Line between any two points in the set stays in the set.
Convex Convex
Not Convex
![Page 23: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/23.jpg)
Shape of K-Means Clusters
![Page 24: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/24.jpg)
K-Means with Non-Convex Clusters
https://corelifesciences.com/human-long-non-coding-rna-expression-microarray-service.html
![Page 25: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/25.jpg)
K-Means with Non-Convex Clusters
https://corelifesciences.com/human-long-non-coding-rna-expression-microarray-service.html
K-means cannot separate non-convex
![Page 26: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/26.jpg)
K-Means with Non-Convex Clusters
https://corelifesciences.com/human-long-non-coding-rna-expression-microarray-service.html
Though over-clustering can help (next class)
K-means cannot separate non-convex
![Page 27: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/27.jpg)
Application: Elephant Range Map
• Find habitat area of African elephants.
– Useful for assessing/protecting population.
• Build clusters from observations of locations.
• Clusters are non-convex:
– affected by vegetation, relief, rivers, water access.
• We do not want a partition:
– Some regions should not have a cluster.
http://www.defenders.org/elephant/basic-facts
![Page 28: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/28.jpg)
Motivation for Density-Based Clustering
• Density-based clustering is a non-parametric clustering method:
– Clusters are defined by connected dense regions.
• Become more complicated the more data we have.
– Data points in non-dense regions are not assigned a cluster.
http://www.defenders.org/elephant/basic-facts
![Page 29: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/29.jpg)
Other Potential Applications
• Where are high crime regions of a city?
• Where should taxis patrol?
• Where does Iguodala make/miss shots?
• Which products are similar to this one?
• Which pictures are in the same place?
• Where can protein ‘dock’?
https://en.wikipedia.org/wiki/Cluster_analysis https://www.flickr.com/photos/dbarefoot/420194128/ http://letsgowarriors.com/replacing-jarrett-jack/2013/10/04/ http://www.dbs.informatik.uni-muenchen.de/Forschung/KDD/Clustering/
![Page 30: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/30.jpg)
Density-Based Clustering
• Density-based clustering algorithm (DBSCAN) has two parameters:
– Radius: minimum distance between points to be considered ‘close’.
– MinPoints: number of ‘close’ points needed to define a cluster.
![Page 31: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/31.jpg)
Density-Based Clustering
![Page 32: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/32.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 33: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/33.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 34: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/34.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 35: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/35.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 36: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/36.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 37: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/37.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 38: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/38.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 39: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/39.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 40: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/40.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 41: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/41.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 42: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/42.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 43: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/43.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 44: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/44.jpg)
• Pseudocode for DBSCAN:
– For each example xi:
• If xi is already assigned to a cluster, do nothing.
• If xi is not core point (less than minPoints neighbours with distance ≤ ‘r’), do nothing.
• If xi is a core point, expand cluster.
– Expand cluster function:
• Assign all xj within distance ‘r’ of core point xi to cluster.
• For each newly-assigned neighbour xj that is a core point, expand cluster.
Density-Based Clustering
![Page 45: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/45.jpg)
Density-Based Clustering
![Page 46: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/46.jpg)
Density-Based Clustering Issues
• Some points are not assigned to a cluster.
– Good or bad, depending on the application.
• Sensitive to the choice of radius and minPoints.
• Ambiguity of ‘non-core’ (boundary) points:
– They could be assigned more than once.
• Other than this ambiguity, not sensitive to initialization.
• Assigning new points to clusters is expensive.
• In high-dimensions, need a lot of points to ‘fill’ the space.
![Page 47: CPSC 340: Data Mining Machine Learningschmidtm/Courses/340-F15/L9.pdf · Machine Learning and Data Mining ... Stop when we have k means, ... •Pseudocode for DBSCAN: –For each](https://reader034.fdocuments.net/reader034/viewer/2022051803/5b08189a7f8b9a51508b5e44/html5/thumbnails/47.jpg)
Summary
1. K-means++: randomized initialization with good expected performance.
2. Shape of K-means clusters: intersection of half-spaces => convex sets.
3. Density-based clustering: useful for finding non-convex connected clusters.
4. DBSCAN algorithm: assign points in dense regions to same cluster.
• Next time:
– Dealing with clusters of different densities.