CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange...
Transcript of CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange...
![Page 1: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/1.jpg)
CSC 4510 – Machine Learning Dr. Mary‐Angela Papalaskari Department of CompuBng Sciences Villanova University
Course website: www.csc.villanova.edu/~map/4510/
11: Unsupervised Learning ‐ Clustering
1 Some of the slides in this presentaBon are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course hNp://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course hNp://www.ml‐class.org/
![Page 2: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/2.jpg)
Supervised learning
Training set: • The Stanford online ML course hNp://www.ml‐class.org/
![Page 3: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/3.jpg)
Unsupervised learning
Training set: • The Stanford online ML course hNp://www.ml‐class.org/
![Page 4: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/4.jpg)
Unsupervised Learning • Learning “what normally happens” • No output • Clustering: Grouping similar instances • Example applicaBons – Customer segmentaBon – Image compression: Color quanBzaBon – BioinformaBcs: Learning moBfs
4 CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University
![Page 5: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/5.jpg)
Clustering Algorithms • K means • Hierarchical – BoNom up or top down
• ProbabilisBc – ExpectaBon MaximizaBon (E‐M)
CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 5
![Page 6: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/6.jpg)
Clustering algorithms • Par77oning method: Construct a parBBon of n examples into a set of K clusters
• Given: a set of examples and the number K • Find: a parBBon of K clusters that opBmizes the chosen parBBoning criterion – Globally opBmal: exhausBvely enumerate all parBBons – EffecBve heurisBc method: K‐means algorithm.
hNp://www.csee.umbc.edu/~nicholas/676/MRSslides/lecture17‐clustering.ppt CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 6
![Page 7: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/7.jpg)
19
K‐Means • Assumes instances are real‐valued vectors. • Clusters based on centroids, center of gravity, or mean of points in a cluster, c
• Reassignment of instances to clusters is based on distance to the current cluster centroids.
Based on: www.cs.utexas.edu/~mooney/cs388/slides/TextClustering.ppt CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 7
![Page 8: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/8.jpg)
K‐means intuiBon • Randomly choose k points as seeds, one per cluster. • Form iniBal clusters based on these seeds. • Iterate, repeatedly reallocaBng seeds and by re‐compuBng clusters to improve the overall clustering.
• Stop when clustering converges or ager a fixed number of iteraBons.
Based on: www.cs.utexas.edu/~mooney/cs388/slides/TextClustering.ppt CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 8
![Page 9: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/9.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 10: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/10.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 11: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/11.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 12: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/12.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 13: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/13.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 14: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/14.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 15: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/15.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 16: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/16.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 17: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/17.jpg)
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 18: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/18.jpg)
21
K‐Means Algorithm
hNp://www.csc.villanova.edu/~matuszek/spring2012/index2012.html, based on: www.cs.utexas.edu/~mooney/cs388/slides/TextClustering.ppt
• Let d be the distance measure between instances. • Select k random points {s1, s2,… sk} as seeds. • UnBl clustering converges or other stopping criterion: – For each instance xi:
• Assign xi to the cluster cj such that d(xi, sj) is minimal.
– (Update the seeds to the centroid of each cluster) • For each cluster cj, sj = μ(cj)
CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 18
![Page 19: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/19.jpg)
Distance measures • Euclidean distance • ManhaNan • Hamming
CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 19
![Page 20: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/20.jpg)
CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 20
Orange schema
![Page 21: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/21.jpg)
Orange schema
CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21
![Page 22: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/22.jpg)
hNp://store02.prostores.com/selectsocksinc/images/store_version1/Sigvaris%20120%20Pantyhose%20SIZE%20chart.gif
Clusters aren’t always separated…
![Page 23: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/23.jpg)
K‐means for non‐separated clusters
T‐shirt sizing
Height
Weight
• The Stanford online ML course hNp://www.ml‐class.org/
![Page 24: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/24.jpg)
Weaknesses of k‐means • The algorithm is only applicable to numeric data • The user needs to specify k. • The algorithm is sensiBve to outliers – Outliers are data points that are very far away from other data points.
– Outliers could be errors in the data recording or some special data points with very different values.
www.cs.uic.edu/~liub/teach/cs583‐fall‐05/CS583‐unsupervised‐learning.ppt CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 24
![Page 25: CSC 4510 – Machine Learningmap/4510/11clustering.pdf · CSC 4510 – Machine Learning ... Orange schema Orange schema CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 21 ...](https://reader030.fdocuments.net/reader030/viewer/2022040205/5ed716502da71b386924006c/html5/thumbnails/25.jpg)
Strengths of k‐means • Strengths:
– Simple: easy to understand and to implement – Efficient: Time complexity: O(tkn), – where n is the number of data points, – k is the number of clusters, and – t is the number of iteraBons. – Since both k and t are small. k‐means is considered a linear algorithm.
• K‐means is the most popular clustering algorithm.
www.cs.uic.edu/~liub/teach/cs583‐fall‐05/CS583‐unsupervised‐learning.ppt CSC 4510 ‐ M.A. Papalaskari ‐ Villanova University 25