Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.
-
Upload
megan-dawson -
Category
Documents
-
view
238 -
download
1
Transcript of Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.
![Page 1: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/1.jpg)
ClusteringUNSUPERVISED LEARNING INTRODUCTION
Machine Learning
![Page 2: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/2.jpg)
Andrew Ng
Supervised learning
Training set:
![Page 3: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/3.jpg)
Andrew Ng
Unsupervised learning
Training set:
![Page 4: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/4.jpg)
Andrew Ng
Applications of clustering
Organize computing clusters
Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
![Page 5: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/5.jpg)
ClusteringVARIANT
![Page 6: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/6.jpg)
Clustering Category• Based on the Clustering Algorithms, clustering are categorized into Four
Major Category:– Partitional (Centroid Based)
Try to cluster data into k number of cluster.Example: K-Means, K-Means++, Fuzzy C-Means.
– Hierarchical• Agglomerative
Start with all data as an individual cluster • Divisive
Start with the entire data as a single cluster.
![Page 7: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/7.jpg)
– Distribution BasedThe clustering model most closely related to statistics is based on distribution
models.Example: EM-clusteringUnpopular because tend to overfitting
– Density Based• In density-based clustering, clusters are defined as areas of higher density than the
remainder of the data set.
![Page 8: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/8.jpg)
Based on the data• Clustering are categorized into:
– Numerical data clustering– Categorical data clustering
![Page 9: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/9.jpg)
Clustering
K-MEANS ALGORITHM
![Page 10: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/10.jpg)
Andrew Ng
![Page 11: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/11.jpg)
Andrew Ng
![Page 12: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/12.jpg)
Andrew Ng
![Page 13: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/13.jpg)
Andrew Ng
![Page 14: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/14.jpg)
Andrew Ng
![Page 15: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/15.jpg)
Andrew Ng
![Page 16: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/16.jpg)
Andrew Ng
![Page 17: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/17.jpg)
Andrew Ng
![Page 18: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/18.jpg)
Andrew Ng
![Page 19: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/19.jpg)
Andrew Ng
Input:- (number of clusters)- Training set
(drop convention)
K-means algorithm
![Page 20: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/20.jpg)
Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {for = 1 to
:= index (from 1 to ) of cluster centroid closest to
for = 1 to := average (mean) of points assigned to cluster
}
![Page 21: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/21.jpg)
Andrew Ng
K-means for non-separated clusters
T-shirt sizing
Height
Wei
ght
![Page 22: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/22.jpg)
Clustering
OPTIMIZATION OBJECTIVE
Machine Learning
![Page 23: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/23.jpg)
Andrew Ng
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently assigned
= cluster centroid ( )= cluster centroid of cluster to which example has been
assignedOptimization objective:
![Page 24: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/24.jpg)
Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {for = 1 to
:= index (from 1 to ) of cluster centroid closest to
for = 1 to := average (mean) of points assigned to cluster
}
![Page 25: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/25.jpg)
Clustering
RANDOM INITIALIZATION
Machine Learning
![Page 26: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/26.jpg)
Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {for = 1 to
:= index (from 1 to ) of cluster centroid closest to
for = 1 to := average (mean) of points assigned to cluster
}
![Page 27: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/27.jpg)
Andrew Ng
Random initialization
Should have
Randomly pick training examples.
Set equal to these examples.
![Page 28: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/28.jpg)
Andrew Ng
Local optima
![Page 29: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/29.jpg)
Andrew Ng
For i = 1 to 100 {
Randomly initialize K-means.Run K-means. Get .Compute cost function (distortion)
}
Pick clustering that gave lowest cost
Random initialization
![Page 30: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/30.jpg)
ClusteringCHOOSING THE NUMBER OF CLUSTERS
Machine Learning
![Page 31: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/31.jpg)
Andrew Ng
What is the right value of K?
![Page 32: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/32.jpg)
Andrew Ng
Choosing the value of K
Elbow method:
1 2 3 4 5 6 7 8
Cost
func
tion
(no. of clusters)1 2 3 4 5 6 7 8
Cost
func
tion
(no. of clusters)
![Page 33: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.](https://reader035.fdocuments.net/reader035/viewer/2022062309/56649f185503460f94c2f325/html5/thumbnails/33.jpg)
Andrew Ng
Choosing the value of KSometimes, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.
E.g. T-shirt sizing
Height
Wei
ght
T-shirt sizing
HeightW
eigh
t