Lecture15–Part01 Supervisedand...
Transcript of Lecture15–Part01 Supervisedand...
![Page 1: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/1.jpg)
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5Eruption Time
2
1
0
1
2
Wai
ting
Tim
e
Lecture 15 – Part 01Supervised and
Unsupervised Learning
![Page 2: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/2.jpg)
Supervised Learning
▶ We tell the machine the “right answer”.▶ There is a ground truth.
▶ Data set: {( ⃗𝑥(𝑖), 𝑦𝑖)}.
▶ Goal: learn relationship between features ⃗𝑥(𝑖)and labels 𝑦𝑖.
▶ Examples: classification, regression.
![Page 3: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/3.jpg)
Unsupervised Learning
▶ We don’t tell the machine the “right answer”.▶ In fact, there might not be one!
▶ Data set: ⃗𝑥(𝑖) (usually no test set)
▶ Goal: learn the structure of the data itself.▶ To discover something, for compression, to use as afeature later.
▶ Example: clustering
![Page 4: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/4.jpg)
Example
▶ We gather measurements ⃗𝑥(𝑖) of a bunch offlowers.
▶ Question: how many species are there?
▶ Goal: cluster the similar flowers into groups.
![Page 5: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/5.jpg)
Example
1 2 3 4 5 6 7PetalLengthCm
0.0
0.5
1.0
1.5
2.0
2.5
Peta
lWid
thCm
![Page 6: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/6.jpg)
Example
70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5Height
160
180
200
220
240
260
280
Wei
ght
![Page 7: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/7.jpg)
Clustering and Dimensionality
▶ Groups emerge with more features.
▶ But too many features, and groups disappear.▶ Curse of dimensionality.
▶ Also: We can’t see in 𝑑 > 3.
![Page 8: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/8.jpg)
Ground Truth
▶ If we don’t have labels, we can’t measureaccuracy.
▶ Sometimes, labels don’t exist.
▶ Example: cluster customers into types byprevious purchases.
![Page 9: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/9.jpg)
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5Eruption Time
2
1
0
1
2
Wai
ting
Tim
e
Lecture 15 – Part 02K-Means Clustering
![Page 10: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/10.jpg)
Learning
▶ Goal: turn clustering into optimization problem.
▶ Idea: clustering is like compression
1 2 3 4 5 6 7PetalLengthCm
0.0
0.5
1.0
1.5
2.0
2.5
Peta
lWid
thCm
![Page 11: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/11.jpg)
K-Means Objective
▶ Given: data, { ⃗𝑥(𝑖)} ∈ ℝ𝑑 and a parameter 𝑘.
▶ Find: 𝑘 cluster centers �⃗�(1), … , �⃗�(𝑘) so that theaverage squared distance from a data point tonearest cluster center is small.
▶ The k-means objective function:
Cost(�⃗�(1), … , �⃗�(𝑘)) = 1𝑛𝑛∑𝑖=1
min𝑗∈{1,…,𝑘}
‖ ⃗𝑥(𝑖) − �⃗�(𝑗)‖2
![Page 12: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/12.jpg)
Optimization
▶ Goal: find �⃗�(1), … , �⃗�(𝑘) minimizing 𝑘-meansobjective function.
▶ Problem: this is NP-Hard.
▶ We use a heuristic instead of solving exactly.
![Page 13: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/13.jpg)
Lloyd’s Algorithm for K-Means
▶ Initialize centers, �⃗�(1), … , �⃗�(𝑘) somehow.
▶ Repeat until convergence:▶ Assign each point ⃗𝑥(𝑖) to closest center▶ Update each �⃗�(𝑖) to be mean of points assigned to it
![Page 14: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/14.jpg)
Example
![Page 15: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/15.jpg)
Theory
▶ Each iteration reduces cost.
▶ This guarantees convergence to a local min.
▶ Initialization is very important.
![Page 16: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/16.jpg)
Example
![Page 17: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/17.jpg)
Initialization Strategies
▶ Basic Approach: Pick 𝑘 data points at random.
▶ Better Approach: k-means++:▶ Pick first center at random from data.▶ Let 𝐶 = {�⃗�(1)} (centers chosen so far)▶ Repeat 𝑘 − 1 more times:
▶ Pick random data point ⃗𝑥 according todistribution
ℙ( ⃗𝑥) ∝ min�⃗�∈𝐶
‖ ⃗𝑥 − 𝜇‖2
▶ Add ⃗𝑥 to 𝐶
![Page 18: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/18.jpg)
Picking k
▶ How do we know how many clusters the datacontains?
![Page 19: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/19.jpg)
Plot of K-Means Objective
![Page 20: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/20.jpg)
Applications of K-Means
▶ Discovery
▶ Vector Quantization▶ Find a finite set of representatives of a large (possiblyinfinite) set.
![Page 21: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/21.jpg)
Example #1
▶ Cluster animal descriptions.
▶ 50 animals: grizzly bear, dalmatian, rabbit, pig, …
▶ 85 attributes: long neck, tail, walks, swims, …
▶ 50 data points in ℝ85. Run 𝑘-means with 𝑘 = 10
![Page 22: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/22.jpg)
Results
![Page 23: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/23.jpg)
Example #2
▶ How do we represent images of different sizes asfixed length feature vectors for use inclassification tasks?
![Page 24: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/24.jpg)
Visual Bags-of-Words
▶ Idea: build a “dictionary” of image patches.
▶ Extract all ℓ × ℓ image patches from all trainingimages.
▶ Cluster them with 𝑘-means.▶ Each cluster center is now a dictionary “word”
▶ Represent an image as a histogram over {1, 2, … , 𝑘}by associating each patch with nearest center.
![Page 25: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/25.jpg)
Online Learning
▶ What if the dataset is huge?▶ It doesn’t even fit in memory.
▶ What if we’re continuously getting new data?▶ Don’t want to retrain with every new point.
▶ We can update the model online.
![Page 26: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/26.jpg)
Sequential k-Means▶ Set the centers �⃗�(1), … , �⃗�(𝑘) to be first 𝑘 points
▶ Set counts to be 𝑛1 = 𝑛2 = … = 𝑛𝑘 = 1.
▶ Repeat:▶ Get next data point, ⃗𝑥▶ Let �⃗�(𝑗) be closest center▶ Update �⃗�(𝑗) and 𝑛𝑗:
�⃗�(𝑗) =𝑛𝑗�⃗�(𝑗) + ⃗𝑥𝑛𝑗 + 1
𝑛𝑗 = 𝑛𝑗 + 1
![Page 27: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/27.jpg)
K-Means
▶ Perhaps the most popular clustering algorithm.
▶ Fast, easy to understand.
▶ Assumes spherical clusters.
![Page 28: Lecture15–Part01 Supervisedand UnsupervisedLearningsierra.ucsd.edu/cse151a-2020-sp/lectures/15... · Example 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 Height 160 180 200 220 240 260](https://reader034.fdocuments.net/reader034/viewer/2022050302/5f6b8308bb476a0129561702/html5/thumbnails/28.jpg)
Example