Perceptron, Kernels, and SVM - University of Washington
Transcript of Perceptron, Kernels, and SVM - University of Washington
![Page 1: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/1.jpg)
Perceptron, Kernels, and SVM
CSE 546 RecitationNovember 5, 2013
![Page 2: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/2.jpg)
Grading Update
● Midterms: likely by Monday
– Expected average is 60%● HW 2: after midterms are graded
● Project proposals: mostly or all graded (everyone gets full credit)
– Check your dropbox for comments
● HW 3 scheduled to be released tomorrow, due in two weeks
![Page 3: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/3.jpg)
Perceptron Basics
● Online algorithm
● Linear classifier
● Learns set of weights
●
● Always converges on linearly separable data
![Page 4: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/4.jpg)
What does perceptron optimize?
● Perceptron appears to work, but is it solving an optimization problem like every other algorithm?
● Is equivalent to making a mistake
● Hinge loss penalizes mistakes by
![Page 5: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/5.jpg)
Hinge Loss
●
●
● Gradient descent update rule:
●
●
● Stochastic gradient descent update rule = perceptron:
●
![Page 6: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/6.jpg)
Feature Maps
● What if data aren't linearly separable?
● Sometimes if we map features to new spaces, we can put the data in a form more amenable to an algorithm, e.g. linearly separable
● The maps couldhave extremely high oreven infinite dimension,so is there a shortcut torepresent them?
– Don't want to storeevery or do computation in highdimensions
![Page 7: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/7.jpg)
Kernel Trick
● Kernels (aka kernel functions) represent dot products of mapped features in same dimension as original features
– Apply to algorithms that only depend on dot product●
– Lower dimension for computation
– Don't have to store explicitly● Choose mappings that have kernels, since not all do
– e.g.
![Page 8: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/8.jpg)
Kernelized Perceptron
● Recall perceptron update rule:
– Implies: where M^t is mistake indices up to t
● Classification rule:
● With mapping :
● If have kernel :
![Page 9: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/9.jpg)
SVM Basics
● Linear classifier (without kernels)
● Find separating hyperplane by maximizing margin
● One of the most popular and robust classifiers
![Page 10: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/10.jpg)
Setting Up SVM Optimization
● Weights and margin
– Optimization unbounded● Use canonical hyperplanes to
remedy
–
● If linearly separable data, can solve
![Page 11: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/11.jpg)
SVM Optimization
● If non-linearly separable data, could map to new space
– But doesn't guarantee separability● Therefore, remove separability constraints
and instead penalize the violation in the objective
– Soft-margin SVM minimizes regularized hinge loss
![Page 12: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/12.jpg)
SVM vs Perceptron
● SVM
has almost same goal as L2-regularized perceptron
● Perceptron
![Page 13: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/13.jpg)
Other SVM Comments
● C > 0 is “soft margin”
– High C means we care more about getting a good separation
– Low C means we care more about getting a large margin
● How to implement SVM?
– Suboptimal method is SGD (see HW 3)
– More advanced methods can be used to employ the kernel trick
![Page 14: Perceptron, Kernels, and SVM - University of Washington](https://reader031.fdocuments.net/reader031/viewer/2022021211/620657a28c2f7b173006dfa7/html5/thumbnails/14.jpg)
Questions?