Support Vector Machines Jordan Smith MUMT 611 14 February 2008.
-
Upload
bridget-oneal -
Category
Documents
-
view
219 -
download
0
Transcript of Support Vector Machines Jordan Smith MUMT 611 14 February 2008.
![Page 1: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/1.jpg)
Support Vector Support Vector MachinesMachines
Jordan SmithJordan Smith
MUMT 611MUMT 611
14 February 200814 February 2008
![Page 2: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/2.jpg)
Topics to coverTopics to cover What do Support Vector Machines (SVMS) What do Support Vector Machines (SVMS)
do?do?
How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data(added Cost function)(added Cost function)
Search optimizationSearch optimization
Why?Why?
![Page 3: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/3.jpg)
What SVMs doWhat SVMs do
![Page 4: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/4.jpg)
What SVMs doWhat SVMs do
![Page 5: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/5.jpg)
What SVMs doWhat SVMs do
= margin
![Page 6: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/6.jpg)
What SVMs doWhat SVMs do
= margin
![Page 7: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/7.jpg)
What SVMs doWhat SVMs do
= margin= support vector
![Page 8: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/8.jpg)
What SVMs doWhat SVMs do
= margin= support vector
(optimum separating hyperplane)
![Page 9: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/9.jpg)
What SVMs doWhat SVMs do
= margin= support vector
(optimum separating hyperplane)
![Page 10: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/10.jpg)
What SVMs doWhat SVMs do
Sherrod 230
![Page 11: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/11.jpg)
Topics to coverTopics to cover What do Support Vector Machines (SVMS) What do Support Vector Machines (SVMS)
do?do?
How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data(added Cost function)(added Cost function)
Search optimizationSearch optimization
Why?Why?
![Page 12: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/12.jpg)
The linear, separable The linear, separable casecase
Training data {Training data {xxii, y, yii}} Separating hyperplane defined by Separating hyperplane defined by
normal vector normal vector ww hyperplane equation: hyperplane equation: w·xw·x + b = 0 + b = 0 distance from plane to origin: |b|/|w|distance from plane to origin: |b|/|w|
Distances from hyperplane to nearest Distances from hyperplane to nearest point in each collection are dpoint in each collection are d++ and d and d--
Goal: maximize dGoal: maximize d++ + d + d--
(margins)
![Page 13: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/13.jpg)
The linear, separable The linear, separable casecase
1)1) x xii·w ·w + b ≥ +1+ b ≥ +1 (for y(for yii = +1) = +1)
2) 2) xxii·w ·w + b ≤ -1+ b ≤ -1 (for y(for yii = -1) = -1) yyii((xxii·w ·w + b) - 1 ≥ 0 + b) - 1 ≥ 0 for our support vectors, distance from originfor our support vectors, distance from origin
to plane = |1-b|/|w|to plane = |1-b|/|w|
AlgebraAlgebra d d++ + d + d-- = 2 / |w| = 2 / |w|
New goal:New goal:maximize: 2 /|w|maximize: 2 /|w| i.e.,i.e., minimize: |w|minimize: |w|
![Page 14: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/14.jpg)
Nonlinear SVMsNonlinear SVMs
Sherrod 235
![Page 15: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/15.jpg)
Nonlinear SVMsNonlinear SVMs
Kernel trick:Kernel trick:
Map data into a higher-dimensional space Map data into a higher-dimensional space using using : R: Rdd HH
Training problems involve only the dot Training problems involve only the dot product, so product, so HH can even be of infinite can even be of infinite dimensiondimension
Kernel trick makes nonlinear solutions Kernel trick makes nonlinear solutions linear again!linear again!
youtube youtube exampleexample
![Page 16: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/16.jpg)
Nonlinear SVMsNonlinear SVMs
Radial basis function:Radial basis function:
Sherrod 236
![Page 17: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/17.jpg)
Nonlinear SVMsNonlinear SVMs
SigmoidSigmoid
Sherrod 237
![Page 18: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/18.jpg)
Another demonstrationAnother demonstration
appletapplet
![Page 19: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/19.jpg)
The unseparable caseThe unseparable case
Classifiers need to have a balanced Classifiers need to have a balanced capacitycapacity:: Bad botanist: “It has 847 leaves. Not a Bad botanist: “It has 847 leaves. Not a
tree!”tree!” Bad botanist: “It’s green. That’s a tree!”Bad botanist: “It’s green. That’s a tree!”
![Page 20: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/20.jpg)
The unseparable caseThe unseparable case
Sherrod 237
![Page 21: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/21.jpg)
The unseparable caseThe unseparable case
![Page 22: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/22.jpg)
The unseparable caseThe unseparable case
= error= fuzzy margin
![Page 23: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/23.jpg)
The unseparable caseThe unseparable case
Add a cost function:Add a cost function:
xxii·w ·w + b ≥ +1 + b ≥ +1 - - ii (for y(for yii = +1) = +1) xxii·w ·w + b ≤ -1 + b ≤ -1 + + ii (for y(for yii = - = -
1)1) i i ≥ 0≥ 0
old goal:old goal: minimize |w|minimize |w|22/2/2
new goal:new goal: minimize |w|minimize |w|22/2 /2 + C(∑+ C(∑i i ii))kk
![Page 24: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/24.jpg)
Optimizing your searchOptimizing your search
To find the separating hyperplane, To find the separating hyperplane, you must manipulate many you must manipulate many parameters, depending on which parameters, depending on which kernel function you select:kernel function you select: C, the cost constantC, the cost constant Gamma, Gamma, ii, etc., etc.
There are two basic methods:There are two basic methods: Grid searchGrid search Pattern searchPattern search
![Page 25: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/25.jpg)
Topics to coverTopics to cover What do Support Vector Machines (SVMS) do?What do Support Vector Machines (SVMS) do?
How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data (added Cost function)(added Cost function)
Search optimizationSearch optimization
Why?Why?
![Page 26: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/26.jpg)
Why use SVMs?Why use SVMs?
Uses:Uses: Optical character recognitionOptical character recognition Spam detectionSpam detection MIRMIR
genre, artist classification (Mandel genre, artist classification (Mandel 2004, 2005)2004, 2005)
mood classification (Laurier 2007)mood classification (Laurier 2007) popularity classification, based on lyrics popularity classification, based on lyrics
(Dhanaraj 2005)(Dhanaraj 2005)
![Page 27: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/27.jpg)
Why use SVMs?Why use SVMs? Machine learner of choice for high-Machine learner of choice for high-
dimensional data, such as text, images, dimensional data, such as text, images, music!music!
Conceptually simple.Conceptually simple.
Generalizable and efficient.Generalizable and efficient.
Next slides: results of a benchmark study Next slides: results of a benchmark study (Meyer 2004) comparing SVMs and other (Meyer 2004) comparing SVMs and other learning techniqueslearning techniques
![Page 28: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/28.jpg)
![Page 29: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/29.jpg)
![Page 30: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/30.jpg)
![Page 31: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/31.jpg)
![Page 32: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/32.jpg)
Questions?Questions?
![Page 33: Support Vector Machines Jordan Smith MUMT 611 14 February 2008.](https://reader036.fdocuments.net/reader036/viewer/2022062423/5697c01f1a28abf838cd18f8/html5/thumbnails/33.jpg)
Key ReferencesKey ReferencesBurges, C. J. C. "A tutorial on support vector machines for pattern
recognition." Data Mining and Knowledge Discovery, 2:955-974, 1998. http://citeseer.ist.psu.edu/burges98tutorial.html
Cortes, C. and V. Vapnik. "Support-Vector Networks." Machine Learning, 20:273-297, Sept 1995. http://citeseer.ist.psu.edu/cortes95supportvector.html
Sherrod, Phillip H. 2008. DTREG: Predictive Modeling Software. (User’s guide) 227-41. <http://www.dtreg.com/DTREG.pdf”
Smola, A. J. and B. Scholkopf. 1998. “A tutorial on support vector regression.” NEUROCOLT Technical report NC-TR-98-030. Royal Holloway college, London.