Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before...
-
date post
19-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before...
Support Vector Machines
Joseph Gonzalez
From a linear classifier to ...
*One of the most famous slides you will see, ever!
The Big Idea
O
X
O
O
X
X
X
X
X
X
O
O
OO
O
O
Maximum margin
Maximum possible separation between positive and negative training examples
*One of the most famous slides you will see, ever!
Geometric Intuition
O
X
O
OO
XX
X
SUPPORT VECTORS
Geometric Intuition
O
X
X
O
OO
XX
X
SUPPORT VECTORS
Geometric Intuition
O
X
XO
O
O
XX
X
SUPPORT VECTORS
Primal Versionmin ||w||
2 +C ∑ξs.t. (w.x + b)y ≥ 1-ξ
ξ ≥ 0
DUAL Version
Where did this come from?Remember Lagrange Multipliers
Let us “incorporate” constraints into objectiveThen solve the problem in the “dual” space of lagrange multipliers
max ∑α -1/2 ∑αiαjyiyjxixj
s.t. ∑αiyi = 0C ≥ αi ≥ 0
Primal vs Dual
Number of parameters?large # features?large # examples?
for large # features, DUAL preferredmany αi can go to zero!
max ∑α -1/2 ∑αiαjyiyjxixj
s.t. ∑αiyi = 0C ≥ αi ≥ 0
min ||w||2 +C ∑ξ
s.t. (w.x + b)y ≥ 1-ξξ ≥ 0
DUAL: the “Support vector” version
How do we find α?
Quadratic programming
How do we find C?
Cross-validation!
Wait... how do we predict y for a new point x??
How do we find w?
How do we find b?
y = sign(w.x+b)
w = Σi αi yi xi
max ∑α - 1/2 ∑αiαjyiyjxixj
s.t. ∑αiyi = 0C ≥ αi ≥ 0
y = sign(Σi αi yi xi xj + b)
max α1 + α2 + 2α1α2 - α12/2 - 4α22
s.t. α1-α2 = 0C ≥ αi ≥ 0
“Support Vector”s?
O
X
α1
α2
max ∑α - 1/2 ∑αiαjyiyjxixj
s.t. ∑αiyi = 0C ≥ αi ≥ 0
(0,1)
(2,2)max ∑α - α1α2(-1)(0+2)- 1/2 α12(1)(0+1) - 1/2 α22(1)(4+4)
w = Σi αi yi xi
w = .4([0 1]-[2 2]) =.4[-2 -1 ]
y=w.x+bb = y-w.xx1: b = 1-
.4 [-2 -1][0 1] = 1+.4 =1.4
b
4/5
α1=α2=αmax 2α -5/2α2
max 5/2α(4/5-α) 0 2/5
α1=α2=2/5
“Support Vector”s?
O
X
α1
α2
max ∑α - 1/2 ∑αiαjyiyjxixj
s.t. ∑αiyi = 0C ≥ αi ≥ 0
(0,1)
(2,2)
Oα3
What is α3? Try this at home
Playing With SVMS
• http://www.csie.ntu.edu.tw/~cjlin/libsvm/
More on Kernels
• Kernels represent inner products– K(a,b) = a.b– K(a,b) = φ(a) . φ(b)
• Kernel trick is allows extremely complex φ( ) while keeping K(a,b) simple
• Goal: Avoid having to directly construct φ( ) at any point in the algorithm
Kernels
Complexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!
Can we used Kernels to Measure Distances?
• Can we measure distance between φ(a) and φ(b) using K(a,b)?
Continued:
Popular Kernel Methods
• Gaussian Processes• Kernel Regression (Smoothing)
– Nadarayan-Watson Kernel Regression