Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before...

Support Vector Machines

Joseph Gonzalez

From a linear classifier to ...

*One of the most famous slides you will see, ever!

The Big Idea

O

X

O

O

X

X

X

X

X

X

O

O

OO

O

O

Maximum margin

Maximum possible separation between positive and negative training examples

*One of the most famous slides you will see, ever!

Geometric Intuition

O

X

O

OO

XX

X

SUPPORT VECTORS

Geometric Intuition

O

X

X

O

OO

XX

X

SUPPORT VECTORS

Geometric Intuition

O

X

XO

O

O

XX

X

SUPPORT VECTORS

Primal Versionmin ||w||

2 +C ∑ξs.t. (w.x + b)y ≥ 1-ξ

ξ ≥ 0

DUAL Version

Where did this come from?Remember Lagrange Multipliers

Let us “incorporate” constraints into objectiveThen solve the problem in the “dual” space of lagrange multipliers

max ∑α -1/2 ∑αiαjyiyjxixj

s.t. ∑αiyi = 0C ≥ αi ≥ 0

Primal vs Dual

Number of parameters?large # features?large # examples?

for large # features, DUAL preferredmany αi can go to zero!

max ∑α -1/2 ∑αiαjyiyjxixj

s.t. ∑αiyi = 0C ≥ αi ≥ 0

min ||w||2 +C ∑ξ

s.t. (w.x + b)y ≥ 1-ξξ ≥ 0

DUAL: the “Support vector” version

How do we find α?

Quadratic programming

How do we find C?

Cross-validation!

Wait... how do we predict y for a new point x??

How do we find w?

How do we find b?

y = sign(w.x+b)

w = Σi αi yi xi

max ∑α - 1/2 ∑αiαjyiyjxixj

s.t. ∑αiyi = 0C ≥ αi ≥ 0

y = sign(Σi αi yi xi xj + b)

max α1 + α2 + 2α1α2 - α12/2 - 4α22

s.t. α1-α2 = 0C ≥ αi ≥ 0

“Support Vector”s?

O

X

α1

α2


s.t. ∑αiyi = 0C ≥ αi ≥ 0

(0,1)

(2,2)max ∑α - α1α2(-1)(0+2)- 1/2 α12(1)(0+1) - 1/2 α22(1)(4+4)

w = Σi αi yi xi

w = .4([0 1]-[2 2]) =.4[-2 -1 ]

y=w.x+bb = y-w.xx1: b = 1-

.4 [-2 -1][0 1] = 1+.4 =1.4

b

4/5

α1=α2=αmax 2α -5/2α2

max 5/2α(4/5-α) 0 2/5

α1=α2=2/5

“Support Vector”s?

O

X

α1

α2


s.t. ∑αiyi = 0C ≥ αi ≥ 0

(0,1)

(2,2)

Oα3

What is α3? Try this at home

Playing With SVMS

• http://www.csie.ntu.edu.tw/~cjlin/libsvm/

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

More on Kernels

• Kernels represent inner products– K(a,b) = a.b– K(a,b) = φ(a) . φ(b)

• Kernel trick is allows extremely complex φ( ) while keeping K(a,b) simple

• Goal: Avoid having to directly construct φ( ) at any point in the algorithm

Kernels

Complexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!

Can we used Kernels to Measure Distances?

• Can we measure distance between φ(a) and φ(b) using K(a,b)?

Continued:

Popular Kernel Methods

• Gaussian Processes• Kernel Regression (Smoothing)

– Nadarayan-Watson Kernel Regression

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before...

Documents

Transcript of Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before...