SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM...

14
Machine Learning Srihari SVM for Overlapping Classes [email protected]

Transcript of SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM...

Page 1: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

SVM for Overlapping Classes

[email protected]

Page 2: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

SVM Discussion Overview

1.  Importance of SVMs 2.  Overview of Mathematical Techniques

Employed 3.  Margin Geometry 4.  SVM Training Methodology 5.  Overlapping Distributions 6.  Dealing with Multiple Classes 7.  SVM and Computational Learning Theory 8.  Relevance Vector Machines

Page 3: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Overlapping Class Distributions

•  We have assumed training data are linearly separable in mapped space ϕ(x) • Resulting SVM gives exact separation in

input space x although decision boundary will be nonlinear

•  In practice class-conditionals overlap •  Exact separation leads to poor generalization •  Therefore need to allow SVM to misclassify

some training points

Page 4: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Overlapping Disributions

Page 5: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari Modifying SVM •  Implicit error function of SVM is

• Gives infinite error if point is misclassified • Zero error if it was classified correctly

• We optimize parameters to maximize margin •  We now modify this approach

•  Points allowed to be on wrong side of margin • But with penalty that increases with the distance

from the boundary – Convenient if penalty is a linear function of distance

E∞

n=1

N

∑ y(xn)t

n−1( )+λ w

2

Page 6: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Definition of Slack Variables ξn

•  Introduce slack variables, ξn ≥ 0, n=1,..,N • With one slack variable for each training point

•  These are defined by •  ξn =0 for data points that are on or inside the

correct margin boundary •  ξn =|tn-y(xn)| for other points

Page 7: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Slack Variables •  A data point that is on the decision

boundary y(xn)=0 will have ξn =1 •  Points with ξn ≥1 will be misclassified •  Exact classification constraints

tn (wTϕ(xn)+b)≥1 , n=1,..,N are then replaced with

tny(xn) ≥1-ξn n=1,..,N •  in which slack variables are constrained to

satisfy ξn ≥0

Page 8: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Slack Variables and Margins y = 1

y = 0y = �1

margin

y = 1

y = 0

y = �1

Correctly classified points on correct side of margin have ξ=0

Correctly classified points inside margin but on correct side of decision boundary have 0< ξ≤1

Data points with circles are support vectors

Data points on wrong side of decision boundary have ξ>1

Hard Margin

Soft Margin

Page 9: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Soft Margin Classification

•  Allows some of the data points to be misclassified

•  While slack variables allow for overlapping class distributions, the framework is still sensitive to outliers •  because the penalty for misclassification still

increases linearly with ξ

Page 10: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Optimization with slack variables •  Our goal is to maximize the margin while

softly penalizing points that lie on the wrong side of the margin boundary

•  We therefore minimize •  Parameter C>0 controls trade-off between

slack variable penalty and the margin •  Subject to constraints

tny(xn)≥1-ξn n=1,..,N

C ξ

n+

12n=1

N

∑ w2

Page 11: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Lagrangian with slack •  The Lagrangian we wish to maximize is

• where {an≥0}, {µn≥0} are Lagrange multipliers

•  The corresponding KKT conditions are

• We then optimize out w,b and {ξn} by taking derivatives of L wrt w, b and ξn to get

L w,b,a( ) =

12

w2

+C ξn

n=1

N

∑ − an

tny(x

n)−1+ ξ

n⎡⎣⎢

⎤⎦⎥

n=1

N

∑ − µn

n=1

N

∑ ξn

an≥ 0

tny(x

n)−1+ ξ

n≥ 0

an{t

ny(x

n)−1+ ξ} = 0

µn≥ 0

ξn≥ 0

µnξ

n= 0

Page 12: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Dual Lagrangian

•  The dual Lagrangian has the form

•  Which is identical to the separable case, except that the constraints are somewhat different

!L a( ) = a

nn=1

N

∑ −12 m=1

N

∑ ama

ntntmk(x

n,x

m)

n=1

N

Page 13: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

Nu-SVM •  An equivalent formulation of SVM

•  It involves maximizing

•  Subject to the constraints

•  ν-SVM has the advantage of using a

parameter ν for controlling the no. of support vectors

!L a( ) =−

12 m=1

N

∑ ama

ntntmk(x

n,x

m)

n=1

N

0≤an≤1/N

antn

n=1

N

∑ = 0

an≥ ν

n=1

N

Page 14: SVM for Overlapping Classes - University at Buffalosrihari/CSE574/Chap7/7.2-SVM-Overlap.pdf · SVM for Overlapping Classes srihari@buffalo.edu . Machine Learning Srihari SVM Discussion

Machine Learning Srihari

ν-SVM applied to non-separable data

•  Support Vectors are indicated by circles

•  Done by introducing slack variables

•  With one slack variable per training data point

•  Maximize the margin while softly penalizing points that lie on the wrong side of the margin boundary

•  ν is an upper-bound on the fraction of margin errors (lie on wrong side of margin boundary)