Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Classification Problem2-Category Linearly Separable
Case
A-
A+
x0w+ b= à 1
wx0w+ b= + 1x0w+ b= 0
Malignant
Benign
Support Vector MachinesMaximizing the Margin between Bounding
Planes
x0w+ b= + 1
x0w+ b= à 1
A+
A-
w
jjwjj22 = Margin
Algebra of the Classification Problem
2-Category Linearly Separable Case
Given m points in the n dimensional real spaceRn
Represented by anmâ nmatrixAor Membership of each pointA iin the classesAà A+
is specified by anmâ mdiagonal matrix D :
D ii = à 1 if A i 2 Aà and D ii = 1 A i 2 A+if SeparateAà and A+by two bounding planes such that:
A iw+ b > + 1; for D ii = + 1;A iw+ b 6 à 1; for D ii = à 1
More succinctly:D(Aw+ eb)>e
e= [1;1;. . .;1]02 Rm:
, where
Support Vector Classification(Linearly Separable Case)
Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample and represented by
matrices
A =
(x1)0
(x2)0...
(xl)0
2
64
3
75 2 R lâ n; D =
y1 ááá 0......
...0 ááá yl
" #
2 R lâ l
Support Vector Classification(Linearly Separable Case, Primal)
The hyperplane that solves the minimization problem:
(w;b)
min(w;b)2R n+1
21 jjwjj22
D(Aw+ eb)>e;
realizes the maximal margin hyperplane withgeometric margin í = jjwjj2
1
Support Vector Classification(Linearly Separable Case, Dual Form)
The dual problem of previous MP:
maxë2R l
e0ë à 21ë0DAA0Dë
subject to
e0Dë = 0; ë>0:Applying the KKT optimality conditions, we have
w = A0Dë. But where isb?
06ë ? D(Aw+ eb) à e>0Don’t forget
Dual Representation of SVM
(Key of Kernel Methods: )
The hypothesis is determined by(ëã;bã)
h(x) = sgn(êx;A0Dëã
ë+ bã)
= sgn(P
i=1
l
yiëãi
êxi;x
ë+ bã)
= sgn(P
ëãi >0
yiëãi
êxi;x
ë+ bã)
w = A0Dëã =P
i=1
`
yiëiA0i
Remember : A0i = xi
Compute the Geometric Margin via Dual Solution
The geometric margin í = jjwãjj21 and
êwã;wã
ë= (ëã)0DAA0Dëã, hence we can
computeí by usingëã. Use KKT again (in dual)!
0 6 ëã ? D(AA0Dëã + bãe) à e> 0 Don’t forgete0Dëã = 0
í = (e0ëã)à 21
= (P
ëãi >0
ëãi )
à 21
Soft Margin SVM(Nonseparable Case)
If data are not linearly separable Primal problem is infeasible Dual problem is unbounded above
Introduce the slack variable for each training point
yi(w0xi + b)>1à øi; øi>0 8 i
The inequality system is always feasible
w = 0; b= 0 & ø= ee.g.
xj
x
x
x
x
x
x
x
x
o
o
o
o
o
o
o
oi
í
í
øj
øi
Two Different Measures of Training Error
min(w;b;ø)2R n+1+l
21jjwjj22 + 2
Cjjøjj22
D(Aw+ eb) + ø>e
2-Norm Soft Margin:
1-Norm Soft Margin:min
(w;b;ø)2R n+1+l21jjwjj22 + Ce0ø
D(Aw+ eb) + ø>e
ø> 0
2-Norm Soft Margin Dual Formulation
The Lagrangian for 2-norm soft margin:
L (w;b;ø;ë) = 21w0w+ 2
Cø0ø+ë0[eà D(Aw+ eb) à ø]
where ë>0
The partial derivatives with respect to primalvariables equal zeros
@w@L (w;b;ø;ë) = wà A0Dë = 0
@b@L (w;b;ø;ë) = e0Dë = 0; @ø
@L (w;b;ø;ë) = Cøà ë = 0
Dual Maximization ProblemFor 2-Norm Soft Margin
Dual:
ë>0
maxë2R l
e0ë à 21ë0D(AA0+ C
1I )Dë
e0Dë = 0
The corresponding KKT complementarity:
06ë ? D(Aw+ eb) + øà e>0 Use above conditions to find bã
f (x) =ð P
i=1
?wiþi(x)
ñ+ b
Linear Machine in Feature Space
Let þ : X ! Fbe a nonlinear map from the
input space to some feature space
The classifier will be in the form (Primal):
Make it in the dual form:
f (x) =ð P
i=1
lë iyi
êþ(xi) áþ(x)
ëñ+ b
K (x;z) =êþ(x) áþ(z)
ë
Kernel: Represent Inner Product in Feature Space
The classifier will become:
f (x) =ð P
i=1
lë iyiK (xi;x)
ñ+ b
Definition: A kernel is a functionK : X â X ! Rsuch thatfor all x;z 2 X
where þ : X ! F
Introduce Kernel into DualFormulation
Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample in the feature space
implicitly defined by the kernel K (x;z).The SV classifier is determined byëã that
solvesmaxë2R l
e0ë à 21ë0DK (A;A0)Dë
subject to
e0Dë = 0; ë>0:
The value of kernel function represents the inner product in feature space
Kernel functions merge two steps 1. map input data from input space to feature space (might be infinite dim.) 2. do inner product in the feature space
Kernel TechniqueBased on Mercer’s Condition (1909)
Mercer’s Conditions Guarantees the Convexity of QP
and k(x;z)is a symmetric function onX .
K 2 Rnâ n
be a finite spaceX = f x1; x2; . . .; xngLet
Then k(x;z)is a kernel function if and only if
is positive semi-definite.;K i j = k(xi;xj)
Introduce Kernel in Dual FormulationFor 2-Norm Soft Margin
ë>0
maxë2R l
e0ë à 21ë0D(K (A;A0) + C
1I )Dë
e0Dë = 0
Then the decision rule is defined by
Use above conditions to find
The feature space implicitly defined byk(x;z) Supposeëãsolves the QP problem:
h(x) = sgn(K (x;A0)Dëã + bã)
Introduce Kernel in Dual Formulationfor 2-Norm Soft Margin
for any
bã is chosen so that
yi[K (A0i;A
0)Dëã + bã] = 1à Cëã
i
i with ëãi 6= 0
06ëã ? D(K (A;A0)Dëã + ebã)+ øã à e> 0
Because:
and ëã = Cøã
Geometric Margin in Feature Spacefor 2-Norm Soft Margin
The geometric margin in the feature space is defined by
í = jjwãjj21 =
àe0ëã à C
1jjëãjj22áà 2
1
jjwãjj22 = (ëã)0DK (A;A0)Dëã
...= e0ëã à C
1 jjëãjj22
Why e0øã > jjøãjj22 ?
Discussion about Cfor 2-Norm Soft Margin
The only difference between “hard margin” and 2-norm soft margin is the objective function in the optimization problem
Larger C will give you a smaller margin in the feature space
CompareK (A;A0) & (K (A;A0) + C1I )
Smaller C will give you a better numerical condition