Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines:...
-
Upload
alexia-marsh -
Category
Documents
-
view
229 -
download
0
Transcript of Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines:...
Ch. Eick: Support Vector Machines: The Main Ideas
Reading Material Support Vector Machines:1.Textbook2. First 3 columns of Smola/Schönkopf article on SV Regression3.http://en.wikipedia.org/wiki/Kernel_trick
2
Likelihood- vs. Discriminant-based Classification
Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x)
gi(x) = log P(Ci|x)
Discriminant-based: Assume a model for gi(x|Φi); no density estimation Prototype-based: Make classification decisions based on nearest prototypes without constructing
decision boundaries (kNN, kMeans approach) Estimating the boundaries is enough; no need to accurately estimate the densities/probability inside the
boundaries; we are just interested in learning decision boundaries (lines for which the densities of two classes is the same), and many popular classification techniques learn decision boundaries without explicitly constructing density functions.
Eick: Support Vector Machines: The Main Ideas
Support Vector Machines
SVMs use a single hyperplane; one Possible Solution
B1
Eick: Support Vector Machines: The Main Ideas
http://en.wikipedia.org/wiki/Hyperplane
Support Vector Machines
Another possible solution
B2
Eick: Support Vector Machines: The Main Ideas
Support Vector Machines
Other possible solutions
B2
Eick: Support Vector Machines: The Main Ideas
Support Vector Machines
Which one is better? B1 or B2? How do you define better?
B1
B2
Eick: Support Vector Machines: The Main Ideas
Support Vector Machines
Find a hyperplane maximizing the margin => B1 is better than B2
B1
B2
b11
b12
b21b22
margin
Eick: Support Vector Machines: The Main Ideas
8
Key Properties of Support Vector Machines1. Use a single hyperplane which subdivides the space into two half-spaces, one which is
occupied by Class1 and the other by Class22. They maximize the margin of the decision boundary using quadratic optimization
techniques which find the optimal hyperplane.3. When used in practice, SVM approaches frequently map (using ) the examples to a higher
dimensional space and find margin maximal hyperplanes in the mapped space, obtaining decision boundaries which are not hyperplanes in the original space.
4. Moreover, versions of SVMs exist that can be used when linear separability cannot be accomplished.
Eick: Support Vector Machines: The Main Ideas
T
T
xxxxxx
yxyxyyxxyxyx
yxyx
K
22
212121
22
22
21
2121212211
22211
2
2221
2221
1
1
,,,,,
,
x
yxyx
Support Vector MachinesB1
b11
b12
0 bxw
1 bxw 1 bxw
1bxw if1
1bxw if1)(
xf ||||
2Margin
w
Examples are:
(x1,..,xn,y) with y{-1,1}
Eick: Support Vector Machines: The Main Ideas
L2 Norm: http://en.wikipedia.org/wiki/L2_norm#Euclidean_norm
Dot-Product: http://en.wikipedia.org/wiki/Dot_product
Support Vector Machines We want to maximize:
Which is equivalent to minimizing:
But subjected to the following N constraints:
This is a constrained convex quadratic optimization problem that can be solved in polynominal time
Numerical approaches to solve it (e.g., quadratic programming) exist
The function to be optimized has only a single minimum no local minimum problem
||||
2Margin
w
N1,..,i 1b)xw(y ii
2
||||)(
2wwL
Eick: Support Vector Machines: The Main Ideas
Dot-Product: http://en.wikipedia.org/wiki/Dot_product
Support Vector Machines What if the problem is not linearly separable?
Eick: Support Vector Machines: The Main Ideas
Linear SVM for Non-linearly Separable Problems
What if the problem is not linearly separable? Introduce slack variables Need to minimize:
Subject to (i=1,..,N):
C is chosen using a validation set trying to keep the margins wide while keeping the training error low.
i
iii
0)2(
-1b)xw(*y )1(
N
i
kiC
wwL
1
2
2
||||)(
Measures prediction error
Inverse size of marginbetween hyperplanes
Parameter
Slack variable
allows constraint violationto a certain degree
Eick: Support Vector Machines: The Main Ideas
No kernel
Nonlinear Support Vector Machines What if decision boundary is not linear?
Alternative 1:Use technique thatEmploys non-lineardecision boundaries
Non-linear function
Eick: Support Vector Machines: The Main Ideas
Nonlinear Support Vector Machines1. Transform data into higher dimensional space2. Find the best hyperplane using the methods introduced
earlier
Alternative 2:Transform into a higher dimensionalattribute space and find linear decision boundaries in this space
Eick: Support Vector Machines: The Main Ideas
Nonlinear Support Vector Machines
1. Choose a non-linear function to transform into a different, usually higher dimensional, attribute space
2. Minimize
but subjected to the following N constraints:
N1,..,i 1b))x w(y ii
2
||||)(
2wwL
Find a good hyperplanein the transformed space
Eick: Support Vector Machines: The Main Ideas
Remark: The Soft Margin SVM can be generalized similarly.
Example: Polynomial Kernel Function
Polynomial Kernel Function:(x1,x2)=(x12,x22,sqrt(2)*x1,sqrt(2)*x2,1)K(u,v)=(u)(v)= (uv + 1)2
A Support Vector Machine with polynomial kernel function classifies a new example z as follows:
sign(( iyi(xi)(z))+b) =
sign(( iyi (xiz +1)2))+b)
Remark: i and b are determined using the methods for linear SVMs that were discussed earlier
Kernel function trick: perform computations in the original space, although we solve an optimization problem in the transformed space more efficient; more detailsTopic14.
Other Material on SVMs http://www.youtube.com/watch?v=27RQRUR7Ubc Support Vector Machines in Rapid Miner
http://stackoverflow.com/questions/1072097/pointers-to-some-good-svm-tutorial
http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html Adaboost/SVM Relationship Lecture:
http://videolectures.net/mlss05us_rudin_da/
Eick: Support Vector Machines: The Main Ideas
Summary Support Vector Machines Support vector machines learn hyperplanes that separate two
classes maximizing the margin between them (the empty space between the instances of the two classes).
Support vector machines introduce slack variables—in the case that classes are not linear separable—trying to maximize margins while keeping the training error low.
The most popular versions of SVMs use non-linear kernel functions and map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space.
Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/5000/50000…
In general, support vector machines accomplish quite high accuracies, if compared to other techniques.
In the last 10 years, support vector machines have been generalized for other tasks such as regression, PCA, outlier detection,…
Eick: Support Vector Machines: The Main Ideas
19
Kernels—What can they do for you? Some machine learning/statistical problems only depend on the dot-
product of the objects in the dataset O={x1,..,xn} and not on other characteristics of the objects in the dataset; in other words, those techniques only depend on the gram matrix of O which stores x1x1, x1x2,…xnxn (http://en.wikipedia.org/wiki/Gramian_matrix) .
These techniques can be generalized by mapping the dataset into a higher dimensional space as long as the non-linear mapping can be kernelized; that is, a kernel function K can be found such that:
K(u,v)= (u)(v) In this case the results are computed in the mapped space based on
K(x1,x1), K(x1,x2),…,K(xn,xn) which is called the kernel trick: http://en.wikipedia.org/wiki/Kernel_trick
Kernels have been successfully used to generalize PCA, K-means, support vector machines, and many other techniques, allowing them to use non-linear coordinate systems, more complex decision boundaries, or more complex cluster boundaries.
We will revisit kernels later discussing transparencies 13-25, 30-35 of the Vasconcelos lecture.
Eick: Support Vector Machines: The Main Ideas