Kernel Functions for Support Vector Machinesjbg/teaching/CSCI_5622/09b.pdf · Jordan Boyd-Graber j...

Kernel Functions for SupportVector Machines

Jordan Boyd-GraberUniversity of Colorado BoulderLECTURE 9B

Slides adapted from Jerry Zhu

Jordan Boyd-Graber | Boulder Kernel Functions for Support Vector Machines | 1 of 13

Can you solve this with linear separator?

Adding another dimension

Behold yon miserable creature. That Point is aBeing like ourselves, but confined to thenon-dimensional Gulf. He is himself his ownWorld, his own Universe; of any other thanhimself he can form no conception; he knowsnot Length, nor Breadth, nor Height, for he hashad no experience of them; he has nocognizance even of the number Two; nor has hea thought of Plurality, for he is himself his Oneand All, being really Nothing. Yet mark hisperfect self-contentment, and hence learn thislesson, that to be self-contented is to be vileand ignorant, and that to aspire is better thanto be blindly and impotently happy.

Problems get easier in higher dimensions

What’s special about SVMs?

max~α

m∑i=1

αi −1

m∑i=1

αiαjyiyj(~xi · ~xj) (1)

• This dot product is basically just how much xi looks like xj . Canwe generalize that?

• Kernels!

max~α

m∑i=1

αi −1

m∑i=1

• Kernels!

max~α

m∑i=1

αi −1

m∑i=1

• Kernels!

What’s a kernel?

• A function K : X × X 7→ R is a kernel over X .

• This is equivalent to taking the dot product 〈φ(x1), φ(x2)〉 forsome mapping

• Mercer’s Theorem: So long as the function is continuous andsymmetric, then K admits an expansion of the form

K (x , x ′) =∞∑n=0

anφn(x)φn(x ′) (2)

• The computational cost is just in computing the kernel

What’s a kernel?

• A function K : X × X 7→ R is a kernel over X .

• This is equivalent to taking the dot product 〈φ(x1), φ(x2)〉 forsome mapping

• Mercer’s Theorem: So long as the function is continuous andsymmetric, then K admits an expansion of the form

K (x , x ′) =∞∑n=0

anφn(x)φn(x ′) (2)

• The computational cost is just in computing the kernel

Polynomial Kernel

K (x , x ′) = (x · x ′ + c)d (3)

When d = 2:

Polynomial Kernel

K (x , x ′) = (x · x ′ + c)d (3)

When d = 2:

Gaussian Kernel

K (x , x ′) = exp−‖x′ − x‖2

2σ2(4)

which can be rewritten as

K (x , x ′) =∑n

(x · x ′)n

σnn!(5)

(All polynomials!)

Gaussian Kernel

K (x , x ′) = exp−‖x′ − x‖2

2σ2(4)

K (x , x ′) =∑n

(x · x ′)n

σnn!(5)

(All polynomials!)

Gaussian Kernel

K (x , x ′) = exp−‖x′ − x‖2

2σ2(4)

K (x , x ′) =∑n

(x · x ′)n

σnn!(5)

(All polynomials!)

Tree Kernels

• Sometimes we have example x that are hard to express as vectors

• For example sentences “a dog” and “a cat”: internal syntaxstructure

3/5 structures match, so tree kernel returns .6

Tree Kernels

What does this do to learnability?

• Kernelized hypothesis spaces are obviously more complicated

• What does this do to complexity?

• Rademacher complexity for a kernel with radius Λ and data withradius r : S ⊂ {x : K (x , x) ≤ r2}, H = {x 7→ w · φ(x) : ‖w‖ ≤ Λ}

R̂S (H) ≤√

• Proof requires real analysis

R̂S (H) ≤√

Margin learnability

• With probability 1− δ:

R(h) ≤ R̂ρ(h) + 2

√r2Λ2/ρ2

√log 1

• So if you can find a simple kernel representation that induces amargin, use it!

• . . . so long as you can handle the computational complexity

Margin learnability

R(h) ≤ R̂ρ(h) + 2

√r2Λ2/ρ2

√log 1

Margin learnability

R(h) ≤ R̂ρ(h) + 2

√r2Λ2/ρ2

√log 1

How does it effect optimization

• Replace all dot product with kernel evaluations K (x1, x2)

• Makes computation more expensive, overall structure is the same

• Try linear first!

• This completes our discussionof SVMs

• Workhorse method of machinelearning

• Flexible, fast, effective

• Kernels: applicable to widerange of data, inner producttrick keeps method simple

• This completes our discussionof SVMs

• Workhorse method of machinelearning

• Flexible, fast, effective

• Kernels: applicable to widerange of data, inner producttrick keeps method simple

Kernel Functions for Support Vector Machinesjbg/teaching/CSCI_5622/09b.pdf · Jordan Boyd-Graber j...

Documents

Transcript of Kernel Functions for Support Vector Machinesjbg/teaching/CSCI_5622/09b.pdf · Jordan Boyd-Graber j...

Support Vector Machines Kernel Machines

Support Vector Machines & Kernel Machines

A Divide-and-Conquer Solver for Kernel Support Vector Machines · A Divide-and-Conquer Solver for Kernel Support Vector Machines can ﬁnd a globally optimal solution (to within 10

Applications of Support Vector Machines in Chemistryivanciuc.org/Files/Reprint/Ivanciuc_Applications_of_Support_Vector... · including Advances in Kernel Methods: Support Vector Learning

A Divide-and-Conquer Solver for Kernel Support Vector Machines

Kernel methods and support vector machines

Kernel Methods and Support Vector Machines - …oschulte/teaching/726/spring11/slides/mychapter6.pdf · Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop

Kernel Methods and Support Vector Machines - Computer Science

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Support Vector Machines and Kernel Algorithmsalex.smola.org/Papers/2002/SchSmo02b.pdfB. Sch olkopf and A.J. Smola, Support Vector Machines and Kernel Algorithms, 2 INTRODUCTION One

LNAI 5212 - Support Vector Machines, Data …bnrg.cs.berkeley.edu/~adj/publications/paper-files/SVM-PKDD08.pdf · Keywords: Support vector machines, kernel methods, approximate kernel

A Unifying Framework in Vector-valued Reproducing Kernel Hilbert ...

From Reset Vector to Kernel - FOSDEM

Support Vector Machines -Dual formulation and Kernel Trickaarti/Class/10315_Fall20/lecs/... · 2020. 10. 27. · Support Vector Machines-Dual formulation and Kernel Trick Aarti Singh

Evolutionary Optimization of Least-Squares Support Vector ... · Evolutionary Optimization of Least-Squares Support Vector Machines 3 2 Kernel Machines All Kernel Machines rely on

Support Vector Machines (and Kernel Methods in general)

Support Vector Machine (SVM) and Kernel Methods · Support Vector Machine (SVM) and Kernel Methods. Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

Support Vector Machines and Kernel Based Learning · 2007. 9. 6. · Kernel based learning: interdisciplinary challenges SVM & kernel methods linear algebra mathematics statistics

Support Vector Machine (SVM) and Kernel Methods

081 Cambio automatico 09B.pdf