Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of...
Transcript of Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of...
![Page 1: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/1.jpg)
Introduction: Overview of Kernel MethodsStatistical Data Analysis with Positive Definite Kernels
Kenji Fukumizu
Institute of Statistical Mathematics, ROISDepartment of Statistical Science, Graduate University for Advanced Studies
October 6-10, 2008, Kyushu University
![Page 2: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/2.jpg)
Basic idea of kernel methods Some examples of kernel methods
Outline
Basic idea of kernel methodsLinear and nonlinear Data AnalysisEssence of kernel methodology
Some examples of kernel methodsKernel PCA: Nonlinear extension of PCARidge regression and its kernelization
2 / 25
![Page 3: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/3.jpg)
Basic idea of kernel methods Some examples of kernel methods
Basic idea of kernel methodsLinear and nonlinear Data AnalysisEssence of kernel methodology
Some examples of kernel methodsKernel PCA: Nonlinear extension of PCARidge regression and its kernelization
3 / 25
![Page 4: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/4.jpg)
Basic idea of kernel methods Some examples of kernel methods
Nonlinear Data Analysis I
• Classical linear methods• Data is expressed by a matrix.
X =
X1
1 X21 · · · Xm
1
X12 X2
2 · · · Xm2
...X1
N X2N · · · Xm
N
(m dimensional, N data)
• Linear operations (matrix operations) are used for data analysis.e.g.
- Principal component analysis (PCA)- Canonical correlation analysis (CCA)- Linear regression analysis- Fisher discriminant analysis (FDA)- Logistic regression, etc.
4 / 25
![Page 5: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/5.jpg)
Basic idea of kernel methods Some examples of kernel methods
Nonlinear Data Analysis II
Are linear methods sufficient? Nonlinear transform can help.
• Example 1: classification
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
0
5
10
15
20 0
5
10
15
20
-15
-10
-5
0
5
10
15
x1
x2
z1
z3
z2
linearly inseparable linearly separable
(x1, x2) 7→ (z1, z2, z3) = (x21, x
22,√
2x1x2)
(Unclear? watch http://jp.youtube.com/watch?v=3liCbRZPrZA)
5 / 25
![Page 6: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/6.jpg)
Basic idea of kernel methods Some examples of kernel methods
• Example 2: dependence of two data
Correlation
ρXY =Cov[X,Y ]√Var[X]Var[Y ]
=E[(X − E[X])(Y − E[Y ])]√E[(X − E[X])2]E[(Y − E[Y ])2]
.
• Transforming data to incorporate high-order moments seemsattractive.
6 / 25
![Page 7: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/7.jpg)
Basic idea of kernel methods Some examples of kernel methods
Basic idea of kernel methodsLinear and nonlinear Data AnalysisEssence of kernel methodology
Some examples of kernel methodsKernel PCA: Nonlinear extension of PCARidge regression and its kernelization
7 / 25
![Page 8: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/8.jpg)
Basic idea of kernel methods Some examples of kernel methods
Feature space for transforming data• Kernel methodology = a systematic way of analyzing data by
transforming them into a high-dimensional feature space.
Apply linear methods on the feature space.
• Which type of space serves as a feature space?• The space should incorporate various nonlinear information of the
original data.• The inner product of the feature space is essential for data analysis
(seen in the next subsection).
8 / 25
![Page 9: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/9.jpg)
Basic idea of kernel methods Some examples of kernel methods
Computational problem of inner product
• For example, how about this?
(X,Y, Z) 7→ (X,Y, Z,X2, Y 2, Z2, XY, Y Z,ZX, . . .).
• But, for high-dimensional data, the above expansion makes thefeature space very huge!
e.g. If X is 100 dimensional and the moments up to the thirdorder are used, the dimensionality of feature space is
100C1 + 100C2 + 100C3 = 166750.
• This causes a serious computational problem in working on theinner product of the feature space.We need a cleverer way of computing it. ⇒ Kernel method.
9 / 25
![Page 10: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/10.jpg)
Basic idea of kernel methods Some examples of kernel methods
Inner product by positive definite kernel
• A positive definite kernel gives efficient computation of the innerproduct:
With special choice of the feature space, we have a functionk(x, y) such that
〈Φ(Xi),Φ(Xj)〉 = k(Xi, Xj), positive definite kernel
whereX 3 x 7→ Φ(x) ∈ H (feature space).
• Many linear methods use only the inner product withoutnecessity of the explicit form of the vector Φ(X).
10 / 25
![Page 11: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/11.jpg)
Basic idea of kernel methods Some examples of kernel methods
Basic idea of kernel methodsLinear and nonlinear Data AnalysisEssence of kernel methodology
Some examples of kernel methodsKernel PCA: Nonlinear extension of PCARidge regression and its kernelization
11 / 25
![Page 12: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/12.jpg)
Basic idea of kernel methods Some examples of kernel methods
Review of PCA I
X1, . . . , XN : m-dimensional data.
Principal Component Analysis (PCA)• Find d-directions to maximize the variance.• Purpose: represent the structure of the data in a low dimensional
space.
12 / 25
![Page 13: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/13.jpg)
Basic idea of kernel methods Some examples of kernel methods
Review of PCA IIThe first principal direction:
u1 = arg max‖u‖=1
1N
{∑Ni=1u
T (Xi − 1N
∑Nj=1Xj)
}2 = arg max‖u‖=1
uTV u,
where V is the variance-covariance matrix:
V = 1N
∑Ni=1(Xi − 1
N
∑Nj=1Xj)(Xi − 1
N
∑Nj=1Xj)T .
- Eigenvectors u1, . . . , um of V (in descending order).- The p-th principal axis = up.- The p-th principal component of Xi = uT
pXi
Observation: PCA can be done if we can compute the inner product• covariance matrix V ,• inner product between the unit eigenvector and the data.
13 / 25
![Page 14: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/14.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernel PCA IX1, . . . , XN : m-dimensional data.Transform the data by a feature map Φ into a feature space H:
X1, . . . , XN 7→ Φ(X1), . . . ,Φ(XN )
Assume that the feature space has the inner product 〈 , 〉.
Apply PCA to the transformed data:• Maximize the variance of the projection onto the unit vector f .
max‖f‖=1
Var[〈f,Φ(X)〉] = max‖f‖=1
1N
∑Ni=1
(〈f,Φ(Xi)− 1
N
∑Nj=1Φ(Xj)〉
)2• Note: it suffices to use f =
∑ni=1 aiΦ(Xi), where
Φ(Xi) = Φ(Xi)− 1N
∑Nj=1Φ(Xj).
The direction orthogonal to Span{Φ(X1), . . . , Φ(XN )} does notcontribute.
14 / 25
![Page 15: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/15.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernel PCA II• The PCA solution:
max aT K2a subject to aT Ka = 1,
where K is N ×N matrix with Kij = 〈Φ(Xi), Φ(Xj)〉.Note:
1N
∑Ni=1〈f, Φ(Xi)〉2 = 1
N
∑Ni=1〈
∑Nj=1ajΦ(Xj), Φ(Xi)〉2 = 1
N aT K2a,
‖f‖2 = 〈∑n
i=1aiΦ(Xi),∑n
i=1aiΦ(Xi)〉 = aT Ka.
• The first principal component of the data Xi is
〈Φ(Xi), f〉 =∑N
i=1
√λ1u
1i ,
where K =∑N
i=1 λiuiuiT is the eigen decomposition.
15 / 25
![Page 16: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/16.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernel PCA IIIObservation:• PCA in the feature space can be done if we can compute〈Φ(Xi), Φ(Xj)〉 or
〈Φ(Xi),Φ(Xj)〉 = k(Xi, Xj).
• The principal direction is obtained in the form f =∑
i aiΦ(Xi),i.e., in the linear hull of the data.
Note:
Kij = 〈Φ(Xi), Φ(Xj)〉
= 〈Φ(Xi), Φ(Xj)〉 − 1N
∑Nb=1〈Φ(Xi), Φ(Xb)〉
− 1N
∑Na=1〈Φ(Xa), Φ(Xj)〉+ 1
N2
∑Na=1〈Φ(Xa), Φ(Xb)〉
= k(Xi, Xj)− 1N
∑Nb=1k(Xi, Xb)− 1
N
∑Na=1k(Xa, Xj) + 1
N2
∑Na=1k(Xa, Xb)
16 / 25
![Page 17: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/17.jpg)
Basic idea of kernel methods Some examples of kernel methods
Basic idea of kernel methodsLinear and nonlinear Data AnalysisEssence of kernel methodology
Some examples of kernel methodsKernel PCA: Nonlinear extension of PCARidge regression and its kernelization
17 / 25
![Page 18: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/18.jpg)
Basic idea of kernel methods Some examples of kernel methods
Review: Linear Regression ILinear regression• Data: (X1, Y1), . . . , (XN , YN ): data
• Xi: explanatory variable, covariate (m-dimensional)• Yi: response variable, (1 dimensional)
• Regression model: find the best linear relation
Yi = aTXi + εi
18 / 25
![Page 19: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/19.jpg)
Basic idea of kernel methods Some examples of kernel methods
Review: Linear Regression II• Least square method:
mina
∑Ni=1‖Yi − aTXi‖2.
• Matrix expression
X =
X1
1 X21 · · · Xm
1
X12 X2
2 · · · Xm2
...X1
N X2N · · · Xm
N
, Y =
Y1
Y2
...YN
.
• Solution:a = (XTX)−1XTY
y = aTx = Y TX(XTX)−1x.
Observation: Linear regressio can be done if we can compute theinner product XTX, aTx and so on.
19 / 25
![Page 20: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/20.jpg)
Basic idea of kernel methods Some examples of kernel methods
Ridge Regression
Ridge regression:• Find a linear relation by
mina
∑Ni=1‖Yi − aTXi‖2 + λ‖a‖2.
λ: regularization coefficient.• Solution
a = (XTX + λIN )−1XTY
For a general x,
y(x) = aTx = Y TX(XTX + λIN )−1x.
• Ridge regression is useful when (XTX)−1 does not exist, orinversion is numerically unstable.
20 / 25
![Page 21: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/21.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernelization of Ridge Regression I(X1, Y1) . . . , (XN , YN ) (Yi: 1-dimensional)
Transform Xi by a feature map Φ into a feature space H:
X1, . . . , XN 7→ Φ(X1), . . . ,Φ(XN )
Assume that the feature space has the inner product 〈 , 〉.
Apply ridge regression to the transformed data:• Find the vector f such that
minf∈H
∑Ni=1|Yi − 〈f,Φ(Xi)〉H|2 + λ‖f‖2H.
• Similarly to kernel PCA, we can assume f =∑n
j=1 cjΦ(Xj).
minc
∑Ni=1|Yi − 〈
∑Nj=1cjΦ(Xj),Φ(Xi)〉H|2 + λ‖
∑Nj=1cjΦ(Xj)‖2H
21 / 25
![Page 22: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/22.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernelization of Ridge Regression II
• Solution:
c = (K+λIN )−1Y, where Kij = 〈Φ(Xi),Φ(Xj)〉H = k(Xi, Xj).
For a general x,
y(x) = 〈f ,Φ(x)〉H = 〈∑
j cjΦ(Xj),Φ(x)〉H= Y T (K + λIN )−1k,
where
k =
〈Φ(X1),Φ(x)〉...
〈Φ(XN ),Φ(x)〉
=
k(X1, x)...
k(XN , x)
.
22 / 25
![Page 23: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/23.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernelization of Ridge Regression III
Proof.Matrix expression gives∑N
i=1 |Yi − 〈∑N
j=1cjΦ(Xj),Φ(Xi)〉H|2 + λ‖∑N
j=1cjΦ(Xj)‖2H= (Y −Kc)T (Y −Kc) + λcTKc
= cT (K2 + λK)c− 2Y TKc+ Y TY.
It follows that the optimal c is given by
c = (K + λIN )−1Y.
Inserting this to y(x) = 〈∑
j cjΦ(Xj),Φ(x)〉H, we have the claim.
23 / 25
![Page 24: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/24.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernelization of Ridge Regression IV
Observation:• Ridge regression in the feature space can be done if we can
compute the inner product
〈Φ(Xi),Φ(Xj)〉 = k(Xi, Xj).
• The resulting coefficient is of the form f =∑
i ciΦ(Xi), i.e., in thelinear hull of the data.The orthogonal directions do not contribute to the objectivefunction.
24 / 25
![Page 25: Introduction: Overview of Kernel Methodsfukumizu/Kyushu2008/Kernel_intro_1.pdf · Basic idea of kernel methods Some examples of kernel methods Outline Basic idea of kernel methods](https://reader034.fdocuments.net/reader034/viewer/2022042208/5eaae0b3188a1c3d087013f7/html5/thumbnails/25.jpg)
Basic idea of kernel methods Some examples of kernel methods
Kernel methodology
• A feature space H with inner product 〈 , 〉.
• Mapping of the data into a feature space:
X1, . . . , XN 7→ Φ(X1), . . . ,Φ(XN ) ∈ H.
• If the computation of the inner product 〈Φ(Xi),Φ(Xi)〉 istractable, various linear methods can be extended to the featurespace.
• Give Methods of nonlinear data analysis.
How can we prepare such a feature space?⇒ Positive definite kernel!
25 / 25