CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.
-
Upload
bryce-morren -
Category
Documents
-
view
216 -
download
1
Transcript of CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.
![Page 1: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/1.jpg)
CS 59000 Statistical Machine learningLecture 13
Yuan (Alan) QiPurdue CS
Oct. 8 2008
![Page 2: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/2.jpg)
Outline
• Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis
• Gaussian processes (GPs)• From linear regression to GP • GP for regression
![Page 3: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/3.jpg)
Kernel Trick1. Reformulate an algorithm such that input
vector enters only in the form of inner product .
2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel
function:Examples: Kernel PCA, Kernel Fisher
discriminant, Support Vector Machines
![Page 4: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/4.jpg)
Dual variables:
Dual Representation for Ridge Regression
![Page 5: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/5.jpg)
Kernel Ridge Regression
Using kernel trick:
Minimize over dual variables:
![Page 6: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/6.jpg)
Generate Kernel Matrix
Positive semidefiniteConsider Gaussian kernel:
![Page 7: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/7.jpg)
Principle Component Analysis (PCA)
Assume
We have
is a normalized eigenvector:
![Page 8: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/8.jpg)
Feature Mapping
Eigen-problem in feature space
![Page 9: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/9.jpg)
Dual Variables
Suppose , we have
![Page 10: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/10.jpg)
Eigen-problem in Feature Space (1)
![Page 11: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/11.jpg)
Eigen-problem in Feature Space (2)
Normalization condition:
Projection coefficient:
![Page 12: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/12.jpg)
General Case for Non-zero Mean Case
Kernel Matrix:
![Page 13: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/13.jpg)
Gaussian Processes
How kernels arise naturally in a Bayesian setting?
Instead of assigning a prior on parameters w, we assign a prior on function value y.Infinite space in theory
Finite space in practice (finite number of training set and test set)
![Page 14: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/14.jpg)
Linear Regression Revisited
Let
We have
![Page 15: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/15.jpg)
From Prior on Parameter to Prior on Function
The prior on function value:
![Page 16: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/16.jpg)
Stochastic Process
A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)
![Page 17: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/17.jpg)
Gaussian Processes
The joint distribution of any variables is a multivariable Gaussian distribution.
Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :
![Page 18: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/18.jpg)
Impact of Kernel FunctionCovariance matrix : kernel function
Application economics & finance
![Page 19: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/19.jpg)
Gaussian Process for Regression
Likelihood:
Prior:
Marginal distribution:
![Page 20: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/20.jpg)
Samples of GP Prior over Functions
![Page 21: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/21.jpg)
Samples of Data Points
![Page 22: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/22.jpg)
Predictive Distribution
is a Gaussian distribution with mean and variance:
![Page 23: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/23.jpg)
Predictive Mean
We see the same form as kernel ridge regression and kernel PCA.
![Page 24: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/24.jpg)
GP Regression
Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?
![Page 25: CS 59000 Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008.](https://reader038.fdocuments.net/reader038/viewer/2022110320/56649caf5503460f9497257e/html5/thumbnails/25.jpg)
Marginal Distribution of Target Values