Old Testament: Dispensations, People and Events--Class Collaboration (ANLI)
Gaussian Processes Li An [email protected] Li An [email protected].
-
Upload
georgiana-snow -
Category
Documents
-
view
228 -
download
0
Transcript of Gaussian Processes Li An [email protected] Li An [email protected].
![Page 2: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/2.jpg)
The PlanThe Plan
• Introduction to Gaussian Processes
• Revisit Linear regression• Linear regression updated by Gaussian Processes
• Gaussian Processes for Regression
• Conclusion
• Introduction to Gaussian Processes
• Revisit Linear regression• Linear regression updated by Gaussian Processes
• Gaussian Processes for Regression
• Conclusion
![Page 3: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/3.jpg)
Why GPs?Why GPs?
• Here are some data points! What function did they come from?
• I have no idea.
• Oh. Okay. Uh, you think this point is likely in the function too?
• I have no idea.
• Here are some data points! What function did they come from?
• I have no idea.
• Oh. Okay. Uh, you think this point is likely in the function too?
• I have no idea.
![Page 4: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/4.jpg)
Why GPs?Why GPs?
• You can’t get anywhere without making some assumptions
• GPs are a nice way of expressing this ‘prior on functions’ idea.
• Can do a bunch of cool stuff• Regression• Classification• Optimization
• You can’t get anywhere without making some assumptions
• GPs are a nice way of expressing this ‘prior on functions’ idea.
• Can do a bunch of cool stuff• Regression• Classification• Optimization
![Page 5: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/5.jpg)
GaussianGaussian• Unimodal• Concentrated• Easy to compute with
• Sometimes
• Tons of crazy properties
• Unimodal• Concentrated• Easy to compute with
• Sometimes
• Tons of crazy properties
e (x )2
2 2
2 2
![Page 6: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/6.jpg)
Linear Regression Revisited
Linear Regression Revisited
• Linear regression model: Combination of M fixed basis functions given by , so that
• Prior distribution
• Given training data points , what is the joint distribution of ?
• is the vector with elements , this vector is given by
where is the design matrix with elements
• Linear regression model: Combination of M fixed basis functions given by , so that
• Prior distribution
• Given training data points , what is the joint distribution of ?
• is the vector with elements , this vector is given by
where is the design matrix with elements
)(x)()( xwxy T
),0|()( 1IwNwp nxx ,...,1
)(),...,( 1 nxyxy
y )( nn xyy wy
)( nknk x
![Page 7: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/7.jpg)
Linear Regression Revisited
Linear Regression Revisited
• , y is a linear combination of Gaussian distributed variables given by the elements of w, hence itself is Gaussian.
• Find its mean and covariance
• , y is a linear combination of Gaussian distributed variables given by the elements of w, hence itself is Gaussian.
• Find its mean and covariance
wy
function. kernel theis )xk(x, and
)()(1
),(K
elements withmatrix Gram theisK where
1]E[ww]E[yycov[y]
0Ε[w] Ε[y]
'
nm
TTT
mT
nmn
T
xxxxk
K
![Page 8: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/8.jpg)
Definition of GPDefinition of GP• A Gaussian process is defined as a probability
distribution over functions y(x), such that the set of values of y(x) evaluated at an arbitrary set of points x1,.. Xn jointly have a Gaussian distribution.• Probability distribution indexed by an arbitrary
set• Any finite subset of indices defines a
multivariate Gaussian distribution
• Input space X, for each x the distribution is a Gaussian, what determines the GP is • The mean function µ(x) = E(y(x))• The covariance function (kernel)
k(x,x')=E(y(x)y(x'))• In most applications, we take µ(x)=0. Hence the
prior is represented by the kernel.
• A Gaussian process is defined as a probability distribution over functions y(x), such that the set of values of y(x) evaluated at an arbitrary set of points x1,.. Xn jointly have a Gaussian distribution.• Probability distribution indexed by an arbitrary
set• Any finite subset of indices defines a
multivariate Gaussian distribution
• Input space X, for each x the distribution is a Gaussian, what determines the GP is • The mean function µ(x) = E(y(x))• The covariance function (kernel)
k(x,x')=E(y(x)y(x'))• In most applications, we take µ(x)=0. Hence the
prior is represented by the kernel.
![Page 9: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/9.jpg)
Linear regression updated by GP
Linear regression updated by GP
• Specific case of a Gaussian Process
• It is defined by the linear regression model
with a weight prior
the kernel function is given by
• Specific case of a Gaussian Process
• It is defined by the linear regression model
with a weight prior
the kernel function is given by
)()( xwxy T
),0|()( 1IwNwp
)()(1
),( mT
nmn xxxxk
![Page 10: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/10.jpg)
Kernel functionKernel function• We can also define the kernel function directly.
• The figure show samples of functions drawn from Gaussian processes for two different choices of kernel functions
• We can also define the kernel function directly.
• The figure show samples of functions drawn from Gaussian processes for two different choices of kernel functions
![Page 11: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/11.jpg)
GP for RegressionGP for RegressionTake account of the noise on the observed target
values,which are given by
Take account of the noise on the observed target values,
which are given by
)Iy,|N(ty)|p(t
by given is )y,...,(y yon dconditione
)t,...,(t tof ondistributijoint thet,independen is noise theBecause
noise. theof precision thengrepresentieter hyperparam a is where
) ,y|N(t )y|p(t
that so
on,distributi Gaussian a have that processes noiseconsider weHere
variablenoise random a is and ,)( where
t
n1-
Tn1
Tn1
1-nnnn
n
n
nn
nn
xyy
y
![Page 12: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/12.jpg)
GP for regressionGP for regression
• From the definition of GP, the marginal distribution p(y) is given by
• The marginal distribution of t is given by
• Where the covariance matrix C has elements
• From the definition of GP, the marginal distribution p(y) is given by
• The marginal distribution of t is given by
• Where the covariance matrix C has elements
),0|()( KyNyp
),0|()()|()( CtNdyypytptp
nmmnmn xxkxxC 1),(),(
![Page 13: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/13.jpg)
GP for RegressionGP for Regression
• The sampling of data points t• The sampling of data points t
![Page 14: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/14.jpg)
GP for RegressionGP for Regression• We’ve used GP to build a model of the joint distribution over sets of data points
• Goal:
• To find , we begin by writing down the joint distribution
• We’ve used GP to build a model of the joint distribution over sets of data points
• Goal:
• To find , we begin by writing down the joint distribution
1n1n
n11n
input xnew afor predict t
,x,..., xesinput valu,),..,( tpoints trainingGiven
Tntt
)|( 1 ttp n
1-1n1n1
1
111
)x,k(xc and matrix, nn is where,c
matrix, 1)(n1)(n is where
),0|()(
nT
n
n
n
nnn
Ck
kCC
C
CtNtp
![Page 15: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/15.jpg)
GP for RegressionGP for Regression
• The conditional distribution is a Gaussian distribution with mean and covariance given by
• These are the key results that define Gaussian process regression.
• The predictive distribution is a Gaussian whose mean and variance both depend on
• The conditional distribution is a Gaussian distribution with mean and covariance given by
• These are the key results that define Gaussian process regression.
• The predictive distribution is a Gaussian whose mean and variance both depend on
)|( 1 ttp n
kCkcx
tCkxm
nT
n
nT
n
11
2
11
)(
)(
1nx
![Page 16: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/16.jpg)
A Example of GP Regression
A Example of GP Regression
![Page 17: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/17.jpg)
GP for RegressionGP for Regression
• The only restriction on the kernel is that the covariance matrix given by
must be positive definite.• GP will involve a matrix of size n*n, for which require computations.
• The only restriction on the kernel is that the covariance matrix given by
must be positive definite.• GP will involve a matrix of size n*n, for which require computations.
nmmnmn xxkxxC 1),(),(
)( 3nO
![Page 18: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/18.jpg)
ConclusionConclusion• Distribution over functions• Jointly have a Gaussian distribution• Index set can be pretty much whatever
• Reals• Real vectors• Graphs• Strings• …
• Most interesting structure is in k(x,x’), the ‘kernel.’
• Uses for regression to predict the target for a new input
• Distribution over functions• Jointly have a Gaussian distribution• Index set can be pretty much whatever
• Reals• Real vectors• Graphs• Strings• …
• Most interesting structure is in k(x,x’), the ‘kernel.’
• Uses for regression to predict the target for a new input
![Page 19: Gaussian Processes Li An anli@temple.edu Li An anli@temple.edu.](https://reader035.fdocuments.net/reader035/viewer/2022062304/56649f515503460f94c74f43/html5/thumbnails/19.jpg)
Questions Questions
• Thank you!• Thank you!