Additional Topics in Prediction Methodology. Introduction Predictive distribution for random...
-
Upload
marcia-banks -
Category
Documents
-
view
213 -
download
0
Transcript of Additional Topics in Prediction Methodology. Introduction Predictive distribution for random...
![Page 1: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/1.jpg)
Additional Topics in Prediction Methodology
![Page 2: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/2.jpg)
Introduction
• Predictive distribution for random variable Y0 is meant to capture all the information about Y0 that is contained in Yn.
• not completely specify Y0 but does provide a probability distribution of more likely and less likely values of Y0
• E{Y0|Yn} is the best MSPE predictor of Y0
![Page 3: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/3.jpg)
Hierarchical models have two stages
• X Rd
• f0=f(x0) known p*1 vector
• F=(fj(xj)) known n*p matrix
unknown p*1 vector regression coefficients
• R=(R(xi-xj)) known n*n matrix correlations among trainning data Yn
• r0=(R(xi-x0)) known n*1 vector correlations of Y0 with Yn
![Page 4: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/4.jpg)
Predictive Distributions when Z2, R
and r0 are known
![Page 5: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/5.jpg)
![Page 6: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/6.jpg)
Interesting features of (a) and (b)
• Non-informative Prior is the limit of the normal prior as
• While the prior is non-informative, it is not a proper distribution. The corresponding predictive distribution is proper.
• The same conditioning argument can be applied to drive posterior mean for the non-informative prior and normal prior.
![Page 7: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/7.jpg)
The mean and variance of the predictive distribution (mean)
0|n(x0) and 0|n(x0) depend on x0 only through the regression function f0 and correlation vector r0
0|n(x0) is a linear unbiased predictor of Y(x0)• The continuity and other smoothness properties
of 0|n(x0) are inherited from correlation function R(.) and the regressors {f(.)}j=1
p
![Page 8: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/8.jpg)
0|n(x0) depends on the parameters z2 2
only through their ratio
0|n(x0) interpolate the training data. When x0=xi, f0=f(xi), and r0
TR-1=eiT, the ith unit vect
or.
![Page 9: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/9.jpg)
)2/7cos()( 4.1 xexy x
00| )1(0
bn
![Page 10: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/10.jpg)
The mean and variance of the predictive distribution (Variance)
• MSPE(0|n(x0) )= 0|n2(x0)
• The variance of the posterior of Y(x0) given Yn should be 0 whenever x0=xi
0|n2(xi)=0
![Page 11: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/11.jpg)
Most important use of Theorem 4.1.1
![Page 12: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/12.jpg)
Predictive Distributions when R and r0 are known
The posterior is a location shifted and scaled univariate t distribution having degrees of freedom that are enhanced when there is informative prior information for either or z
2
![Page 13: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/13.jpg)
![Page 14: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/14.jpg)
![Page 15: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/15.jpg)
Degree of freedom
• Base value for the degree of freedom i=n-p
• P additional degrees of freedom when prior is informative
0 additional degree of freedom when z2 is infor
mative
![Page 16: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/16.jpg)
Location shift
The same centering value as Theorem 4.1.1 (known z
2 )
The non-informative prior gives the BLUP
![Page 17: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/17.jpg)
Scale factor i2(x0)
(compare 4.1.15 with 4.1.6)
• Estimate of the scale factor 0|n2(x0).
• Qi2/i : estimate z
2
• Qi2: get information about z
2 from the conditional distribution Yn given z
2 and information from the prior of z
2
i2(xi)=0, xi is any of the training data point
s.
![Page 18: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/18.jpg)
Prediction Distributions when Correlation parameters are unknown
• If the correlations among the observations is unknown (R r0 are unknown)?– Assume y(.) has a Gaussian prior with
correlation function R(.|), is unknown vector parameters
• Two issues– Standard error of Plug-in predictor 0|n(x0|)
by substituting comes from MLE or REML– Bayesian approach to uncertainty in which
is to model it by a prior distribution
![Page 19: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/19.jpg)
Prediction of Multiple Response Models
• Several outputs are available for from a computer experiment
• Several codes are available for computing the same response (fast and slow code)
• Competing response
• Several stochastic models for joint response• Using these models to describe the optimal
predictor for one of the several computed responses.
![Page 20: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/20.jpg)
Modeling Multiple Outputs
• Zi(.): marginally mean zero stationary Gaussian stochastic processes with unknown variance and correlation function R
• Zi(x) implies that the correlation between Zi(x1) and Zi(x2) only depends on x1-x2
• Assume Cov(Zi(x1), Zj(x2))=ijRij(x1-x2)• Rij(.) cross-correlation function of Zi(.) and Zj(.) • Linear model: global mean of the Yi process. fi(.): known
regression functions i: unknown regression parameters
![Page 21: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/21.jpg)
Selection of correlation and cross-correlation functions are complicated
• Reason: for any input sites xli, the multivariate normal distributed random vector (Z1(x1
1), ….)T must have a nonnegative definite covariance matrix
• Solution: construct the Zi(.) from a set of elementary processes (usually this processes are mutually independent)
![Page 22: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/22.jpg)
Example by Kennedy and O’Hagan
• Yi(x): prior for the ith code level (i=m top-level code). The autoregressive model:– Yi(x)=i-1Yi-1(x)+i(x), i=2, … , m
• The output for each successive higher level code i at x is related to the output of the less precise code i-1 at x plus the refinement i(x)
– Cov(Yi(x), Yi-1(w)|Yi-1(x))=0 for all w~=x• No additional second-order knowledge of code i at x can be
obtained from the lower-level code i-1 if the value of code i-1 at x is known (Markov property on the hierarchy of codes)
• Since there is no natural hierarchy of computer code in such applications, we need find something better.
![Page 23: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/23.jpg)
More reasonable Model
• Each constraint function is associated with the objective function plus a refinement– Yi(x)=iY1(x)+i(x), i=2, … , m+1
• Ver Hoef and Marry– Form models in the environmental sciences– Include an unknown smooth surface plus a ra
ndom measurement error.– Moving averages over white noise processes
![Page 24: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/24.jpg)
Morris and Mitchell model• Prior information about y(x) is specified by a Gaussian pr
ocessor Y(.)• Prior information about the partial derivatives y(j)(x) is obt
ained by considering the “derivative” processes of Y(.)– Y1(.)=y(.), y2(.)= y(1)(.), y1+m(.)=y(m)(.)
• Natural prior for y(j)(x):
• The covariances between Y(x1), Y(j)(x2) and Y(i)(x1), Y(j)(x2) are:
![Page 25: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/25.jpg)
Optimal Predictors for Multiple Outputs
• The best MSPE predictor based on training data is:
• Where Y0=Y1(X0), Yini=(Yi(x1
i), …), and yini i
s observed value for i=[1,m]
![Page 26: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/26.jpg)
The joint distribution is the multivariate normal distribution
![Page 27: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/27.jpg)
Conditional expectation
…..• In practice, this is useless (it requires knowledge of marg
inal correlation functions, joint correlation function and ratio of all the process variance)
• Empirical versions are of practical use:– Every time we assume each of the correlation matrices Ri and cr
oss-correlation matrices Rij are known up to a vector of parameters.
– Estimate using MLE or REML
![Page 28: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/28.jpg)
example1
• 14 point training data has feature that it allows us to learn over the entire input space: space-filling
• Compare two model– Using the predictor of y(.) based on y(.) alone– Using the predictor of y(.) base on (y(.), y(1)(.),
y(2)(.))
• Second one is both more visually fit and has 24% smaller ERMSPE
![Page 29: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/29.jpg)
![Page 30: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfad1a28abf838c9c385/html5/thumbnails/30.jpg)
Thank you!