Multiscale Gaussian Process regression for massive remote ...

Multiscale Gaussian Process regression for massive remote sensing data Jouni Susiluoto 1,2,3 , Alessio Spantini 1 , Heikki Haario 2,3 , Youssef Marzouk 1 [email protected] June 2, 2019 1 Massachusetts Institute of Technology 2 Lappeenranta University of Technology 3 Finnish Meteorological Institute 1

Transcript of Multiscale Gaussian Process regression for massive remote ...

Multiscale Gaussian Process regression for massive remote sensing datasensing data
[email protected]
1 Massachusetts Institute of Technology 2 Lappeenranta University of Technology 3 Finnish Meteorological Institute
1
Introduction
• Satellite data spatially irregular (clouds etc.)
• Not measuring everywhere all the time
We apply the methods here to OCO-2 v9 XCO2 data but could be another quantity of
interest just as well. We ask:
1. How much XCO2 do we have where there are no measurements?
2. What is the uncertainty?
3. How do realizations of the random field look like?
4. How can we deal with the enormous amounts of data?
5. How do we deal with the different spatial scales in data?
4
Outline
Introduction
Gaussian process definitions
Consider random function (field) Ψ „ GPpmpx ;βq, kpx , x 1; θqq.
• mpx ;β, δq “ f px t, δpxsqqTβpxsq is the mean function at x
• Components fi of f are some functions
• β and δ are some coefficients xs and xt are the spatial and temporal parts of x
• Notice separation of space and time dimensions
• k is the covariance function with parameters θ
• Multi-scale covariance function can have components
kpx , x 1; θq “ δx ,x 1σ2x ` ÿ
ki px , x 1; θi q (1)
where ki can be exponential, Matern, periodic, or wind-informed.
• Called Gaussian process (field) if joint distributions of data at some vector xobs
are multivariate Gaussian.
Covariance function determines the smoothness of the realizations
τ2 is max covariance, ` length scale parameter, γ exponent, ν smoothness parameter.
Draws from exponential fields
Draws from Matern fields
γ = 2, τ = 1, l = 2 γ = 2, τ = 1.5, l = 0.25 γ = 1, τ = 1, l = 1
ν = 0.5, l = 1 ν = 1.5, l = 1 ν = 1.5, l = 0.5
7
Outline
Introduction
Data vs mean function
Finding covariance function parameters θ
Validating the multi-scale covariance kernel approach
Results II
Conditional of mean function parameters β has a closed form
Given covariance function parameters θ and observations Ψobs “ ψobs, the coefficients
β have a closed form:
ppβ|θ, ψobsq “ exp
(2)
with Ki ,jpx , xq “ kpxobsi , xobsj q and Fi ,j “ fi pxjq. This corresponds to
β|ψobs, θ „ N ´
(3)
as a point estimate
dependence via MRF solved with
approximate elimination algorithm on a
lattice graph.
vary by location and fit data reasonably
well so that residuals are roughly
zero-mean.
ν0,0
∂ν1
(nlon + 1)th
(nlon + 2)th
(N − 1)th
N th
)
Azores (37.7° N, 25.7° E) Lagos (6.5° N, 3.3° W) Mauna Loa (19.5° N, 155.6° W) Perth (32.0° S, 115.9° E) St. Petersburg (60.0° N, 30.3° E) Ulan Bator (46.9° N, 106.9° E) Washington DC (38.9° N, 77.3° W)
10
mpx ;β, δq “ β0 sin `
2πx t´1 year ` δxs


The log marginal likelihood of is given by
log ppψobs|β, θq “ ´ 1
2 pψobs ´ FβqTK´1pψobs ´ Fβq ´
1
2 logp2πq (5)
The marginal MLE of θ is found by minimizing negative log marginal likelihood (NLL):
θMLE “ arg min θ
pψobs ´ F βqTK´1pψobs ´ F βq ` log |K | ` n logp2πq )
. (6)
L Bθpxqpθq, (7)
where L Bθpxq is the NLL of the observations in an x-centered ball without the last
term in Eq. 6, and A is a random sample of space-time points.
12
Multi-scale kernel parameters can be recovered with MCMC in synthetic studies
Optimization fails in local minima Mean of MCMC estimate works
1
2
3
4
1
2
3
4
5
6
1.0
1.5
2.0
0.2
0.4
0.6
0.8
1
2
3
4
5
1
2
3
4
5
6
0.5
1.0
1.5
2.0
2.5
We have
Ψobs „ N ´
, (8)
so for any test input x, the field (e.g. XCO2) can be modeled as a random variable
Ψ with ˜
ff¸
(9)
where Ψ has been divided into two parts, so that
Ψ „ N ´
f pxqT β ` K px, xobsqK pxobs, xobsq´1pψobs ´ F βq,Σ ¯
. (10)
where Σ is the Schur complement
Σ “ K px, xq ´ K px, xobsqK pxobs, xobsq´1K pxobs, xq. (11)
14
Outline
Introduction
Conclusion
15
OCO-2 covariance parameters with two kernels
Table 1: Covariance function parameter values learned from OCO-2 data. First column shows
the Matern kernel parameters, and the second column the exponential kernel parameters. The
length scale along the parallels, ` p¨q
lon is much larger than that along the meridians, ` p¨q
lat.
τ p¨q 0.899 2.72
` p¨q
16
Example: mean and uncertainty of Gaussian process posterior
We computed global fields (-80 S . . . 80 N) daily at 0.5 resolution for 1526 days
(350 million marginals conditioned on 116 million observations)
17
(a) XCO2 (ppm), multiscale kernel (b) Uncertainty (std), multiscale kernel
(c) XCO2 (ppm), larger-scale kernel only (d) Uncertainty (std), larger-scale kernel only
(e) Difference in XCO2 (ppm) (f) Difference in uncertainty (std)
392
393
394
395
396
397
398
0.2
0.4
0.6
0.8
1.0
1.2
1.4
392
393
394
395
396
397
398
0.05
0.10
0.15
0.20
0.25
0.30
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
0.2
0.4
0.6
0.8
1.0
1.2
Wind data can be used to inform covariance directions locally
A single kernel component was used
and parameters were learned from
OCO-2 data.
2014-09-15
• We solve a general spatial statistics problem using Gaussian processes
• Produces means, uncertainties, and draws from the process
• We described a way to learn a space-dependent mean function
• Multi-scale kernel resolves many spatial scales
• We are able to learn covariance function parameters from data
• Various computational tricks were employed to make computation possible
• We computed GP posteriors using all OCO-2 v9 data (flagged good)
21
For GP basic references, see e.g.
• Rasmussen et al. Gaussian Processes for Machine Learning, MIT Press 2006,
http://www.gaussianprocess.org/gpml/chapters/
• Santner et al. Design and Analysis of Computer Experiments, Springer Verlag
2003
• Gelman et al. Bayesian Data Analysis, 3rd ed., CRC Press 2013
23
Introduction
Finding covariance function parameters
Results II
OCO-2 parameters
Conclusion
Appendix