Big Data Meets Big Models: Solution of Large-Scale Bayesian ...

Big Data Meets Big Models:Solution of Large-Scale Bayesian Inverse Problems

Omar Ghattas

joint work with:

Tan Bui-Thanh, Carsten Burstedde, James MartinNoemi Petra, Georg Stadler, Hari Sundar, Lucas Wilcox

Institute for Computational Engineering & SciencesDepartments of Geological Sciences and Mechanical Engineering

The University of Texas at Austin

NSF Cyberbridges 2013Arlington, VAJuly 15, 2013

ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 1 / 49

Outline

1 Background, motivation, and goals

2 Langevin MCMC methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Example: Full waveform global seismic inversion


Outline






The inverse problem:The quest for knowledge from data and models

Input parameters, computational model,and output observables

The forward problem:

Given input parameters, solve model toyield output observables

Well-posed: solution exists, is unique,and is stable to perturbations in inputs

Causal: later-time solutions dependonly on earlier time solutions

Local: the forward operator includesderivatives that couple nearby solutionsin space and time




The inverse problem:

Given output observations and model,infer input parameters

Ill-posed: observations are usuallysparse; many different parameter valuesmay be consistent with the data

Non-causal: the inverse operatorcouples earlier time solutions with latertime ones

Global: the inverse operator couplessolution values across all of space andtime




Uncertainty is a fundamental feature of ill-posed inverse problems:

Deterministic approach to ill-posedness:employ regularization to penalizeunwanted solution features, guaranteeunique solution

Bayesian approach to ill-posedness:describe probability of all models thatare consistent with the data, themodel, and any prior knowledge of theparameters


Global seismology inverse problem: Observational data

120 110 100 90

30

40

50

-108 -106 -104 -102 -100 -9838

40

42

44

46

48

-107 -105 -103 -101 -99 -9738

39

40

41

42

43

44

45

46

47

48

30 sec

Black Hills

Figure : Left. USArray network of 400 broadband seismic stations with 70 km spacingover 1000 km aperture. Past/present stations in green, future stations in blue. Right.Shear waves from deep South American earthquake plotted on top of map of arrivaltime. Early arrivals in blue, late arrivals in red. (Courtesy D. Helmberger, Caltech)


Global seismology inverse problem: Parameter fieldS40RTS 1229

Figure 6. Maps of shear-velocity heterogeneity at, from top to bottom, 100 km, 600 km, 1500 km and 2800 km depth for, from left to right, a total number ofunknowns equal to N = 5000, N = 8000 and N = 11 000 (eq. 5). Model S40RTS has 8000 unknowns.

Typically, N is estimated from misfit curves and by cross-validation (Hastie & Tibshirani 1990). Fig. 8 illustrates how modelmisfit varies as a function of N . Shown is the misfit of fundamental-mode and overtone Rayleigh waves at a period of 62 s (Fig. 8a),the traveltimes of S, SS, ScS and SKS (Fig. 8b), and the combinedsplitting functions (Fig. 8c). As expected, misfit decreases when Nincreases. The misfit is lowest for the fundamental mode Rayleighwave which propagates through the strongly heterogeneous crustand uppermost mantle with well-resolved long-wavelength velocityheterogeneity. Misfit curves for phase delays for Rayleigh waves atother periods and the traveltimes of other body-wave phases havesimilar behaviour. For each data type, the decline in misfit is rela-tively small when N is larger than 8000.

Selecting N is, to large extent, subjective since we do not fullyunderstand the measurement errors, the systematic errors originat-ing from unmodelled crustal effects and the effects of theoreticalsimplifications. Ideally, model uncertainties are evaluated by theanalysis of 3-D synthetics (Komatitsch et al. 2002; Qin et al. 2009;Bozdag & Trampert 2010). On the basis of the misfit curves ofFig. 8 and inspection of maps and cross sections we adopt S40RTSas the model with N = 8000 effective unknowns but we emphasize

that the misfit varies little for models with N between 5000 and11 000.

4.1 Model images

Since S20RTS and S40RTS are based on the same data types andmodelling approaches, it is not surprising that they correlate ex-tremely well. Many of the model characteristics of S20RTS thatwe have discussed previously are still present in S40RTS. Althoughdifferences between S20RTS and S40RTS are subtle, they may haveimportant implications for model interpretations.

Fig. 9 shows images of the upper mantle beneath the At-lantic and surrounding regions. Low velocity anomalies be-neath the Mid-Atlantic Ridge, the Red Sea and East AfricanRift are narrower in S40RTS since lateral resolution is higher.The low-velocity anomaly beneath Iceland extends much deeperthan elsewhere along the Mid-Atlantic Ridge (Montagner &Ritsema 2001) but it does not extend below the 660-km discontinu-ity. In S40RTS, this anomaly is significantly stronger (>3 per cent)than in S20RTS and may inhibit a pure thermal explanation.

C 2010 The Authors, GJI, 184, 12231236Geophysical Journal International C 2010 RAS

Maps of shear velocity heterogeneity at different depths and resolutions(source: J. Ritsema, et al., Geophysical Journal International, 2011)


Bayesian inference framework for inverse problem

Given:pr(m) := prior p.d.f. of model parameters m

obs(d) := prior p.d.f. of the observables d

model(d|m) := conditional p.d.f. relating d and mThen posterior p.d.f. of model parameters is given by:

post(m)def= post(m|dobs)

pr(m)D

obs(d)model(d|m)(d)

dd

pr(m)(dobs|m)

From A. Tarantola, Inverse Problem Theory, SIAM, 2005


Markov chain Monte Carlo method

Explore the Bayesian posterior probability density post(m)

m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances

post(m) exp(12 f(m) dobs 21noise

12 mmpr 21pr

)

Example Probability Density

Given a probability density (m):

How do we explore the distribution?

Often high dimensional

Computationally expensive

The MCMC Approach

Replace (m) by a sample chain {mk}Compute using ergodic averages

E[f(M)] =Rn

f(m)(dm) 1

N

Nj=1

f(mk)


Markov chain Monte Carlo method

Explore the Bayesian posterior probability density post(m)

m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances

post(m) exp(12 f(m) dobs 21noise

12 mmpr 21pr

)

Sampled Probability Density

Given a probability density (m):

How do we explore the distribution?

Often high dimensional

Computationally expensive

The MCMC Approach

Replace (m) by a sample chain {mk}Compute using ergodic averages

E[f(M)] =Rn

f(m)(dm) 1

N

Nj=1

f(mk)


High dimensional space: the final frontier

The curse of dimensionality: Consider a hypersphere inscribed in ahypercube; what is probability that a random sample will lie inhypersphere as dimension increases?

dimension hypersphere/hypercube1 1.002 0.7853 0.5364 0.3085 0.164

10 0.00249100 1.87 1070158 5.76 10126

1000 2.87 10118710, 000 6.65 1016,851

1, 000, 000 8.53 102,684,797


Metropolis-Hastings algorithm to sample (m)

1 mk m02 k 03 Choose a point y from the proposal density q(mk, )4 min

(1,

(y)q(y,mk)

(mk)q(mk,y)

)5 If > rand([0, 1]) Then

Accept: mk+1 = y

Otherwise

Reject: mk+1 = mk

End If

6 k k + 17 Repeat from step 3


Challenges in large-scale Bayesian inversion

Method of choice is to sample the posterior density using Markov chainMonte Carlo (MCMC); growth in the 1980s transformed Bayesian inference

For inverse problems characterized by high-dimensional parameter spacesand expensive forward simulations, conventional MCMC is prohibitive

Conventional MCMC methods view the parameter-to-observable map as ablack-box

Goals: overcome bottlenecks of MCMC:

avoid black-box MCMC (might be embarrassingly parallel, butalgorithmic scaling is embarrassingly poor!)develop specialized MCMC algorithms that reduce effective problemdimension by exploiting infinite-dimensional structure of the Hessianstructure-exploiting algorithms must map well onto extreme-scalesystems, and scale independently of parameter dimension, statedimension, data dimension, and number of cores


References for -D Bayesian inversionJ. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A Stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion,SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler, A computational framework forinfinite-dimensional Bayesian inverse problems. Part I: The linearized case, withapplications to global seismic inversion. SIAM Journal on Scientific Computing,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems withBesov priors, Inverse Problems, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part III: Inverse medium scattering of electromagnetic wave in three dimensions, InverseProblems, submitted, 2012.T. Bui-Thanh and O. Ghattas, An analysis of infinite dimensional Bayesian inverse shapeacoustic scattering and its numerical approximation, SIAM Journal on Numerical Analysis,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chainMonte Carlo simulations, SIAM Journal on Uncertainty Quantification, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part II: Inverse medium scattering of acoustic waves. Inverse Problems, 28(5):055002,2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part I: Inverse shape scattering of acoustic waves. Inverse Problems, 28(5):055001, 2012.


Outline






Langevin MCMC (Grenander & Miller, 1994)

Given the target density (m), the associated Langevin SDE is given by:

dmt = Am( log )dt+

2A1/2dW t

Discretize with timestep t to derive proposal for Metropolis-Hastings:

mpropk+1 = mk Am( log )t+

2tA1/2N(0, I)

Notes:

Preconditioner A must be symmetric positive definite

Process is ergodic (convergence of time averages)

W t is i.i.d. vector of standard Brownian motions

W t has independent increments given by

W (t+t) W t N(0,t I

)See work by A. Stuart, Y. Efendiev, ...


Stochastic Newtons method

Langevin Metropolis Hastings MCMC proposal given by:

mpropk+1 = mk Am( log )t+

2tA1/2N(0, I)

Take A to be the inverse of the (local) Hessian and set t = 1:

A = H(m)1 2m( log (m))1

=(F T1noiseF +

1pr

)1(local covariance matrix)

Then we have the stochastic equivalent of Newtons method:

mpropk+1 = mk H1m( log ) + N(0,H1)

Often leads to several orders of magnitude reduction in number of samples.

Details in: J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion, SIAMJournal on Scientific Computing, 34(3):A1460-A1487, 2012.

T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chain MonteCarlo simulations, submitted.


Rosenbrock illustration: Gaussian random walk

0.5 0 0.5 10.5

0

0.5

1

x

y

mpropk+1 = mk + N(0, I)


Rosenbrock illustration: Unpreconditioned Langevin

0.5 0 0.5 10.5

0

0.5

1

x

y

mpropk+1 = mk tm( log ) +

2t N(0, I)


Rosenbrock illustration: Hessian-preconditioned Langevin

0.5 0 0.5 10.5

0

0.5

1

x

y

mpropk+1 = mk H1m( log ) + N(0,H1)


Convergence comparison: different MCMC methods

Multivariate potential scale reduction factor (MPSRF) convergencestatistic for 65-parameter problem

unpreconditioned Langevin vs. stochastic Newton vs. Adaptive MetropolisICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 23 / 49

Outline






Large-scale local covariance estimates

Stochastic Newton requires a (local) Gaussian approximation whose covariance isgiven by the inverse of the Hessian, which is formally a dense operator. Key idea:never form H (every column would requires a forward solve); instead:

recognize that H is sum of data misfit term, which is often equivalent to acompact operator, and (the inverse of) a prior, which is often equivalent toa differential operator:

H = F T1noiseF + 1pr

invoke low rank (truncated spectral decomposition) approximation of datamisfit operator using randomized SVD; often requires constant number offorward/adjoint solves, independent of problem size

combine with Sherman-Morrison-Woodbury to invert/factor

Details in: H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, andO. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverseproblems based on low-rank partial Hessian approximations, SIAM Journal on ScientificComputing, 33(1):407432, 2011.


Low rank approximation of data misfit Hessian

post = H1

=(F T1noiseF +

1pr

)1=

1/2pr

(1/2pr F

T1noiseF 1/2pr + I

)11/2pr

1/2pr(V rrV

Tr + I

)11/2pr

= 1/2pr

[I V rDrV Tr + O

(n

i=r+1

ii + 1

)]1/2pr

where V r,r are truncated eigenvector/values of prior-preconditioneddata misfit Hessian, and Dr = diag(i/(i + 1))


Computations required for stochastic Newton

Never need to form dense Hessian:

H = 1/2pr[V rrV

Tr + I

]1/2pr

H1g = 1/2pr{V r[(r + Ir)

1 Ir]V Tr + I

}1/2pr g (Newton step)

H1/2x = 1/2pr{V r[(r + Ir)

1/2 Ir]V Tr + I

}x (drawing a sample)

det(H1/2) = (det pr)1/2

ri=1

(i + 1)1/2 (accept/reject criterion of M-H)

Complexity of these operations is scalable (i.e. requires a number of forward PDE solvesthat is independent of the parameter dimension) when:

prior-preconditioned data misfit Hessian is compact with mesh-independentdominant spectrum (theoretical results)

dominant spectrum is captured in a number of matvecs that is a constant multipleof number of dominant eigenvalues (e.g., using Lanczos or randomized SVD)

Hessian-vector products carried out matrix-free using adjoint methods

square root prior 1/2pr taken as inverse of elliptic operator; fast elliptic solver for

computing its action 1/2pr z


Outline






Elastic/acoustic wave equationsGoverning equations in velocity-strain form

E

t=

1

2

(v + vT

)in B

v

t= ( tr(E)I + 2E) + f in B

Sn = tbc(t) on B

v = v0(x) at t = 0

E = E0(x) at t = 0

E strain tensor

S stress tensor

mass density

v displacement velocity

f body force per unit mass

and Lame parameters

I identity tensor

tbc traction bc

v0,E0 initial conditions

t time

x point in the body

B solution bodyICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 31 / 49

Forward discontinuous Galerkin wave propagation solver

seven mortars

M1M2

M3M4

M5M6

M7

+PeimiPmieiPe0mi

Pmie0

() July 15, 2011 1 / 1

0

3

1

4

2

nonconforming hexahedral elements with Koprivas mortar approach forhyperbolic equations same convergence rate as conforming elementstensor product Lagrange basis on the Legendre-Gauss-Lobatto (LGL) nodesLGL quadrature (diagonal mass matrix)time integration by classical 4-stage/RK4integrated parallel mesh generation/adaptivity

L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkinmethod for wave propagation through coupled elastic-acoustic media, Journal of ComputationalPhysics, 229(24):93739396, 2010.

T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkinspectral element method for wave propagations, SIAM Journal on Numerical Analysis,50(3):1801826, 2012.


Point source approximation of M9 Tohoku earthquake

Animation by Greg Abram, TACC


japan_quake.movMedia File (video/quicktime)

Strong scalability of global seismic wave propagationExcellent strong scalability on Jaguar for meshing+wave propagation

# proc meshing wave prop par eff Tflopscores time (s) per step (s) wave32,640 6.32 12.76 1.00 25.665,280 6.78 6.30 1.01 52.2

130,560 17.76 3.12 1.02 105.5223,752

Extreme granularity limits for strong scaling of forward DGwave propagation solver on ORNL Cray XK6

#cores cpu per step (ms) elem/core efficiency (%)256 1630.80 4712 100.0512 832.46 2356 98.0

1024 411.54 1178 99.18192 61.69 148 82.6

65536 11.79 19 54.0131072 7.09 10 44.9262144 4.07 5 39.2

table shows wall clock time per time step in ms, elements per core,and parallel efficiency for 3 orders of magnitude increase in core count

just 1.21 million 3rd order DG elements (694 million unknowns)

parallel efficiency remains at 39% with just 4 or 5 elements/core

T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scaleUQ for Bayesian inverse problems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012(Gordon Bell Prize Finalist).


Gradient and Hessian for full waveform seismic inversion

Would like to compute gradients and Hessian actions w.r.t. c of

J(c) :=1

2

T0

(Bv(c) vobs)T 1noise (Bv(c) vobs)dx dt+ Rpr(c)

where the dependence of v on c is given by solving the forward wavepropagation equations:

vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},

e = 0 on (0, T ).

v, e are velocity and strain dilation

c is the uncertain local wave speed parameter

and g are known density and seismic source

vobs are observations at receivers, B(x) is an observation operator

noise is the noise covariance

Rpr is the prior term involving 1prior


The gradient computation

Gradient expression w.r. to c given by

G(c) := 2c

T0

e( w) dt+ Rpr(c)

where v, e satisfy the forward wave propagation equations

vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},

e = 0 on (0, T ).

w, d (adjoint velocity, dilation) satisfy the adjoint wave propagation equations

wt +(c2d) = B 1noise(Bv vobs) in (0, T ),

dt + w = 0 in (0, T ),w = 0, d = 0 in {t = T} ,

d = 0 on (0, T ).


Computation of action of Hessian in given direction

Action of the Hessian operator in direction c at a point c given by

H(c)c := 2

T0

ce( w)+ce( w)+ce( w) dt+ Rpr(c)(c),

where v, e satisfy the incremental forward wave propagation equations

vt (c2e) = (2cce) in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0} ,

e = 0 on (0, T ).

and w, d satisfy the incremental adjoint wave propagation equations

wt +(c2d) = (2ccd) B1noiseBv in (0, T ),

dt + w = 0 in (0, T ),

w = 0, d = 0 in {t = T} ,

d = 0 on (0, T ).


Application to synthetic global seismic inversion

invert for anomaly from radially-varying PREM model (left)observations: from laterally-varying S20RTS model (right)


The prior

2 fields prior samples

ground truth

Prior is defined by square of generalized anisotropic Poisson operatorA := + , with

= (I3 (r)rrT

)with (r) :=

1 r2

(2r r2

)if r 6= 0

0 if r = 0,ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 40 / 49

Samples from prior and Gaussianized posterior distributions

1.07 million uncertain acoustic wave speed parameters

0.07 Hz maximum frequency, 3rd order DG elements, 630 million wavepropagation unknowns, 2400 time steps (1000s inversion time window)

up to 100K cores on Jaguar XK6 (single forward solve is 1 minute on 64K cores)

2000 reduction in problem dimension (488 dominant eigenvectors)Top row: Samples from prior

Bottom row: Samples from the posterior

Right: true earth model (black dots=5 sources, white dots=100 receivers)


Comparison of true model (S20RTS, left) with MAPsolution (right)

black dots = 3 earthquake sources; white dots = 130 receivers


MCMC for posterior distribution

Solving the full UQ problem:

Repeated Hessian evaluations too expensive for this problem

Use Gaussian approximation at MAP as a proposal for MCMC

Accept/Reject framework corrects for errors in approximation

Sampling performance for a coarser problem (with 78k parameters):

15,587 MCMC samples (each requires 1 forward PDE solve)

4399 samples accepted (28%)

Integrated autocorrelation time of about 1620 effective samplesize of about 800

Total runtime of about 96 hours on 2048 cores


Samples and point marginals

sampleno. 1 fromposterior

distribution

pointwiseprior

variance

pointwiseposteriorvariance




distribution

pointwiseprior

variance



Spectral decay for refined parameter meshesSpectrum of prior-preconditioned misfit Hessian for global seismic inversion problem

0 100 200 300 400 500 600 70010

1

100

101

102

103

104

105

106

107

number

eig

en

va

lue

40,842 parameters

67,770 parameters

431,749 parameters

largest 700 eigenvalues of prior preconditioned data misfit Hessian fordifferent discretizations


Summary and conclusions

Stochastic Newton MCMC sampling algorithm reduces number of samples versusconventional MCMC by several orders of magnitude, and makes UQ for Bayesianinverse problems tractable

Compactness of local data-misfit Hessian operator provides several orders ofmagnitude effective dimension reduction without introducing bias

Randomized SVD extracts low rank approximation of data-misfit Hessian indimension-independent number of matvecs

Matrix-free Hessian matvecs implemented through consistent first and secondorder adjoints

Adaptive discontinuous Galerkin forward/adjoint wave propagation solver scales to262K cores with small number of elements per core

Scalability of elliptic solve for action of prior operator assured by hybridGMG-AMG on forest of octrees scalability to 262K cores

Stochastic Newton MCMC applied to synthetic inverse problem in 3D globalseismology with 1M earth model parameters and 630M forward unknowns, on upto 100K cores, leading to 3 orders of magnitude dimension reduction


References

A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A-Optimal design for infinite-dimensional Bayesian linear inverseproblems with regularized `0-sparsification, 2013.N. Petra, J. Martin, G. Stadler, and O. Ghattas, A computational framework for infinite-dimensional Bayesian inverseproblems. Part II: Stochastic Newton MCMC with application to ice sheet flow inverse problems, 2013.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems with Besov priors, submitted.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov Chain Monte Carlo simulations,submitted.H. Sundar, G. Biros, C. Burstedde, J. Rudi, G. Stadler, Parallel geometric-algebraic multigrid on unstructured forests ofoctrees, submitted, Proceedings of IEEE/ACM SC12, 2012,T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scale UQ for Bayesian inverseproblems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012. (2012 Gordon Bell Prize Finalist)T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part II: Inverse mediumscattering of acoustic waves. Inverse Problems, 28(5):055002, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part I: Inverse shape scatteringof acoustic waves. Inverse Problems, 28(5):055001, 2012.J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, Uncertainty quantification in inverse problems with stochasticNewton MCMC, SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkin spectral element method forwave propagation, SIAM Journal on Numerical Analysis, 50(3):180-1826, 2012.T. Isaac, C. Burstedde, and O. Ghattas, Low-Cost Parallel Algorithms for 2:1 Octree Balance, Proceedings of IPDPS 12.H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesianuncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAMJournal on Scientific Computing, 33(1):407432, 2011.C. Burstedde, L.C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forestsof octrees, SIAM Journal on Scientific Computing, 33(3):11031133, 2011.T. Bui-Thanh, O. Ghattas, and D. Higdon, Adaptive Hessian-based non-stationary Gaussian process response surfacemethod for probability density approximation with application to Bayesian solution of large-scale inverse problems, SIAMJournal on Scientific Computing, 2011, submitted.L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkin method for wavepropagation through coupled elastic-acoustic media, Journal of Computational Physics, 229(24):93739396, 2010.C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, L.C. Wilcox, Extreme-Scale AMR,Proceedings of ACM/IEEE SC10, 2010.


Acknowledgements

Research program supported by:

NSF CMMI-1028889 (CDI), ARC-0941678 (CDI)

AFOSR grant FA9550-12-1-0484 (Computational Math)

DOE grants DE-SC0009286 (MMICCs), DE-SC0006656 (SciDAC),DE-FG02-08ER25860 (ASCR), DE-SC0002710 (SciDAC)

Resources on ORNL Jaguar Cray XT-5/XK-6 supercomputer providedthrough ALCC award at ORNL Leadership Computing Facility

Resources on TACC Lonestar, Longhorn, and Stampede systemsprovided through awards from TACC and XSEDE


Discussion questions

1 How can big data and big models be integrated to produce betterpredictive models?

2 What are promising new ideas for exploring high-dimensional space?

3 What are promising new ideas for quantifying uncertainties inmodeling and simulation?

4 How can we adapt/reinvent the important algorithms of CS&E sothey better map onto high-throughput accelerators? Onto systemswith massive numbers of cores?

5 How can we transform our universities and federal agencies to becomemore hospitable to cross-cutting research/education at the interfacesof science/engineering, mathematics, statistics, and computing?


Background, motivation, and goalsLangevin MCMC methods and stochastic NewtonLow rank Hessian approximation and scalabilityExample: Full waveform global seismic inversion

Big Data Meets Big Models: Solution of Large-Scale Bayesian ...

Documents

Transcript of Big Data Meets Big Models: Solution of Large-Scale Bayesian ...