Big Data Meets Big Models: Solution of Large-Scale Bayesian ...

55
Big Data Meets Big Models: Solution of Large-Scale Bayesian Inverse Problems Omar Ghattas joint work with: Tan Bui-Thanh, Carsten Burstedde, James Martin Noemi Petra, Georg Stadler, Hari Sundar, Lucas Wilcox Institute for Computational Engineering & Sciences Departments of Geological Sciences and Mechanical Engineering The University of Texas at Austin NSF Cyberbridges 2013 Arlington, VA July 15, 2013 ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 1 / 49

Transcript of Big Data Meets Big Models: Solution of Large-Scale Bayesian ...

  • Big Data Meets Big Models:Solution of Large-Scale Bayesian Inverse Problems

    Omar Ghattas

    joint work with:

    Tan Bui-Thanh, Carsten Burstedde, James MartinNoemi Petra, Georg Stadler, Hari Sundar, Lucas Wilcox

    Institute for Computational Engineering & SciencesDepartments of Geological Sciences and Mechanical Engineering

    The University of Texas at Austin

    NSF Cyberbridges 2013Arlington, VAJuly 15, 2013

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 1 / 49

  • Outline

    1 Background, motivation, and goals

    2 Langevin MCMC methods and stochastic Newton

    3 Low rank Hessian approximation and scalability

    4 Example: Full waveform global seismic inversion

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 2 / 49

  • Outline

    1 Background, motivation, and goals

    2 Langevin MCMC methods and stochastic Newton

    3 Low rank Hessian approximation and scalability

    4 Example: Full waveform global seismic inversion

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 3 / 49

  • The inverse problem:The quest for knowledge from data and models

    Input parameters, computational model,and output observables

    The forward problem:

    Given input parameters, solve model toyield output observables

    Well-posed: solution exists, is unique,and is stable to perturbations in inputs

    Causal: later-time solutions dependonly on earlier time solutions

    Local: the forward operator includesderivatives that couple nearby solutionsin space and time

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49

  • The inverse problem:The quest for knowledge from data and models

    Input parameters, computational model,and output observables

    The inverse problem:

    Given output observations and model,infer input parameters

    Ill-posed: observations are usuallysparse; many different parameter valuesmay be consistent with the data

    Non-causal: the inverse operatorcouples earlier time solutions with latertime ones

    Global: the inverse operator couplessolution values across all of space andtime

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49

  • The inverse problem:The quest for knowledge from data and models

    Input parameters, computational model,and output observables

    Uncertainty is a fundamental feature of ill-posed inverse problems:

    Deterministic approach to ill-posedness:employ regularization to penalizeunwanted solution features, guaranteeunique solution

    Bayesian approach to ill-posedness:describe probability of all models thatare consistent with the data, themodel, and any prior knowledge of theparameters

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49

  • Global seismology inverse problem: Observational data

    120 110 100 90

    30

    40

    50

    -108 -106 -104 -102 -100 -9838

    40

    42

    44

    46

    48

    -107 -105 -103 -101 -99 -9738

    39

    40

    41

    42

    43

    44

    45

    46

    47

    48

    30 sec

    Black Hills

    Figure : Left. USArray network of 400 broadband seismic stations with 70 km spacingover 1000 km aperture. Past/present stations in green, future stations in blue. Right.Shear waves from deep South American earthquake plotted on top of map of arrivaltime. Early arrivals in blue, late arrivals in red. (Courtesy D. Helmberger, Caltech)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 5 / 49

  • Global seismology inverse problem: Parameter fieldS40RTS 1229

    Figure 6. Maps of shear-velocity heterogeneity at, from top to bottom, 100 km, 600 km, 1500 km and 2800 km depth for, from left to right, a total number ofunknowns equal to N = 5000, N = 8000 and N = 11 000 (eq. 5). Model S40RTS has 8000 unknowns.

    Typically, N is estimated from misfit curves and by cross-validation (Hastie & Tibshirani 1990). Fig. 8 illustrates how modelmisfit varies as a function of N . Shown is the misfit of fundamental-mode and overtone Rayleigh waves at a period of 62 s (Fig. 8a),the traveltimes of S, SS, ScS and SKS (Fig. 8b), and the combinedsplitting functions (Fig. 8c). As expected, misfit decreases when Nincreases. The misfit is lowest for the fundamental mode Rayleighwave which propagates through the strongly heterogeneous crustand uppermost mantle with well-resolved long-wavelength velocityheterogeneity. Misfit curves for phase delays for Rayleigh waves atother periods and the traveltimes of other body-wave phases havesimilar behaviour. For each data type, the decline in misfit is rela-tively small when N is larger than 8000.

    Selecting N is, to large extent, subjective since we do not fullyunderstand the measurement errors, the systematic errors originat-ing from unmodelled crustal effects and the effects of theoreticalsimplifications. Ideally, model uncertainties are evaluated by theanalysis of 3-D synthetics (Komatitsch et al. 2002; Qin et al. 2009;Bozdag & Trampert 2010). On the basis of the misfit curves ofFig. 8 and inspection of maps and cross sections we adopt S40RTSas the model with N = 8000 effective unknowns but we emphasize

    that the misfit varies little for models with N between 5000 and11 000.

    4.1 Model images

    Since S20RTS and S40RTS are based on the same data types andmodelling approaches, it is not surprising that they correlate ex-tremely well. Many of the model characteristics of S20RTS thatwe have discussed previously are still present in S40RTS. Althoughdifferences between S20RTS and S40RTS are subtle, they may haveimportant implications for model interpretations.

    Fig. 9 shows images of the upper mantle beneath the At-lantic and surrounding regions. Low velocity anomalies be-neath the Mid-Atlantic Ridge, the Red Sea and East AfricanRift are narrower in S40RTS since lateral resolution is higher.The low-velocity anomaly beneath Iceland extends much deeperthan elsewhere along the Mid-Atlantic Ridge (Montagner &Ritsema 2001) but it does not extend below the 660-km discontinu-ity. In S40RTS, this anomaly is significantly stronger (>3 per cent)than in S20RTS and may inhibit a pure thermal explanation.

    C 2010 The Authors, GJI, 184, 12231236Geophysical Journal International C 2010 RAS

    Maps of shear velocity heterogeneity at different depths and resolutions(source: J. Ritsema, et al., Geophysical Journal International, 2011)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 6 / 49

  • Bayesian inference framework for inverse problem

    Given:pr(m) := prior p.d.f. of model parameters m

    obs(d) := prior p.d.f. of the observables d

    model(d|m) := conditional p.d.f. relating d and mThen posterior p.d.f. of model parameters is given by:

    post(m)def= post(m|dobs)

    pr(m)D

    obs(d)model(d|m)(d)

    dd

    pr(m)(dobs|m)

    From A. Tarantola, Inverse Problem Theory, SIAM, 2005

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 7 / 49

  • Markov chain Monte Carlo method

    Explore the Bayesian posterior probability density post(m)

    m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances

    post(m) exp(12 f(m) dobs 21noise

    12 mmpr 21pr

    )

    Example Probability Density

    Given a probability density (m):

    How do we explore the distribution?

    Often high dimensional

    Computationally expensive

    The MCMC Approach

    Replace (m) by a sample chain {mk}Compute using ergodic averages

    E[f(M)] =Rn

    f(m)(dm) 1

    N

    Nj=1

    f(mk)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 10 / 49

  • Markov chain Monte Carlo method

    Explore the Bayesian posterior probability density post(m)

    m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances

    post(m) exp(12 f(m) dobs 21noise

    12 mmpr 21pr

    )

    Sampled Probability Density

    Given a probability density (m):

    How do we explore the distribution?

    Often high dimensional

    Computationally expensive

    The MCMC Approach

    Replace (m) by a sample chain {mk}Compute using ergodic averages

    E[f(M)] =Rn

    f(m)(dm) 1

    N

    Nj=1

    f(mk)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 10 / 49

  • High dimensional space: the final frontier

    The curse of dimensionality: Consider a hypersphere inscribed in ahypercube; what is probability that a random sample will lie inhypersphere as dimension increases?

    dimension hypersphere/hypercube1 1.002 0.7853 0.5364 0.3085 0.164

    10 0.00249100 1.87 1070158 5.76 10126

    1000 2.87 10118710, 000 6.65 1016,851

    1, 000, 000 8.53 102,684,797

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 11 / 49

  • Metropolis-Hastings algorithm to sample (m)

    1 mk m02 k 03 Choose a point y from the proposal density q(mk, )4 min

    (1,

    (y)q(y,mk)

    (mk)q(mk,y)

    )5 If > rand([0, 1]) Then

    Accept: mk+1 = y

    Otherwise

    Reject: mk+1 = mk

    End If

    6 k k + 17 Repeat from step 3

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 12 / 49

  • Challenges in large-scale Bayesian inversion

    Method of choice is to sample the posterior density using Markov chainMonte Carlo (MCMC); growth in the 1980s transformed Bayesian inference

    For inverse problems characterized by high-dimensional parameter spacesand expensive forward simulations, conventional MCMC is prohibitive

    Conventional MCMC methods view the parameter-to-observable map as ablack-box

    Goals: overcome bottlenecks of MCMC:

    avoid black-box MCMC (might be embarrassingly parallel, butalgorithmic scaling is embarrassingly poor!)develop specialized MCMC algorithms that reduce effective problemdimension by exploiting infinite-dimensional structure of the Hessianstructure-exploiting algorithms must map well onto extreme-scalesystems, and scale independently of parameter dimension, statedimension, data dimension, and number of cores

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 13 / 49

  • References for -D Bayesian inversionJ. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A Stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion,SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler, A computational framework forinfinite-dimensional Bayesian inverse problems. Part I: The linearized case, withapplications to global seismic inversion. SIAM Journal on Scientific Computing,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems withBesov priors, Inverse Problems, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part III: Inverse medium scattering of electromagnetic wave in three dimensions, InverseProblems, submitted, 2012.T. Bui-Thanh and O. Ghattas, An analysis of infinite dimensional Bayesian inverse shapeacoustic scattering and its numerical approximation, SIAM Journal on Numerical Analysis,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chainMonte Carlo simulations, SIAM Journal on Uncertainty Quantification, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part II: Inverse medium scattering of acoustic waves. Inverse Problems, 28(5):055002,2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part I: Inverse shape scattering of acoustic waves. Inverse Problems, 28(5):055001, 2012.

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 15 / 49

  • Outline

    1 Background, motivation, and goals

    2 Langevin MCMC methods and stochastic Newton

    3 Low rank Hessian approximation and scalability

    4 Example: Full waveform global seismic inversion

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 16 / 49

  • Langevin MCMC (Grenander & Miller, 1994)

    Given the target density (m), the associated Langevin SDE is given by:

    dmt = Am( log )dt+

    2A1/2dW t

    Discretize with timestep t to derive proposal for Metropolis-Hastings:

    mpropk+1 = mk Am( log )t+

    2tA1/2N(0, I)

    Notes:

    Preconditioner A must be symmetric positive definite

    Process is ergodic (convergence of time averages)

    W t is i.i.d. vector of standard Brownian motions

    W t has independent increments given by

    W (t+t) W t N(0,t I

    )See work by A. Stuart, Y. Efendiev, ...

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 17 / 49

  • Stochastic Newtons method

    Langevin Metropolis Hastings MCMC proposal given by:

    mpropk+1 = mk Am( log )t+

    2tA1/2N(0, I)

    Take A to be the inverse of the (local) Hessian and set t = 1:

    A = H(m)1 2m( log (m))1

    =(F T1noiseF +

    1pr

    )1(local covariance matrix)

    Then we have the stochastic equivalent of Newtons method:

    mpropk+1 = mk H1m( log ) + N(0,H1)

    Often leads to several orders of magnitude reduction in number of samples.

    Details in: J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion, SIAMJournal on Scientific Computing, 34(3):A1460-A1487, 2012.

    T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chain MonteCarlo simulations, submitted.

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 18 / 49

  • Rosenbrock illustration: Gaussian random walk

    0.5 0 0.5 10.5

    0

    0.5

    1

    x

    y

    mpropk+1 = mk + N(0, I)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 19 / 49

  • Rosenbrock illustration: Unpreconditioned Langevin

    0.5 0 0.5 10.5

    0

    0.5

    1

    x

    y

    mpropk+1 = mk tm( log ) +

    2t N(0, I)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 20 / 49

  • Rosenbrock illustration: Hessian-preconditioned Langevin

    0.5 0 0.5 10.5

    0

    0.5

    1

    x

    y

    mpropk+1 = mk H1m( log ) + N(0,H1)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 21 / 49

  • Convergence comparison: different MCMC methods

    Multivariate potential scale reduction factor (MPSRF) convergencestatistic for 65-parameter problem

    unpreconditioned Langevin vs. stochastic Newton vs. Adaptive MetropolisICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 23 / 49

  • Outline

    1 Background, motivation, and goals

    2 Langevin MCMC methods and stochastic Newton

    3 Low rank Hessian approximation and scalability

    4 Example: Full waveform global seismic inversion

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 25 / 49

  • Large-scale local covariance estimates

    Stochastic Newton requires a (local) Gaussian approximation whose covariance isgiven by the inverse of the Hessian, which is formally a dense operator. Key idea:never form H (every column would requires a forward solve); instead:

    recognize that H is sum of data misfit term, which is often equivalent to acompact operator, and (the inverse of) a prior, which is often equivalent toa differential operator:

    H = F T1noiseF + 1pr

    invoke low rank (truncated spectral decomposition) approximation of datamisfit operator using randomized SVD; often requires constant number offorward/adjoint solves, independent of problem size

    combine with Sherman-Morrison-Woodbury to invert/factor

    Details in: H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, andO. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverseproblems based on low-rank partial Hessian approximations, SIAM Journal on ScientificComputing, 33(1):407432, 2011.

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 26 / 49

  • Low rank approximation of data misfit Hessian

    post = H1

    =(F T1noiseF +

    1pr

    )1=

    1/2pr

    (1/2pr F

    T1noiseF 1/2pr + I

    )11/2pr

    1/2pr(V rrV

    Tr + I

    )11/2pr

    = 1/2pr

    [I V rDrV Tr + O

    (n

    i=r+1

    ii + 1

    )]1/2pr

    where V r,r are truncated eigenvector/values of prior-preconditioneddata misfit Hessian, and Dr = diag(i/(i + 1))

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 27 / 49

  • Computations required for stochastic Newton

    Never need to form dense Hessian:

    H = 1/2pr[V rrV

    Tr + I

    ]1/2pr

    H1g = 1/2pr{V r[(r + Ir)

    1 Ir]V Tr + I

    }1/2pr g (Newton step)

    H1/2x = 1/2pr{V r[(r + Ir)

    1/2 Ir]V Tr + I

    }x (drawing a sample)

    det(H1/2) = (det pr)1/2

    ri=1

    (i + 1)1/2 (accept/reject criterion of M-H)

    Complexity of these operations is scalable (i.e. requires a number of forward PDE solvesthat is independent of the parameter dimension) when:

    prior-preconditioned data misfit Hessian is compact with mesh-independentdominant spectrum (theoretical results)

    dominant spectrum is captured in a number of matvecs that is a constant multipleof number of dominant eigenvalues (e.g., using Lanczos or randomized SVD)

    Hessian-vector products carried out matrix-free using adjoint methods

    square root prior 1/2pr taken as inverse of elliptic operator; fast elliptic solver for

    computing its action 1/2pr z

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 29 / 49

  • Outline

    1 Background, motivation, and goals

    2 Langevin MCMC methods and stochastic Newton

    3 Low rank Hessian approximation and scalability

    4 Example: Full waveform global seismic inversion

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 30 / 49

  • Elastic/acoustic wave equationsGoverning equations in velocity-strain form

    E

    t=

    1

    2

    (v + vT

    )in B

    v

    t= ( tr(E)I + 2E) + f in B

    Sn = tbc(t) on B

    v = v0(x) at t = 0

    E = E0(x) at t = 0

    E strain tensor

    S stress tensor

    mass density

    v displacement velocity

    f body force per unit mass

    and Lame parameters

    I identity tensor

    tbc traction bc

    v0,E0 initial conditions

    t time

    x point in the body

    B solution bodyICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 31 / 49

  • Forward discontinuous Galerkin wave propagation solver

    seven mortars

    M1M2

    M3M4

    M5M6

    M7

    +PeimiPmieiPe0mi

    Pmie0

    () July 15, 2011 1 / 1

    0

    3

    1

    4

    2

    nonconforming hexahedral elements with Koprivas mortar approach forhyperbolic equations same convergence rate as conforming elementstensor product Lagrange basis on the Legendre-Gauss-Lobatto (LGL) nodesLGL quadrature (diagonal mass matrix)time integration by classical 4-stage/RK4integrated parallel mesh generation/adaptivity

    L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkinmethod for wave propagation through coupled elastic-acoustic media, Journal of ComputationalPhysics, 229(24):93739396, 2010.

    T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkinspectral element method for wave propagations, SIAM Journal on Numerical Analysis,50(3):1801826, 2012.

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 32 / 49

  • Point source approximation of M9 Tohoku earthquake

    Animation by Greg Abram, TACC

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 33 / 49

    japan_quake.movMedia File (video/quicktime)

  • Strong scalability of global seismic wave propagationExcellent strong scalability on Jaguar for meshing+wave propagation

    # proc meshing wave prop par eff Tflopscores time (s) per step (s) wave32,640 6.32 12.76 1.00 25.665,280 6.78 6.30 1.01 52.2

    130,560 17.76 3.12 1.02 105.5223,752

  • Extreme granularity limits for strong scaling of forward DGwave propagation solver on ORNL Cray XK6

    #cores cpu per step (ms) elem/core efficiency (%)256 1630.80 4712 100.0512 832.46 2356 98.0

    1024 411.54 1178 99.18192 61.69 148 82.6

    65536 11.79 19 54.0131072 7.09 10 44.9262144 4.07 5 39.2

    table shows wall clock time per time step in ms, elements per core,and parallel efficiency for 3 orders of magnitude increase in core count

    just 1.21 million 3rd order DG elements (694 million unknowns)

    parallel efficiency remains at 39% with just 4 or 5 elements/core

    T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scaleUQ for Bayesian inverse problems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012(Gordon Bell Prize Finalist).

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 35 / 49

  • Gradient and Hessian for full waveform seismic inversion

    Would like to compute gradients and Hessian actions w.r.t. c of

    J(c) :=1

    2

    T0

    (Bv(c) vobs)T 1noise (Bv(c) vobs)dx dt+ Rpr(c)

    where the dependence of v on c is given by solving the forward wavepropagation equations:

    vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},

    e = 0 on (0, T ).

    v, e are velocity and strain dilation

    c is the uncertain local wave speed parameter

    and g are known density and seismic source

    vobs are observations at receivers, B(x) is an observation operator

    noise is the noise covariance

    Rpr is the prior term involving 1prior

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 36 / 49

  • The gradient computation

    Gradient expression w.r. to c given by

    G(c) := 2c

    T0

    e( w) dt+ Rpr(c)

    where v, e satisfy the forward wave propagation equations

    vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},

    e = 0 on (0, T ).

    w, d (adjoint velocity, dilation) satisfy the adjoint wave propagation equations

    wt +(c2d) = B 1noise(Bv vobs) in (0, T ),

    dt + w = 0 in (0, T ),w = 0, d = 0 in {t = T} ,

    d = 0 on (0, T ).

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 37 / 49

  • Computation of action of Hessian in given direction

    Action of the Hessian operator in direction c at a point c given by

    H(c)c := 2

    T0

    ce( w)+ce( w)+ce( w) dt+ Rpr(c)(c),

    where v, e satisfy the incremental forward wave propagation equations

    vt (c2e) = (2cce) in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0} ,

    e = 0 on (0, T ).

    and w, d satisfy the incremental adjoint wave propagation equations

    wt +(c2d) = (2ccd) B1noiseBv in (0, T ),

    dt + w = 0 in (0, T ),

    w = 0, d = 0 in {t = T} ,

    d = 0 on (0, T ).

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 38 / 49

  • Application to synthetic global seismic inversion

    invert for anomaly from radially-varying PREM model (left)observations: from laterally-varying S20RTS model (right)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 39 / 49

  • The prior

    2 fields prior samples

    ground truth

    Prior is defined by square of generalized anisotropic Poisson operatorA := + , with

    = (I3 (r)rrT

    )with (r) :=

    1 r2

    (2r r2

    )if r 6= 0

    0 if r = 0,ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 40 / 49

  • Samples from prior and Gaussianized posterior distributions

    1.07 million uncertain acoustic wave speed parameters

    0.07 Hz maximum frequency, 3rd order DG elements, 630 million wavepropagation unknowns, 2400 time steps (1000s inversion time window)

    up to 100K cores on Jaguar XK6 (single forward solve is 1 minute on 64K cores)

    2000 reduction in problem dimension (488 dominant eigenvectors)Top row: Samples from prior

    Bottom row: Samples from the posterior

    Right: true earth model (black dots=5 sources, white dots=100 receivers)

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 41 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • Comparison of true model (S20RTS, left) with MAPsolution (right)

    black dots = 3 earthquake sources; white dots = 130 receivers

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49

  • MCMC for posterior distribution

    Solving the full UQ problem:

    Repeated Hessian evaluations too expensive for this problem

    Use Gaussian approximation at MAP as a proposal for MCMC

    Accept/Reject framework corrects for errors in approximation

    Sampling performance for a coarser problem (with 78k parameters):

    15,587 MCMC samples (each requires 1 forward PDE solve)

    4399 samples accepted (28%)

    Integrated autocorrelation time of about 1620 effective samplesize of about 800

    Total runtime of about 96 hours on 2048 cores

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 43 / 49

  • Samples and point marginals

    sampleno. 1 fromposterior

    distribution

    pointwiseprior

    variance

    pointwiseposteriorvariance

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49

  • Samples and point marginals

    sampleno. 2 fromposterior

    distribution

    pointwiseprior

    variance

    pointwiseposteriorvariance

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49

  • Samples and point marginals

    sampleno. 3 fromposterior

    distribution

    pointwiseprior

    variance

    pointwiseposteriorvariance

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49

  • Samples and point marginals

    sampleno. 4 fromposterior

    distribution

    pointwiseprior

    variance

    pointwiseposteriorvariance

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49

  • Spectral decay for refined parameter meshesSpectrum of prior-preconditioned misfit Hessian for global seismic inversion problem

    0 100 200 300 400 500 600 70010

    1

    100

    101

    102

    103

    104

    105

    106

    107

    number

    eig

    en

    va

    lue

    40,842 parameters

    67,770 parameters

    431,749 parameters

    largest 700 eigenvalues of prior preconditioned data misfit Hessian fordifferent discretizations

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 45 / 49

  • Summary and conclusions

    Stochastic Newton MCMC sampling algorithm reduces number of samples versusconventional MCMC by several orders of magnitude, and makes UQ for Bayesianinverse problems tractable

    Compactness of local data-misfit Hessian operator provides several orders ofmagnitude effective dimension reduction without introducing bias

    Randomized SVD extracts low rank approximation of data-misfit Hessian indimension-independent number of matvecs

    Matrix-free Hessian matvecs implemented through consistent first and secondorder adjoints

    Adaptive discontinuous Galerkin forward/adjoint wave propagation solver scales to262K cores with small number of elements per core

    Scalability of elliptic solve for action of prior operator assured by hybridGMG-AMG on forest of octrees scalability to 262K cores

    Stochastic Newton MCMC applied to synthetic inverse problem in 3D globalseismology with 1M earth model parameters and 630M forward unknowns, on upto 100K cores, leading to 3 orders of magnitude dimension reduction

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 46 / 49

  • References

    A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A-Optimal design for infinite-dimensional Bayesian linear inverseproblems with regularized `0-sparsification, 2013.N. Petra, J. Martin, G. Stadler, and O. Ghattas, A computational framework for infinite-dimensional Bayesian inverseproblems. Part II: Stochastic Newton MCMC with application to ice sheet flow inverse problems, 2013.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems with Besov priors, submitted.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov Chain Monte Carlo simulations,submitted.H. Sundar, G. Biros, C. Burstedde, J. Rudi, G. Stadler, Parallel geometric-algebraic multigrid on unstructured forests ofoctrees, submitted, Proceedings of IEEE/ACM SC12, 2012,T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scale UQ for Bayesian inverseproblems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012. (2012 Gordon Bell Prize Finalist)T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part II: Inverse mediumscattering of acoustic waves. Inverse Problems, 28(5):055002, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part I: Inverse shape scatteringof acoustic waves. Inverse Problems, 28(5):055001, 2012.J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, Uncertainty quantification in inverse problems with stochasticNewton MCMC, SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkin spectral element method forwave propagation, SIAM Journal on Numerical Analysis, 50(3):180-1826, 2012.T. Isaac, C. Burstedde, and O. Ghattas, Low-Cost Parallel Algorithms for 2:1 Octree Balance, Proceedings of IPDPS 12.H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesianuncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAMJournal on Scientific Computing, 33(1):407432, 2011.C. Burstedde, L.C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forestsof octrees, SIAM Journal on Scientific Computing, 33(3):11031133, 2011.T. Bui-Thanh, O. Ghattas, and D. Higdon, Adaptive Hessian-based non-stationary Gaussian process response surfacemethod for probability density approximation with application to Bayesian solution of large-scale inverse problems, SIAMJournal on Scientific Computing, 2011, submitted.L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkin method for wavepropagation through coupled elastic-acoustic media, Journal of Computational Physics, 229(24):93739396, 2010.C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, L.C. Wilcox, Extreme-Scale AMR,Proceedings of ACM/IEEE SC10, 2010.

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 47 / 49

  • Acknowledgements

    Research program supported by:

    NSF CMMI-1028889 (CDI), ARC-0941678 (CDI)

    AFOSR grant FA9550-12-1-0484 (Computational Math)

    DOE grants DE-SC0009286 (MMICCs), DE-SC0006656 (SciDAC),DE-FG02-08ER25860 (ASCR), DE-SC0002710 (SciDAC)

    Resources on ORNL Jaguar Cray XT-5/XK-6 supercomputer providedthrough ALCC award at ORNL Leadership Computing Facility

    Resources on TACC Lonestar, Longhorn, and Stampede systemsprovided through awards from TACC and XSEDE

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 48 / 49

  • Discussion questions

    1 How can big data and big models be integrated to produce betterpredictive models?

    2 What are promising new ideas for exploring high-dimensional space?

    3 What are promising new ideas for quantifying uncertainties inmodeling and simulation?

    4 How can we adapt/reinvent the important algorithms of CS&E sothey better map onto high-throughput accelerators? Onto systemswith massive numbers of cores?

    5 How can we transform our universities and federal agencies to becomemore hospitable to cross-cutting research/education at the interfacesof science/engineering, mathematics, statistics, and computing?

    ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 49 / 49

    Background, motivation, and goalsLangevin MCMC methods and stochastic NewtonLow rank Hessian approximation and scalabilityExample: Full waveform global seismic inversion