Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian...
Transcript of Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian...
Brief Introduction to (Hierarchical) Bayesian Methods
Christopher K. WikleDepartment of Statistics
University of Missouri-Columbia
Outline
• Bayesian Statistics
• Hierarchical Bayesian Models
• Examples
– Blending Tropical Surface Winds
– Long-Lead Prediction of SST
– Tornado Report Climatology
• Conclusion
AMS Short Course, January, 2004
1
Geophysical Analysis
Two types of geophysical models?
• Physical:
– Laws of classical physics
– Uncertainties: neglecting higher order terms; choice of space-time averaging (subgrid-scale parameterizations); linking sys-tems (coupling biological processes to weather), etc.
• Statistical:
– Descriptive
– Uncertainties: inefficient in complex settings; data not repre-sentative of full dynamics; application under system changes;sampling; estimation
2
“Physical/Statistical Models”
Better to consider a “spectrum of models” (e.g., Berliner, 2003)
Deterministic ⇐⇒ Stochastic
Bayesian Approach:
• Natural for combining information sources while managing theiruncertainties.
– Multiple data sources
– Uncertain physics
– Physical models with stochastic parameters
– Expert opinion
• Predictive distributions of quantities of interest, conditioned on
observations
3
Distributional Notation
We use the short-hand notation [ ] for probability distribution.
Given random variables A and B, represent:
• Joint distribution of A and B: [A, B]
• Conditional distribution of A given B: [A|B]
• Marginal distribution of A: [A]
Also, note the following are equivalent:
Y |µ, σ2 ∼ N(µ, σ2)
Y = µ + ε, ε ∼ N(0, σ2)
where “∼” means “is distributed as”.
4
Bayesian Viewpoint
Bayes’ Theorem:
[Y |X ] =[Y,X ]
[X ]=
[X|Y ][Y ]∫[X|Y ][Y ]dY
For random variables with known distributions, this is a “fact” fromprobability theory. It is not controversial!
Bayesian Statistics:
Modeling unknown distributions and updating those models based ondata. Might be controversial!
5
Bayesian Statistics
[Y |X ] =[X|Y ][Y ]∫
[X|Y ][Y ]dY∝ [X|Y ][Y ]
• Want to make inference about Y , but we only observe X
• We update our uncertainty about Y after observing X
• [Y ] reflects our prior knowledge of Y
• [X|Y ] is the “likelihood”
• [Y |X ] is the posterior distribution
Bayesian Modeling:
Treat all unknowns as if they are random and evaluateprobabilistically
[Process|Data] ∝ [Data|Process][Process]
6
Simple Example 1
• Temperature observations: T (s1), T (s2), . . . , T (sn)
• These are observations of the true (unknown) process: θ
• We have prior information about θ, say θ0 from a “numerical model”but we don’t believe our model is perfect.
Probabilistically, we may believe:
[Data|Process] : T (si)|θ, σ2 ∼ N(θ, σ2)
[Process] : θ|θ0, τ2 ∼ N(θ0, τ
2)
7
We want the posterior: [Process|Data], i.e.,
[θ|T (s1), . . . , T (sn)] ∝ [T (s1), . . . , T (sn)|θ, σ2][θ|θ0, τ2]
where for simplicity, we assume θ0, τ2, σ2 are known. Then,
θ|T (s1), . . . , T (sn) ∼ N(wT + (1− w)θ0,σ2τ 2
σ2 + nτ 2)
where
T =1
n
n∑i=1
T (si)
w =τ 2
τ 2 + σ2/n.
Note: this is a weighted average of the sample mean and the priormean, where the weights are related to our certainty about the obser-vations and prior.
8
Basic Bayes is Not New to Meteorology
• Epstein, early 1960’s used Bayesian decision making in applied me-teorology
• Weather Modification Research in 1970’s (e.g., Simpson et al. 1973,JAS)
• Data Assimilation can be viewed from Bayesian viewpoint (e.g.,Lorenc 1986, Q. J. Roy. Met. Soc.)
– Kalman filter is Bayesian (e.g., Meinhold and Singpurwalla, 1983)
– Ensemble Kalman Filters (Sequential Monte Carlo); e.g., Evensenand van Leeuwen, 2000, MWR
• Climate Change (e.g., Solow 1988; J. Clim.; Leroy 1998, J. Clim.)
• Satellite Retrieval (many!)
Hierarchical Bayes is new to meteorology!
9
Hierarchical Bayesian Viewpoint
Cornerstone of hierarchical modeling is conditional thinking.
Joint distribution can be represented as products of conditionals. e.g.,
[A, B, C] = [A|B, C][B|C][C]
NOTE: Our choice for this decomposition is based on what we knowabout the process, and what assumptions we are willing and able tomake for simplification:
e.g., conditional independence: [A|B,C]=[A|B].
Often easier to express conditional models than full joint models.
10
Hierarchical Bayesian Modeling of Geophysical Systems
Separate unknowns into two groups:
• process variables: actual physical quantities of interest (e.g., tem-perature, pressure, wind)
• model parameters: quantities introduced in model development(e.g., propagator matrices, measurement error and sub-grid scalevariances, unknown physical constants)
Basic Hierarchical Model
1. [data|process, parameters]
2. [process|parameters]
3. [parameters]
The posterior [process,parameters|data] is proportional to theproduct of these three distributions!
11
Simple Example 1 (cont.)
Say that in Example 1, we believe that the numerical model valueof θ is biased. We do not know exactly the bias, but suspect it isrelated to known factors x ≡ (x1, x2, . . . , xk)
′ (e.g., MOS). Then, ourhierarchical model has the following stages:
• Data Model:
T (si)|θ, σ2 ∼ N(θ, σ2)
• Process Model:
θ|θ0,x, β, τ 2 ∼ N(θ0 + x′β, τ 2)
• Parameter Model:
β|β0,Σ ∼ N(β0,Σ)
We then seek the posterior distribution: [θ, β|T (s1), . . . , T (sn)]
12
Complicated Processes
Usually, we can’t find analytically the normalizing constant in Bayes’theorem. Thus we can’t easily get the posterior distribution.
• Numerical integration (o.k. for low dimensions)
• Use Monte Carlo methods to sample from the posterior. Specifi-cally,
– Markov Chain Monte Carlo (MCMC)
∗ Sample from a Markov Chain that has the same ergodic dis-tribution as the posterior
∗ Don’t need to know normalizing constant
∗ e.g., Metropolis, Gibbs sampler algorithms
∗ revolutionized Bayesian statistics in the late 1980’s
– Very computationally intensive
13
Example 2: Tropical Ocean Winds
Problem: Blending tropical surface winds given high-resolution satel-lite scatterometer observations and low-resolution assimilated modeloutput. (Wikle, Milliff, Nychka, Berliner, 2001; JASA)
14
Wind Problem: Process Model Motivation I
Consider the linear shallow water equations (on an equatorial beta-plane):
∂u
∂t− β0yv + g
∂h
∂x= 0
∂v
∂t+ β0yu + g
∂h
∂y= 0
∂h
∂t+ h(
∂u
∂x+
∂v
∂y) = 0
• β0: constant related to rotation of earth
• g: gravitational acceleration
This linear system can be solved analytically, giving a series of travelingwaves (equatorial normal modes) - Observed in the tropics!
15
Wind Problem: Process/Parameter Model Motivation II
Empirical results suggest turbulent scaling behavior for near surfacetropical winds (e.g., Wikle, Milliff and Large, 1999; JAS)
In particular:
Sv(ω) ∝ σ2v
|ω|κ
where κ = 5/3, and ω is the spatial frequency.
16
Wind Model: Hierarchical Model Sketch (v-Component)
• Data Model: Observations at different resolutions
[Zs′t Za
′t]′ = Ktvt + εt, εt ∼ N(0,Σε)
• Process Model:vt = Φat + γt,
Φ - matrix containing “important” normal mode basis functions(importance determined from empirical studies, e.g., Wheeler andKiladis, 1999)
at - time-varying spectral coefficients
γt - multiresolution spatio-temporal process
• Parameter Models:
at = H(θ)at−1 + ηt, ηt ∼ N(0,Ση)
θ - priors based on equatorial wave theory and empirical studies
γt - priors to suggest turbulent power-law relationships
17
Wind Problem Results: Posterior Decomposition
18
Wind Problem Results: Tropical Cyclone Dale
19
Wind Blending: “Operational”
Tim Hoar, Geophysical Statistics Project, NCAR has made thismodel “operational” on QSCAT data (1/2 deg)
20
Example 3: Long Lead Prediction of SST
Goal: Predict Pacific SST anomalies (2◦ × 2◦ resolution) 7 months inadvance while realistically accounting for uncertainty. [Berliner,Wikle,Cressie, 2000, J. Climate]
Consider data {Z(si; t)} to be anomalies from monthly means.
Want to predict Z(s0; T + τ ); for τ = 7 months, from space-time dataD(T ) ≡ {ZT ,ZT−1, . . . ,Z1}.
21
Data Model
Dimension Reduction:
Zt = Φat + νt,
where
• νt ∼ Gau(0,Σν); measurement error and small-scale spatial vari-ability that is uncorrelated in time
• Φ; truncated empirical orthogonal function basis set, ob-tained from SVD of Z ≡ (Z1, . . . ,ZT ) .
22
Process Model
at+τ = µt + Htat + ηt+τ ,
where, ηt ∼ Gau(0,Ση) for all t.
Critical modeling assumption: Let Ht and µt be dependent on boththe current and future climate regimes:
Ht = H(It, Jt)
µt = µ(It, Jt),
where,
• It - classifies the current regime as “warm” (2), “normal” (1), or“cold” (0)
• Jt - anticipates a transition to one of the three regimes at time t+τ
23
Climate States
Current, It: [Threshold Model, e.g., Tong 1990]
It = 0, if SOIt > low threshold
= 1, if otherwise
= 2, if SOIt < upper threshold,
where SOIt is the Southern Oscillation Index (assumed “known”).
Future, Jt: [Latent (hidden) Process Model]
Jt = 0, if Wt > low threshold
= 1, if otherwise
= 2, if Wt < upper threshold,
where Wt is a latent process which anticipates the future climateregime.
24
Latent Process to Assess “Future”
Wt|βw, σ2w ∼ Gau(x′tβw, σ2
w)
where,
xt = (1, Ut, Utsin(2πmt
12), Utcos(
2πmt
12), U 2
t )′,
where
• Ut - the lowpass filtered E-W component of the wind at 10 metersabove the surface at 5◦ N and 157◦ E. (This is absolutely critical!)
• mt - index of month (0 - 11) at time t
25
Priors on Model Parameters
At next stage of hierarchy:
βw ∼ Gau(βSOI, cβvar(βSOI)),
σ2w ∼ IG(q, r)
where the mean and variance and IG (“Inverse Gamma”) parametersare obtained from fitting
SOIt+τ = x′tβSOI + et
(gives R2 ≈ .7 !)
• vec(Hj), j = 1, 2, 3 - Multivariate Normal distributions
• µj, j = 1, 2, 3 - Multivariate Normal distributions
• Covariance matrices - Wishart distributions
(“Empirical Bayes” priors)
26
Bayesian Prediction
MCMC Implementation
• Probability weighted mixtures:
E(aT+τ |D(T )) =2∑
j=0Pr{JT = j|D(T )}E(µ(IT , j) + H(IT , j)aT |D(T ))
• Pixelwise percentiles of [ΦaT+τ |D(T )]
• Summary measures of El Nino/La Nina (e.g., Nino 3.4 Index) andtheir posterior distributions
27
Results: Nino 3.4 Prediction of 10/97 from 3/97
28
Results: Posterior Predictions for 10/97 and 10/98
29
Results: Component Predictions for 10/98
30
Results: Out of Sample Nino 3.4 Posterior Predictions
31
Example 4: Tornado Counts (Wikle and Anderson, 2003; JGR-A)
Investigate possible dependence between observed tornado counts andclimate indices (ENSO, NAO, NPI).
NWS Storm Reports 1953-2001
32
Modeling Issues
Complicated Variability
• Non-normal data (counts)
• Observational uncertainty in counts
– Uncertainty in procedure for assigning Fujita scale (reliance onhuman observation and variable rating procedure)
– Demographic and technological influences
• Spatial variability in long-term frequency of tornado counts
• Temporal trends that may vary with space
• Exogenous climatological factors with effects that may vary
spatially (e.g., regionally)
33
Data Model
Y (si; t) - tornado count in grid box si in year t
Poisson Data Model
Y (si; t)|λ(si; t) ∼ Poisson(λ(si; t))
assumes conditional independence (i.e., counts are independent condi-tioned on the true Poisson intensity process, λ)
NOTE: In paper, we consider a “zero-inflated Poisson model”.
34
Process Model
log(λ(si; t) = β1(si)X1(t) + . . . + β6(si)X6(t) + η(si; t) + ε(si; t)
where,
• X1 - time
• X2 - ENSO index
• X3 - NPI index
• X4 - NAO index
• X5 - ENSO by NPI interaction
• X6 - ENSO by NAO interaction
• η(si; t) - spatially correlated temporal process (large-scale) [account-ing for other potential climate effects]
• ε(si; t) - uncorrelated random effect [ε(si; t) ∼ N(0, σ2ε )]
35
Parameter Models
βk ∼ N(0,Σ(θk)) - Spatially correlated random fields specified byvariance and dependence parameters (θk).
ηt ∼ N(µ,Σ(θη)) - spatial process, with mean µ and variance anddependence parameters given by θη.
Distributions must also be specified for the “hyper-parameters”: θk,θη, µ, and σ2
ε (see Wikle and Anderson (2003) for details)
36
Posterior Summary: Time Trend (β1) Coefficients
Posterior Mean and Posterior 95% “Credibility”
37
Posterior Summary: ENSO (β2) Coefficients
Posterior Mean and Posterior 95% “Credibility”
38
Posterior Summary: NPI (β3) Coefficients
Posterior Mean and Posterior 95% “Credibility”
39
Some Other Recent Hierarchical Bayesian Examples
• Air-Sea Interaction: (Berliner, Milliff, Wikle, 2003, JGR-O)
• Climate Change: (Berliner, Levine, Shea, 2000, J. Clim.),
• Stochastic Boundary Value Problems: (Wikle, Berliner, Milliff,2003, MWR)
• Nowcasting Radar: (Xu, Wikle, Fox, 2003)
• Hurricane Climatology: (Elsner and Bossak, 2001, J. Clim.)
• Analysis of Rainfall Data: (Sanso and Guenni, 1999, Applied Statis-tics)
40
Conclusion
• Bayesian methods allow one to quantify uncertainty in all aspects ofthe problem (data, process, parameters) and distributional outputreflects the quantification of uncertainty
• Hierarchical Bayesian methods allow one to decompose the probleminto a series of simpler conditional models
– Can accommodate data, model, and parameter uncertainty
– Can accommodate data of different resolution/alignment (e.g.,GIS layers)
– Can accommodate complicated spatial, temporal, and spatio-temporal dependence
– Can accommodate physics (e.g., shallow water PDE; turbulence)
– Can accommodate expert opinion
• Downside: Complicated and computationally intense (canned soft-ware available for free, winBUGS)
41
* Most of the papers discussed here can be found on myweb page:
http://www.stat.missouri.edu/~wikle/
42