C ollaborative Centre for Data Analysis, Modelling and … · 2017-11-22 · “Probability “...
Transcript of C ollaborative Centre for Data Analysis, Modelling and … · 2017-11-22 · “Probability “...
Intermediate Bayes
Kerrie MengersenQUT Brisbane
ACEMS
2016
Collaborative Centre for Data Analysis,
Modelling and Computation
QUT, GPO Box 2434
Brisbane 4001, Australia
Course Outline
1. Foundations
2. Linear and hierarchical modelling
3. Computational methods
4. Software
5. Case Study
6. Latent variable models
7. Spatial models
8. Bayesian networks
Acknowledgement to Dr Clair Alston, Griffith University, Australia, for some of these course notes.
1. Foundations
•
A Bayesian and a Frequentist were to be executed.
The judge asked them what were their last wishes.
The Bayesian replied that he would like to give the
Frequentist one more lecture. The judge granted the
Bayesian's wish and then turned to the Frequentist
for his last wish. The Frequentist quickly responded
that he wished to hear the lecture again and again
and again and again........ (Xiao-Li Meng)
p(q|y) = p(y|q) p(q) / p(y)
1763 1812 1838 1930’s 1950’s 1980’s 1990’s 2000’s
Bayes Laplace Boole Fisher Jeffreys Geman Gelfand Today’s
Venn Neyman Geman Smith Bayesians
“Probability “ Inverse “BayesianTheory” Probability” Analysis”
Recall Bayes’ Rule
p(A|B) = p(B|A)P(A) / p(B)
p(A | B) = p(A and B) / p(B)
p(B | A) = p(B and A) / p(A)
p(A and B) = p(B and A)
p(B and A) = p(B | A) p(A)
p(A | B) = p(B | A) p(A) / p(B)
Bayes’ Theorem
p(A|B) = p(B|A) p(A) / p(B)
Think of: A=q (unknown parameters, etc)B=y (known/observed ‘data’)
So: p(q|y) = p(y|q) p(q) / p(y)
The Reverend Thomas Bayes (1701-61) studied how to compute a distribution for the probability parameter of a binomial distribution (in modern terminology).
Bayesian Modelling
Frequentist approach to modelling
We have some data y, and want to know about q given y
q can be unknown parameters, missing data, latent variables, etc. Eg 1: sample y “successes” from n trials. What is Pr(success), q?
Eg 2: sample y from N(q,1). What is population mean q?
Frequentist: estimate q through the likelihood: p(y|q)
How likely is y for specified values of q?
Eg: prob. of observing y if y~Bin(n,q=0.3) or y~N(q=1,1)
Solved using moment estimators or maximum likelihood.
But we really want to know about p(q|y)
Bayesian approach to modelling
Example: Estimating a proportion
• From an ecologist: I want to know where koalas might be present. I surveyed 29 sites and 22 have koalas. What is the probability that a koala will be present at a different site in the same area, given this information?
• From a clinician: I want to know about the safety of a medical procedure. I treated 29 patients and 22 survived. What is the probability of survival, given this information?
• What is unobserved?q = probability of success (presence of koalas, survival)
Likelihood
Prior
DAG: Binomial model
Modely ~ Binomial (q, n)q ~ Beta (a,b)
y
n
q
ba
Posterior
Your turn
Binomial example with 22 successes out of 29 trials:
Consider the following priors for q:
Beta(1,1) Beta(9,1) Beta(100,100)
Choose one of these priors:
1. What is the prior mean for q?
2. What is the posterior distribution for q?
3. What is the posterior mean for q?
4. What general conclusions can you make
about the influence of priors and sample size?
Answers
Sample proportion = 22/29 = 0.76
Beta(1,1):
Prior mean = 1/(1+1) = 0.5
Posterior mean = (22+1)/(22+1+7+1) = (22+1)/(29+1+1) = 0.74
Beta(9,1):
Prior mean = 9/(9+1) = 0.90
Posterior mean = (22+9)/(22+9+7+1) = 0.79
Beta(100,100):
Prior mean = (100)/(100+100) = 0.5
Posterior mean = (22+100)/(22+100+7+100) = 0.53
Your turn
Simulate and plot the density for the likelihood and these sets of
priors and posteriors.
Sample code
# calculate the likelihood
# p(y=22|theta) = Bin(n=29, theta)
y=22
n=29
theta=c(0,0.001,seq(0.01,0.99,0.01),0.999,1)
lik=dbinom(x=y,size=n,prob=theta)
lik=lik/sum(lik)
plot(theta,lik,type="l",ylab="prob",ylim=c(0,0.2))
# calculate the prior
# p(theta) = Beta(1,1)
# change to Beta(9,1), Beta(100,100) later
a1=1; b1=1
prior1=dbeta(theta,a1,b1)
prior1=prior1/sum(prior1)
lines(theta,prior1,col=2,lty=2)
# calculate the corresponding posterior
post1=dbeta(theta,y+a1,n-y+b1)
post1=post1/sum(post1)
lines(theta,post1,col=2,lty=1)
Influences on posterior
• The posterior mean is a compromise between the prior mean and the data.
• The stronger the prior, the more weight the prior has in the posterior.
• The larger the sample size, the more weight the likelihood has in the posterior.
Conjugate priors
• It might be reasonable to expect the posterior distribution to be of the same form as the prior distribution. This is the principle of conjugacy.
• A conjugate prior for a Binomial likelihood is a Beta distribution: the posterior is then also a Beta distribution.
Conjugate priors
Dynamic UpdatingIf we obtain more data, we do not have to redo all of the analysis: our posterior from the first analysis simply becomes our prior for this next analysis.
Binomial example:
Stage 0. Prior p(q) ~ Beta(1,1); ie E(q)=0.5.
Stage 1. Observe y=22 presences from 29 sites.
Likelihood: p(y|q)~Bin(n=29, q)
Posterior: p(q|y)~Beta(23,8); ie E(q|y) = 0.74
Stage 2: Observe 5 more presences from 10 sites.
Likelihood: p(y|q)~Bin(n=10, q);
Prior p(q)~Beta(23,8);
Posterior p(q|y)~Beta(28,13); ie E(q|y) = 0.68.
Your turn
Confirm that the dynamic updating method described in the previous slide gives the same outcome as analysing all of the data together.
Data:
Prior:
Posterior:
Posterior mean:
Example: Estimating a normal mean
n observations Y = (y1,..,yn) from a normal distribution,
unknown mean m, known variance s2
Normal Model
Normal model, unknown meanunknown variance
s2 ~ Inverse Gamma(a,b)
s ~ Uniform(a,b)
s ~ Half Cauchy(a,b)
What do these ‘look like’?
Linear regression
Linear regression: priors
Linear regression: Posterior
Linear regression: Posterior
Model Comparison
• Bayes factors, posterior odds, BIC, DIC
• Reversible jump MCMC, Birth and death MCMC
• Model averaging
Bayes factors
• Consider models M1 and M2 (not necessarily nested)
• Choose a model based on its posterior probability given the data. This is proportional to the prior probability of the model multiplied by the likelihood of the model given the data. So we consider:
p(M2|y) p(M2) p(y|M2)
Bayes factors
To compare M2 versus M1:
p(M2|y) / P(M1|y) = {p(M2) / p(M1)} {p(y|M2) / P(y|M1)}
• The second term (the ratio of marginal likelihoods) is termed the Bayes factor B21. This is similar to a likelihood ratio, but p(y|M) is integrated over the parameters instead of maximised: eg, p(y|M1) = p(y|M1,q1) p(q1) dq1
• 2log(B21) gives same scale as usual deviance and LR statistics.
Guidelines for Bayes Factors(arbitrary!)
B21 2log(B21) Interpretation
<1 Negative Supports M1
1 to 3 0 to 2 Weak support for M2
3-20 2-6 Supports M2
20-150 6-10 Strong evidence for M2
>150 >10 Very strong supportfor M2
Bayesian Information Criterion (BIC)
• Approximate the Bayes factor
• Under some assumptions, if p is the dimension of the model and n is the no. observations:
BIC = log P(y|q*,M) – p/2 log n
We can rewrite as
BIC = n log(1-R2) + k log(n)
Discussion of BIC
• BIC penalises models which improve fit at the expense of more parameters (encourages parsimony).
• A problem is that the true dimensionality (number of parameters p) of the model is often not known, and also that the number of parameters may increase with sample size n.
• Can approximate using the effective number of parameters (Speigelhalter et al, 1999).
• Alternatives are DIC (deviance information criterion, calculated in WinBUGS), conditional posterior predictive probabilities, etc.
Markov chain Monte Carlo
• “Decompose” joint posterior distribution into a sequence of conditional distributions – these are often much simpler (eg, simple univariate normals, etc)
• Simulate from each conditional distribution in turn. We use a simulation method that resembles a Markov chain (so that the new simulated value relies only on the previous value), giving a set of simulated values
q (1), q (2), …, q (i) , ... which converges to the required conditional, The resulting simulations will come from the required joint distribution
• We can use Markov chain theory to make statements about behaviour and convergence of the chain
Computational Algorithms
• Gibbs sampling: sample from the conditionals themselves
• Metropolis-Hastings: sample from an “easy” distribution and accept those values that conform to the conditional distribution
• Lots of variations: reversible jump, slice sampling, particle filters, perfect sampling, adaptive rejection sampling, etc
• Need to ensure conditions, eg detailed balance, reversibility
• Approximations: Variational Bayes (VB), Approximate Bayesian Computation (ABC), Sequential Monte Carlo (SMC)
Gibbs sampling
Suppose we have a joint posterior p(q1, q2 | y,… )
0. Choose starting values q1(0), q2
(0)
1. At ith iteration (i)
Sample q1(i) from p(q1 | q2
(i-1),y,...)
Sample q2(i) from p(q2 | q1
(i),y,…)
2. Repeat step 1 many times
3. Make inferences based on simulated values
Exercise
Data yi|l ~ Poisson(l), i=1,…,m
yi|f ~ Poisson(f), i=m+1,…,n
Priors l ~ Gamma(a,b)
f ~ Gamma(c,d)
m is discrete over {1,…, n}
a, b, c, d known constants.
1. What is the posterior distribution of interest?
2. What are the conditional distributions of interest?
3. Design and implement a Gibbs sampler for this problem.
(Alston et al. 2013, Case Studies in Bayesian Statistical Modelling and Analysis, p. 19)
2. Linear and Hierarchical Modelling
Exercise
Video: Bayesian Methods Interpret Data Better
Monday, November 12, 2012
(Based on the book “Doing Bayesian Data Analysis”, by John Kruske)
http://doingbayesiandataanalysis.blogspot.com.au/2012/11/video-bayesian-methods-interpret-data.html
Goal: estimate the underlying probability of getting a hit by each player, based on their hits H, at bats AB, and primary position.
Linear regression: US crime data
Likelihood
Prior
Prior
Gibbs sampler
Weakly informative prior
Posterior estimates
Non-informative Priors
Posterior estimates
Posterior distributions
Model checks
Model checks
Model checks
Non-informative Priors
Model comparison
Prior predictive density
Model comparison
Linear mixed models
Linear mixed models
Linear mixed models
Bayesian analysis
Priors
Gibbs sampler
Computational caution
Frequentist analysis using lm() in R
Bayesian analysis using MCMCglmm() in R
MCMC output
Posterior distributions for fixed effects
Difference in maternal heights?
Posterior distributions for random child effects
Hierarchical normal model
Posterior distribution
ExampleMaths scores from students at 100 schools
Results
Spiralling WhiteflyAleurodicus dispersus
Countries where spiralling whitefly has been detected.Administrative regions within some countries are shown whendocumented. Source (CABI 2004, Monteiro et al. 2005, CABI2006). Personal communications (J.H. Martin, 2008, B.M.Waterhouse, 2008)
The Problem
Major tropical plant pest
Lives on 100 hosts +
Restricts market access to other states
Information
Literature: Characteristics,growth, spread
Detectability (inspectors)
Surveillance data (> 30 000 records)
Scope of modelling
Local, district and statewide
• Data Model: Pr(data | incursion process and data parameters) – How data is observed given underlying pest extent
• Process Model: Pr(incursion process | process parameters) – Potential extent given epidemiology / ecology
• Parameter Model: Pr(data and process parameters)– Prior distribution to describe uncertainty in detectability, exposure, growth …
• The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by:
Pr(process, parameters | data)
Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)
Hierarchical Bayesian model
Early Warning Surveillance
Priors
Surveillance data
Posterior learning
modest reduction in
area freedom
large reduction in
estimated extent
residual “risk” maps to
target surveillance
Invasion Parameter Estimates
Useful for local management
Observation parameter estimates
Also learn about:
• Host suitability
• Inspector efficiency
3. Computational Methods
Computational Methods
• MCMC algorithms• Gibbs Sampling• Metropolis Hastings• Slice Sampling
• Approximations• INLA• Variational Bayes• ABC
Metropolis sampling
Often we can’t simulate from the distribution p. Instead, simulate from an “easy” (proposal) distribution q and accept the valuesbased on the correct distribution p.
• Conditional distribution p(q|...)• Proposal distribution q(q) symmetric around q
eg U(q-1,q+1) or N(q,1)• Suppose we have q(i-1) and we want q(i)
• Simulate possible q(i) (q* say) from the proposal dist. q• Accept q(i)=q* with probability:
a = min { 1, p(q*|... ) / p(q(i-1)|...) }Otherwise let q(i) = q(i-1)
Hastings sampler
If the proposal q(q) is not symmetric, the acceptance probability becomes:
Accept q* with probability:
...)|(
...)|(
)(
)(,1min(
)1(
*
*
)1(
i
i
p
p
q
q
q
q
q
qa
Acceptance Rate
For a Metropolis-Hastings algorithm:
• High acceptance rates are desirable if the proposal
density approximates the target closely enough to
ensure uniform ergodicity.
• Low acceptance rates are preferable if a random
walk is adopted (eg, 40-70% acceptance, or less
for higher dimension problems).
• Start with a plausible guess q(0) for q.
• At stage t: suppose the current guess is q(t).
E-step: compute the expectation of the log-likelihood function Q = E log p(q|y)at qq(t)
M-step: find the value q(t+1) that maximises Q
Expectation-Maximisation (EM) Algorithm
Slice Sampling
Sample from a distribution by sampling uniformly from
the region under the plot of its density function.
We alternately sample uniformly in the vertical direction
given the horizontal position, then sample a new horizontal
position from the horizontal ‘slice’ at the new vertical
position.
Radford Neal (Annals of Statistics, 31, 705-767)
Why slice sampling?
• Slice sampling methods are more efficient than Gibbs, easily
implemented for univariate distributions, and can be used to sample
from a multivariate distribution by updating each variable in turn.
• Slice sampling has the ability to adaptively choose the magnitude
of changes made. It is therefore attractive for routine and automated
use.
• Methods that update all variables simultaneously are also possible.
Hybrid Methods
Employ combinations of MCMC algorithms in a single analysis
• different MCMC algorithms for different parameters
• insert a MH step with larger dispersion or probability of
acceptance at every nth iteration
• mode jumping proposals
• methods based on tempering
• methods based on regeneration
Can be almost automatically constructed to ensure uniform
convergence to the target distribution
Delayed rejection
Tierney and Mira (1999); Green and Mira
A. Propose move for qk.
B. Accept with usual M-H probability.
C. If reject, propose new move for qk and accept with probability that takes into account the fact that the first move is rejected.
D. If reject, repeat C as required or until stopping rule.
Sequential Methods
Particle filters are used in sequential settings or for processing and
analysis of large datasets. A particle filter describes a dynamic
state space model of a process with an underlying state of interest
that is evolving over time. The posterior distribution of the state is
approximated by a set of weighted particles inversely proportional
to its probability mass.
Numerous MCMC algorithms have been proposed, with good
convergence properties.
Your turn
• Suppose that you want to estimate the mean q of a Normal distribution with known variance s2=5.
• The prior distribution on q is N(1,2).
• You observe data y=(1,3,2,0,2)
• Design a Metropolis algorithm to estimate the posterior mean.
Approximations
• INLAhttp://www.math.ntnu.no/inla/r-inla.org/papers/jss/lindgren.pdf
Approximations
Variational Bayes
https://en.wikipedia.org/wiki/Variational_Bayesian_methods
Approximations
• ABCDifficult or impossible (intractable) likelihood but you can simulate from it? No problem…
• Assume we have some observed data Y
• Simulate a value of q from a prior distribution• Simulate “pseudo-data” Y* from p(Y|q)• Accept q if Y* is “sufficiently close” to observed data Y
https://www.youtube.com/watch?v=nKCT-Cdk0xY
4. Software
mcsm
Bayesian Software
BUGS: ‘Bayesian analysis Using Gibbs Sampling’• WinBUGS: Windows version of BUGS
http://www.mrc-bsu.cam.ac.uk/software/bugs/
• OpenBUGS: open-source BUGS
http://www.openbugs.net
Can use WinBUGS or OpenBUGS directly or call from R, Matlab, etc
Bayesian Software
Interfacing R and WinBUGS
• R2Winbugs
• Brugs
Bayesian analysis via R
Many packages are now available: see http://cran.r-project.org/web/views/Bayesian.html
Example: bayesmrhierLinearModel: Gibbs Sampler for Hierarchical Linear Modelrunireg: Gibbs Sampler for Univariate Linear Model
Example: MCMCpack
MCMCregress: MCMC for regression
MCMClogit: MCMC for logistic regression
Example: mcsm
gibbsmix: mixture modelling
INLA:
http://www.r-inla.org/
Bayesian Software
Other stand-alone software: • JAGS
http://mcmc-jags.sourceforge.net/
• STAN
can interface with R, Matlab, etc
http://mc-stan.org/
• First Bayes, etc
Bayesian Software
Routines in other software such as SAS, Stata
Bespoke software using R, Fortran, C, Python, etc
In-class introductions
• WinBUGS
• R to WinBUGS
• R packages
• STAN
Your turn
Using the software of your choice, analyse the student maths scores data.
Final exercise:
How many testimonials of a miracle would I need to have sufficient evidence to claim “It’s a miracle”?
5. Case Study
Case Study
Notes will be distributed in class
6. Latent Variable Models
Mixture Models
The observed valuesare observations froma mixture of distributions
Eg, for mixture of K=3 Normals: q = (m,s)y ~ p1 N(m1 ,s1
2) + p2 N(m2 ,s22) + p3 N(m3 ,s3
2)
Eg, phenotypes from
3 genotypes: qq, qQ, QQ
Bayesian normal mixture model
y ~ Sj=1:k pj N( mj , sj2 )
m ~ Normal
s ~ Uniform
p ~ Dirichlet(a1,.., aK)
The p’s are the ‘weights’ assigned to each component. A conjugate
prior for the p’s is a Dirichlet distribution (an extension of the Beta).
f(p;a) P pjaj-1 ; setting a=1 for all j gives the Uniform.
This P is a ‘product’ sign, ie multiply over j=1,..,K
The a’s are usually set
and read in as part of the Data
Latent variable approach
• Associate with each yi another variable Ti that identifies the component of the mixture to which that yi belongs. (Note that we don’t observe the T’s.)
• We can then ‘break down’ the likelihood:
yi | Ti = T ~ N(y|mTi,sTi2)
• A typical prior for T is the multinomial or categorical distributionTi ~ Cat(p1,…,pK)
now just a univariate problem
Gibbs sampling for mixtures
0. Initialisation: Choose p(0) and q(0) arbitrarily
For t=1,…
1.1 Allocate observations to components:
Generate T(t) for each observation
1.2 Generate new weights for the components:
Generate p(t)
1.3 Generate new parameters for each component:
Generate q(t)
Deep brain stimulation for
Parkinson’s Disease
Placement of electrodes
in the subthalamic nucleus.
Electrical current improves
symptoms, in particular
motor function.
Example: Parkinson’s Disease
With Zoe van Havre, Nicole White,
Judith Rousseau
Microelectrode recordings
Identify spikes
and assign to
unknown no.
source neurons
Compare clusters
between segments
within a recording
and between
recordings at
different locations
of the brain3 depths
Extracted waveforms
Waveforms were analysed at 3 depths (Y1, Y2, Y3).
Each recording was divided into 2.5sec segments.
Discriminating features were found via PCA.
Finite mixture models (FMM)
For a FMM with K components, with data y1:n = {y1,…,yn}
yi |p,q ~ Sk=1:K pk f(yi|qk)
Prior on p
p ~ D(a1,…,aK)
Hidden Markov model (HMM)
Prior on the rows of the transition matrix Q:
p(Q(.,k)) ~ D(a1,..,aK), aK > 0, j ≤ K
[ Yt | Xt = j ] , Xt ϵ X = {1,…,K*}
Back to Parkinson’s Disease
Extracted waveforms
Two Mixture Models
• Multivariate Normal
For P PCs, yi=(yi1,..,yiP) follows a MVN distribution
p(y|q,p) = Pi=1:n Sk=1:K* pk Nr(mk,Sk)
• Dirichlet Process Mixture
yi | qi ~ qi
qi | G ~ GG ~ DP(mG0)
m ~ Gamma(1,1)
ResultsHow many clusters?
ResultsWhat do the groups look like?
https://arxiv.org/pdf/1602.01915.pdf
Changes in waveforms over time
https://arxiv.org/pdf/1602.01915.pdf
Mixture modelling in R
• mcsm
• bayesm
• LaplacesDemon (Byron Hall)
• bayesmix (uses JAGS)
• Bmix – for stickbreaking mixtures
• rjags – interface to JAGS MCMC, has module for mixtures
7. Spatial Models
Spatial Models
• Disease mapping• Estimate the true relative risk of a disease of interest
across a geographical study area
• Disease clustering• Assess whether a disease map is clustered and where
the clusters are located• Disease incidence around a putative hazard source
(focused clustering)
• Ecological analysis• Analysis of geographical distribution of disease in
relation to explanatory covariates, usually at an aggregated spatial level
Basic Model: assumptions
• The distribution of observed counts of disease within an area:
yi ~ Poisson(eiqi), qi = constant RR
• Can add fixed effectseg spatial trend or long-range variation over the study area: fit to area centroids with x1=easting, x2=northing:
qi = exp(b0+b1x1i+b2x2i) , ie qi = exp(Xib)
• Can add random effectseg, variation in individual susceptibility (frailty); variation due to unmodelled covariates (overdispersion); error in interpolation of spatial covariate to locations of case events or area centroids; spatial autocorrelation
add priors to parameters (eg qi)
Basic Bayesian Model
• Likelihood
yi ~ Poisson(eiqi)
• Prior
qi ~ Gamma(a,b)
• Posterior
p(qi|yi,a,b) ~ Gamma(yi+a, ei+b)
E(qi|yi,a,b) = (yi+a) / (ei+b)
(could map these expectations)
Multilevel Model
Model with area level random effect:
Extend to areas (ij) within regions (j):
yij ~ Poisson(mij)
log(mij) = log(eij) + b0 + nj + uijni ~ N(0,sn
2)
uij ~ N(0,su2)
yi ~ Poisson(mi)
mi = eiqi
qi = exp(b0+ni)
random effects
Spatial analysis using GeoBUGS
Spatial modelling in WinBUGS
Introduction to GeoBUGS
Example: spatial mapping
Example: environmental epidemiology
Example: forest ecology
Example: spatial regression
GeoBUGs Inputs
• Regions are numbered sequentially from 1 to n
• Each region is defined as a polygon in a map file
• Each region is associated with a unique index
• BUGs can import map files from Arcinfo, Epimap, SPLUS.
Displaying a Map with GeoBugs
• Compile model, load data, load initial conditions
• Set sample monitor on desired variables
• Set trace
• Set sample monitor on map variable OR set summary monitor on map variable
• Run chain
• Activate map tool, load appropriate map
• Set cut points, colour spectrum as desired.
Example: Spatial disease modellingRates of lip cancer in 56 counties in Scotland.
The data includes the observed and expected cases (expected numbers based on the population and its age and sex distribution in the county), a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and the "position'' of each county expressed as a list of adjacent counties.
Example: Spatial disease modelling
County Obs. Exp. % in SMR Adjacent
cases cases agric. counties
Oi Ei xi
______________________________________________________________
1 9 1.4 16 652.2 5,9,11,19
2 39 8.7 16 450.3 7,10
... ... ... ... ... ...
56 0 1.8 10 0.0 18,24,30,33,45,55
The CAR Model
Smooth the raw SMRs by fitting a random-effects Poisson model:
Oi ~ Poisson(mi)log mi = log Ei + a0 + a1xi + bi
a0 : intercept term, baseline (log) RR of disease across the study region; xi : "percentage of the population engaged in agriculture, fishing, or forestry" in district i, with associated regression coefficient a1; bi : random effect represents the effect of latent (unobserved) risk factors.
The CAR Model
To allow for spatial dependence between the random effects bi in nearby areas, assume a CAR prior:
bi|bj ~ N(mi, si2)
mi = weighted sum of neighbouring b’s / no. neighbours
si2 = s2 / no. neighbours
(set weights equal to 1)
Use the car.normal distribution to fit this model.
car.normal(adj[], weights[], num[], prec[])
model {# Likelihoodfor (i in 1 : N) {
O[i] ~ dpois(mu[i])log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * X[i]/10 + b[i]RR[i] <- exp(alpha0 + alpha1 * X[i] + b[i]) # Area-specific relative risk (for maps)
}# CAR prior distribution for random effects: b[1:N] ~ car.normal(adj[], weights[], num[], tau)for(k in 1:sumNumNeigh) {
weights[k] <- 1}# Other priors:alpha0 ~ dflat() alpha1 ~ dnorm(0.0, 1.0E-5)tau ~ dgamma(0.5, 0.0005) # prior on precisionsigma <- sqrt(1 / tau) # standard deviation
}
Example: Kriging in GeoBUGS
Diggle and Riberio (2000)
The data file contains the variables height, x and y, giving surface elevation at each of 52 locations (x, y) within a 310-foot square. The unit of distance is 50 feet; the unit of height is 10 feet.
Example: Kriging in GeoBUGS
A Gaussian kriging model can be fitted to these data using either the spatial.exp or spatial.disc distributions.
The data file also contains a set of locations x.pred and y.pred representing a 15 x 15 grid of points at which we wish to predict surface elevation. Predictions can be obtained using either the spatial.pred or spatial.unipred predictive distributions in WinBUGS 1.4
# Spatial lystructured multivariatenormall ikelihood
height[1:N]~ spatial.exp(mu[],x[], y[], tau, phi,kappa) # exponentialcorrelation
function
# height[1:N]~ spatial.disc(mu[],x[], y[], tau, alpha) # disc correlationfunction
for(i in 1:N) {
mu[i]<- beta
}
# Priors
beta ~ dflat()
tau ~ dgamma(0.001,0.001)
sigma2 <- 1/tau
# priors forspatial.expparameters
phi ~ dunif(0.05,20) # priorrangeforcorrelationat min distance (0.2 x 50 ft) is 0.02 to 0.99
# priorrangeforcorrelationat max distance (8.3 x 50 ft) is 0 to 0.66
kappa ~ dunif(0.05,1.95)
# priors forspatial.discparameter
# alpha~ dunif(0.25,48) # priorrangeforcorrelationat min distance (0.2 x 50 ft) is 0.07 to 0.96
# priorrangeforcorrelationat max distance (8.3 x 50 ft) is 0 to 0.63
# Spatialprediction
# Singlesite prediction
for(j in 1:M) {
height.pred[j]~ spatial.unipred(beta,x.pred[j],y.pred[j],height[])
}
# Onlyuse jointpredictionforsmall subset of points, due to lengthof time it takes to run
for(j in 1:10){ mu.pred[j]<- beta }
height.pred.multi [1:10]~ spatial.pred(mu.pred[],x.pred[1:10],y.pred[1:10],height[])
}
Example: Mapping cancer
Does “place”impact on
cancer survival?
Bayesian spatial modelling
Poisson likelihood
for number of
cases per SLA
yi ~ Poisson(li)
log(li) = Xib + ui + vi
+
ui ~ Spatial CAR Prior
vi ~ Normal(0,s2)
What did we learn?
Your turn
1. Open the GeoBUGS manual (in the Map menu).
2. Choose the Examples option and read the Scottish lip cancer example.
3. Run this example and plot your results on the map provided in GeoBUGS.
4. Choose the Lung Cancer example or Kriging example and run this. Write a short summary of your results.
8. Bayesian Networks
Bayesian Networks From science to management
The policy questions
What is the overall scientific consensus about the drivers of lyngbya?
What management actions should be taken to reduce lyngbya in Moreton Bay, Australia?
Scientific drivers of lyngbya: a Bayesian Network approach
• Bring together disparate scientific knowledge
• Create a ‘conceptual map’ of the scientific drivers
• Quantify the map with data, model outputs, expert knowledge, etc
• Identify key drivers
• Explore scenarios of change
• Understand impact of management and policy decisions
From concept to quantification
Target
F1 F2
F3 F4
F6
F5
F7
open
moderate cover
thick cover
0-10
10-20
20-30
true
false
high
medium
low
Constructing a BN - CPTs
Temperature
LowHigh
49.550.5
19.6 ± 9
Light Quantity
Optimal
SubOptimal
20.0
80.0
Light Quality
PoorBorderlineHigh
10.040.050.0
Wind direction
NorthSEOther
21.024.055.0
Wind Speed
LowHigh
59.940.1
Ground Water Amount
LowHigh
73.126.9
Rain - Present
LowMediumHigh
62.026.012.0
142 ± 190
Dissolved Fe Concentration
LowHigh
56.743.3
Dissolved P Concentration
LowHigh
62.137.9
199 ± 300
Dissolved N Concentration
LowHigh
49.650.4
Dissolved Organics
LowHigh
51.049.0Sediment Nutrient Climate
NonReducingReducing
58.441.6
Avail nutrient pool (dissolved)
EnoughNot enough
33.666.4
Land Run-off Load
LowHigh
51.648.4
Tide
SpringNeap
50.050.0
Bottom Current Climate
LowHigh
48.052.0
Turbidity
LowHigh
45.454.6
Light Climate
InadequateAdequate
71.328.7
20.7 ± 12
Point Sources
LowMediumHigh
26.330.143.7
No.of previous dry days
LowMediumHigh
10.050.040.0
75.6 ± 110
Air
LowHigh
57.442.6
Particulates (Nutr)
LowHigh
45.154.9
2.8 ± 3.3
INITIATION MODEL
Bloom Initiation
NoYes
76.423.6
Most influential factors
1. Available Nutrient Pool
2. Bottom Current Climate
3. Sediment Nutrients
4. Dissolved Iron
5. Dissolved Phosphorous
6. Light
7. Temperature
M
A
N
A
G
E
M
E
N
T
A
C
T
I
O
N
S
“What-if” scenarios
FactorChange in P(Bloom)
(%)
Available Nutrient Pool 77 (3% - 80%)
Bottom Current Climate 28 (15% - 43%)
Sediment Nutrient Climate 17 (21% - 38%)
Dissolved Fe 16 (21% - 37%)
Dissolved P 15 (23% - 38%)
Light Climate 14 (18% - 32% )
Temperature 14 (21% - 35%)
Dissolved N 13 (22% - 35%)
Rain – present 10 (25% - 35%)
Light Quantity 9 (21% - 30%)
From Science to Management
Management
Model
Science BN
Model
P(Bloom Initiation) Evaluation of factors, scenario assessment
Integration of information, adaptive updates
Other Applications of BNs
• Cheetah conservation in Southern Africa
• Airports
• Integrated asset management
• Resource management
• Recycled water and health
• Import risk
• Hospital infection
• PhD completion
Viability of wild cheetah population in Namibia
Namibia (Marker,2002)
Biological Factors Sub-network
Human Factors Sub-network
Ecological Factors Sub-network
The airport as a complex system
– Development of complex system modelling
framework for airport planning, design and
operations decision support
– Multiple stakeholders
– Multiple objectives
• Security
• Efficiency
• Passenger experience
– Uncertainty
Airport surveillance – BN based on customs documentation
BN subnetworks
BN quantification
Meeting or
exceeding target
Below target
Arrival Concourse Dwell Time 0.59 0.41
Secondary Examination Area Dwell Time 0.83 0.17
Entry Control Point Dwell Time 0.78 0.22
Baggage Hall Dwell Time 0.81 0.19
Overall Passenger Facilitation Time 0.77 0.23
Hospital infections
• Huge cost to human life
• Cost millions of dollars per annum
• Increasing in virulence and location
• Many partial mathematical and simulation models
• Complex network of interacting factors involved
Data-based Bayesian Network
Ward Outliers
Operating Theatre
Cancellations/
Deferrals
Overcrowding Screening MRO Isolates
Handwashing
ComplianceMRSA Isolates
Isolation Ward
OverflowMRO Prevalence
Staffing MRSA Prevalence
January/
FebruaryStaff/1000 OBD % Casual
Five most important factors influencing Pr(infection)
• Isolates
• Overcrowding
• Handwashing Compliance
• Isolation Ward Overflow
• MRSA Prevalence
Counter-intuitive results! New insights
Last example! What factors affect
successful on-time completion of PhDs?
• Bair & Hanworth (2005, meta-analysis of 160 references): funding,
socialization, positive and supportive mentor relationships.
• Maher, Ford and Thompson (2004): availability of funding resources, the nature
of the advising relationship, the extent to which students receive research
preparation and opportunities, student concerns about marital, family or health
problems.
• Seagram, Gould and Pyke (1998): gender, discipline, supportive relationship,
financial situation and enrolment status.
• Council of Graduate Schools Ph.D. Completion Program (2009, conducted in
the USA and Canada): selection, mentoring, financial support, program
environment, research mode of the field, and processes and procedures
Other contributing factors
• Muszynski and Akamatsu (1991, A Procrastination Inventory): demographic and situational variables, including a supportive advisor, finding a topic of interest, making the dissertation a top priority and living close to the university
• Kearnes, Gardiner and Marshall (2009): psychological factors, eg self-sabotaging behaviour due to over-committing, procrastination and perfectionism.
• cohort partnerships and groups and peer to peer support (Devenish, Dyer and Jefferson, 2009), race (Ellis, 2001), type of attendance (Rodwell, 2008), gender (Maher, Ford and Thompson, 2004)
PhD completion is a
complex process
Bayesian Network approach – statistics students:
• A1 – former domestic PhD students
• A2 – current domestic PhD students
• A3 – current international PhD students
• B1 – current PhD supervisors
Three questions
1. What is the overall perceived probability of timely completion of a PhD in Statistics at QUT?
2. What factors were most influential in timely completion, and how do these differ between the four groups?
3. What is the change in the probability of timely completion under specified scenarios?
Factors
Time Management Skills Discipline Expertise Maths skills
Writing skills English skills Incoming Skills
Domestic Cirumstance Emotional State Continuity of Study
Personal Circumstance Financial Circumstance Attitude
Personal Aspects Other PhD Students Researchers
Peers Enrolment Study Location
Research Environment Library Access Physical
General Research Experience Computer Access Resources Available
Interest Written Research Type Expertise in Topic
Student History Access Supervision Experience
Student-Supervisor History Supervisor Research Niche
Previous Experience Topic Research Project
Results
Most important factors:
Across all participants, the four factors that were considered to most directly influence timely completion were: personal aspects, the research environment, the research project, incoming skills.
Overall probability of on-time completion:
Network Probability of Completion in 3.5 Years
B1 70%
A1 68%
A3 72%
A2 79%
Project 2: International students
ALTC project (Prasad Yarlagadda, Karen Woodman) A holistic model for research supervision of international students in engineering and information technology disciplines
• Over 12% of international and NESB postgraduate research students enrol in engineering and IT in Australian universities. They face technical and scientific challenges, cultural, social and religious isolation and linguistic barriers. Existing supervisory frameworks are not fully assisting these students and supervisors.
BN: Student survey
Results: Student surveyOverall probabilities
• Overall student perception of supervisor: 0.79
• Overall Supervisor attributes: 0.83
• Overall Student obligations: 0.91.
Most influential variables
• Supervisor attributes and Student obligations influencing the Overall perception of supervisor
• Demographics and Course preparation influencing the Personal profile
• Qualifications affecting Course preparation
• Age and Sex affecting Personal Demographics
BN: supervisor survey
Results: supervisor survey
• The overall probability of a High score on the target node, Supervisor’s Perception of a Successful Student, was 0.46.
• The overall probability of a High score for the node Successful Student, General was 0.58
• The overall probability of a High score for the node Successful Student, CALD was 0.48
• The overall probability of a High score for the node Supervisor’s Attitudes was 0.33
• The most influential variables were the factors leading to Supervisor’s Perception of Successful CALD Students, Supervisor’s Experience with CALD Students, and Supervisor’s International Experience.
So...
Almost 30 years ago, Abedi and Benkin (1984) described research into reasons contributing to timely completion of degrees as “charitably sparse” (p.4). Twenty years later, Maher, Ford and Thompson (2004) argued that empirical research in this field could still be described as such. There has been considerable literature on the topic in the intervening years, and we hoped that our study would contribute to our growing understanding of timely completion as a complex system.
So: why use Bayesian Networks to model complex systems?
1. Flexible way of conceptualising and quantifying complex systems
2. Able to assess relative impact of factors and evaluate scenarios (‘what-if’)
3. Can incorporate diverse sources of information
4. Can be updated as new information is gathered
5. Transparency and consistency
Your turn
• Install and load the Bayesian Network software package GeNie
• Use GeNie to create a BN of your own choice.
Conclusion:advantages of Bayesian Models
• Give probabilistic inferences about unknown variables based directly on their (posterior) distributions
• Allow formal combination of diverse sources of information through priors
• Address Uncertainty and Complexity
• Modular/hierarchical model structure facilitates description of complex systems
• Can facilitate iterative updating of opinion based on new information
Conclusion: caveats
• Bayesian modelling is not a panacea for bad data: ‘garbage in, garbage out’ still holds.
• Bayesian models are not built overnight. They require care with planning, inputs, sensitivity, outputs.
• Bayesian models do depend on prior information: this is good and bad. (Be careful that they are not simply self-fulfilling or a replacement for quality data.)
• Understand what the models do and can do!
Course Feedback
Please comment and provide a score on a scale of 1 (poor) – 5 (great) on the following:
1. Course administration / organisation
2. Course presentation
3. Course materials
4. Course content
5. Course pitch – too high? too low?
Please comment on the following general questions:
1. What was your overall opinion of the course?
2. How might the course have been improved?
3. Has the course motivated you to use Bayesian statistics in practice?