C ollaborative Centre for Data Analysis, Modelling and … · 2017-11-22 · “Probability “...

Intermediate Bayes

Kerrie MengersenQUT Brisbane

ACEMS

2016

Collaborative Centre for Data Analysis,

Modelling and Computation

QUT, GPO Box 2434

Brisbane 4001, Australia

Course Outline

1. Foundations

2. Linear and hierarchical modelling

3. Computational methods

4. Software

5. Case Study

6. Latent variable models

7. Spatial models

8. Bayesian networks

Acknowledgement to Dr Clair Alston, Griffith University, Australia, for some of these course notes.

1. Foundations

•

A Bayesian and a Frequentist were to be executed.

The judge asked them what were their last wishes.

The Bayesian replied that he would like to give the

Frequentist one more lecture. The judge granted the

Bayesian's wish and then turned to the Frequentist

for his last wish. The Frequentist quickly responded

that he wished to hear the lecture again and again

and again and again........ (Xiao-Li Meng)

p(q|y) = p(y|q) p(q) / p(y)

1763 1812 1838 1930’s 1950’s 1980’s 1990’s 2000’s

Bayes Laplace Boole Fisher Jeffreys Geman Gelfand Today’s

Venn Neyman Geman Smith Bayesians

“Probability “ Inverse “BayesianTheory” Probability” Analysis”

//upload.wikimedia.org/wikipedia/commons/d/d4/Thomas_Bayes.gif

//upload.wikimedia.org/wikipedia/commons/d/d4/Thomas_Bayes.gif

http://en.wikipedia.org/wiki/File:Pierre-Simon_Laplace.jpg

http://en.wikipedia.org/wiki/File:Pierre-Simon_Laplace.jpg

http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg

http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg

http://www.google.com.au/imgres?imgurl=http://sciencematters.berkeley.edu/archives/volume2/issue12/images/legacy1.jpg&imgrefurl=http://sciencematters.berkeley.edu/archives/volume2/issue12/legacy.php&h=329&w=240&sz=13&tbnid=8KWIP55H5PIDLM:&tbnh=119&tbnw=87&prev=/search?q%3Djerzy%2Bneyman%26tbm%3Disch%26tbo%3Du&zoom=1&q=jerzy+neyman&usg=__cZbm9YVIs4FkMwQnExp6rPj_T_A=&sa=X&ei=wl6lTveyJqW0iQexupHGBg&ved=0CB8Q9QEwBg

http://www.google.com.au/imgres?imgurl=http://sciencematters.berkeley.edu/archives/volume2/issue12/images/legacy1.jpg&imgrefurl=http://sciencematters.berkeley.edu/archives/volume2/issue12/legacy.php&h=329&w=240&sz=13&tbnid=8KWIP55H5PIDLM:&tbnh=119&tbnw=87&prev=/search?q%3Djerzy%2Bneyman%26tbm%3Disch%26tbo%3Du&zoom=1&q=jerzy+neyman&usg=__cZbm9YVIs4FkMwQnExp6rPj_T_A=&sa=X&ei=wl6lTveyJqW0iQexupHGBg&ved=0CB8Q9QEwBg

http://en.wikipedia.org/wiki/File:Harold_Jeffreys,_Sir.jpg

http://en.wikipedia.org/wiki/File:Harold_Jeffreys,_Sir.jpg

Bayes’ Theorem

p(A|B) = p(B|A) p(A) / p(B)

Think of: A=q (unknown parameters, etc)B=y (known/observed ‘data’)

So: p(q|y) = p(y|q) p(q) / p(y)

The Reverend Thomas Bayes (1701-61) studied how to compute a distribution for the probability parameter of a binomial distribution (in modern terminology).

Bayesian Modelling

Frequentist approach to modelling

We have some data y, and want to know about q given y

q can be unknown parameters, missing data, latent variables, etc. Eg 1: sample y “successes” from n trials. What is Pr(success), q?

Eg 2: sample y from N(q,1). What is population mean q?

Frequentist: estimate q through the likelihood: p(y|q)

How likely is y for specified values of q?

Eg: prob. of observing y if y~Bin(n,q=0.3) or y~N(q=1,1)

Solved using moment estimators or maximum likelihood.

But we really want to know about p(q|y)

Bayesian approach to modelling

Example: Estimating a proportion

• From an ecologist: I want to know where koalas might be present. I surveyed 29 sites and 22 have koalas. What is the probability that a koala will be present at a different site in the same area, given this information?

• From a clinician: I want to know about the safety of a medical procedure. I treated 29 patients and 22 survived. What is the probability of survival, given this information?

• What is unobserved?q = probability of success (presence of koalas, survival)

Likelihood

DAG: Binomial model

Modely ~ Binomial (q, n)q ~ Beta (a,b)

y

n

q

ba

Posterior

Your turn

Binomial example with 22 successes out of 29 trials:

Consider the following priors for q:

Beta(1,1) Beta(9,1) Beta(100,100)

Choose one of these priors:

1. What is the prior mean for q?

2. What is the posterior distribution for q?

3. What is the posterior mean for q?

4. What general conclusions can you make

about the influence of priors and sample size?

Answers

Sample proportion = 22/29 = 0.76

Beta(1,1):

Prior mean = 1/(1+1) = 0.5

Posterior mean = (22+1)/(22+1+7+1) = (22+1)/(29+1+1) = 0.74

Beta(9,1):

Prior mean = 9/(9+1) = 0.90

Posterior mean = (22+9)/(22+9+7+1) = 0.79

Beta(100,100):

Prior mean = (100)/(100+100) = 0.5

Posterior mean = (22+100)/(22+100+7+100) = 0.53

Your turn

Simulate and plot the density for the likelihood and these sets of

priors and posteriors.

Sample code

# calculate the likelihood

# p(y=22|theta) = Bin(n=29, theta)

y=22

n=29

theta=c(0,0.001,seq(0.01,0.99,0.01),0.999,1)

lik=dbinom(x=y,size=n,prob=theta)

lik=lik/sum(lik)

plot(theta,lik,type="l",ylab="prob",ylim=c(0,0.2))

# calculate the prior

# p(theta) = Beta(1,1)

# change to Beta(9,1), Beta(100,100) later

a1=1; b1=1

prior1=dbeta(theta,a1,b1)

prior1=prior1/sum(prior1)

lines(theta,prior1,col=2,lty=2)

# calculate the corresponding posterior

post1=dbeta(theta,y+a1,n-y+b1)

post1=post1/sum(post1)

lines(theta,post1,col=2,lty=1)

Influences on posterior

• The posterior mean is a compromise between the prior mean and the data.

• The stronger the prior, the more weight the prior has in the posterior.

• The larger the sample size, the more weight the likelihood has in the posterior.

Conjugate priors

• It might be reasonable to expect the posterior distribution to be of the same form as the prior distribution. This is the principle of conjugacy.

• A conjugate prior for a Binomial likelihood is a Beta distribution: the posterior is then also a Beta distribution.

Conjugate priors

Dynamic UpdatingIf we obtain more data, we do not have to redo all of the analysis: our posterior from the first analysis simply becomes our prior for this next analysis.

Binomial example:

Stage 0. Prior p(q) ~ Beta(1,1); ie E(q)=0.5.

Stage 1. Observe y=22 presences from 29 sites.

Likelihood: p(y|q)~Bin(n=29, q)

Posterior: p(q|y)~Beta(23,8); ie E(q|y) = 0.74

Stage 2: Observe 5 more presences from 10 sites.

Likelihood: p(y|q)~Bin(n=10, q);

Prior p(q)~Beta(23,8);

Posterior p(q|y)~Beta(28,13); ie E(q|y) = 0.68.

Your turn

Confirm that the dynamic updating method described in the previous slide gives the same outcome as analysing all of the data together.

Data:

Prior:

Posterior:

Posterior mean:

Example: Estimating a normal mean

n observations Y = (y1,..,yn) from a normal distribution,

unknown mean m, known variance s2

Normal Model

Normal model, unknown meanunknown variance

s2 ~ Inverse Gamma(a,b)

s ~ Uniform(a,b)

s ~ Half Cauchy(a,b)

What do these ‘look like’?

Linear regression

Linear regression: priors

Linear regression: Posterior

Model Comparison

• Bayes factors, posterior odds, BIC, DIC

• Reversible jump MCMC, Birth and death MCMC

• Model averaging

Bayes factors

• Consider models M1 and M2 (not necessarily nested)

• Choose a model based on its posterior probability given the data. This is proportional to the prior probability of the model multiplied by the likelihood of the model given the data. So we consider:

p(M2|y) p(M2) p(y|M2)

Bayes factors

To compare M2 versus M1:

p(M2|y) / P(M1|y) = {p(M2) / p(M1)} {p(y|M2) / P(y|M1)}

• The second term (the ratio of marginal likelihoods) is termed the Bayes factor B21. This is similar to a likelihood ratio, but p(y|M) is integrated over the parameters instead of maximised: eg, p(y|M1) = p(y|M1,q1) p(q1) dq1

• 2log(B21) gives same scale as usual deviance and LR statistics.

Guidelines for Bayes Factors(arbitrary!)

B21 2log(B21) Interpretation

<1 Negative Supports M1

1 to 3 0 to 2 Weak support for M2

3-20 2-6 Supports M2

20-150 6-10 Strong evidence for M2

>150 >10 Very strong supportfor M2

Bayesian Information Criterion (BIC)

• Approximate the Bayes factor

• Under some assumptions, if p is the dimension of the model and n is the no. observations:

BIC = log P(y|q*,M) – p/2 log n

We can rewrite as

BIC = n log(1-R2) + k log(n)

Discussion of BIC

• BIC penalises models which improve fit at the expense of more parameters (encourages parsimony).

• A problem is that the true dimensionality (number of parameters p) of the model is often not known, and also that the number of parameters may increase with sample size n.

• Can approximate using the effective number of parameters (Speigelhalter et al, 1999).

• Alternatives are DIC (deviance information criterion, calculated in WinBUGS), conditional posterior predictive probabilities, etc.

Markov chain Monte Carlo

• “Decompose” joint posterior distribution into a sequence of conditional distributions – these are often much simpler (eg, simple univariate normals, etc)

• Simulate from each conditional distribution in turn. We use a simulation method that resembles a Markov chain (so that the new simulated value relies only on the previous value), giving a set of simulated values

q (1), q (2), …, q (i) , ... which converges to the required conditional, The resulting simulations will come from the required joint distribution

• We can use Markov chain theory to make statements about behaviour and convergence of the chain

Computational Algorithms

• Gibbs sampling: sample from the conditionals themselves

• Metropolis-Hastings: sample from an “easy” distribution and accept those values that conform to the conditional distribution

• Lots of variations: reversible jump, slice sampling, particle filters, perfect sampling, adaptive rejection sampling, etc

• Need to ensure conditions, eg detailed balance, reversibility

• Approximations: Variational Bayes (VB), Approximate Bayesian Computation (ABC), Sequential Monte Carlo (SMC)

Gibbs sampling

Suppose we have a joint posterior p(q1, q2 | y,… )

0. Choose starting values q1(0), q2

(0)

1. At ith iteration (i)

Sample q1(i) from p(q1 | q2

(i-1),y,...)

Sample q2(i) from p(q2 | q1

(i),y,…)

2. Repeat step 1 many times

3. Make inferences based on simulated values

Exercise

Data yi|l ~ Poisson(l), i=1,…,m

yi|f ~ Poisson(f), i=m+1,…,n

Priors l ~ Gamma(a,b)

f ~ Gamma(c,d)

m is discrete over {1,…, n}

a, b, c, d known constants.

1. What is the posterior distribution of interest?

2. What are the conditional distributions of interest?

3. Design and implement a Gibbs sampler for this problem.

(Alston et al. 2013, Case Studies in Bayesian Statistical Modelling and Analysis, p. 19)

2. Linear and Hierarchical Modelling

Exercise

Video: Bayesian Methods Interpret Data Better

Monday, November 12, 2012

(Based on the book “Doing Bayesian Data Analysis”, by John Kruske)

http://doingbayesiandataanalysis.blogspot.com.au/2012/11/video-bayesian-methods-interpret-data.html

Goal: estimate the underlying probability of getting a hit by each player, based on their hits H, at bats AB, and primary position.

Linear regression: US crime data

Likelihood

Gibbs sampler

Weakly informative prior

Posterior estimates

Non-informative Priors

Posterior estimates

Posterior distributions

Model checks

Non-informative Priors

Model comparison

Prior predictive density

Model comparison

Linear mixed models

Bayesian analysis

Priors

Gibbs sampler

Computational caution

Frequentist analysis using lm() in R

Bayesian analysis using MCMCglmm() in R

MCMC output

Posterior distributions for fixed effects

Difference in maternal heights?

Posterior distributions for random child effects

Hierarchical normal model

Posterior distribution

ExampleMaths scores from students at 100 schools

Results

Spiralling WhiteflyAleurodicus dispersus

Countries where spiralling whitefly has been detected.Administrative regions within some countries are shown whendocumented. Source (CABI 2004, Monteiro et al. 2005, CABI2006). Personal communications (J.H. Martin, 2008, B.M.Waterhouse, 2008)

The Problem

Major tropical plant pest

Lives on 100 hosts +

Restricts market access to other states

Information

Literature: Characteristics,growth, spread

Detectability (inspectors)

Surveillance data (> 30 000 records)

Scope of modelling

Local, district and statewide

• Data Model: Pr(data | incursion process and data parameters) – How data is observed given underlying pest extent

• Process Model: Pr(incursion process | process parameters) – Potential extent given epidemiology / ecology

• Parameter Model: Pr(data and process parameters)– Prior distribution to describe uncertainty in detectability, exposure, growth …

• The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by:

Pr(process, parameters | data)

Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)

Hierarchical Bayesian model

Early Warning Surveillance

Priors

Surveillance data

Posterior learning

modest reduction in

area freedom

large reduction in

estimated extent

residual “risk” maps to

target surveillance

Invasion Parameter Estimates

Useful for local management

Observation parameter estimates

Also learn about:

• Host suitability

• Inspector efficiency

3. Computational Methods

Computational Methods

• MCMC algorithms• Gibbs Sampling• Metropolis Hastings• Slice Sampling

• Approximations• INLA• Variational Bayes• ABC

Metropolis sampling

Often we can’t simulate from the distribution p. Instead, simulate from an “easy” (proposal) distribution q and accept the valuesbased on the correct distribution p.

• Conditional distribution p(q|...)• Proposal distribution q(q) symmetric around q

eg U(q-1,q+1) or N(q,1)• Suppose we have q(i-1) and we want q(i)

• Simulate possible q(i) (q* say) from the proposal dist. q• Accept q(i)=q* with probability:

a = min { 1, p(q*|... ) / p(q(i-1)|...) }Otherwise let q(i) = q(i-1)

Hastings sampler

If the proposal q(q) is not symmetric, the acceptance probability becomes:

Accept q* with probability:

...)|(

...)|(

)(

)(,1min(

)1(

*

*

)1(

i

i

p

p

q

q

q

q

q

qa

Acceptance Rate

For a Metropolis-Hastings algorithm:

• High acceptance rates are desirable if the proposal

density approximates the target closely enough to

ensure uniform ergodicity.

• Low acceptance rates are preferable if a random

walk is adopted (eg, 40-70% acceptance, or less

for higher dimension problems).

• Start with a plausible guess q(0) for q.

• At stage t: suppose the current guess is q(t).

E-step: compute the expectation of the log-likelihood function Q = E log p(q|y)at qq(t)

M-step: find the value q(t+1) that maximises Q

Expectation-Maximisation (EM) Algorithm

Slice Sampling

Sample from a distribution by sampling uniformly from

the region under the plot of its density function.

We alternately sample uniformly in the vertical direction

given the horizontal position, then sample a new horizontal

position from the horizontal ‘slice’ at the new vertical

position.

Radford Neal (Annals of Statistics, 31, 705-767)

Why slice sampling?

• Slice sampling methods are more efficient than Gibbs, easily

implemented for univariate distributions, and can be used to sample

from a multivariate distribution by updating each variable in turn.

• Slice sampling has the ability to adaptively choose the magnitude

of changes made. It is therefore attractive for routine and automated

use.

• Methods that update all variables simultaneously are also possible.

Hybrid Methods

Employ combinations of MCMC algorithms in a single analysis

• different MCMC algorithms for different parameters

• insert a MH step with larger dispersion or probability of

acceptance at every nth iteration

• mode jumping proposals

• methods based on tempering

• methods based on regeneration

Can be almost automatically constructed to ensure uniform

convergence to the target distribution

Delayed rejection

Tierney and Mira (1999); Green and Mira

A. Propose move for qk.

B. Accept with usual M-H probability.

C. If reject, propose new move for qk and accept with probability that takes into account the fact that the first move is rejected.

D. If reject, repeat C as required or until stopping rule.

Sequential Methods

Particle filters are used in sequential settings or for processing and

analysis of large datasets. A particle filter describes a dynamic

state space model of a process with an underlying state of interest

that is evolving over time. The posterior distribution of the state is

approximated by a set of weighted particles inversely proportional

to its probability mass.

Numerous MCMC algorithms have been proposed, with good

convergence properties.

Your turn

• Suppose that you want to estimate the mean q of a Normal distribution with known variance s2=5.

• The prior distribution on q is N(1,2).

• You observe data y=(1,3,2,0,2)

• Design a Metropolis algorithm to estimate the posterior mean.

Approximations

• INLAhttp://www.math.ntnu.no/inla/r-inla.org/papers/jss/lindgren.pdf

Approximations

Variational Bayes

https://en.wikipedia.org/wiki/Variational_Bayesian_methods

Approximations

• ABCDifficult or impossible (intractable) likelihood but you can simulate from it? No problem…

• Assume we have some observed data Y

• Simulate a value of q from a prior distribution• Simulate “pseudo-data” Y* from p(Y|q)• Accept q if Y* is “sufficiently close” to observed data Y

https://www.youtube.com/watch?v=nKCT-Cdk0xY

4. Software

mcsm

Bayesian Software

BUGS: ‘Bayesian analysis Using Gibbs Sampling’• WinBUGS: Windows version of BUGS

http://www.mrc-bsu.cam.ac.uk/software/bugs/

• OpenBUGS: open-source BUGS

http://www.openbugs.net

Can use WinBUGS or OpenBUGS directly or call from R, Matlab, etc

Bayesian Software

Interfacing R and WinBUGS

• R2Winbugs

• Brugs

Bayesian analysis via R

Many packages are now available: see http://cran.r-project.org/web/views/Bayesian.html

Example: bayesmrhierLinearModel: Gibbs Sampler for Hierarchical Linear Modelrunireg: Gibbs Sampler for Univariate Linear Model

Example: MCMCpack

MCMCregress: MCMC for regression

MCMClogit: MCMC for logistic regression

Example: mcsm

gibbsmix: mixture modelling

INLA:

http://www.r-inla.org/

Bayesian Software

Other stand-alone software: • JAGS

http://mcmc-jags.sourceforge.net/

• STAN

can interface with R, Matlab, etc

http://mc-stan.org/

• First Bayes, etc

Bayesian Software

Routines in other software such as SAS, Stata

Bespoke software using R, Fortran, C, Python, etc

In-class introductions

• WinBUGS

• R to WinBUGS

• R packages

• STAN

Your turn

Using the software of your choice, analyse the student maths scores data.

Final exercise:

How many testimonials of a miracle would I need to have sufficient evidence to claim “It’s a miracle”?

5. Case Study

Case Study

Notes will be distributed in class

6. Latent Variable Models

Mixture Models

The observed valuesare observations froma mixture of distributions

Eg, for mixture of K=3 Normals: q = (m,s)y ~ p1 N(m1 ,s1

2) + p2 N(m2 ,s22) + p3 N(m3 ,s3

2)

Eg, phenotypes from

3 genotypes: qq, qQ, QQ

Bayesian normal mixture model

y ~ Sj=1:k pj N( mj , sj2 )

m ~ Normal

s ~ Uniform

p ~ Dirichlet(a1,.., aK)

The p’s are the ‘weights’ assigned to each component. A conjugate

prior for the p’s is a Dirichlet distribution (an extension of the Beta).

f(p;a) P pjaj-1 ; setting a=1 for all j gives the Uniform.

This P is a ‘product’ sign, ie multiply over j=1,..,K

The a’s are usually set

and read in as part of the Data

Latent variable approach

• Associate with each yi another variable Ti that identifies the component of the mixture to which that yi belongs. (Note that we don’t observe the T’s.)

• We can then ‘break down’ the likelihood:

yi | Ti = T ~ N(y|mTi,sTi2)

• A typical prior for T is the multinomial or categorical distributionTi ~ Cat(p1,…,pK)

now just a univariate problem

Gibbs sampling for mixtures

0. Initialisation: Choose p(0) and q(0) arbitrarily

For t=1,…

1.1 Allocate observations to components:

Generate T(t) for each observation

1.2 Generate new weights for the components:

Generate p(t)

1.3 Generate new parameters for each component:

Generate q(t)

Deep brain stimulation for

Parkinson’s Disease

Placement of electrodes

in the subthalamic nucleus.

Electrical current improves

symptoms, in particular

motor function.

Example: Parkinson’s Disease

With Zoe van Havre, Nicole White,

Judith Rousseau

Microelectrode recordings

Identify spikes

and assign to

unknown no.

source neurons

Compare clusters

between segments

within a recording

and between

recordings at

different locations

of the brain3 depths

Extracted waveforms

Waveforms were analysed at 3 depths (Y1, Y2, Y3).

Each recording was divided into 2.5sec segments.

Discriminating features were found via PCA.

Finite mixture models (FMM)

For a FMM with K components, with data y1:n = {y1,…,yn}

yi |p,q ~ Sk=1:K pk f(yi|qk)

Prior on p

p ~ D(a1,…,aK)

Hidden Markov model (HMM)

Prior on the rows of the transition matrix Q:

p(Q(.,k)) ~ D(a1,..,aK), aK > 0, j ≤ K

[ Yt | Xt = j ] , Xt ϵ X = {1,…,K*}

Back to Parkinson’s Disease

Extracted waveforms

Two Mixture Models

• Multivariate Normal

For P PCs, yi=(yi1,..,yiP) follows a MVN distribution

p(y|q,p) = Pi=1:n Sk=1:K* pk Nr(mk,Sk)

• Dirichlet Process Mixture

yi | qi ~ qi

qi | G ~ GG ~ DP(mG0)

m ~ Gamma(1,1)

ResultsHow many clusters?

ResultsWhat do the groups look like?

https://arxiv.org/pdf/1602.01915.pdf

Changes in waveforms over time

https://arxiv.org/pdf/1602.01915.pdf

Mixture modelling in R

• mcsm

• bayesm

• LaplacesDemon (Byron Hall)

• bayesmix (uses JAGS)

• Bmix – for stickbreaking mixtures

• rjags – interface to JAGS MCMC, has module for mixtures

7. Spatial Models

Spatial Models

• Disease mapping• Estimate the true relative risk of a disease of interest

across a geographical study area

• Disease clustering• Assess whether a disease map is clustered and where

the clusters are located• Disease incidence around a putative hazard source

(focused clustering)

• Ecological analysis• Analysis of geographical distribution of disease in

relation to explanatory covariates, usually at an aggregated spatial level

Basic Model: assumptions

• The distribution of observed counts of disease within an area:

yi ~ Poisson(eiqi), qi = constant RR

• Can add fixed effectseg spatial trend or long-range variation over the study area: fit to area centroids with x1=easting, x2=northing:

qi = exp(b0+b1x1i+b2x2i) , ie qi = exp(Xib)

• Can add random effectseg, variation in individual susceptibility (frailty); variation due to unmodelled covariates (overdispersion); error in interpolation of spatial covariate to locations of case events or area centroids; spatial autocorrelation

add priors to parameters (eg qi)

Basic Bayesian Model

• Likelihood

yi ~ Poisson(eiqi)

• Prior

qi ~ Gamma(a,b)

• Posterior

p(qi|yi,a,b) ~ Gamma(yi+a, ei+b)

E(qi|yi,a,b) = (yi+a) / (ei+b)

(could map these expectations)

Multilevel Model

Model with area level random effect:

Extend to areas (ij) within regions (j):

yij ~ Poisson(mij)

log(mij) = log(eij) + b0 + nj + uijni ~ N(0,sn

2)

uij ~ N(0,su2)

yi ~ Poisson(mi)

mi = eiqi

qi = exp(b0+ni)

random effects

Spatial analysis using GeoBUGS

Spatial modelling in WinBUGS

Introduction to GeoBUGS

Example: spatial mapping

Example: environmental epidemiology

Example: forest ecology

Example: spatial regression

GeoBUGs Inputs

• Regions are numbered sequentially from 1 to n

• Each region is defined as a polygon in a map file

• Each region is associated with a unique index

• BUGs can import map files from Arcinfo, Epimap, SPLUS.

Displaying a Map with GeoBugs

• Compile model, load data, load initial conditions

• Set sample monitor on desired variables

• Set trace

• Set sample monitor on map variable OR set summary monitor on map variable

• Run chain

• Activate map tool, load appropriate map

• Set cut points, colour spectrum as desired.

Example: Spatial disease modellingRates of lip cancer in 56 counties in Scotland.

The data includes the observed and expected cases (expected numbers based on the population and its age and sex distribution in the county), a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and the "position'' of each county expressed as a list of adjacent counties.

Example: Spatial disease modelling

County Obs. Exp. % in SMR Adjacent

cases cases agric. counties

Oi Ei xi

______________________________________________________________

1 9 1.4 16 652.2 5,9,11,19

2 39 8.7 16 450.3 7,10

... ... ... ... ... ...

56 0 1.8 10 0.0 18,24,30,33,45,55

The CAR Model

Smooth the raw SMRs by fitting a random-effects Poisson model:

Oi ~ Poisson(mi)log mi = log Ei + a0 + a1xi + bi

a0 : intercept term, baseline (log) RR of disease across the study region; xi : "percentage of the population engaged in agriculture, fishing, or forestry" in district i, with associated regression coefficient a1; bi : random effect represents the effect of latent (unobserved) risk factors.

The CAR Model

To allow for spatial dependence between the random effects bi in nearby areas, assume a CAR prior:

bi|bj ~ N(mi, si2)

mi = weighted sum of neighbouring b’s / no. neighbours

si2 = s2 / no. neighbours

(set weights equal to 1)

Use the car.normal distribution to fit this model.

car.normal(adj[], weights[], num[], prec[])

model {# Likelihoodfor (i in 1 : N) {

O[i] ~ dpois(mu[i])log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * X[i]/10 + b[i]RR[i] <- exp(alpha0 + alpha1 * X[i] + b[i]) # Area-specific relative risk (for maps)

}# CAR prior distribution for random effects: b[1:N] ~ car.normal(adj[], weights[], num[], tau)for(k in 1:sumNumNeigh) {

weights[k] <- 1}# Other priors:alpha0 ~ dflat() alpha1 ~ dnorm(0.0, 1.0E-5)tau ~ dgamma(0.5, 0.0005) # prior on precisionsigma <- sqrt(1 / tau) # standard deviation

}

Example: Kriging in GeoBUGS

Diggle and Riberio (2000)

The data file contains the variables height, x and y, giving surface elevation at each of 52 locations (x, y) within a 310-foot square. The unit of distance is 50 feet; the unit of height is 10 feet.

Example: Kriging in GeoBUGS

A Gaussian kriging model can be fitted to these data using either the spatial.exp or spatial.disc distributions.

The data file also contains a set of locations x.pred and y.pred representing a 15 x 15 grid of points at which we wish to predict surface elevation. Predictions can be obtained using either the spatial.pred or spatial.unipred predictive distributions in WinBUGS 1.4

# Spatial lystructured multivariatenormall ikelihood

height[1:N]~ spatial.exp(mu[],x[], y[], tau, phi,kappa) # exponentialcorrelation

function

# height[1:N]~ spatial.disc(mu[],x[], y[], tau, alpha) # disc correlationfunction

for(i in 1:N) {

mu[i]<- beta

}

# Priors

beta ~ dflat()

tau ~ dgamma(0.001,0.001)

sigma2 <- 1/tau

# priors forspatial.expparameters

phi ~ dunif(0.05,20) # priorrangeforcorrelationat min distance (0.2 x 50 ft) is 0.02 to 0.99

# priorrangeforcorrelationat max distance (8.3 x 50 ft) is 0 to 0.66

kappa ~ dunif(0.05,1.95)

# priors forspatial.discparameter

# alpha~ dunif(0.25,48) # priorrangeforcorrelationat min distance (0.2 x 50 ft) is 0.07 to 0.96

# priorrangeforcorrelationat max distance (8.3 x 50 ft) is 0 to 0.63

# Spatialprediction

# Singlesite prediction

for(j in 1:M) {

height.pred[j]~ spatial.unipred(beta,x.pred[j],y.pred[j],height[])

}

# Onlyuse jointpredictionforsmall subset of points, due to lengthof time it takes to run

for(j in 1:10){ mu.pred[j]<- beta }

height.pred.multi [1:10]~ spatial.pred(mu.pred[],x.pred[1:10],y.pred[1:10],height[])

}

Example: Mapping cancer

Does “place”impact on

cancer survival?

Bayesian spatial modelling

Poisson likelihood

for number of

cases per SLA

yi ~ Poisson(li)

log(li) = Xib + ui + vi

+

ui ~ Spatial CAR Prior

vi ~ Normal(0,s2)

What did we learn?

Your turn

1. Open the GeoBUGS manual (in the Map menu).

2. Choose the Examples option and read the Scottish lip cancer example.

3. Run this example and plot your results on the map provided in GeoBUGS.

4. Choose the Lung Cancer example or Kriging example and run this. Write a short summary of your results.

8. Bayesian Networks

Bayesian Networks From science to management

The policy questions

What is the overall scientific consensus about the drivers of lyngbya?

What management actions should be taken to reduce lyngbya in Moreton Bay, Australia?

Scientific drivers of lyngbya: a Bayesian Network approach

• Bring together disparate scientific knowledge

• Create a ‘conceptual map’ of the scientific drivers

• Quantify the map with data, model outputs, expert knowledge, etc

• Identify key drivers

• Explore scenarios of change

• Understand impact of management and policy decisions

From concept to quantification

Target

F1 F2

F3 F4

F6

F5

F7

open

moderate cover

thick cover

0-10

10-20

20-30

true

false

high

medium

low

Constructing a BN - CPTs

Temperature

LowHigh

49.550.5

19.6 ± 9

Light Quantity

Optimal

SubOptimal

20.0

80.0

Light Quality

PoorBorderlineHigh

10.040.050.0

Wind direction

NorthSEOther

21.024.055.0

Wind Speed

LowHigh

59.940.1

Ground Water Amount

LowHigh

73.126.9

Rain - Present

LowMediumHigh

62.026.012.0

142 ± 190

Dissolved Fe Concentration

LowHigh

56.743.3

Dissolved P Concentration

LowHigh

62.137.9

199 ± 300

Dissolved N Concentration

LowHigh

49.650.4

Dissolved Organics

LowHigh

51.049.0Sediment Nutrient Climate

NonReducingReducing

58.441.6

Avail nutrient pool (dissolved)

EnoughNot enough

33.666.4

Land Run-off Load

LowHigh

51.648.4

Tide

SpringNeap

50.050.0

Bottom Current Climate

LowHigh

48.052.0

Turbidity

LowHigh

45.454.6

Light Climate

InadequateAdequate

71.328.7

20.7 ± 12

Point Sources

LowMediumHigh

26.330.143.7

No.of previous dry days

LowMediumHigh

10.050.040.0

75.6 ± 110

Air

LowHigh

57.442.6

Particulates (Nutr)

LowHigh

45.154.9

2.8 ± 3.3

INITIATION MODEL

Bloom Initiation

NoYes

76.423.6

Most influential factors

1. Available Nutrient Pool

2. Bottom Current Climate

3. Sediment Nutrients

4. Dissolved Iron

5. Dissolved Phosphorous

6. Light

7. Temperature

M

A

N

A

G

E

M

E

N

T

A

C

T

I

O

N

S

“What-if” scenarios

FactorChange in P(Bloom)

(%)

Available Nutrient Pool 77 (3% - 80%)

Bottom Current Climate 28 (15% - 43%)

Sediment Nutrient Climate 17 (21% - 38%)

Dissolved Fe 16 (21% - 37%)

Dissolved P 15 (23% - 38%)

Light Climate 14 (18% - 32% )

Temperature 14 (21% - 35%)

Dissolved N 13 (22% - 35%)

Rain – present 10 (25% - 35%)

Light Quantity 9 (21% - 30%)

From Science to Management

Management

Model

Science BN

Model

P(Bloom Initiation) Evaluation of factors, scenario assessment

Integration of information, adaptive updates

Other Applications of BNs

• Cheetah conservation in Southern Africa

• Airports

• Integrated asset management

• Resource management

• Recycled water and health

• Import risk

• Hospital infection

• PhD completion

Viability of wild cheetah population in Namibia

Namibia (Marker,2002)

Biological Factors Sub-network

Human Factors Sub-network

Ecological Factors Sub-network

The airport as a complex system

– Development of complex system modelling

framework for airport planning, design and

operations decision support

– Multiple stakeholders

– Multiple objectives

• Security

• Efficiency

• Passenger experience

– Uncertainty

Airport surveillance – BN based on customs documentation

BN subnetworks

BN quantification

Meeting or

exceeding target

Below target

Arrival Concourse Dwell Time 0.59 0.41

Secondary Examination Area Dwell Time 0.83 0.17

Entry Control Point Dwell Time 0.78 0.22

Baggage Hall Dwell Time 0.81 0.19

Overall Passenger Facilitation Time 0.77 0.23

Hospital infections

• Huge cost to human life

• Cost millions of dollars per annum

• Increasing in virulence and location

• Many partial mathematical and simulation models

• Complex network of interacting factors involved

Data-based Bayesian Network

Ward Outliers

Operating Theatre

Cancellations/

Deferrals

Overcrowding Screening MRO Isolates

Handwashing

ComplianceMRSA Isolates

Isolation Ward

OverflowMRO Prevalence

Staffing MRSA Prevalence

January/

FebruaryStaff/1000 OBD % Casual

Five most important factors influencing Pr(infection)

• Isolates

• Overcrowding

• Handwashing Compliance

• Isolation Ward Overflow

• MRSA Prevalence

Counter-intuitive results! New insights

Last example! What factors affect

successful on-time completion of PhDs?

• Bair & Hanworth (2005, meta-analysis of 160 references): funding,

socialization, positive and supportive mentor relationships.

• Maher, Ford and Thompson (2004): availability of funding resources, the nature

of the advising relationship, the extent to which students receive research

preparation and opportunities, student concerns about marital, family or health

problems.

• Seagram, Gould and Pyke (1998): gender, discipline, supportive relationship,

financial situation and enrolment status.

• Council of Graduate Schools Ph.D. Completion Program (2009, conducted in

the USA and Canada): selection, mentoring, financial support, program

environment, research mode of the field, and processes and procedures

Other contributing factors

• Muszynski and Akamatsu (1991, A Procrastination Inventory): demographic and situational variables, including a supportive advisor, finding a topic of interest, making the dissertation a top priority and living close to the university

• Kearnes, Gardiner and Marshall (2009): psychological factors, eg self-sabotaging behaviour due to over-committing, procrastination and perfectionism.

• cohort partnerships and groups and peer to peer support (Devenish, Dyer and Jefferson, 2009), race (Ellis, 2001), type of attendance (Rodwell, 2008), gender (Maher, Ford and Thompson, 2004)

PhD completion is a

complex process

Bayesian Network approach – statistics students:

• A1 – former domestic PhD students

• A2 – current domestic PhD students

• A3 – current international PhD students

• B1 – current PhD supervisors

Three questions

1. What is the overall perceived probability of timely completion of a PhD in Statistics at QUT?

2. What factors were most influential in timely completion, and how do these differ between the four groups?

3. What is the change in the probability of timely completion under specified scenarios?

Factors

Time Management Skills Discipline Expertise Maths skills

Writing skills English skills Incoming Skills

Domestic Cirumstance Emotional State Continuity of Study

Personal Circumstance Financial Circumstance Attitude

Personal Aspects Other PhD Students Researchers

Peers Enrolment Study Location

Research Environment Library Access Physical

General Research Experience Computer Access Resources Available

Interest Written Research Type Expertise in Topic

Student History Access Supervision Experience

Student-Supervisor History Supervisor Research Niche

Previous Experience Topic Research Project

Results

Most important factors:

Across all participants, the four factors that were considered to most directly influence timely completion were: personal aspects, the research environment, the research project, incoming skills.

Overall probability of on-time completion:

Network Probability of Completion in 3.5 Years

B1 70%

A1 68%

A3 72%

A2 79%

Project 2: International students

ALTC project (Prasad Yarlagadda, Karen Woodman) A holistic model for research supervision of international students in engineering and information technology disciplines

• Over 12% of international and NESB postgraduate research students enrol in engineering and IT in Australian universities. They face technical and scientific challenges, cultural, social and religious isolation and linguistic barriers. Existing supervisory frameworks are not fully assisting these students and supervisors.

BN: Student survey

Results: Student surveyOverall probabilities

• Overall student perception of supervisor: 0.79

• Overall Supervisor attributes: 0.83

• Overall Student obligations: 0.91.

Most influential variables

• Supervisor attributes and Student obligations influencing the Overall perception of supervisor

• Demographics and Course preparation influencing the Personal profile

• Qualifications affecting Course preparation

• Age and Sex affecting Personal Demographics

BN: supervisor survey

Results: supervisor survey

• The overall probability of a High score on the target node, Supervisor’s Perception of a Successful Student, was 0.46.

• The overall probability of a High score for the node Successful Student, General was 0.58

• The overall probability of a High score for the node Successful Student, CALD was 0.48

• The overall probability of a High score for the node Supervisor’s Attitudes was 0.33

• The most influential variables were the factors leading to Supervisor’s Perception of Successful CALD Students, Supervisor’s Experience with CALD Students, and Supervisor’s International Experience.

So...

Almost 30 years ago, Abedi and Benkin (1984) described research into reasons contributing to timely completion of degrees as “charitably sparse” (p.4). Twenty years later, Maher, Ford and Thompson (2004) argued that empirical research in this field could still be described as such. There has been considerable literature on the topic in the intervening years, and we hoped that our study would contribute to our growing understanding of timely completion as a complex system.

So: why use Bayesian Networks to model complex systems?

1. Flexible way of conceptualising and quantifying complex systems

2. Able to assess relative impact of factors and evaluate scenarios (‘what-if’)

3. Can incorporate diverse sources of information

4. Can be updated as new information is gathered

5. Transparency and consistency

Your turn

• Install and load the Bayesian Network software package GeNie

• Use GeNie to create a BN of your own choice.

Conclusion:advantages of Bayesian Models

• Give probabilistic inferences about unknown variables based directly on their (posterior) distributions

• Allow formal combination of diverse sources of information through priors

• Address Uncertainty and Complexity

• Modular/hierarchical model structure facilitates description of complex systems

• Can facilitate iterative updating of opinion based on new information

Conclusion: caveats

• Bayesian modelling is not a panacea for bad data: ‘garbage in, garbage out’ still holds.

• Bayesian models are not built overnight. They require care with planning, inputs, sensitivity, outputs.

• Bayesian models do depend on prior information: this is good and bad. (Be careful that they are not simply self-fulfilling or a replacement for quality data.)

• Understand what the models do and can do!

Course Feedback

Please comment and provide a score on a scale of 1 (poor) – 5 (great) on the following:

1. Course administration / organisation

2. Course presentation

3. Course materials

4. Course content

5. Course pitch – too high? too low?

Please comment on the following general questions:

1. What was your overall opinion of the course?

2. How might the course have been improved?

3. Has the course motivated you to use Bayesian statistics in practice?

C ollaborative Centre for Data Analysis, Modelling and … · 2017-11-22 · “Probability “...

Documents

Transcript of C ollaborative Centre for Data Analysis, Modelling and … · 2017-11-22 · “Probability “...