COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A...

29
COMP STAT WEEK 6 DAY 2 More Bayes and start of Metropolis Hastings Dave Campbell, www.stat.sfu.ca/~dac5

Transcript of COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A...

Page 1: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

COMP STAT WEEK 6 DAY 2More Bayes and start of Metropolis Hastings

Dave Campbell, www.stat.sfu.ca/~dac5

Page 2: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Basics of Computational Bayesian Methods

MCMC, how to and what you need to know

Page 3: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Thomas Bayes

Page 4: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

A Markov Chain is a sequence of random variables {Xt,t≥0} where

The probability of moving from state At-1 to state At is constant

Conditional on one previous time step, the chain is independent of all events before that.

P(Xt ∈At | Xt−1 ∈At−1,Xt−2 ∈At−2 ,...,X0 ∈A0 )= P(Xt ∈At | Xt−1 ∈At−1)= P(Xs ∈At | Xs−1 ∈At−1)= PAt−1At

Page 5: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Let Ωt be a random variable (stochastic process)

We want to evaluate

We use dependent realizations from a Markov chain to approximate

We just set up a Markov chain with the desired state space and let it step ahead for a long time

θ = E(h(Ω) = h(ω j )P(Ω =ω j )j=1

θ̂

Page 6: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

In practice we typically use Metropolis Hastings (MH) algorithm to use a sample from one nice and well behaved Markov chain to give us a sample from our target distribution P(ß|Y=y)

We have a good way of getting

but we don’t have the scaling factor P(Y=y)

P(Y = y | β)P(β)

Page 7: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

So what we have is

Where the unknown

P(β = b |Y = y) = CP(Y = y | β = b)P(β = b)

C =1

P(Y = y | β = bj )P(β = bj )j=1

Page 8: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Given ßt=i we propose a value of X=j as a candidate for ßt+1 from the proposal distribution (transition distribution) Qij

For example propose X from Uniform(ßt-∂,ßt+∂)

Make a probabilistic decision about keeping setting ßt+1=X or keeping ßt+1=ßt

We make the decision such that {ßt|t≥0} has the correct limiting distribution: P(ß|Y=y)

Page 9: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Let’s be clear about notation:

P(ßt+1=j|ßt=i) = Pij

So Pij is the probability that the random walk leading to the target (posterior) distribution moves from state i to state j

P(X=j|ßt=i) = Qij

Qij is the the probability that the random walk from an easy to sample yet arbitrary distribution proposes a value from state i to state j

Page 10: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

If we could sample from ßt directly it would have the transition distribution Pij

The probability of accepting the value X is

To get the right target distribution we need when i≠j

And we must fulfill the detailed balance

α ij

Pij = Qijα ij

P(β = i)Pij = PjiP(β = j)

Page 11: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

1.Start with ßt-1=i 2.Propose a value X|ßt-1=j from transition

probability matrix Qij as a candidate for ßt

3.compute

5.sample u ~ Unif(0,1) 6.If u< then accept the proposal and set ßt=X

and if not then set ßt=ßt-1. 7.Repeat (N times) until you obtain a sufficient

sample from the distribution of ß|Y=y

α ij = minP(Y = y | β = j)P(β = j)P(X = j | β = i)P(Y = y | β = i)P(β = i)P(X = i | β = j)

,1⎛

⎝⎜⎞

⎠⎟

α ij

Page 12: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

http://www.probability.ca/jeff/java/ A Markov chain applet, "rwm", illustrates a random walk metropolis hastings algorithm

Check out my awesome applets!

Jeffrey S. Rosenthal University of Toronto

author of: Struck by Lightning: The

Curious World of Probabilities (book for

the general public). HarperCollins Canada,

272 pages, 2005.

And heaps of MCMC theory papers

Page 13: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Simple Example

Page 14: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

We will use the example from the cervical cancer vaccination data.

The parameter 'ß' is the probability of getting cervical cancer when someone is not vaccinated.

Page 15: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Without any data, I don't think I know anyone with cervical cancer but admit I know very little about its prevalence

But 0<ß<1 and I will assume is has a q has a density that is higher at low values and decreases linearly to 0 density at ß=1.

Page 16: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Data: The study showed that Y=36 women got cancer from N=5766.

We will use a Binomial Statistical model for Y

The likelihood P(Y|ß) = Binomial(N,ß) is our statistical model.

We are interested in updating our belief about the value of the real parameter with the data suggesting Bayesian methods are appropriate.

Page 17: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Given the data our belief about q is

Let’s get a point and interval estimate for P(ß|Y=y) using MCMC

P(β | Y = y) ∝ P(Y = y | β)P(β)

= P(Y = 36 | β)P(β)

= 576636

⎝⎜⎞

⎠⎟β 36 (1− β)5766−36 (2 − 2β)

∝ β 36 (1− β)5730 (2 − 2β)

Page 18: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Week6_Day2_Basic_MCMC.R######################## This file runs basic Metropolis Hastings for the Merck Vaccination data# Parameter Beta is the probability of getting cervical cancer when someone is not vaccinated#######################

# The prior is the simple triangle function. # This function is numerically quite stable within the (0,1) interval for Beta# logprior = function(beta){ if(beta>0 && beta<1 ){ return(log(2-2*beta)) }else{ return(-Inf) }}

Page 19: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

# Set up the MCMC with niter iterationsniter = 100000stepvar = .002beta = rep(0,niter)

# The datay = 36N = 5766

# keep track of the acceptance rateaccepts = 0

Page 20: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

# Initialize and run the MCMCbeta[1] = y/N

for(iter in 2:niter){ # propose a value from an easy distribution Betaprop = rnorm(n = 1, mean = beta[iter-1], sd = stepvar);

# the ratio of un-normalized posteriors. Note that my proposal # distribution is symmetric so Q_{ij}=Q_{ji} alpha = dbinom(y,N,Betaprop,log=TRUE) + logprior(Betaprop) - dbinom(y,N,beta[iter-1],log=TRUE) - logprior(beta[iter-1]); # make a decision if(!is.na(alpha) && runif(1) < exp(alpha)){ accepts = accepts+1; beta[iter] = Betaprop; }else{ beta[iter] = beta[iter-1]; }}

Page 21: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

The Markov Chain for ß

Page 22: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

hist(beta,100)The distribution of ß, the probability of getting cancer

without getting vaccinated.

Page 23: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Use the sampled values of ß to compute

We often use the sampled values to get an approximation for the mean, median, modes, variance, interval estimates, quantiles...

Bayesian statistics uses MCMC to give an approximation to the full posterior distribution.

E h(β)[ ] = h(b j)P(β = b j)j=1

≈h(b j)Nj=1

N

> summary(beta) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.002896 0.005749 0.006404 0.006450 0.007034 0.011720

Page 24: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Taking it a step further:

The cancer is rare.

Statisticians are skeptical of everything

What should we use as a prior for the probability of getting cancer given that we have been vaccinated?

Page 25: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Let’s see if the vaccine actually works

Page 26: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Switch to RStudio

Page 27: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving
Page 28: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Interpret my prior for the second example.

Interpret my prior for the first example

What is the frequentist analog to the second analysis?

Page 29: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

Second analysis used the posterior from the “no vaccine” group as a prior. This is saying that we start with assuming that the vaccine doesn’t work and the probability of getting cancer is the same with or without vaccination