MCMC: Gibbs Sampler
Transcript of MCMC: Gibbs Sampler
MCMC:Gibbs Sampler
Building Complex Models: Conditional Sampling
● Consider a more complex model
● When sampling for each parameter iteratively, only need to consider distributions that have that parameter
pb , ,∣y ∝ p y∣b p b∣ , p p
pb∣ , , y ∝ p y∣b p b∣ ,
p∣b , , y ∝ p b∣ , p
p∣b , , y ∝ p b∣ , p
In one MCMC step
p bg1∣g ,g , y ∝ p y∣bg p bg∣g ,g
p g1∣bg1 ,g , y ∝ p bg1∣g ,g p g
p g1∣bg1 ,g1 , y ∝ pbg1∣g1 ,g p g
Gibbs Sampling
● If we can solve a conditional distribution analytically– We can sample from that conditional directly
– Even if we can't solve for full joint distribution analytically
● Example
p ,2∣y ∝N y∣ ,2N ∣0,V 0 IG 2∣ ,
=N n
2g y
0
2
n
2g
1
2
,1
1
2g
1
2
p 2g1∣
g1 ,y ∝ N y∣g1 ,2g IG
2g∣ ,
p g1∣
2g ,y ∝N y∣g ,2gN
g∣0,V 0
= IG n2 ,12∑ y i−
g12
Trade-offs of Gibbs Sampling
● 100% acceptance rate● Quicker mixing, quicker convergence● No need to specify a Jump distribution● No need to tune a Jump variance● Requires that we know the analytical solution
for the conditional– Conjugate distributions
● Can mix different samplers within a MCMC
Gibbs Sampler
1) Start from some initial parameter value c
2) Evaluate the unnormalized posterior
3) Propose a new parameter value Random draw from conditional distribution given the current value for all other parameters
4) Evaluate the new unnormalized posterior
5) Decide whether or not to accept the new valueAccept new value 100% of the time
6) Repeat 3-5
Metropolis Algorithm
)1 Start from some initial parameter value c
)2 Evaluate the unnormalized posterior p(c)
)3 Propose a new parameter value Random draw from a “jump” distribution centered on the current parameter value
)4 Evaluate the new unnormalized posterior p()
)5 Decide whether or not to accept the new valueAccept new value with probability
a = p() / p(c)
)6 Repeat 3-5
Bayesian Regressionvia Gibbs Sampling
● Consider the standard regression model
– Y = response variable
– Xj = covariates
– j = parameters
yi=b0b1 xi ,1b2 xi ,2⋯bn xi , ni
i~N 0,2
Matrix version
● As noted earlier this
can be expressed as
● The full data set can be expressed as
yi=b0b1 xi ,1b2 xi ,2⋯bn xi ,n
yi=X ib
y=X b
Quick review of matrix math
● Vector
● Vector x vector
b=[b1
b2
b3]
bT×b=[b1 b2 b3]×[b1
b2
b3]=b1
2b2
2b3
2
● Transpose
bT=[b1 b2 b3 ]3 x 11 x 3
1 x 3 3 x 1 1 x 1
See Appendix C for more details
Quick review of matrix math
● Vector x vector
● Vector x matrix
b× bT=[b1
b2
b3]×[b1 b2 b3]=[
b12 b1b2 b1b3
b2b1 b22 b2b3
b3b1 b3b2 b32 ]
● Matrix x Matrix
n x 3
3 x 1 1 x 1
X b=
3 x 1 n x 1
X XT=Zn x 3 n x n3 x n
● Matrix x Matrix
● Identity Matrix
XT X=[ ∑ xi ,12 ∑ xi ,1 xi ,2 ∑ x i ,1 xi ,3
∑ x i ,2 xi ,1 ∑ xi ,22 ∑ x i ,2 xi ,3
∑ xi ,3 xi ,1 ∑ xi ,3 xi ,2 ∑ xi ,32 ]
3 x n 3 x 3n x 3
I=[1 0 00 1 00 0 1]
● Matrix Inversion
X−1X=X X−1=I
...back to regression
y=X b ~N n ,0
~N n0,2 I Likelihood
p y ∣b ,2 ,X =N ny ∣ X b ,2 I
Priorsp b=N pb ∣ b0 ,V b
p 2=IG
2∣ s1 , s2
Process modelData model
Parameter model
General pattern
● Write down likelihood– Data model
– Process model
● Write down priors for the free parameters in the likelihood– Parameter model
● Solve for posterior distribution of model parameters– Analytical
– Numerical
Posterior
p b ,2∣X ,y∝N n y ∣X b ,2 I ×
N pb ∣ b0 ,V b IG 2∣ s1 , s2
Sample in terms of conditionals
p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b
p 2∣b ,X ,y∝N n y ∣X b ,2 I IG 2∣ s1 , s2
p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b
Regression Parameters
Will have a posterior that is multivariate Normal...
N n b ∣ vV ,V =1
2n /2∣∣1 /2exp [ 1
2b−Vv TV−1
b−Vv ]
What are V and v??
b−Vv TV−1b−Vv
bTV−1b− bTV−1V v−v V V−1bv V V−1Vv
bTV−1b− bT v−v bvVv
bTV−1b−2 bT vvVv
y−XbT 2 I −1y−Xbb−b0
TV b−1b−b0
−2 yT y−−2 yT Xb−−2bT XT y−2bT XT Xb
bTV b−1b− bTV b
−1 b0−b0TV b
−1bb0TV b
−1b0
bTV−1b
−2 bT v
−2bT XT Xb bT V b−1b V−1=−2 XT XV b
−1
−−2 yT Xb−−2bT XT y− bTV b−1 b0−b0
TV b−1b
v=−2XT yV b
−1 b0
p 2 ∣b ,X ,y∝N n y ∣X b ,2 I IG 2 ∣ s1 , s2
Variance
Will have a posterior that is Inverse Gamma...
IG 2∣u1 ,u2∝2
−u11exp [− u2
2 ]
∝1
n /2 exp [ 122 y−Xb
T y−Xb]2−s11exp [− s2
2 ]
u1=s1n2
u2=s212y−XbT y−Xb
BUGS implementation
model{
##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)
##likelihoodfor(i in 1:n){
mu[i] <- b[1] + x[i] * b[2]y[i] ~ dnorm(mu[i],tau)
}
}
b [ 1 ]
b [ 1 ]
0 . 0 2 0 . 0 4 0 . 0 6 0 . 0
P(b
[1])
0.0
0.0
50
.10
.15
b [ 2 ]
b [ 2 ]
- 1 0 . 0 - 5 . 0 0 . 0
P(b
[2])
0.0
0.5
1.0
b [ 1 ].1 0 0 2 0 . 0 3 0 . 0 4 0 . 0
b[2
]
-8.0
-6.0
-4.0
-2.0
t a u
t a u0 . 0 0 . 1 0 . 2 0 . 3
P(t
au
)
0.0
5.0
10
.01
5.0
20
.0
b [ ]1
l a g0 5 0 1 0 0
au
to c
orr
ela
tio
n-1
.0-0
.50
.00
.51
.0
Centering the data
model{
##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)mX <- mean(x[])
##likelihoodfor(i in 1:n){
mu[i] <- b[1] + (x[i]-mX) * b[2]y[i] ~ dnorm(mu[i],tau)
}
}
b [ 1 ]- 5 . 0 0 . 0 5 . 0 1 0 . 0
b[2
]
-8.0
-6.0
-4.0
-2.0
Outline
● Different numerical techniques for sampling from the posterior– Rejection Sampling
– Inverse Distribution Sampling
– Markov Chain-Monte Carlo (MCMC)● Metropolis● Metropolis-Hastings● Gibbs sampling
● Sampling conditionals vs full model● Flexibility to specify complex models
Where things stand...
● We've now filled our “toolbox” with the basic tools that will enable us to explore more advanced topics– Probability rules & distributions
– Data models & Likelihoods
– Process models
– Parameter models & Prior distributions
– Analytical and Numerical Methods in ML and Bayes
General pattern - Bayes
● Write down likelihood– Data model
– Process model
● Write down priors for the free parameters in the likelihood– Parameter model
● Solve for posterior distribution of model parameters– Analytical
– Numerical
General Pattern - Maximum Likelihood
● Step 1: Write a likelihood function describing the likelihood of the observation
● Step 2: Find the value of the model parameter that maximized the likelihood– Calculate -lnL
– Analytical● Take derivative with respect to EACH parameter● Set = 0 , solve for parameter
– Numerical● Find minimum of -lnL surface
dLd
=0
Where we're going...
● Section 2 is the “meat” of the course:– Interval estimation
● Confidence and Credible intervals● Predictive intervals
– Model selection and hypothesis testing
– GLMs
– Nonlinear models
– Hierarchal models and random effects
– Measurement error and missing data
● Section 3 is more advanced applications