Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. ·...
Transcript of Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. ·...
![Page 1: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/1.jpg)
Metropolis-Hastings Algorithm for Mixture Model andits Weak Convergence
Kengo, KAMATANI
University of Tokyo, Japan
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 1 / 22
![Page 2: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/2.jpg)
1 Gibbs sampler usually works well.
2 However in certain settings, it works poorly. ex) Mixture model.
3 Fortunately, we found an alternative MCMC method which worksbetter in simulation.
Problem
Both 2 and 3 are uniformly ergodic. Therefore, to compare those methods,we have to calculate the convergence rates. It is very difficult!
Therefore, in Harris recurrence approach, the comparison is difficult.We take another approach.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 2 / 22
![Page 3: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/3.jpg)
Summary of the talk
Sec. 1 I show a bad behavior of the Gibbs sampler.
Sec. 2 Define efficiency (consistency) of MCMC. Prove that theGibbs sampler has a bad convergence property.
Sec. 3 Propose a new MCMC. Prove that the new MCMC is betterthan the Gibbs sampler.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 3 / 22
![Page 4: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/4.jpg)
Note that...
Harris recurrence property is also very important for our approach.Without this property, our approach is useless.
The another motivation of our approach is to divide two differentconvergence issues 1) convergence to the local area and 2) consistency
Only the mixture model is considered here. However it may be usefulto other models.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 4 / 22
![Page 5: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/5.jpg)
Outline
1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler
2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy
3 MH algorithm converges fasterMH proposal constructionMH performance
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 5 / 22
![Page 6: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/6.jpg)
Outline
1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler
2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy
3 MH algorithm converges fasterMH proposal constructionMH performance
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 6 / 22
![Page 7: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/7.jpg)
Bad behavior of the Gibbs samplerModel description
1 Consider a model
pX |Θ(dx |θ) = (1− θ)F0(dx) + θF1(dx).
2 Flip a coin with the proportion of head θ. If the coin is head, generatex from F1, otherwise, from F0.
3 We do not observe the coin but x .
4 Observation xn = (x1, x2, . . . , xn), x i ∼ pX |Θ(dx |θ0).Prior distribution pΘ = Beta(α1, α0).
We want to calculate the posterior distribution.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 7 / 22
![Page 8: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/8.jpg)
Bad behavior of the Gibbs samplerGibbs sampler
1 Set θ(0) ∈ Θ.
2 y i ∼ Bi(1, pi ) (i = 1, 2, . . . , n) where
pi =θ(0)f1(x i )
(1− θ(0))f0(x i ) + θ(0)f1(x i ).
Count m =∑n
i=1 yi . Fi (dx) = fi (x)dx .
3 Generate θ(1) ∼ Beta(α1 + m, α0 + n −m).
4 Empirical measure of (θ(0), θ(1), . . . , θ(N − 1)) is an estimator of theposterior distribution.
The next figure is a path of the Gibbs sampler when the true model is F0,that is, θ0 = 0.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 8 / 22
![Page 9: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/9.jpg)
Bad behavior of the Gibbs samplerGibbs sampler
0 200 400 600 800 1000
02
46
Path of MCMC
iteration
deviance
Figure: Plot of paths of MCMC methods for n = 104. The dashed line is a pathfrom the Gibbs sampler and the solid line is the MH algorithm.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 9 / 22
![Page 10: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/10.jpg)
Bad behavior of the Gibbs samplerHow to define efficiency
1 MCMC methods produce complicated Markov chain.
2 We make an approximation of MCMC method.
We observe the behavior of MCMC methods when the sample size n→∞.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 10 / 22
![Page 11: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/11.jpg)
Outline
1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler
2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy
3 MH algorithm converges fasterMH proposal constructionMH performance
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 11 / 22
![Page 12: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/12.jpg)
Weak convergence of MCMCWhat is MCMC?
Write s instead of θ.
1 For each observation x , Gibbs sampler produces pathss = (s(0), s(1), . . .) in S∞.
2 In other words, for x ∈ X , Gibbs sampler defines a law G x ∈ P(S∞).
3 Therefore, a Gibbs sampler is a set of probability measuresG = (G x ; x ∈ X ) (Later, we will consider G as a random variableG (x) = G x).
Let ν̂m(s) be the empirical measure of s(0), . . . , s(m − 1).Let νx be the target distribution for each x .
We expect that d(ν̂m(s), νx)→ 0 in a certain sense.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 12 / 22
![Page 13: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/13.jpg)
Weak convergence of MCMCConsistency
1 We expect that as m→∞,
EG (d(ν̂m(s), ν))→ 0.
But G and ν depend on x!.
2 We expect that as m→∞,
EG x (d(ν̂m(s), νx)) = oP(1).
But G x and νx may depend on n!.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 13 / 22
![Page 14: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/14.jpg)
Weak convergence of MCMCConsistency
Definition
(Mn = (Mxn ); n ∈ N): sequence of MCMC.
We call (Mn; n ∈ N) consistent for νn = (νxn ) if for any m(n)→∞,
EMxn(d(ν̂[m(n)](s), νxn )) = oPn(1).
For a regular model, the Gibbs sampler has consistency with scalingθ 7→ n1/2(θ − θ0).
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 14 / 22
![Page 15: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/15.jpg)
Weak convergence of MCMCDegeneracy
Definition1 If a measure ω ∈ P(S∞) satisfies the following, we call it degenerate:
ω({s; s(0) = s(1) = s(2) = · · · }) = 1 (1)
2 We also call M degenerate (in P) if Mx is degenerate a.s. x .
3 If Mn ⇒ M and M degenerate, we call Mn degenerate in the limit.
The Gibbs sampler Gn for mixture model is degenerate with scalingθ 7→ n1/2θ if θ0 = 0 as n→∞.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 15 / 22
![Page 16: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/16.jpg)
Weak convergence of MCMCDegeneracy
In fact, Gn tends to a diffusion process type variable with time scaling0, 1, 2, . . . 7→ 0, n−1/2, 2n−1/2, . . .!
Under both space and time scaling, G xn is similar to the law of
dSt = (α1 + StZn − S2t I )dt + StdBt
where Zn ⇒ N(0, I ) and I is the Fisher information matrix.
If we take m(n)n−1/2 →∞, the empirical measure converges to theposterior distribution.
We call Gn n1/2-weakly consistent.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 16 / 22
![Page 17: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/17.jpg)
Outline
1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler
2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy
3 MH algorithm converges fasterMH proposal constructionMH performance
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 17 / 22
![Page 18: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/18.jpg)
MH algorithm converges fasterMH proposal construction
Construct a posterior distribution for another parametric family:
1 Fix Q ⊂ P(X ).
2 For each θ, set
qX |Θ(dx |θ) := argminq∈Qd(pX |Θ(dx |θ), q)
where d is a certain metric. ex) Kullback-Leibler distance.
3 Calculate the posterior qnΘ|Xn(dθ|xn).
Remark
We assume that we can generate θ ∼ qnΘ|Xn(dθ|xn) in PC.
This construction is similar to
1 quasi Bayes method (See ex. Smith and Markov 1978)
2 variational Bayes method (See ex. Humphreys and Titterington 2000).
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 18 / 22
![Page 19: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/19.jpg)
MH algorithm converges fasterMH proposal construction
Construct an independent type Metropolis-Hastings algorithm with targetdistribution pnΘ|Xn
(dθ|xn).
Step 0 Generate θ(0) ∼ qnΘ|Xn(dθ|xn). Go to Step 1.
Step i Generate θ(i)∗ ∼ qnΘ|Xn(dθ|xn). Then
θ(i) =
{θ∗(i) with probability α(θ(i), θ∗(i))θ(i − 1) with probability 1− α(θ(i), θ∗(i))
.
Go to Step i + 1.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 19 / 22
![Page 20: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/20.jpg)
MH algorithm converges fasterMH performance (Normal): Mean squared error
0 2000 4000 6000 8000 10000
0.0
0.5
1.0
1.5
2.0
MCMC standard error 0 0 1 2 4 3 0
mcmc
sd
Figure: The dashed line is apath from the Gibbs samplerand the solid line is the MHalgorithm for n = 10.
0 2000 4000 6000 8000 10000
0.0
0.5
1.0
1.5
2.0
MCMC standard error 0 0 1 3 4 3 0
mcmc
sdFigure: The same figure as theleft. The sample size is 102.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 20 / 22
![Page 21: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/21.jpg)
MH algorithm converges fasterRemarks
Remarks...
If F0 and F1 are similar, the Gibbs sampler becomes even worse. This factcan be verified by setting Fε and ε→ 0 instead of ε ≡ 1. In this case, wehave to take m(n)εn−1/2 →∞ ( Gn is ε−1n1/2-weakly consistent).
For other model, there is a case such that m(n)n−1 →∞.
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 21 / 22
![Page 22: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University](https://reader036.fdocuments.net/reader036/viewer/2022081519/6118a8524ad43b075e6ceb22/html5/thumbnails/22.jpg)
MH algorithm converges faster
Thank you!
KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 22 / 22