Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Parallel Adaptive Wang–Landau Algorithm

Pierre E. Jacob

CEREMADE - Universite Paris Dauphine & CREST, funded by AXA Research

15 novembre 2011

joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford), Pierre Del Moral(INRIA & Universite de Bordeaux)

Pierre E. Jacob PAWL 1/ 18



Outline

1 Wang–Landau algorithm

2 ImprovementsAutomatic BinningParallel Interacting ChainsAdaptive proposals

3 2D Ising model

4 Conclusion




Wang–Landau

Context

unnormalized target density π

on a state space X

A kind of adaptive MCMC algorithm

It iteratively generates a sequence Xt .

The stationary distribution is not π itself.

At each iteration a different stationary distribution is targeted.




Wang–Landau

Partition the space

The state space X is cut into d bins:

X =d⋃

i=1

Xi and ∀i 6= j Xi ∩ Xj = ∅

Goal

The generated sequence spends the same time in each bin Xi ,

within each bin Xi the sequence is asymptotically distributedaccording to the restriction of π to Xi .




Wang–Landau

Stationary distribution

Define the mass of π over Xi by:

ψi =

∫Xi

π(x)dx

The stationary distribution of the WL algorithm is:

πψ(x) ∝ π(x)× 1

ψJ(x)

where J(x) is the index such that x ∈ XJ(x)




Wang–Landau

Example with a bimodal, univariate target density: π and two πψcorresponding to different partitions.

X

Log

Den

sity

−12

−10

−8

−6

−4

−2

0

Original Density, with partition lines

−5 0 5 10 15

Biased by X

−5 0 5 10 15

Biased by Log Density

−5 0 5 10 15




Wang–Landau

Plugging estimates

In practice we cannot compute ψi analytically. Instead we plug inestimates θt(i) of ψi at iteration t, and define the distribution πθtby:

πθt (x) ∝ π(x)× 1

θt(J(x))

Metropolis–Hastings

The algorithm does a Metropolis–Hastings step, aiming πθt atiteration t, generating a new point Xt .




Wang–Landau

Estimate of the bias

The update of the estimated bias θt(i) is done according to:

θt(i)← θt−1(i)[1 + γt(IXt∈Xi− d−1)]

with γt a decreasing sequence or “step size”. E.g. γt = 1/t.




Wang–Landau

Result

In the end we get:

a sequence Xt asymptotically following πψ,

as well as estimates θt(i) of ψi .




Automatic BinningParallel Interacting ChainsAdaptive proposals

Automate Binning

Easily move from one bin to another

Maintain some kind of uniformity within bins. If non-uniform, splitthe bin.

Log density

Fre

quen

cy

(a) Before the split

Log density

Fre

quen

cy

(b) After the split





Parallel Interacting Chains

N chains (X(1)t , . . . ,X

(N)t ) instead of one.

targeting the same biased distribution πθt at iteration t,

sharing the same estimated bias θt at iteration t.

The update of the estimated bias becomes:

θt(i)← θt−1(i)[1 + γt(1

N

N∑j=1

IX

(j)t ∈Xi

− d−1)]





Adaptive proposals

For continuous state spaces

We can use the adaptive Random Walk proposal where thevariance σt is learned along the iterations to target an acceptancerate.

Robbins-Monro stochastic approximation update

σt+1 = σt + ρt (2I(A > 0.234)− 1)

Or alternatively

Σt = δ × Cov (X1, . . . ,Xt)




2D Ising model

Higdon (1998), JASA 93(442)

Target density

Consider a 2D Ising model, with posterior density

π(x |y) ∝ exp

α∑i

I[yi = xi ] + β∑i∼jI[xi = xj ]

with α = 1, β = 0.7.

The first term (likelihood) encourages states x which aresimilar to the original image y .

The second term (prior) favors states x for whichneighbouring pixels are equal, like a Potts model.




2D Ising models

(a) Original Image (b) Focused Region of Image




2D Ising models

X1

X2

10

20

30

40

10

20

30

40

Iteration 300,000

10 20 30 40

Iteration 350,000

10 20 30 40

Iteration 400,000

10 20 30 40

Iteration 450,000

10 20 30 40

Iteration 500,000

10 20 30 40

Metropolis−

Hastings

Wang−

Landau

Pixel

On

Off

Figure: Spatial model example: states explored over 200,000 iterationsfor Metropolis-Hastings (top) and proposed algorithm (bottom).




2D Ising models

X1

X2

10

20

30

40

Metropolis−Hastings

10 20 30 40

Wang−Landau

10 20 30 40

Pixel

0.4

0.6

0.8

1.0

Figure: Spatial model example: average state explored withMetropolis-Hastings (left) and Wang-Landau after importance sampling(right).




Conclusion

Automatic binning

We still have to define a range.

Parallel Chains

In practice it is more efficient to use N chains for T iterationsinstead of 1 chain for N × T iterations.

Adaptive Proposals

Convergence results with fixed proposals are already challenging,and making the proposal adaptive might add a layer of complexity.




Bibliography

Article: An Adaptive Interacting Wang-Landau Algorithm forAutomatic Density Exploration, L. Bornn, P.E. Jacob, P. Del

Moral, A. Doucet, available on arXiv.

Software: PAWL, an R package, available on CRAN:

install.packages("PAWL")

References:

F. Wang, D. Landau, Physical Review E, 64(5):56101

Y. Atchade, J. Liu, Statistica Sinica, 20:209-233


Parallel Adaptive Wang Landau - GDR November 2011

Education

Transcript of Parallel Adaptive Wang Landau - GDR November 2011