Parallel Adaptive Wang Landau - GDR November 2011

18
Wang–Landau algorithm Improvements 2D Ising model Conclusion Parallel Adaptive Wang–Landau Algorithm Pierre E. Jacob CEREMADE - Universit´ e Paris Dauphine & CREST, funded by AXA Research 15 novembre 2011 joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford), Pierre Del Moral (INRIA & Universit´ e de Bordeaux) Pierre E. Jacob PAWL 1/ 18

description

http://arxiv.org/abs/1109.3829 http://cran.r-project.org/web/packages/PAWL/index.html http://statisfaction.wordpress.com/2011/09/21/density-exploration-and-wang-landau-algorithms-with-r-package/

Transcript of Parallel Adaptive Wang Landau - GDR November 2011

Page 1: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Parallel Adaptive Wang–Landau Algorithm

Pierre E. Jacob

CEREMADE - Universite Paris Dauphine & CREST, funded by AXA Research

15 novembre 2011

joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford), Pierre Del Moral(INRIA & Universite de Bordeaux)

Pierre E. Jacob PAWL 1/ 18

Page 2: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Outline

1 Wang–Landau algorithm

2 ImprovementsAutomatic BinningParallel Interacting ChainsAdaptive proposals

3 2D Ising model

4 Conclusion

Pierre E. Jacob PAWL 2/ 18

Page 3: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Context

unnormalized target density π

on a state space X

A kind of adaptive MCMC algorithm

It iteratively generates a sequence Xt .

The stationary distribution is not π itself.

At each iteration a different stationary distribution is targeted.

Pierre E. Jacob PAWL 3/ 18

Page 4: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Partition the space

The state space X is cut into d bins:

X =d⋃

i=1

Xi and ∀i 6= j Xi ∩ Xj = ∅

Goal

The generated sequence spends the same time in each bin Xi ,

within each bin Xi the sequence is asymptotically distributedaccording to the restriction of π to Xi .

Pierre E. Jacob PAWL 4/ 18

Page 5: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Stationary distribution

Define the mass of π over Xi by:

ψi =

∫Xi

π(x)dx

The stationary distribution of the WL algorithm is:

πψ(x) ∝ π(x)× 1

ψJ(x)

where J(x) is the index such that x ∈ XJ(x)

Pierre E. Jacob PAWL 5/ 18

Page 6: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Example with a bimodal, univariate target density: π and two πψcorresponding to different partitions.

X

Log

Den

sity

−12

−10

−8

−6

−4

−2

0

Original Density, with partition lines

−5 0 5 10 15

Biased by X

−5 0 5 10 15

Biased by Log Density

−5 0 5 10 15

Pierre E. Jacob PAWL 6/ 18

Page 7: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Plugging estimates

In practice we cannot compute ψi analytically. Instead we plug inestimates θt(i) of ψi at iteration t, and define the distribution πθtby:

πθt (x) ∝ π(x)× 1

θt(J(x))

Metropolis–Hastings

The algorithm does a Metropolis–Hastings step, aiming πθt atiteration t, generating a new point Xt .

Pierre E. Jacob PAWL 7/ 18

Page 8: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Estimate of the bias

The update of the estimated bias θt(i) is done according to:

θt(i)← θt−1(i)[1 + γt(IXt∈Xi− d−1)]

with γt a decreasing sequence or “step size”. E.g. γt = 1/t.

Pierre E. Jacob PAWL 8/ 18

Page 9: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Wang–Landau

Result

In the end we get:

a sequence Xt asymptotically following πψ,

as well as estimates θt(i) of ψi .

Pierre E. Jacob PAWL 9/ 18

Page 10: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Automatic BinningParallel Interacting ChainsAdaptive proposals

Automate Binning

Easily move from one bin to another

Maintain some kind of uniformity within bins. If non-uniform, splitthe bin.

Log density

Fre

quen

cy

(a) Before the split

Log density

Fre

quen

cy

(b) After the split

Pierre E. Jacob PAWL 10/ 18

Page 11: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Automatic BinningParallel Interacting ChainsAdaptive proposals

Parallel Interacting Chains

N chains (X(1)t , . . . ,X

(N)t ) instead of one.

targeting the same biased distribution πθt at iteration t,

sharing the same estimated bias θt at iteration t.

The update of the estimated bias becomes:

θt(i)← θt−1(i)[1 + γt(1

N

N∑j=1

IX

(j)t ∈Xi

− d−1)]

Pierre E. Jacob PAWL 11/ 18

Page 12: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Automatic BinningParallel Interacting ChainsAdaptive proposals

Adaptive proposals

For continuous state spaces

We can use the adaptive Random Walk proposal where thevariance σt is learned along the iterations to target an acceptancerate.

Robbins-Monro stochastic approximation update

σt+1 = σt + ρt (2I(A > 0.234)− 1)

Or alternatively

Σt = δ × Cov (X1, . . . ,Xt)

Pierre E. Jacob PAWL 12/ 18

Page 13: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

2D Ising model

Higdon (1998), JASA 93(442)

Target density

Consider a 2D Ising model, with posterior density

π(x |y) ∝ exp

α∑i

I[yi = xi ] + β∑i∼jI[xi = xj ]

with α = 1, β = 0.7.

The first term (likelihood) encourages states x which aresimilar to the original image y .

The second term (prior) favors states x for whichneighbouring pixels are equal, like a Potts model.

Pierre E. Jacob PAWL 13/ 18

Page 14: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

2D Ising models

(a) Original Image (b) Focused Region of Image

Pierre E. Jacob PAWL 14/ 18

Page 15: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

2D Ising models

X1

X2

10

20

30

40

10

20

30

40

Iteration 300,000

10 20 30 40

Iteration 350,000

10 20 30 40

Iteration 400,000

10 20 30 40

Iteration 450,000

10 20 30 40

Iteration 500,000

10 20 30 40

Metropolis−

Hastings

Wang−

Landau

Pixel

On

Off

Figure: Spatial model example: states explored over 200,000 iterationsfor Metropolis-Hastings (top) and proposed algorithm (bottom).

Pierre E. Jacob PAWL 15/ 18

Page 16: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

2D Ising models

X1

X2

10

20

30

40

Metropolis−Hastings

10 20 30 40

Wang−Landau

10 20 30 40

Pixel

0.4

0.6

0.8

1.0

Figure: Spatial model example: average state explored withMetropolis-Hastings (left) and Wang-Landau after importance sampling(right).

Pierre E. Jacob PAWL 16/ 18

Page 17: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Conclusion

Automatic binning

We still have to define a range.

Parallel Chains

In practice it is more efficient to use N chains for T iterationsinstead of 1 chain for N × T iterations.

Adaptive Proposals

Convergence results with fixed proposals are already challenging,and making the proposal adaptive might add a layer of complexity.

Pierre E. Jacob PAWL 17/ 18

Page 18: Parallel Adaptive Wang Landau - GDR November 2011

Wang–Landau algorithmImprovements

2D Ising modelConclusion

Bibliography

Article: An Adaptive Interacting Wang-Landau Algorithm forAutomatic Density Exploration, L. Bornn, P.E. Jacob, P. Del

Moral, A. Doucet, available on arXiv.

Software: PAWL, an R package, available on CRAN:

install.packages("PAWL")

References:

F. Wang, D. Landau, Physical Review E, 64(5):56101

Y. Atchade, J. Liu, Statistica Sinica, 20:209-233

Pierre E. Jacob PAWL 18/ 18