Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With...

24
Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson 1 Simon Tavar´ e 2 1 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics and Theoretical Physics University of Cambridge Workshop on Approximate Inference in Stochastic Processes and Dynamical System R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 1 / 19

Transcript of Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With...

Page 1: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate Bayesian Computation: a simulation

based approach to inference

Richard Wilkinson1 Simon Tavare2

1Department of Probability and StatisticsUniversity of Sheffield

2Department of Applied Mathematics and Theoretical PhysicsUniversity of Cambridge

Workshop on Approximate Inference in Stochastic Processes andDynamical System

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 1 / 19

Page 2: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Stochastic ComputationImplicit Statistical Models

Two types of statistical model:

Prescribed models - likelihood function is specified.

Implicit models - mechanism to simulate observations.

Implicit models give scientistsmore freedom to accuratelymodel the phenomenon underconsideration. The increase incomputer power has madethere use more practicable.Popular in many disciplines.

��������

��������

��������

��������

��������

��������

��������

����������������

��������

��������

����������������

��������

��������

��������

��������

����������������������������������������������������������������

����������������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

��������

����

����

����

��������

����

����������������

����

����

����

������������

��������

����

��������

��������

����

����

����

��������

Time

t

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 2 / 19

Page 3: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Fitting to data

Most models are forwards models, i.e., specify parameters θ and i.c.s andthe model generates output D. Usually, we are interested in theinverse-problem, i.e., observe data, want to estimate parameter values.Different terminology:

Calibration

Data assimilation

Parameterestimation

Inverse-problem

Bayesianinference

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 3 / 19

Page 4: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Monte Carlo Inference

Aim to sample from the posterior distribution:

π(θ|D) ∝ prior × likelihood = π(θ)P(D|θ).

Monte Carlo methods enable Bayesian inference to be done in morecomplex models.

MCMC can be difficult or impossible in many stochastic models, e.g.,if

◮ P(D|θ) unknown - true for many stochastic models,◮ or where there are convergence or mixing problems, often caused by

highly dependent data arising from an underlying tree or graphicalstructure.

⋆ Population Genetics⋆ Epidemiology⋆ Evolutionary Biology

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 4 / 19

Page 5: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Likelihood-Free Inference

Rejection Algorithm

Draw θ from prior π(·)Accept θ with probability P(D | θ)

Accepted θ are independent draws from the posterior distribution,π(θ | D).

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 5 / 19

Page 6: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Likelihood-Free Inference

Rejection Algorithm

Draw θ from prior π(·)Accept θ with probability P(D | θ)

Accepted θ are independent draws from the posterior distribution,π(θ | D).If the likelihood, P(D|θ), is unknown:

‘Mechanical’ Rejection Algorithm

Draw θ from π(·)Simulate D′ ∼ P(· | θ)

Accept θ if D = D′

The acceptance rate is P(D): the number of runs to get n observations isnegative binomial, with mean n

P(D) .

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 5 / 19

Page 7: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate Bayesian Computation I

If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:

Approximate Rejection Algorithm

Draw θ from π(θ)

Simulate D′ ∼ P(· | θ)

Accept θ if ρ(D,D′) ≤ ǫ

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 6 / 19

Page 8: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate Bayesian Computation I

If P(D) is small, we will rarely accept any θ. Instead, there is anapproximate version:

Approximate Rejection Algorithm

Draw θ from π(θ)

Simulate D′ ∼ P(· | θ)

Accept θ if ρ(D,D′) ≤ ǫ

This generates observations from π(θ | ρ(D,D′) < ǫ):

As ǫ → ∞, we get observations from the prior, π(θ).

If ǫ = 0, we generate observations from π(θ | D).

ǫ reflects the tension between computability and accuracy.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 6 / 19

Page 9: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate Bayesian Computation II

If the data are too high dimensional we never observe simulations that are‘close’ to the field data.Reduce the dimension using summary statistics, S(D).

Approximate Rejection Algorithm With Summaries

Draw θ from π(θ)

Simulate D′ ∼ P(· | θ)

Accept θ if ρ(S(D),S(D′)) < ǫ

If S is sufficient this is equivalent to the previous algorithm.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 7 / 19

Page 10: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Error Structure

Example (Gaussian Distribution)

Suppose Xi ∼ N(µ, σ2), with σ2 known, and give µ an improper flat priordistribution, π(µ) = 1 for µ ∈ R.

Suppose we observe data withx = 0.

Pick µ ∼ U(−∞,∞)

Simulate Xi ∼ N(µ, σ2)

Accept µ if |x | < ǫ.

Then π(µ | |x | ≤ ǫ) =

Φ

(

ǫ−µ√σ2/n

)

− Φ

(

−ǫ−µ√σ2/n

)

and

Var(µ | |x | ≤ ǫ) = Var(µ | x = 0)+ǫ2

3

1000 samples

−10 −5 0 5 10

0.0

0.4

0.8

1.2

−10 −5 0 5 10

0.0

0.4

0.8

1.2

−10 −5 0 5 10

0.0

0.4

0.8

1.2

−10 −5 0 5 10

0.0

0.4

0.8

1.2

ǫ = 1 ǫ = 5

ǫ = 0.5ǫ = 0.1

Den

sity

Den

sity

Den

sity

Den

sity

µµ

µµ

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 8 / 19

Page 11: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.

The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 9 / 19

Page 12: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Approximate MCMCRejection sampling is inefficient, as θ is repeatedly sampled from its priordistribution.

The idea behind MCMC is that by correlating observations moretime is spent in regions of high likelihood.

Approximate Metropolis-Hastings Algorithm

Suppose we are currently at θ. Propose θ′ from density q(θ, θ′).

Simulate D′ from P(·|θ′).If ρ(D,D′) ≤ ǫ, calculate

h(θ, θ′) = min

(

1,π(θ′)q(θ′, θ)

π(θ)q(θ, θ′)

)

.

Accept the move to θ′ with probability h(θ, θ′), else stay at θ.

Adaptive tolerance choices.Sisson et al. and Robert et al. proposed an approximate sequentialimportance sampling algorithm.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 9 / 19

Page 13: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19

Page 14: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with

π(θ1 | D,θ2) known,

π(θ2 | D,θ1) unknown.

We can combine Gibbs update steps (or any M-H update) with ABC.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19

Page 15: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

ABC-within-MCMC

Problem: a low acceptance rate leads to slow convergence.Suppose θ = (θ1,θ2) with

π(θ1 | D,θ2) known,

π(θ2 | D,θ1) unknown.

We can combine Gibbs update steps (or any M-H update) with ABC.

ABC-within-Gibbs Algorithm

Suppose we are at θt = (θt1, θ

t2)

1. Draw θt+11 ∼ π(θ1 | D, θt

2)

2. Draw θ∗2 ∼ πθ2(·)

◮ Simulate D′ ∼ P(· | θt+11 , θ∗2 )

◮ If ρ(D,D′) < ǫ, set θt+12 = θ∗2 . Else return to step 2.

This is often the case for models with a hidden tree structure generatinghighly dependent data.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 10 / 19

Page 16: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Example From Population BiologyInferring ancestral divergence times

����

������������

����

��������

��������

��������

��������

����

��������

����

����������������

����

��������

��������

��������

��������������������������������

��������

��������

����

����

��������

����

����

��

����

����

��

����

��������

����

��������

����

��������

��

��

����

����

������������

����

��������

����

��������

����

����

����

����

����

����

����

��������

Time

t

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 11 / 19

Page 17: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Choosing summary statistics and metrics

We need

summaries S(D), which aresensitive to changes in θ, butrobust to random variations inDa definition of approximatesufficiency (LeCam 1963):distance between π(θ | D) andπ(θ | S(D))?

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

2.0

D1

D2

a systematic implementable approach for finding good summarystatistics.

Complex dependence structures can be accounted for.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 12 / 19

Page 18: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

ABC Approach

Data can be thought of in two parts:

the observed number of fossils Di found in ith interval

the total number of fossils found, D+.

D′ denotes simulated data. A suitable metric might be

ρ(D,D′) =k

i=1

Di

D+− D ′

i

D ′+

+

D ′+

D+− 1

Note: no data summaries here

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 13 / 19

Page 19: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Not going so well

0 200 400 600 800 1000

5010

015

020

025

030

0

Ext

ant

Pop

ula

tion

Siz

e

Iteration Number

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 14 / 19

Page 20: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Tweak the metric

The simulated N0 values are too small (376 modern species)

Easy to combine different types of information with ABC

Change the metric

ρ(D,D′) =k

i=1

Di

D+− D ′

i

D ′+

+

D ′+

D+− 1

+

N ′

0

N0− 1

This gives approximate samples from

π(θ | D,N0 = 376) ∝ P(D,N0 = 376 | θ)π(θ)

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 15 / 19

Page 21: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Results

60 80 100 120 140

0.00

0.01

0.02

0.03

Den

sity

Divergence Time (My)

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 16 / 19

Page 22: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Extensions

Model selection:

Ratio of acceptance ratesπM1

(S ′≈S)

πM2(S ′≈S) ≈ Bayes Factor. Relative

acceptance rates gives posterior model probabilities.◮ Hopeless in practice as it is too sensitive to the tolerance ǫ.

Raftery and Lewis (1992) and Chib (1995) give computationalschemes to calculate Bayes factors. Neither works.

Expensive Simulators:

Emulate the stochastic model with a Gaussian process emulator.Richard Boys, Darren Wilkinson et al. .

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 17 / 19

Page 23: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

Pros and cons of ABC

Pros ◮ Likelihood is not needed◮ Easy to code◮ Easy to adapt◮ Generates independent observations (parallel computation)

Cons ◮ Hard to anticipate effect of summary statistics (needs intuition)◮ Over dispersion of posterior due to ρ(D,D′) < ǫ◮ For complex problems, sampling from the prior does not make good

use of observations

Issues ◮ One run or many?◮ How to choose good summary statistics?◮ How good an approximation do we get?

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 18 / 19

Page 24: Approximate Bayesian Computation: a simulation based ... · Approximate Rejection Algorithm With Summaries Draw θ from π(θ) Simulate D′ ∼ P(· | θ) Accept θ if ρ(S(D),S(D′))

References

M. A. Beaumont and W. Zhang and D. J. Balding,Approximate Bayesian Computatation in Population Genetics,Genetics, 2002.

P. Marjoram and J. Molitor and V. Plagnol and S.

Tavare, Markov Chain Monte Carlo without likelihoods, PNAS,2003.

S. A. Sisson and Y. Fan and M. M. Tanaka, SequentialMonte Carlo without Likelihoods, PNAS, 2007.

C. P. Robert, M. A. Beaumont, J. Marin and J. Cornuet,Adaptivity for ABC algorithms: the ABC-PMC scheme, arXiv, 2008.

R.D. Wilkinson (University of Sheffield) Approximate Bayesian Computation PASCAL 2008 19 / 19