Savage-Dickey paradox

56
On resolving the Savage–Dickey paradox On resolving the Savage–Dickey paradox Christian P. Robert Universit´ e Paris Dauphine & CREST-INSEE http://www.ceremade.dauphine.fr/ ~ xian Frontiers... San Antonio, March 19, 2010 Joint work with J.-M. Marin

description

This is my talk in San Antonio, March 19, 2010

Transcript of Savage-Dickey paradox

Page 1: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

On resolving the Savage–Dickey paradox

Christian P. Robert

Universite Paris Dauphine & CREST-INSEEhttp://www.ceremade.dauphine.fr/~xian

Frontiers...San Antonio, March 19, 2010Joint work with J.-M. Marin

Page 2: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Outline

1 Importance sampling solutions compared

2 The Savage–Dickey ratio

Page 3: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Outline

1 Importance sampling solutions compared

2 The Savage–Dickey ratio

Happy B’day, Jim!!!

Page 4: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Evidence

Bayesian model choice and hypothesis testing relies on a similarquantity, the evidence

Zk =∫

Θk

πk(θk)Lk(θk) dθk, k = 1, 2, . . .

aka the marginal likelihood.[Jeffreys, 1939]

Page 5: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Importance sampling solutions

1 Importance sampling solutions comparedRegular importanceBridge samplingHarmonic meansChib’s representation

2 The Savage–Dickey ratio

Page 6: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Bayes factor approximation

When approximating the Bayes factor

B01 =

∫Θ0

f0(x|θ0)π0(θ0)dθ0∫Θ1

f1(x|θ1)π1(θ1)dθ1

use of importance functions ϕ0 and ϕ1 and

B01 =n−1

0

∑n0i=1 f0(x|θi0)π0(θi0)/ϕ0(θi0)

n−11

∑n1i=1 f1(x|θi1)π1(θi1)/ϕ1(θi1)

Page 7: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Probit modelling on Pima Indian women

Example (R benchmark)200 Pima Indian women with observed variables

plasma glucose concentration in oral glucose tolerance test

diastolic blood pressure

diabetes pedigree function

presence/absence of diabetes

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)Use of importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

Page 8: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Probit modelling on Pima Indian women

Example (R benchmark)200 Pima Indian women with observed variables

plasma glucose concentration in oral glucose tolerance test

diastolic blood pressure

diabetes pedigree function

presence/absence of diabetes

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)Use of importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

Page 9: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Probit modelling on Pima Indian women

Example (R benchmark)200 Pima Indian women with observed variables

plasma glucose concentration in oral glucose tolerance test

diastolic blood pressure

diabetes pedigree function

presence/absence of diabetes

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)Use of importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

Page 10: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Probit modelling on Pima Indian women

Example (R benchmark)200 Pima Indian women with observed variables

plasma glucose concentration in oral glucose tolerance test

diastolic blood pressure

diabetes pedigree function

presence/absence of diabetes

Probability of diabetes function of above variables

P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,

Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:

β ∼ N3(0, n(XTX)−1

)Use of importance function inspired from the MLE estimatedistribution

β ∼ N (β, Σ)

Page 11: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Regular importance

Diabetes in Pima Indian women

Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations from the prior andthe above MLE importance sampler

●●

Monte Carlo Importance sampling

23

45

Page 12: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Bridge sampling

General identity:

B12 =

∫π2(θ|x)α(θ)π1(θ|x)dθ∫π1(θ|x)α(θ)π2(θ|x)dθ

∀ α(·)

1n1

n1∑i=1

π2(θ1i|x)α(θ1i)

1n2

n2∑i=1

π1(θ2i|x)α(θ2i)θji ∼ πj(θ|x)

Back later!

Page 13: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Optimal bridge sampling

The optimal choice of auxiliary function is

α? ∝ 1n1π1(θ|x) + n2π2(θ|x)

leading to

B12 ≈

1n1

n1∑i=1

π2(θ1i|x)n1π1(θ1i|x) + n2π2(θ1i|x)

1n2

n2∑i=1

π1(θ2i|x)n1π1(θ2i|x) + n2π2(θ2i|x)

Page 14: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Extension to varying dimensions

When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution

π1(θ1|x)× ω(ψ|θ1, x)

on Θ2 so that

B12 =

∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ

= Eπ2

[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)

]=

Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]

for any conditional density ω(ψ|θ1) and any joint density ϕ.

Page 15: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Extension to varying dimensions

When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution

π1(θ1|x)× ω(ψ|θ1, x)

on Θ2 so that

B12 =

∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ

= Eπ2

[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)

]=

Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]

for any conditional density ω(ψ|θ1) and any joint density ϕ.

Page 16: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Illustration for the Pima Indian dataset

Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3

in bridge sampling estimate

Page 17: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Bridge sampling

Illustration for the Pima Indian dataset

Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3

in bridge sampling estimate

●●

MC Bridge IS

23

45

Page 18: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

The original harmonic mean estimator

When θki ∼ πk(θ|x),

1T

T∑t=1

1L(θkt|x)

is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]

Highly dangerous: Most often leads to an infinite variance!!!

Page 19: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

The original harmonic mean estimator

When θki ∼ πk(θ|x),

1T

T∑t=1

1L(θkt|x)

is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]

Highly dangerous: Most often leads to an infinite variance!!!

Page 20: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

“The Worst Monte Carlo Method Ever”

“The good news is that the Law of Large Numbers guarantees that thisestimator is consistent ie, it will very likely be very close to the correctanswer if you use a sufficiently large number of points from the posteriordistribution.

The bad news is that the number of points required for this estimator to

get close to the right answer will often be greater than the number of

atoms in the observable universe. The even worse news is that it’s easy

for people to not realize this, and to naıvely accept estimates that are

nowhere close to the correct value of the marginal likelihood.”

[Radford Neal’s blog, Aug. 23, 2008]

Page 21: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

Approximating Zk from a posterior sample

Use of the [harmonic mean] identity

Eπk[

ϕ(θk)πk(θk)Lk(θk)

∣∣∣∣x] =∫

ϕ(θk)πk(θk)Lk(θk)

πk(θk)Lk(θk)Zk

dθk =1Zk

no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]

Direct exploitation of the MCMC output

Page 22: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

Approximating Zk from a posterior sample

Use of the [harmonic mean] identity

Eπk[

ϕ(θk)πk(θk)Lk(θk)

∣∣∣∣x] =∫

ϕ(θk)πk(θk)Lk(θk)

πk(θk)Lk(θk)Zk

dθk =1Zk

no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]

Direct exploitation of the MCMC output

Page 23: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

Comparison with regular importance sampling

Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation

Z1k = 1

/1T

T∑t=1

ϕ(θ(t)k )

πk(θ(t)k )Lk(θ

(t)k )

to have a finite variance.E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ

Page 24: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

Comparison with regular importance sampling

Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation

Z1k = 1

/1T

T∑t=1

ϕ(θ(t)k )

πk(θ(t)k )Lk(θ

(t)k )

to have a finite variance.E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ

Page 25: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

HPD indicator as ϕ

Use the convex hull of MCMC simulations corresponding to the10% HPD region (easily derived!) and ϕ as indicator:

ϕ(θ) =10T

∑t∈HPD

Id(θ,θ(t))≤ε

Page 26: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Harmonic means

Diabetes in Pima Indian women (cont’d)

Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above harmonic mean sampler and importance samplers

●●

Harmonic mean Importance sampling

3.10

23.

104

3.10

63.

108

3.11

03.

112

3.11

43.

116

Page 27: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Use of an approximation to the posterior

Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)

πk(θ∗k|x).

Page 28: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Use of an approximation to the posterior

Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)

πk(θ∗k|x).

Page 29: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwellestimate

πk(θ∗k|x) =1T

T∑t=1

πk(θ∗k|x, z(t)k ) ,

where the z(t)k ’s are Gibbs sampled latent variables

Page 30: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(σ(θk)|x) =1k!

∑σ∈S

πk(σ(θk)|x)

for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using

πk(θ∗k|x) =1T k!

∑σ∈Sk

T∑t=1

πk(σ(θ∗k)|x, z(t)k ) .

[Berkhof, Mechelen, & Gelman, 2003]

Page 31: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(σ(θk)|x) =1k!

∑σ∈S

πk(σ(θk)|x)

for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using

πk(θ∗k|x) =1T k!

∑σ∈Sk

T∑t=1

πk(σ(θ∗k)|x, z(t)k ) .

[Berkhof, Mechelen, & Gelman, 2003]

Page 32: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

Importance sampling solutions compared

Chib’s representation

Case of the probit modelFor the completion by z,

π(θ|x) =1T

∑t

π(θ|x, z(t))

is a simple average of normal densities

●●

Chib's method importance sampling

0.02

400.

0245

0.02

500.

0255

Page 33: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

1 Importance sampling solutions compared

2 The Savage–Dickey ratioMeasure-theoretic aspectsComputational implications

Page 34: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

1 Importance sampling solutions compared

2 The Savage–Dickey ratioMeasure-theoretic aspectsComputational implications

Page 35: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

The Savage–Dickey ratio representation

Special representation of the Bayes factor used for simulation

Original version (Dickey, AoMS, 1971)

Page 36: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

The Savage–Dickey ratio representation

Special representation of the Bayes factor used for simulation

Original version (Dickey, AoMS, 1971)

Page 37: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Savage’s density ratio theorem

Given a test H0 : θ = θ0 in a model f(x|θ, ψ) with a nuisanceparameter ψ, under priors π0(ψ) and π1(θ, ψ) such that

π1(ψ|θ0) = π0(ψ)

then

B01 =π1(θ0|x)π1(θ0)

,

with the obvious notations

π1(θ) =∫π1(θ, ψ)dψ , π1(θ|x) =

∫π1(θ, ψ|x)dψ ,

[Dickey, 1971; Verdinelli & Wasserman, 1995]

Page 38: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Rephrased

“Suppose that f0(θ) = f1(θ|φ = φ0). As f0(x|θ) = f1(x|θ, φ = φ0),

f0(x) =

Zf1(x|θ, φ = φ0)f1(θ|φ = φ0) dθ = f1(x|φ = φ0) ,

i.e., the denumerator of the Bayes factor is the value of f1(x|φ) at φ = φ0, while the denominator is an averageof the values of f1(x|φ) for φ 6= φ0, weighted by the prior distribution f1(φ) under the augmented model.Applying Bayes’ theorem to the right-hand side of [the above] we get

f0(x) = f1(φ0|x)f1(x)‹f1(φ0)

and hence the Bayes factor is given by

B = f0(x)‹f1(x) = f1(φ0|x)

‹f1(φ0) .

the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

[O’Hagan & Forster, 1996]

Page 39: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Rephrased

“Suppose that f0(θ) = f1(θ|φ = φ0). As f0(x|θ) = f1(x|θ, φ = φ0),

f0(x) =

Zf1(x|θ, φ = φ0)f1(θ|φ = φ0) dθ = f1(x|φ = φ0) ,

i.e., the denumerator of the Bayes factor is the value of f1(x|φ) at φ = φ0, while the denominator is an averageof the values of f1(x|φ) for φ 6= φ0, weighted by the prior distribution f1(φ) under the augmented model.Applying Bayes’ theorem to the right-hand side of [the above] we get

f0(x) = f1(φ0|x)f1(x)‹f1(φ0)

and hence the Bayes factor is given by

B = f0(x)‹f1(x) = f1(φ0|x)

‹f1(φ0) .

the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

[O’Hagan & Forster, 1996]

Page 40: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Measure-theoretic difficulty

Representation depends on the choice of versions of conditionaldensities:

B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫

π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]

=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)

[specific version of π1(ψ|θ0)

and arbitrary version of π1(θ0)]

=∫π1(θ0, ψ)f(x|θ0, ψ) dψ

m1(x)π1(θ0)[specific version of π1(θ0, ψ)]

=π1(θ0|x)π1(θ0)

[version dependent]

Page 41: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Measure-theoretic difficulty

Representation depends on the choice of versions of conditionaldensities:

B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫

π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]

=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)

[specific version of π1(ψ|θ0)

and arbitrary version of π1(θ0)]

=∫π1(θ0, ψ)f(x|θ0, ψ) dψ

m1(x)π1(θ0)[specific version of π1(θ0, ψ)]

=π1(θ0|x)π1(θ0)

[version dependent]

Page 42: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Choice of density version

c© Dickey’s (1971) condition is not a condition:

Ifπ1(θ0|x)π1(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

is chosen as a version, then Savage–Dickey’s representation holds

Page 43: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Choice of density version

c© Dickey’s (1971) condition is not a condition:

Ifπ1(θ0|x)π1(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

is chosen as a version, then Savage–Dickey’s representation holds

Page 44: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Savage–Dickey paradox

Verdinelli-Wasserman extension:

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

]similarly depends on choices of versions...

...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it

[Chen, Shao & Ibrahim, 2000]

Page 45: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Measure-theoretic aspects

Savage–Dickey paradox

Verdinelli-Wasserman extension:

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

]similarly depends on choices of versions...

...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it

[Chen, Shao & Ibrahim, 2000]

Page 46: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

A computational exploitation

Starting from the (instrumental) prior

π1(θ, ψ) = π1(θ)π0(ψ)

define the associated posterior

π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)

and impose the choice of version

π1(θ0|x)π0(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

Then

B01 =π1(θ0|x)π1(θ0)

m1(x)m1(x)

Page 47: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

A computational exploitation

Starting from the (instrumental) prior

π1(θ, ψ) = π1(θ)π0(ψ)

define the associated posterior

π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)

and impose the choice of version

π1(θ0|x)π0(θ0)

=∫π0(ψ)f(x|θ0, ψ) dψ

m1(x)

Then

B01 =π1(θ0|x)π1(θ0)

m1(x)m1(x)

Page 48: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

First ratio

If (θ(1), ψ(1)), . . . , (θ(T ), ψ(T )) ∼ π(θ, ψ|x), then

1T

∑t

π1(θ0|x, ψ(t))

converges to π1(θ0|x) provided the right version is used in θ0

π1(θ0|x, ψ) =π1(θ0)f(x|θ0, ψ)∫π1(θ)f(x|θ, ψ) dθ

Page 49: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with

1T

T∑t=1

π1(θ0|x, z(t), ψ(t))

via data completion by latent variable z such that

f(x|θ, ψ) =∫f(x, z|θ, ψ) dz

and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version

π1(θ0|x, z, ψ)π1(θ0)

=f(x, z|θ0, ψ)∫

f(x, z|θ, ψ)π1(θ) dθ.

Page 50: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with

1T

T∑t=1

π1(θ0|x, z(t), ψ(t))

via data completion by latent variable z such that

f(x|θ, ψ) =∫f(x, z|θ, ψ) dz

and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version

π1(θ0|x, z, ψ)π1(θ0)

=f(x, z|θ0, ψ)∫

f(x, z|θ, ψ)π1(θ) dθ.

Page 51: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Bridge revival (1)

Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity

Eπ1(θ,ψ|x)

[π1(θ, ψ)f(x|θ, ψ)

π0(ψ)π1(θ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π1(ψ|θ)π0(ψ)

]=m1(x)m1(x)

to (biasedly) estimate m1(x)/m1(x) by

T/ T∑

t=1

π1(ψ(t)|θ(t))π0(ψ(t))

based on the same sample from π1.

Page 52: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Bridge revival (1)

Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity

Eπ1(θ,ψ|x)

[π1(θ, ψ)f(x|θ, ψ)

π0(ψ)π1(θ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π1(ψ|θ)π0(ψ)

]=m1(x)m1(x)

to (biasedly) estimate m1(x)/m1(x) by

T/ T∑

t=1

π1(ψ(t)|θ(t))π0(ψ(t))

based on the same sample from π1.

Page 53: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Bridge revival (2)Alternative identity

Eπ1(θ,ψ|x)

[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]=m1(x)m1(x)

suggests using a second sample (θ(t), ψ(t), z(t)) ∼ π1(θ, ψ|x) andthe ratio estimate

1T

T∑t=1

π0(ψ(t))/π1(ψ(t)|θ(t))

Resulting unbiased estimate:

B01 =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

Page 54: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Bridge revival (2)Alternative identity

Eπ1(θ,ψ|x)

[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)

]= Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]=m1(x)m1(x)

suggests using a second sample (θ(t), ψ(t), z(t)) ∼ π1(θ, ψ|x) andthe ratio estimate

1T

T∑t=1

π0(ψ(t))/π1(ψ(t)|θ(t))

Resulting unbiased estimate:

B01 =1T

∑t π1(θ0|x, z(t), ψ(t))

π1(θ0)1T

T∑t=1

π0(ψ(t))π1(ψ(t)|θ(t))

Page 55: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Difference with Verdinelli–Wasserman representation

The above leads to the representation

B01 =π1(θ0|x)π1(θ0)

Eπ1(θ,ψ|x)

[π0(ψ)π1(ψ|θ)

]shows how our approach differs from Verdinelli and Wasserman’s

B01 =π1(θ0|x)π1(θ0)

Eπ1(ψ|x,θ0,x)

[π0(ψ)π1(ψ|θ0)

][for referees only!!]

Page 56: Savage-Dickey paradox

On resolving the Savage–Dickey paradox

The Savage–Dickey ratio

Computational implications

Diabetes in Pima Indian women (cont’d)

Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above importance, Chib’s, Savage–Dickey’s and bridgesamplers

IS Chib Savage−Dickey Bridge

2.8

3.0

3.2

3.4