Savage-Dickey paradox
-
Upload
christian-robert -
Category
Education
-
view
1.570 -
download
0
description
Transcript of Savage-Dickey paradox
On resolving the Savage–Dickey paradox
On resolving the Savage–Dickey paradox
Christian P. Robert
Universite Paris Dauphine & CREST-INSEEhttp://www.ceremade.dauphine.fr/~xian
Frontiers...San Antonio, March 19, 2010Joint work with J.-M. Marin
On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Happy B’day, Jim!!!
On resolving the Savage–Dickey paradox
Evidence
Bayesian model choice and hypothesis testing relies on a similarquantity, the evidence
Zk =∫
Θk
πk(θk)Lk(θk) dθk, k = 1, 2, . . .
aka the marginal likelihood.[Jeffreys, 1939]
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Importance sampling solutions
1 Importance sampling solutions comparedRegular importanceBridge samplingHarmonic meansChib’s representation
2 The Savage–Dickey ratio
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Bayes factor approximation
When approximating the Bayes factor
B01 =
∫Θ0
f0(x|θ0)π0(θ0)dθ0∫Θ1
f1(x|θ1)π1(θ1)dθ1
use of importance functions ϕ0 and ϕ1 and
B01 =n−1
0
∑n0i=1 f0(x|θi0)π0(θi0)/ϕ0(θi0)
n−11
∑n1i=1 f1(x|θi1)π1(θi1)/ϕ1(θi1)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:
β ∼ N3(0, n(XTX)−1
)Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:
β ∼ N3(0, n(XTX)−1
)Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:
β ∼ N3(0, n(XTX)−1
)Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1β1 + x2β2 + x3β3) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on ag-prior modelling:
β ∼ N3(0, n(XTX)−1
)Use of importance function inspired from the MLE estimatedistribution
β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Diabetes in Pima Indian women
Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations from the prior andthe above MLE importance sampler
●●
Monte Carlo Importance sampling
23
45
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Bridge sampling
General identity:
B12 =
∫π2(θ|x)α(θ)π1(θ|x)dθ∫π1(θ|x)α(θ)π2(θ|x)dθ
∀ α(·)
≈
1n1
n1∑i=1
π2(θ1i|x)α(θ1i)
1n2
n2∑i=1
π1(θ2i|x)α(θ2i)θji ∼ πj(θ|x)
Back later!
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Optimal bridge sampling
The optimal choice of auxiliary function is
α? ∝ 1n1π1(θ|x) + n2π2(θ|x)
leading to
B12 ≈
1n1
n1∑i=1
π2(θ1i|x)n1π1(θ1i|x) + n2π2(θ1i|x)
1n2
n2∑i=1
π1(θ2i|x)n1π1(θ2i|x) + n2π2(θ2i|x)
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution
π1(θ1|x)× ω(ψ|θ1, x)
on Θ2 so that
B12 =
∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ
= Eπ2
[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)
]=
Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]
for any conditional density ω(ψ|θ1) and any joint density ϕ.
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1) 6= dim(Θ2), e.g. θ2 = (θ1, ψ), introduction of apseudo-posterior density, ω(ψ|θ1, x), augmenting π1(θ1|x) intojoint distribution
π1(θ1|x)× ω(ψ|θ1, x)
on Θ2 so that
B12 =
∫π1(θ1|x)α(θ1, ψ)π2(θ1, ψ|x)dθ1ω(ψ|θ1, x) dψ∫π2(θ1, ψ|x)α(θ1, ψ)π1(θ1|x)dθ1 ω(ψ|θ1, x) dψ
= Eπ2
[π1(θ1)ω(ψ|θ1)π2(θ1, ψ)
]=
Eϕ [π1(θ1)ω(ψ|θ1)/ϕ(θ1, ψ)]Eϕ [π2(θ1, ψ)/ϕ(θ1, ψ)]
for any conditional density ω(ψ|θ1) and any joint density ϕ.
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1, β2) as apseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
●
●
●●
MC Bridge IS
23
45
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk(θ|x),
1T
T∑t=1
1L(θkt|x)
is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk(θ|x),
1T
T∑t=1
1L(θkt|x)
is an unbiased estimator of 1/mk(x)[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
“The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that thisestimator is consistent ie, it will very likely be very close to the correctanswer if you use a sufficiently large number of points from the posteriordistribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to naıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
Eπk[
ϕ(θk)πk(θk)Lk(θk)
∣∣∣∣x] =∫
ϕ(θk)πk(θk)Lk(θk)
πk(θk)Lk(θk)Zk
dθk =1Zk
no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
Eπk[
ϕ(θk)πk(θk)Lk(θk)
∣∣∣∣x] =∫
ϕ(θk)πk(θk)Lk(θk)
πk(θk)Lk(θk)Zk
dθk =1Zk
no matter what the proposal ϕ(·) is.[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation
Z1k = 1
/1T
T∑t=1
ϕ(θ(t)k )
πk(θ(t)k )Lk(θ
(t)k )
to have a finite variance.E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance samplingconstraints: ϕ(θ) must have lighter (rather than fatter) tails thanπk(θk)Lk(θk) for the approximation
Z1k = 1
/1T
T∑t=1
ϕ(θ(t)k )
πk(θ(t)k )Lk(θ
(t)k )
to have a finite variance.E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
HPD indicator as ϕ
Use the convex hull of MCMC simulations corresponding to the10% HPD region (easily derived!) and ϕ as indicator:
ϕ(θ) =10T
∑t∈HPD
Id(θ,θ(t))≤ε
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above harmonic mean sampler and importance samplers
●
●●
Harmonic mean Importance sampling
3.10
23.
104
3.10
63.
108
3.11
03.
112
3.11
43.
116
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),
Zk = mk(x) =fk(x|θk)πk(θk)
πk(θk|x)
Use of an approximation to the posterior
Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)
πk(θ∗k|x).
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),
Zk = mk(x) =fk(x|θk)πk(θk)
πk(θk|x)
Use of an approximation to the posterior
Zk = mk(x) =fk(x|θ∗k)πk(θ∗k)
πk(θ∗k|x).
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwellestimate
πk(θ∗k|x) =1T
T∑t=1
πk(θ∗k|x, z(t)k ) ,
where the z(t)k ’s are Gibbs sampled latent variables
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
For mixture models, z(t)k usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
πk(θk|x) = πk(σ(θk)|x) =1k!
∑σ∈S
πk(σ(θk)|x)
for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using
πk(θ∗k|x) =1T k!
∑σ∈Sk
T∑t=1
πk(σ(θ∗k)|x, z(t)k ) .
[Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
For mixture models, z(t)k usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
πk(θk|x) = πk(σ(θk)|x) =1k!
∑σ∈S
πk(σ(θk)|x)
for all σ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using
πk(θ∗k|x) =1T k!
∑σ∈Sk
T∑t=1
πk(σ(θ∗k)|x, z(t)k ) .
[Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of the probit modelFor the completion by z,
π(θ|x) =1T
∑t
π(θ|x, z(t))
is a simple average of normal densities
●●
●
●
●
●
●
Chib's method importance sampling
0.02
400.
0245
0.02
500.
0255
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratioMeasure-theoretic aspectsComputational implications
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratioMeasure-theoretic aspectsComputational implications
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage’s density ratio theorem
Given a test H0 : θ = θ0 in a model f(x|θ, ψ) with a nuisanceparameter ψ, under priors π0(ψ) and π1(θ, ψ) such that
π1(ψ|θ0) = π0(ψ)
then
B01 =π1(θ0|x)π1(θ0)
,
with the obvious notations
π1(θ) =∫π1(θ, ψ)dψ , π1(θ|x) =
∫π1(θ, ψ|x)dψ ,
[Dickey, 1971; Verdinelli & Wasserman, 1995]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0(θ) = f1(θ|φ = φ0). As f0(x|θ) = f1(x|θ, φ = φ0),
f0(x) =
Zf1(x|θ, φ = φ0)f1(θ|φ = φ0) dθ = f1(x|φ = φ0) ,
i.e., the denumerator of the Bayes factor is the value of f1(x|φ) at φ = φ0, while the denominator is an averageof the values of f1(x|φ) for φ 6= φ0, weighted by the prior distribution f1(φ) under the augmented model.Applying Bayes’ theorem to the right-hand side of [the above] we get
f0(x) = f1(φ0|x)f1(x)‹f1(φ0)
and hence the Bayes factor is given by
B = f0(x)‹f1(x) = f1(φ0|x)
‹f1(φ0) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0(θ) = f1(θ|φ = φ0). As f0(x|θ) = f1(x|θ, φ = φ0),
f0(x) =
Zf1(x|θ, φ = φ0)f1(θ|φ = φ0) dθ = f1(x|φ = φ0) ,
i.e., the denumerator of the Bayes factor is the value of f1(x|φ) at φ = φ0, while the denominator is an averageof the values of f1(x|φ) for φ 6= φ0, weighted by the prior distribution f1(φ) under the augmented model.Applying Bayes’ theorem to the right-hand side of [the above] we get
f0(x) = f1(φ0|x)f1(x)‹f1(φ0)
and hence the Bayes factor is given by
B = f0(x)‹f1(x) = f1(φ0|x)
‹f1(φ0) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditionaldensities:
B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫
π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]
=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)
[specific version of π1(ψ|θ0)
and arbitrary version of π1(θ0)]
=∫π1(θ0, ψ)f(x|θ0, ψ) dψ
m1(x)π1(θ0)[specific version of π1(θ0, ψ)]
=π1(θ0|x)π1(θ0)
[version dependent]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditionaldensities:
B01 =∫π0(ψ)f(x|θ0, ψ) dψ∫
π1(θ, ψ)f(x|θ, ψ) dψdθ[by definition]
=∫π1(ψ|θ0)f(x|θ0, ψ) dψ π1(θ0)∫π1(θ, ψ)f(x|θ, ψ) dψdθ π1(θ0)
[specific version of π1(ψ|θ0)
and arbitrary version of π1(θ0)]
=∫π1(θ0, ψ)f(x|θ0, ψ) dψ
m1(x)π1(θ0)[specific version of π1(θ0, ψ)]
=π1(θ0|x)π1(θ0)
[version dependent]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c© Dickey’s (1971) condition is not a condition:
Ifπ1(θ0|x)π1(θ0)
=∫π0(ψ)f(x|θ0, ψ) dψ
m1(x)
is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c© Dickey’s (1971) condition is not a condition:
Ifπ1(θ0|x)π1(θ0)
=∫π0(ψ)f(x|θ0, ψ) dψ
m1(x)
is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
B01 =π1(θ0|x)π1(θ0)
Eπ1(ψ|x,θ0,x)
[π0(ψ)π1(ψ|θ0)
]similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it
[Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
B01 =π1(θ0|x)π1(θ0)
Eπ1(ψ|x,θ0,x)
[π0(ψ)π1(ψ|θ0)
]similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of alldensities without making mention of it
[Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1(θ, ψ) = π1(θ)π0(ψ)
define the associated posterior
π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)
and impose the choice of version
π1(θ0|x)π0(θ0)
=∫π0(ψ)f(x|θ0, ψ) dψ
m1(x)
Then
B01 =π1(θ0|x)π1(θ0)
m1(x)m1(x)
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1(θ, ψ) = π1(θ)π0(ψ)
define the associated posterior
π1(θ, ψ|x) = π0(ψ)π1(θ)f(x|θ, ψ)/m1(x)
and impose the choice of version
π1(θ0|x)π0(θ0)
=∫π0(ψ)f(x|θ0, ψ) dψ
m1(x)
Then
B01 =π1(θ0|x)π1(θ0)
m1(x)m1(x)
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
First ratio
If (θ(1), ψ(1)), . . . , (θ(T ), ψ(T )) ∼ π(θ, ψ|x), then
1T
∑t
π1(θ0|x, ψ(t))
converges to π1(θ0|x) provided the right version is used in θ0
π1(θ0|x, ψ) =π1(θ0)f(x|θ0, ψ)∫π1(θ)f(x|θ, ψ) dθ
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with
1T
T∑t=1
π1(θ0|x, z(t), ψ(t))
via data completion by latent variable z such that
f(x|θ, ψ) =∫f(x, z|θ, ψ) dz
and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version
π1(θ0|x, z, ψ)π1(θ0)
=f(x, z|θ0, ψ)∫
f(x, z|θ, ψ)π1(θ) dθ.
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variablesWhen π1(θ0|x, ψ) unavailable, replace with
1T
T∑t=1
π1(θ0|x, z(t), ψ(t))
via data completion by latent variable z such that
f(x|θ, ψ) =∫f(x, z|θ, ψ) dz
and that π1(θ, ψ, z|x) ∝ π0(ψ)π1(θ)f(x, z|θ, ψ) available in closedform, including the normalising constant, based on version
π1(θ0|x, z, ψ)π1(θ0)
=f(x, z|θ0, ψ)∫
f(x, z|θ, ψ)π1(θ) dθ.
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity
Eπ1(θ,ψ|x)
[π1(θ, ψ)f(x|θ, ψ)
π0(ψ)π1(θ)f(x|θ, ψ)
]= Eπ1(θ,ψ|x)
[π1(ψ|θ)π0(ψ)
]=m1(x)m1(x)
to (biasedly) estimate m1(x)/m1(x) by
T/ T∑
t=1
π1(ψ(t)|θ(t))π0(ψ(t))
based on the same sample from π1.
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1(x)/m1(x) is unknown, apparent failure!Use of the bridge identity
Eπ1(θ,ψ|x)
[π1(θ, ψ)f(x|θ, ψ)
π0(ψ)π1(θ)f(x|θ, ψ)
]= Eπ1(θ,ψ|x)
[π1(ψ|θ)π0(ψ)
]=m1(x)m1(x)
to (biasedly) estimate m1(x)/m1(x) by
T/ T∑
t=1
π1(ψ(t)|θ(t))π0(ψ(t))
based on the same sample from π1.
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)Alternative identity
Eπ1(θ,ψ|x)
[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)
]= Eπ1(θ,ψ|x)
[π0(ψ)π1(ψ|θ)
]=m1(x)m1(x)
suggests using a second sample (θ(t), ψ(t), z(t)) ∼ π1(θ, ψ|x) andthe ratio estimate
1T
T∑t=1
π0(ψ(t))/π1(ψ(t)|θ(t))
Resulting unbiased estimate:
B01 =1T
∑t π1(θ0|x, z(t), ψ(t))
π1(θ0)1T
T∑t=1
π0(ψ(t))π1(ψ(t)|θ(t))
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)Alternative identity
Eπ1(θ,ψ|x)
[π0(ψ)π1(θ)f(x|θ, ψ)π1(θ, ψ)f(x|θ, ψ)
]= Eπ1(θ,ψ|x)
[π0(ψ)π1(ψ|θ)
]=m1(x)m1(x)
suggests using a second sample (θ(t), ψ(t), z(t)) ∼ π1(θ, ψ|x) andthe ratio estimate
1T
T∑t=1
π0(ψ(t))/π1(ψ(t)|θ(t))
Resulting unbiased estimate:
B01 =1T
∑t π1(θ0|x, z(t), ψ(t))
π1(θ0)1T
T∑t=1
π0(ψ(t))π1(ψ(t)|θ(t))
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Difference with Verdinelli–Wasserman representation
The above leads to the representation
B01 =π1(θ0|x)π1(θ0)
Eπ1(θ,ψ|x)
[π0(ψ)π1(ψ|θ)
]shows how our approach differs from Verdinelli and Wasserman’s
B01 =π1(θ0|x)π1(θ0)
Eπ1(ψ|x,θ0,x)
[π0(ψ)π1(ψ|θ0)
][for referees only!!]
On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximationsbased on 100 replicas for 20, 000 simulations for a simulation fromthe above importance, Chib’s, Savage–Dickey’s and bridgesamplers
●
●
IS Chib Savage−Dickey Bridge
2.8
3.0
3.2
3.4