Talk on the design on non-negative unbiased estimators, useful to perform exact inference for intractable target distributions. Corresponds to the article http://arxiv.org/abs/1309.6473

Pierre E. Jacob@ University of Oxford

& Alexandre H. Thiery@ National University of Singapore

Banff – March 2014

1 Exact inference

2 Unbiased estimators and sign problem

3 Existence and non-existence results

4 Discussion

Exact inference

For a target probability distribution with unnormalised density π, anumerical method is “exact” if for any test function φ,∫


can be approximated with arbitrary precision, at the cost of morecomputational effort.

⇒ In this sense MCMC is exact.

Exact inference

With exact methods, no systematic error.

No guarantees that for a fixed computational budget, exactmethods should be preferred over approximate methods.

Still important to know in which settings exact methods areavailable.

Exact inference using Metropolis-Hastings

Metropolis-Hastings algorithm.

1: Set some θ(1).2: for i = 2 to Nθ do3: Propose θ⋆ ∼ q(·|θ(i−1)).4: Compute the ratio:

α = min(




5: Set θ(i) = θ⋆ with probability α, otherwise set θ(i) = θ(i−1).6: end for

Exact inference with unbiased estimators

Pseudo-marginal Metropolis-Hastings algorithm.For each θ, we can sample Z (θ) with E(Z (θ)) = π(θ).

1: Set some θ(1) and sample Z (θ(1)).2: for i = 2 to Nθ do3: Propose θ⋆ ∼ q(·|θ(i−1)) and sample Z (θ⋆).4: Compute the ratio:

α = min(

1,Z (θ⋆)

Z (θ(i−1))q(θ(i−1)|θ⋆)q(θ⋆|θ(i−1))


5: Set θ(i) = θ⋆, Z (θ(i)) = Z (θ⋆) with probability α, otherwiseset θ(i) = θ(i−1), Z (θ(i)) = Z (θ(i−1)).

6: end for

Exact inference with unbiased estimators

Game-changer when one has access to efficient unbiasedestimators of the target density.

Especially in parameter inference for hidden Markov models,with particle MCMC methods based on particle filters toestimate the likelihood.

What if one doesn’t have access to straightforward unbiasedestimators? Are there general schemes to obtain thoseunbiased estimators?

Exact inference with unbiased estimators

Example: big dataObservations yi

iid∼ fθ for i = 1, . . . , n, and n is very large.Can we do exact inference without computing the full likelihoodevery time we try a new parameter value?

Unbiased estimator of the log-likelihood

ℓ(θ) = (n/m)m∑

i=1log f (yσi | θ)

for m < n and σi corresponding to some subsampling scheme.

It doesn’t directly provide an unbiased estimator of thelikelihood.

Exact inference with unbiased estimators

Doubly intractable distributionsPosterior density decomposable into

π(θ | y) = 1C (θ)

f (y; θ)p(θ).

One can typically get an unbiased estimator of C (θ) usingimportance sampling.

It doesn’t directly provide an unbiased estimator of 1/C (θ).

Exact inference with unbiased estimators

Reference priorsStarting from an arbitrary prior π⋆, define

fk(θ) = exp{∫

p(y1, . . . , yk | θ) log π⋆(θ | y1, . . . , yk)dy1 . . . dyk

}and the reference prior is, for any θ0 in the interior of Θ,

f (θ) = limk→∞



(Berger, Bernardo, Sun 2009.)Unbiased estimators of f (θ)?

Pierre Jacob Non-negative unbiased estimators 10/ 28

Unbiased estimators

Von Neuman & Ulam (∼ 1950), Kuti (∼ 1980), Rychlik (∼ 1990),McLeish, Rhee & Glynn (∼ 2010). . .

Removing the bias off consistent estimatorsIntroduce

a random variable S with E(S) = λ ∈ R,a sequence (Sn)n≥0 converging to S in L2,N be an integer valued random variable andwn = 1/P(N ≥ n) < ∞ for all n ≥ 0,


Y =N∑

n=0wn ×

(Sn − Sn−1

)has expectation E(Y ) = E(S) = λ.

Pierre Jacob Non-negative unbiased estimators 11/ 28

Unbiased estimators

If ∞∑n=1

wn × E(

|S − Sn−1|2)

< ∞, (1)

then the variance of Y is finite.

Denote by tn the expected computing time to obtainSn − Sn−1.Then the computing time of Y , denoted by τ , shouldpreferably satisfy

E(τ) =∞∑


n × tn < ∞. (2)

Success story in multi-level Monte Carlo.

Unbiased estimators

Even if the consistent estimators Sn are each almost-surelynon-negative, Y is not in general almost-surely non-negative:

Y =N∑

n=0wn ×

(Sn − Sn−1


unless we manage to construct ordered consistent estimators, ie:

P(Sn−1 ≤ Sn) = 1.

Direct implementation of the pseudo-marginal approach is difficultin the presence of possibly negative acceptance probabilities.

Dealing with negative values

One can still perform exact inference by noting∫φ(θ)π(θ)dθ∫



which suggests using the absolute values of Z (θ) in the MHacceptance ratio.The integral is recovered using the importance samplingestimator: ∑N

i=1 σ(Z (θ(i)))φ(θ(i))∑Ni=1 σ(Z (θ(i)))

As an importance sampler, deteriorates with the dimension.Called the sign problem in lattice quantum chromodynamics.

Avoiding the sign problem

Can we avoid the sign problem by directly designingnon-negative unbiased estimators?

Given an unbiased estimator of λ > 0, can I generate anon-negative unbiased estimator of λ?

Let f be any function f : R → R+. Given an unbiasedestimator of λ ∈ R, can I generate a non-negative unbiasedestimator of f (λ)?

Pierre Jacob Non-negative unbiased estimators 15/ 28

f -factory

Let X be a subset of R and f : conv(X ) → R+ a function.

DefinitionAn X -algorithm A is an f -factory if, given as inputs

any i.i.d sequence X = (Xk)k≥1 with expectation λ ∈ R,an auxiliary random variable U ∼ Uniform(0, 1) independentof (Xk)k≥1,

then Y = A(U , X) is a non-negative unbiased estimator of f (λ).

Pierre Jacob Non-negative unbiased estimators 16/ 28

X -algorithm

Let X be a subset of R.

DefinitionAn X -algorithm A is a pair

(T , φ


T = (Tn)n≥0 is a sequence of Tn : (0, 1) × X n → {0, 1},φ = (φn)n≥0 is a sequence of φn : (0, 1) × X n → R+.

A ≡ (T , φ) takes u ∈ (0, 1) and x = (xi)i≥1 ∈ X ∞ as inputs andproduces as output

exit time: τ = τ(u, x) = inf{n ≥ 0 : Tn(u, x1, . . . , xn) = 1}exit value: A(u, x) = φτ (u, x1, . . . , xτ )

Set A(u, x) = ∞ if Tn never gives 1.

Pierre Jacob Non-negative unbiased estimators 17/ 28

General non-existence of f -factories

TheoremFor any non constant function f : R → R+, no f -factory exists.

LemmaGiven i.i.d copies of an unbiased estimator of λ > 0 and a uniformrandom variable U , there is no algorithm producing a non-negativeunbiased estimator of λ.

The lemma is not directly implied by the theorem but the proof isvery similar.

For the sake of contradiction, introducea non-constant function f : R → R+, and λ1, λ2 ∈ R withf (λ1) > f (λ2),an f -factory (φ, T ).

Consider an i.i.d sequence X = (Xn)n≥1 with expectation λ1.Then

A(U , X) = φτX (U , X1, . . . , XτX )

has expectation f (λ1), and

τX = inf{n : Tn(U , X1, . . . , Xn) = 1}

is almost surely finite.

An f -factory should work for any input sequence.Introduce Bernoulli variables (Bn)n≥1, with P(Bn = 0) = ε and

Yn = Bn Xn + λ2 − λ1(1 − ε)ε

(1 − Bn)

so that E(Yn) = λ2.Then

A(U , Y ) = φτY (U , Y1, . . . , YτY )

has expectation f (λ2) < f (λ1), and

τY = inf{n : Tn(U , Y1, . . . , Yn) = 1}

is almost surely finite.

By construction we can tune the probability (1 − ε)n of

Mn = {(Y1, . . . , Yn) = (X1, . . . , Xn)},

by changing ε. On the events

{(Y1, . . . , Yn) = (X1, . . . , Xn)}

the algorithm has to “compensate”, so that

f (λ2) = E[A(U , Y )] < E[A(U , X)] = f (λ1).

But the algorithm cannot ouptut values lower than zero⇒ for ε small enough it leads to a contradiction.

Other cases

By putting more restrictions on X we get different results.

The case where X ⊂ R+ and f is decreasing also leads to anon-existence result.

The case where X ⊂ R+ and f is increasing allows someconstructions, for instance for real analytic functions.

Other cases

No full characterisation of increasing functions allowingf -factories for X ⊂ R+, yet.

The case where X = [a, b] is related to the Bernoulli factory.Necessary and sufficient condition: f continuous and thereexist n, m ∈ N and ε > 0 such that

∀x ∈ [a, b] f (x) ≥ ε min ((x − a)m , (b − x)n)

Case X = [a, b]


∀x ∈ [a, b] f (x) ≥ ε min ((x − a)m , (b − x)n) .

Introduceg : x 7→ f (x)/{(x − a)m(b − x)n}

bounded away from zero.Hence g can be approximated from below by polynomials.Introduce

P1(x) =∑


α(1)i,j (x − a)i(b − x)j

with non-negative coefficients, and such that P1(x) ≤ g(x).Then approximate g − P1 from below by P2, g − P1 − P2 by P3,etc.

Pierre Jacob Non-negative unbiased estimators 24/ 28

Case X = [a, b]

We obtain a sum of polynomials∑n

k=0 Pk(x) converging to g(x)when n → ∞.We multiply by (x − a)m(b − x)n to estimate f (x) instead.This leads to a sequence of estimators

Sn =n∑




{ i∏p=1

(Xp − a)i}{ j∏

q=1(b − Xi+q)j


for which P(Sn−1 ≤ Sn) = 1, yielding a non-negative unbiasedestimator of f (λ).

Back to statistics

No f -factory for X = R and any non-constant f .⇒ without lower bounds on the log-likelihood estimator, nonon-negative unbiased likelihood estimators.

No f -factory for decreasing functions f and X = R+.⇒ without lower and upper bounds on the estimator of C (θ),no non-negative unbiased estimators of 1/C (θ).

For the reference prior, it seems hopeless unless X = [a, b].

Pierre Jacob Non-negative unbiased estimators 26/ 28

No answer for the case f “slowly” increasing and X ⊂ R+.

We only considered the transformation of an unbiasedestimator of λ to an unbiased estimator of f (λ).

Should we tolerate negative values and come up withappropriate methodology?

Should we aim for exact inference?


