Download - Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Transcript
Page 1: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Prior-Posterior Analysis and Conjugacy

Econ 690

Purdue University

Justin L. Tobias (Purdue) Prior-Posterior Analysis 1 / 33

Page 2: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Outline

1 Review

2 Conjugate Bernoulli TrialsExamples and Prior SensitivityMarginal likelihoods

3 Conjugate Exponential Analysis

4 Conjugate Poisson Analysis

Justin L. Tobias (Purdue) Prior-Posterior Analysis 2 / 33

Page 3: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Review

Review of Basic Framework

Quantities to become known under sampling are denoted by theT -dimensional vector y ,

The remaining unknown quantities are denoted by the k-dimensionalvector θ ∈ Θ ⊆ Rk .

Standard manipulations show:

p(y , θ) = p(θ)p(y |θ) = p(y)p(θ|y),

where p(θ) is the prior density, p(θ|y) is the posterior density andp(y |θ) is the likelihood function

We also note

p(y) =

∫Θ

p(θ)L(θ)dθ

is the marginal density of the observed data, also known as themarginal likelihood)

Justin L. Tobias (Purdue) Prior-Posterior Analysis 3 / 33

Page 4: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Review

Bayes Theorem

Bayes’ theorem for densities follows immediately:

p(θ|y) =p(θ)L(θ)

p(y)∝ p(θ)L(θ).

The shape of the posterior can be learned by plotting the right handside of this expression when k = 1 or k = 2.

Obtaining moments or quantiles, however, requires the integratingconstant, i.e., the marginal likelihood p(y).

In most situations, the required integration cannot be performedanalytically.

In simple examples, however, this integration can be carried out.Many of these cases arise in conjugate situations. By “conjugacy,” wemean that the functional forms of the prior and posterior are thesame.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 4 / 33

Page 5: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

Given a parameter θ where 0 < θ < 1, consider T iid Bernoulli randomvariables Yt (t = 1, 2, · · · ,T ), each with probability mass function(p.m.f.):

p(yt |θ) =

{θ if yt = 1

1− θ if yt = 0= θyt (1− θ)1−yt .

The likelihood function associated with this data is

where m = Ty is the number of successes (i.e., yt = 1) in T trials.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 5 / 33

Page 6: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

Suppose prior beliefs concerning θ are represented by a Beta distributionwith p.d.f.

where α > 0 and δ > 0 are known, and B(α, δ) = Γ(α)Γ(δ)/Γ(α + δ) isthe Beta function defined in terms of the Gamma functionΓ(α) =

∫∞0 tα−1 exp(−t)dt.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 6 / 33

Page 7: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

Note that the Beta is a reasonable choice of prior, since itincorporates the necessary constraint that θ ∈ (0, 1).

Also note that α and δ are chosen by you!

Some guidance in this regard can be obtained my noting:

Justin L. Tobias (Purdue) Prior-Posterior Analysis 7 / 33

Page 8: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

By Bayes’ Theorem:p(θ|y) ∝ p(θ)p(y |θ).

Putting the previous parts together, we obtain

where

Thus, the posterior distribution for θ is also of the Beta form,θ|y ∼ B(α, δ) so that the beta density is a conjugate prior for theBernoulli sampling model.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 8 / 33

Page 9: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

From our handout on “special” distributions, we know that

E (θ|y) =α

α + δ=

α + Ty

α + δ + T.

Similarly, the prior mean is

E (θ) ≡ µ =α

α + δ.

Expanding the posterior mean a bit further, we find:

Justin L. Tobias (Purdue) Prior-Posterior Analysis 9 / 33

Page 10: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials

Conjugate Bernoulli Trials

E (θ|y) = wT yT + (1− wT )µ,

a weighted average of the sample mean y and the prior mean µ.What happens as T →∞?Note that

wT =T

α + δ + T

and thus as T →∞, wT → 1, and thus the posterior mean E (θ|y)approaches the sample mean yT .This is sensible, and illustrates that, in large samples, information from thedata dominates information in the prior (provided the prior is notdogmatic).

Justin L. Tobias (Purdue) Prior-Posterior Analysis 10 / 33

Page 11: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis: Example

Consider the 2011 record for the Purdue football team: (T = 12, y = .5):

y = [1 0 1 0 1 0 1 0 0 1 0 1]′.

As a “neutral” fan, before the season started, you had little priorinformation about Purdue’s success probability θ.You summarized this lack of information by choosing

and thusp(θ) = I (0 < θ < 1),

i.e., a uniform prior over the unit interval.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 11 / 33

Page 12: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis: Example

Your prior over θ can be graphed as follows:

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

θ

Prior

Justin L. Tobias (Purdue) Prior-Posterior Analysis 12 / 33

Page 13: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis: ExampleYour posterior beliefs, after observing all 12 games, is as follows:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

θ

Den

sity

Posterior

Prior

Justin L. Tobias (Purdue) Prior-Posterior Analysis 13 / 33

Page 14: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis: Example

Now suppose, instead of having “no” prior information, you expectedthat Purdue would win 80 percent of its games this season.

You incorporate these beliefs by choosing the following priorhyperparameters:

Note that this implies

The prior and posterior under this scenario are as follows:

Justin L. Tobias (Purdue) Prior-Posterior Analysis 14 / 33

Page 15: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis, Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

θ

Den

sity

Posterior

Prior

Justin L. Tobias (Purdue) Prior-Posterior Analysis 15 / 33

Page 16: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis, Example

To illustrate the impact of the sample size on the posterior, let usconduct an experiment.

Using θ = .25 as the “true” probability of the data generatingprocess, let’s generate y vectors of length N = 25, 100, 1, 000, whereyi = 1 with probability .25 and 0 otherwise, for all i .

Keep the same “optimistic” prior.

Examine how the posterior changes as the sample size increases.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 16 / 33

Page 17: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis, Example: N = 25

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

Posterior (Blue) Prior (Red)

Justin L. Tobias (Purdue) Prior-Posterior Analysis 17 / 33

Page 18: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis, Example:N = 100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

8

9

10

Posterior (Blue) Prior (Red)

Justin L. Tobias (Purdue) Prior-Posterior Analysis 18 / 33

Page 19: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Examples and Prior Sensitivity

Conjugate Bernoulli Analysis, Example: N = 1, 000

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

Posterior (Blue) Prior (Red)

Justin L. Tobias (Purdue) Prior-Posterior Analysis 19 / 33

Page 20: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Bernoulli Trials Marginal likelihoods

Conjugate Bernoulli Analysis: Marginal Likelihood

Consider, for this problem, determining the marginal likelihood p(y):

p(y) =

∫Θ

p(θ)p(y |θ)dθ.

Here the integration is reasonably straightforward:

where the last integral equals unity because the integrand is a Beta p.d.f.for θ.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 20 / 33

Page 21: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis

Suppose Yt (t = 1, 2, · · · ,T ) is a random sample form an Exponentialdistribution fEXP(yt |θ) = θ exp(−θyt), which has mean θ−1.

In addition, suppose that the prior distribution of θ is the Gammadistribution G (α, β) where α > 0 and β > 0:

p(θ) ∝ θα−1 exp(−θ/β).

What is the posterior distribution of θ?

Justin L. Tobias (Purdue) Prior-Posterior Analysis 21 / 33

Page 22: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis

The likelihood function is

Define α = α + T and β = (β−1 + Ty)−1.Using Bayes Theorem, the posterior density is

Therefore, θ|y ∼ G (α, β). Thus the Gamma prior is a conjugate prior forthe exponential sampling model.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 22 / 33

Page 23: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis

Using properties of the Gamma distribution, we know:

Note, in this parameterization of the exponential,

E (y |θ) =1

θ

and the MLE is

θ̂MLE =1

yT

.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 23 / 33

Page 24: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis: Example

Assume that the duration of the life of a lightbulb is described by anexponential density,

p(yi |θ) = θ−1 exp(−θ−1yi ).

We parameterize the exponential in this way to work in terms of the meanof y .

You obtain data on 10 continuously running light bulbs and find that theylast 25, 20, 40, 75, 15, 30, 30, 10, 20 and 40 days, respectively. Using aninverse gamma prior for θ of the form

derive the posterior distribution of θ and plot it alongside the prior.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 24 / 33

Page 25: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis: Example

Note

Combining this with our prior, we obtain

p(θ|y) ∝ θ−(α+T+1) exp(−θ−1[Ty + β−1]

).

This is in the form of an

density.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 25 / 33

Page 26: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis, Example: α = 3,β = 1/40.

0 10 20 30 40 50 60 70 800

0.01

0.02

0.03

0.04

0.05

0.06

θ

Den

sity

Prior

Posterior

Justin L. Tobias (Purdue) Prior-Posterior Analysis 26 / 33

Page 27: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis: Example

In this example, our choice of prior hyperparameters produced a prior thathad a mean and standard deviation equal to 20. To see this, note (fromthe distributional catalog notes):

and

Justin L. Tobias (Purdue) Prior-Posterior Analysis 27 / 33

Page 28: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis: Example

Think about what the output represents and what kinds of questions youcan answer:

What is the (posterior) probability that a light bulb has an average lifespan of more than 30 days?

Justin L. Tobias (Purdue) Prior-Posterior Analysis 28 / 33

Page 29: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Exponential Analysis

Conjugate Exponential Analysis: Example

Suppose I intend to purchase a light bulb tomorrow. Based on the datathat I have observed (as well as my own prior beliefs), what is theprobability that the light bulb I purchase will last at least 30 days?

Let yf denote the future, as yet unobserved duration of our light bulb. Wewould first seek to recover

the posterior predictive density. We can do this (see future notes onprediction) and obtain:

(Note that the posterior predictive density and the θ posterior distributionare not the same thing!)

Justin L. Tobias (Purdue) Prior-Posterior Analysis 29 / 33

Page 30: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Poisson Analysis

Conjugate Poisson Analysis

Suppose Yt( t = 1, 2, · · · ,T ) is a random sample from a Poissondistribution with mean θ, i.e.,

p(yt |θ) =θyt exp(−θ)

yt !, yt = 0, 1, 2, . . .

and that the prior distribution of θ is the Gamma distribution G (α, β) :

p(θ) ∝ θα−1 exp(−θ/β).

Find the posterior distribution of θ.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 30 / 33

Page 31: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Poisson Analysis

Conjugate Poisson Analysis

The likelihood function is

Define α = α + Ty and β = (β−1 + T )−1. Using Bayes Theorem, theposterior density is proportional to:

Therefore, θ|y ∼ G (α, β). Thus, the Gamma prior is a conjugate prior forthe Poisson sampling model.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 31 / 33

Page 32: Prior-Posterior Analysis and Conjugacyjltobias/BayesClass/...Conjugate Bernoulli Trials Conjugate Bernoulli Trials By Bayes’ Theorem: p( jy) /p( )p(yj ): Putting the previous parts

Conjugate Poisson Analysis

Conjugate Poisson Analysis

As before, note

Therefore, the posterior mean converges to yT as T →∞. Similarly,

Var(θ|y) = αβ2 → 0 as T →∞.

Justin L. Tobias (Purdue) Prior-Posterior Analysis 32 / 33