of 42

• date post

10-Jan-2016
• Category

## Documents

• view

4

0

Embed Size (px)

description

Conjugate families

### Transcript of Conjugate families

• This is page iiiPrinter: Opaque this

Contents

6 Conjugate families 16.1 Binomial - beta prior . . . . . . . . . . . . . . . . . . . . . . 2

6.1.1 Uninformative priors . . . . . . . . . . . . . . . . . . 26.2 Gaussian (unknown mean, known variance) . . . . . . . . . 2

6.2.1 Uninformative prior . . . . . . . . . . . . . . . . . . 46.3 Gaussian (known mean, unknown variance) . . . . . . . . . 4

6.3.1 Uninformative prior . . . . . . . . . . . . . . . . . . 56.4 Gaussian (unknown mean, unknown variance) . . . . . . . . 5

6.4.1 Completing the square . . . . . . . . . . . . . . . . . 66.4.2 Marginal posterior distributions . . . . . . . . . . . . 76.4.3 Uninformative priors . . . . . . . . . . . . . . . . . . 9

6.5 Multivariate Gaussian (unknown mean, known variance) . . 116.5.1 Completing the square . . . . . . . . . . . . . . . . . 116.5.2 Uninformative priors . . . . . . . . . . . . . . . . . . 12

6.6 Multivariate Gaussian (unknown mean, unknown variance) 136.6.1 Completing the square . . . . . . . . . . . . . . . . . 146.6.2 Inverted-Wishart kernel . . . . . . . . . . . . . . . . 146.6.3 Marginal posterior distributions . . . . . . . . . . . . 156.6.4 Uninformative priors . . . . . . . . . . . . . . . . . . 17

6.7 Bayesian linear regression . . . . . . . . . . . . . . . . . . . 186.7.1 Known variance . . . . . . . . . . . . . . . . . . . . 196.7.2 Unknown variance . . . . . . . . . . . . . . . . . . . 226.7.3 Uninformative priors . . . . . . . . . . . . . . . . . . 26

6.8 Bayesian linear regression with general error structure . . . 27

• iv Contents

6.8.1 Known variance . . . . . . . . . . . . . . . . . . . . 286.8.2 Unknown variance . . . . . . . . . . . . . . . . . . . 296.8.3 (Nearly) uninformative priors . . . . . . . . . . . . . 32

6.9 Appendix: summary of conjugacy . . . . . . . . . . . . . . . 34

• This is page 1Printer: Opaque this

6Conjugate families

Conjugate families arise when the likelihood times the prior produces arecognizable posterior kernel

p ( | y) ( | y) p ()

where the kernel is the characteristic part of the distribution function thatdepends on the random variable(s) (the part excluding any normalizingconstants). For example, the density function for a univariate Gaussian ornormal is

12

exp

122

(x )2

and its kernel (for known) is

exp

122

(x )2

as 12

is a normalizing constant. Now, we discuss a few common conjugate

family results1 and uninformative prior results to connect with classicalresults.

1A more complete set of conjugate families are summarized in chapter 7 of Accountingand Causal Eects: Econometric Challenges as well as tabulated in an appendix at theend of the chapter.

• 2 6. Conjugate families

6.1 Binomial - beta prior

A binomial likelihood with unknown success probability, ,

( | s;n) =ns

s (1 )ns

s =n

i=1 yi, yi = {0, 1}combines with a beta(; a, b) prior (i.e., with parameters a and b)

p () = (a+ b)

(a) (b)a1 (1 )b1

to yield

p ( | y) s (1 )ns a1 (1 )b1 s+a1 (1 )ns+b1

which is the kernel of a beta distribution with parameters (a+ s) and(b+ n s), beta( | y; a+ s, b+ n s).

6.1.1 Uninformative priors

Suppose priors for are uniform over the interval zero to one or, equiva-lently, beta(1, 1).2 Then, the likelihood determines the posterior distribu-tion for .

p ( | y) s (1 )nswhich is beta( | y; 1 + s, 1 + n s).

6.2 Gaussian (unknown mean, known variance)

A single draw from a Gaussian likelihood with unknown mean, , knownstandard deviation, ,

( | y,) exp12

(y )22

combines with a Gaussian or normal prior for given 2 with prior mean0 and prior variance 20

p | 2; 0, 20

exp12

( 0)220

2 Some would utilize Jereys prior, p () beta; 12, 12

, which is invariant to trans-

formation, as the uninformative prior.

• 6.2 Gaussian (unknown mean, known variance) 3

or writing 20 2/0, we have

p | 2; 0,2/0

exp12

0 ( 0)22

to yield

p | y,, 0,2/0

exp12

(y )2

2+

0 ( 0)22

Expansion and rearrangement gives

p | y,, 0,2/0

exp 122

y2 + 0

20 2y + 2 + 0

2 20

Any terms not involving are constants and can be discarded as they areabsorbed on normalization of the posterior

p | y,, 0,2/0

exp 122

2 (0 + 1) 2 (00 + y)

Completing the square (add and subtract (00+y)

2

0+1), dropping the term

subtracted (as its all constants), and factoring out (0 + 1) gives

p | y,, 0,2/0

exp0 + 122

00 + y

0 + 1

2

Finally, we have

p | y,, 0,2/0

exp12

( 1)221

where 1 =00+y0+1

=10

0+12y

10+ 12

and 21 =2

0+1= 11

0+ 12, or the posterior

distribution of the mean given the data and priors is Gaussian or normal.Notice, the posterior mean, 1, weights the data and prior beliefs by theirrelative precisions.For a sample of n exchangeable draws, the likelihood is

( | y,) ni=1

exp

12

(yi )22

combined with the above prior yields

p | y,, 0,2/0

exp12

( n)22n

• 4 6. Conjugate families

where n =00+ny0+n

=10

0+n2y

10+ n2

, y is the sample mean, and 2n =2

0+n=

110+ n2, or the posterior distribution of the mean, , given the data and

priors is again Gaussian or normal and the posterior mean, n, weights thedata and priors by their relative precisions.

6.2.1 Uninformative prior

An uninformative prior for the mean, , is the (improper) uniform, p | 2 =

1. Hence, the likelihood

( | y,) ni=1

exp

12

(yi )22

exp 122

ni=1

y2i 2ny + n2

exp 122

ni=1

y2i ny2 + n ( y)2

exp 122

n ( y)2

determines the posterior

p | 2, y expn

2

( y)22

which is the kernel for a Gaussian or N | 2, y; y, 2n

, the classical result.

6.3 Gaussian (known mean, unknown variance)

For a sample of n exchangeable draws with known mean, , and unknownvariance, , a Gaussian or normal likelihood is

( | y, ) ni=1

12 exp

12

(yi )2

combines with an inverted-gamma(a, b)

p (; a, b) (a+1) exp b

to yield an inverted-gamma

n+2a2 , b+

12 tposterior distribution where

t =

ni=1

(yi )2

• 6.4 Gaussian (unknown mean, unknown variance) 5

Alternatively and conveniently (but equivalently), we could parameterizethe prior as an inverted-chi square

0,

20

3

p; 0,

20

()( 02 +1) exp 0202

and combine with the above likelihood to yield

p ( | y) (n+02 +1) exp 12

0

20 + t

an inverted chi-square

0 + n,

020+t

0+n

.

6.3.1 Uninformative prior

An uninformative prior for scale is

p () 1

Hence, the posterior distribution for scale is

p ( | y) (n2+1) exp t2

which is the kernel of an inverted-chi square

;n, tn

.

6.4 Gaussian (unknown mean, unknown variance)

For a sample of n exchangeable draws, a normal likelihood with unknownmean, , and unknown (but constant) variance, 2, is

,2 | y n

i=1

1 exp

12

(yi )22

Expanding and rewriting the likelihood gives

,2 | y n exp n

i=1

12

y2i 2yi + 22

i=1 2yiy = 2ny2, we write

,2 | y 2n2 exp 1

22

ni=1

y2i 2yiy + y2

+y2 2y + 2

3 200X

is a scaled, inverted-chi square0,20

with scale 20 where X is a chi

square(0) random variable.

• 6 6. Conjugate families

or

,2 | y 2n2 exp 1

22

ni=1

(yi y)2 + (y )2

which can be rewritten as

,2 | y 2n2 exp 1

22

(n 1) s2 + n (y )2

where s2 = 1n1

ni=1 (yi y)2. The above likelihood combines with a

Gaussian or normal | 2; 0,2/0

inverted-chi square2; 0,20 prior4p | 2; 0,2/0

p 2; 0,20 1 exp0 ( 0)

2

22

2(0/2+1) exp 02022

2( 0+32 )

exp0

20 + 0 ( 0)2

22

to yield a normal

| 2; n,2n/n

*inverted-chi square

2; n,

2n

joint

posterior distribution5 where

n = 0 + n

n = 0 + n

n2n = 0

20 + (n 1) s2 +

0n

0 + n(0 y)2

That is, the joint posterior is

p,2 | y; 0,2/0, 0,20

2n+0+32 exp

122

0

20 + (n 1) s2

+0 ( 0)2+n ( y)2

6.4.1 Completing the square

The expression for the joint posterior is written by completing the square.Completing the weighted square for centered around

n =1

0 + n(00 + ny)

4The prior for the mean, , is conditional on the scale of the data, 2.5The product of normal or Gaussian kernels produces a Gaussian kernel.

• 6.4 Gaussian (unknown mean, unknown variance) 7

where y = 1nn

i=1 yi gives

(0 + n) ( n)2 = (0 + n) 2 2 (0 + n) n + (0 + n) 2n= (0 + n)

2 2 (00 + ny) + (0 + n) 2nWhile expanding the exponent includes the square plus additional termsas follows

0 ( 0)2 + n ( y)2 = 02 20 + 20

+ n

2 2y + y2

= (0 + n) 2 2 (00 + ny) + 020 + ny2

Add and subtract (0 + n) 2n and simplify.

0 ( 0)2 + n ( y)2 = (0 + n) 2 2 (0 + n) n + (0 + n) 2n (0 + n) 2n + 020 + ny2

= (0 + n) ( n)21

(0 + n)

(0 + n)

0

20 + ny

2

(00 + ny)2

Expand and simplify the last term.

0 ( 0)2 + n ( y)2 = (0 + n) ( n)2 + 0n0 + n

(0 y)2

Now, the joint posterior can be rewr