Conjugate families

of 42

  • date post

    10-Jan-2016
  • Category

    Documents

  • view

    4
  • download

    0

Embed Size (px)

description

Conjugate families

Transcript of Conjugate families

  • This is page iiiPrinter: Opaque this

    Contents

    6 Conjugate families 16.1 Binomial - beta prior . . . . . . . . . . . . . . . . . . . . . . 2

    6.1.1 Uninformative priors . . . . . . . . . . . . . . . . . . 26.2 Gaussian (unknown mean, known variance) . . . . . . . . . 2

    6.2.1 Uninformative prior . . . . . . . . . . . . . . . . . . 46.3 Gaussian (known mean, unknown variance) . . . . . . . . . 4

    6.3.1 Uninformative prior . . . . . . . . . . . . . . . . . . 56.4 Gaussian (unknown mean, unknown variance) . . . . . . . . 5

    6.4.1 Completing the square . . . . . . . . . . . . . . . . . 66.4.2 Marginal posterior distributions . . . . . . . . . . . . 76.4.3 Uninformative priors . . . . . . . . . . . . . . . . . . 9

    6.5 Multivariate Gaussian (unknown mean, known variance) . . 116.5.1 Completing the square . . . . . . . . . . . . . . . . . 116.5.2 Uninformative priors . . . . . . . . . . . . . . . . . . 12

    6.6 Multivariate Gaussian (unknown mean, unknown variance) 136.6.1 Completing the square . . . . . . . . . . . . . . . . . 146.6.2 Inverted-Wishart kernel . . . . . . . . . . . . . . . . 146.6.3 Marginal posterior distributions . . . . . . . . . . . . 156.6.4 Uninformative priors . . . . . . . . . . . . . . . . . . 17

    6.7 Bayesian linear regression . . . . . . . . . . . . . . . . . . . 186.7.1 Known variance . . . . . . . . . . . . . . . . . . . . 196.7.2 Unknown variance . . . . . . . . . . . . . . . . . . . 226.7.3 Uninformative priors . . . . . . . . . . . . . . . . . . 26

    6.8 Bayesian linear regression with general error structure . . . 27

  • iv Contents

    6.8.1 Known variance . . . . . . . . . . . . . . . . . . . . 286.8.2 Unknown variance . . . . . . . . . . . . . . . . . . . 296.8.3 (Nearly) uninformative priors . . . . . . . . . . . . . 32

    6.9 Appendix: summary of conjugacy . . . . . . . . . . . . . . . 34

  • This is page 1Printer: Opaque this

    6Conjugate families

    Conjugate families arise when the likelihood times the prior produces arecognizable posterior kernel

    p ( | y) ( | y) p ()

    where the kernel is the characteristic part of the distribution function thatdepends on the random variable(s) (the part excluding any normalizingconstants). For example, the density function for a univariate Gaussian ornormal is

    12

    exp

    122

    (x )2

    and its kernel (for known) is

    exp

    122

    (x )2

    as 12

    is a normalizing constant. Now, we discuss a few common conjugate

    family results1 and uninformative prior results to connect with classicalresults.

    1A more complete set of conjugate families are summarized in chapter 7 of Accountingand Causal Eects: Econometric Challenges as well as tabulated in an appendix at theend of the chapter.

  • 2 6. Conjugate families

    6.1 Binomial - beta prior

    A binomial likelihood with unknown success probability, ,

    ( | s;n) =ns

    s (1 )ns

    s =n

    i=1 yi, yi = {0, 1}combines with a beta(; a, b) prior (i.e., with parameters a and b)

    p () = (a+ b)

    (a) (b)a1 (1 )b1

    to yield

    p ( | y) s (1 )ns a1 (1 )b1 s+a1 (1 )ns+b1

    which is the kernel of a beta distribution with parameters (a+ s) and(b+ n s), beta( | y; a+ s, b+ n s).

    6.1.1 Uninformative priors

    Suppose priors for are uniform over the interval zero to one or, equiva-lently, beta(1, 1).2 Then, the likelihood determines the posterior distribu-tion for .

    p ( | y) s (1 )nswhich is beta( | y; 1 + s, 1 + n s).

    6.2 Gaussian (unknown mean, known variance)

    A single draw from a Gaussian likelihood with unknown mean, , knownstandard deviation, ,

    ( | y,) exp12

    (y )22

    combines with a Gaussian or normal prior for given 2 with prior mean0 and prior variance 20

    p | 2; 0, 20

    exp12

    ( 0)220

    2 Some would utilize Jereys prior, p () beta; 12, 12

    , which is invariant to trans-

    formation, as the uninformative prior.

  • 6.2 Gaussian (unknown mean, known variance) 3

    or writing 20 2/0, we have

    p | 2; 0,2/0

    exp12

    0 ( 0)22

    to yield

    p | y,, 0,2/0

    exp12

    (y )2

    2+

    0 ( 0)22

    Expansion and rearrangement gives

    p | y,, 0,2/0

    exp 122

    y2 + 0

    20 2y + 2 + 0

    2 20

    Any terms not involving are constants and can be discarded as they areabsorbed on normalization of the posterior

    p | y,, 0,2/0

    exp 122

    2 (0 + 1) 2 (00 + y)

    Completing the square (add and subtract (00+y)

    2

    0+1), dropping the term

    subtracted (as its all constants), and factoring out (0 + 1) gives

    p | y,, 0,2/0

    exp0 + 122

    00 + y

    0 + 1

    2

    Finally, we have

    p | y,, 0,2/0

    exp12

    ( 1)221

    where 1 =00+y0+1

    =10

    0+12y

    10+ 12

    and 21 =2

    0+1= 11

    0+ 12, or the posterior

    distribution of the mean given the data and priors is Gaussian or normal.Notice, the posterior mean, 1, weights the data and prior beliefs by theirrelative precisions.For a sample of n exchangeable draws, the likelihood is

    ( | y,) ni=1

    exp

    12

    (yi )22

    combined with the above prior yields

    p | y,, 0,2/0

    exp12

    ( n)22n

  • 4 6. Conjugate families

    where n =00+ny0+n

    =10

    0+n2y

    10+ n2

    , y is the sample mean, and 2n =2

    0+n=

    110+ n2, or the posterior distribution of the mean, , given the data and

    priors is again Gaussian or normal and the posterior mean, n, weights thedata and priors by their relative precisions.

    6.2.1 Uninformative prior

    An uninformative prior for the mean, , is the (improper) uniform, p | 2 =

    1. Hence, the likelihood

    ( | y,) ni=1

    exp

    12

    (yi )22

    exp 122

    ni=1

    y2i 2ny + n2

    exp 122

    ni=1

    y2i ny2 + n ( y)2

    exp 122

    n ( y)2

    determines the posterior

    p | 2, y expn

    2

    ( y)22

    which is the kernel for a Gaussian or N | 2, y; y, 2n

    , the classical result.

    6.3 Gaussian (known mean, unknown variance)

    For a sample of n exchangeable draws with known mean, , and unknownvariance, , a Gaussian or normal likelihood is

    ( | y, ) ni=1

    12 exp

    12

    (yi )2

    combines with an inverted-gamma(a, b)

    p (; a, b) (a+1) exp b

    to yield an inverted-gamma

    n+2a2 , b+

    12 tposterior distribution where

    t =

    ni=1

    (yi )2

  • 6.4 Gaussian (unknown mean, unknown variance) 5

    Alternatively and conveniently (but equivalently), we could parameterizethe prior as an inverted-chi square

    0,

    20

    3

    p; 0,

    20

    ()( 02 +1) exp 0202

    and combine with the above likelihood to yield

    p ( | y) (n+02 +1) exp 12

    0

    20 + t

    an inverted chi-square

    0 + n,

    020+t

    0+n

    .

    6.3.1 Uninformative prior

    An uninformative prior for scale is

    p () 1

    Hence, the posterior distribution for scale is

    p ( | y) (n2+1) exp t2

    which is the kernel of an inverted-chi square

    ;n, tn

    .

    6.4 Gaussian (unknown mean, unknown variance)

    For a sample of n exchangeable draws, a normal likelihood with unknownmean, , and unknown (but constant) variance, 2, is

    ,2 | y n

    i=1

    1 exp

    12

    (yi )22

    Expanding and rewriting the likelihood gives

    ,2 | y n exp n

    i=1

    12

    y2i 2yi + 22

    Adding and subtractingn

    i=1 2yiy = 2ny2, we write

    ,2 | y 2n2 exp 1

    22

    ni=1

    y2i 2yiy + y2

    +y2 2y + 2

    3 200X

    is a scaled, inverted-chi square0,20

    with scale 20 where X is a chi

    square(0) random variable.

  • 6 6. Conjugate families

    or

    ,2 | y 2n2 exp 1

    22

    ni=1

    (yi y)2 + (y )2

    which can be rewritten as

    ,2 | y 2n2 exp 1

    22

    (n 1) s2 + n (y )2

    where s2 = 1n1

    ni=1 (yi y)2. The above likelihood combines with a

    Gaussian or normal | 2; 0,2/0

    inverted-chi square2; 0,20 prior4p | 2; 0,2/0

    p 2; 0,20 1 exp0 ( 0)

    2

    22

    2(0/2+1) exp 02022

    2( 0+32 )

    exp0

    20 + 0 ( 0)2

    22

    to yield a normal

    | 2; n,2n/n

    *inverted-chi square

    2; n,

    2n

    joint

    posterior distribution5 where

    n = 0 + n

    n = 0 + n

    n2n = 0

    20 + (n 1) s2 +

    0n

    0 + n(0 y)2

    That is, the joint posterior is

    p,2 | y; 0,2/0, 0,20

    2n+0+32 exp

    122

    0

    20 + (n 1) s2

    +0 ( 0)2+n ( y)2

    6.4.1 Completing the square

    The expression for the joint posterior is written by completing the square.Completing the weighted square for centered around

    n =1

    0 + n(00 + ny)

    4The prior for the mean, , is conditional on the scale of the data, 2.5The product of normal or Gaussian kernels produces a Gaussian kernel.

  • 6.4 Gaussian (unknown mean, unknown variance) 7

    where y = 1nn

    i=1 yi gives

    (0 + n) ( n)2 = (0 + n) 2 2 (0 + n) n + (0 + n) 2n= (0 + n)

    2 2 (00 + ny) + (0 + n) 2nWhile expanding the exponent includes the square plus additional termsas follows

    0 ( 0)2 + n ( y)2 = 02 20 + 20

    + n

    2 2y + y2

    = (0 + n) 2 2 (00 + ny) + 020 + ny2

    Add and subtract (0 + n) 2n and simplify.

    0 ( 0)2 + n ( y)2 = (0 + n) 2 2 (0 + n) n + (0 + n) 2n (0 + n) 2n + 020 + ny2

    = (0 + n) ( n)21

    (0 + n)

    (0 + n)

    0

    20 + ny

    2

    (00 + ny)2

    Expand and simplify the last term.

    0 ( 0)2 + n ( y)2 = (0 + n) ( n)2 + 0n0 + n

    (0 y)2

    Now, the joint posterior can be rewr