§❷ An Introduction to Bayesian inference

25
Applied Bayesian Inference, KSU, April 29, 2012 §/ § An Introduction to Bayesian inference Robert J. Tempelman 1

description

§❷ An Introduction to Bayesian inference. Robert J. Tempelman. Bayes Theore m. Recall basic axiom of probability: f ( q , y ) = f ( y | q ) f ( q ) Also f ( q , y ) = f ( q | y ) f ( y ) Combine both expressions to get: or Posterior  Likelihood * Prior. - PowerPoint PPT Presentation

Transcript of §❷ An Introduction to Bayesian inference

Page 1: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 1

§❷ An Introduction to Bayesian inference

Robert J. Tempelman

Page 2: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 2

Bayes Theorem

• Recall basic axiom of probability:– f(q,y) = f(y|q) f(q)

• Also– f(q,y) = f(q|y) f(y)

• Combine both expressions to get:

or

Posterior Likelihood * Prior

||y θ

θ yy

θf

f ff

||θ y θy θff f

Page 3: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/

Prior densities/distributions• What can we specify for ?

– Anything that reflects our prior beliefs.– Common choice: “conjugate” prior.

• is chosen such that is recognizeable and of same form.

– “Flat” prior: . Then

– flat priors can be dangerous…can lead to improper ; i.e.

θf

|θ yf θf

constantθf | |

| |

θ y y θ θ

y θ y θ

f f f

f constant f

|θ yf |θ

θ y θf d

3

Page 4: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 4

Prior information / Objective?

• Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with – 1) human rational behavior – 2) nature of the scientific method. – Memory property: past inference (posterior) can be

used as updated prior in future inference.• Nevertheless, many applied Bayesian data analysts

try to be as “objective” as possible using diffuse (e.g., flat) priors.

Page 5: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 5

Example of conjugate prior

• Recall the binomial distribution:

• Suppose we express prior belief on p using a beta distribution:

– Denoted as Beta(a,b)

!Prob | , (1 )!( )!

y n ynn pY yn

p py y

1 1(1| , )pf p pa ba ba b

a b

Page 6: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 6

Examples of different beta densities

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

p

Bet

a D

ensi

ties

a=9,b=1a=1,b=1a=2,b=18

| ,pE aa ba b

2var | ,

1p aba b

a b a b

Diffuse (flat) bounded prior(but it is proper since it is bounded!)

Page 7: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 7

Posterior density of p

• Posterior Likelihood * Prior

• i.e. Beta(y+a,n-y+b)

• Beta is conjugate to the Binomial

1

1 1

1(1 ) (1 )

| , , , Prob | , | ,

(1 )

y n y

y n y

f p n y n f

p p

Y y

p

p p

p p

p

a b

a b

a b a b

Page 8: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 8

Suppose we observe data

• y = 10, n = 15.• Consider

three alternative priors:– Beta(1,1)– Beta(9,1)– Beta(2,18)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

p

Bet

a D

ensi

ties

a=19,b=6a=11,b=6a=12,b=23

Posterior densities:Beta(y+a,n-y+b)

Page 9: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 9

Suppose we observed a larger dataset

• y = 100, n = 150.• Consider same alternative priors:

– Beta(1,1)– Beta(9,1)– Beta(2,18)

Posterior densities

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

p

Bet

a D

ensi

ties

a=109,b=51a=101,b=51a=102,b=68

Page 10: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 10

Posterior information• Given:

• Posterior information = likelihood information + prior information.

• One option for point estimate: joint posterior mode of q using Newton Raphson.– Also called MAP (maximum a posteriori) estimate of q.

qqq fff || yy

ln | constant+ ln | lnθ y y θ θf f f

'

ln'|ln

'|ln 222

qqq

qqq

qqq

fff yy

Page 11: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 11

Recall the plant genetic linkage example

• Recall

Suppose• Then

1 2 3 4

1 2 3 4

! 2 1 1|! ! ! ! 4 4 4 4

yy y y ynp

y y y yq q q qq

1 1| , (1 )f a ba bq a b q q

a b

1 2 3 4

1 2 3 4

1 2 3 4

1 1

1 1

1 1

| , , | | ,

2 1 (1 )4 4 4

2 1 (1 )

2 1

y yy y y y

y y y y

y y y y

f p f

a b

a b

b a

q a b q q a b

q q q q q

q q q q q

q q q

Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.

Page 12: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 12

Plant linkage example cont’d.Suppose data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output;run;title "Posterior Standard Error = &poststd";proc print; var iterate theta logpost;run;

| , 50, 500Betaq a b a b

Posterior standard error

1

2

ln ||

yy

fsd f

q q

qq

q

Page 13: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 13

OutputPosterior Standard Error = 0.0057929339

Obs iterate theta logpost1 1 0.018318 997.952 2 0.030841 1035.743 3 0.044771 1060.654 4 0.053261 1071.065 5 0.054986 1072.796 6 0.055037 1072.847 7 0.055037 1072.848 8 0.055037 1072.849 9 0.055037 1072.8410 10 0.055037 1072.8411 11 0.055037 1072.84

Posterior Standard Error = 0.0057929339

Page 14: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 14

Additional elements of Bayesian inference• Suppose that q can be partitioned into two components, a

px1 vector q 1 and a qx1 vector q2,

• If want to make probability statements about q, use probability calculus:

• There is NO repeated sampling concept.– Condition on one observed dataset.– However, Bayes estimators typically do have very good

frequentist properties!

2

1

qq

q

qqq dpob yy ||Pr

Page 15: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 15

Marginal vs. conditional inference

• Suppose you’re primarily interested in q1:

– i.e. average over uncertainty on q2 (nuisance variables)

• Of course, if q2 was known, you would condition your inference on q1 accordingly:

y,yy,

yyy

y 21|2221

22121

|E||

|,||

22

22

qqqqqq

qqqqqq

qq

qq

pdpp

dpdpp

R

RR

y,21 | qqp

Page 16: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 16

Two-stage model example

• Given with yi ~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem:

nyyy 21'y

2 22| , , | ,y y| a af f fs m mm s sm

2,~ aaN smm

2

22

21exp

21,| a

aaaaf mm

sssmm

Suppose

i.e.

Page 17: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 17

Simplify likelihood

2 2

1

/2/2 222

1

, ,

12 exp2

y| |n

ii

nnni

i

f f y

y

m s m s

s ms

2

21

1exp2

n

ii

yy y ms

2

2

2

1

1exp 22

n

i ii

y y yy y ym ms

n

i

y1

222

1exp ms

22

2

1exp y

n

ms

Page 18: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 18

Posterior density

• Consider the following limit:

• Consistent with or

22

2

2

22

2

2| , , ,

e| 1xp,2

,y|

y

a

a

aa

a

a

yf f

n

f

mm s

sm m

m m s

s m

s

m s

n

yf aaa

2

222

21exp,,,|lim

2 smmssm

sy

constantmf 1f m

22 2| , , , ~ ,y a a N y

nsm s s m

Page 19: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 19

Interpretation of Posterior Density with Flat Prior

• So

• Then

• i.e.

222 ,,,| smmsmsm |y|yy ffff

22 ,,| smsmmm

|yy fArgMaxfArgMax

2 2Posterior mode | , ML | ,y y ym s m s

Page 20: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 20

Posterior density with informative prior

• Now

After algebraic simplication:

n

yfa

aaa 2

2

2

222

21exp,,,|

sm

smm

mssm y

n

n

n

ny

Nf

a

a

a

aa

aa 22

22

22

2

22

22 ~,~~,,,|ss

sss

ss

smsmmssm y

Page 21: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 21

• Note that

a

a

a

a

a

aa

n

n

y

nn

ny

m

s

ss

s

s

sss

smsm

2

2

2

2

2

222

22

11

1~

12

1 12 2

12

1 12 2

a

a a

a

n

n n

ys

s sss

m

s

122

122

2

212 1

a

a

a

n

n

ns

sss

ss

s

Posterior precision = prior precision + sample (likelihood) precisioni.e., weighted average of data

mean and prior mean

Page 22: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 22

Hierarchical models

• Given

• Two stage:

• Three stage:

– What’s the difference? When do you consider one over another?

2

1

qq

q

1 2 1 1 2| , | |θ y θ y θ θ θp p p

1 2 1 1 2 2, | | |θ θ y y θ θ θ θp p p p

Page 23: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 23

Simple hierarchical model

• Random effects model– Yij = m + ai + eij

m: overall mean, ai ~ NIID(0,t2) ; eij ~ NIID(0,s2).Suppose we knew m , s2, and t2:

| 1yi iBE yq m m

2

| 1yi BVarnsq

2

22

nB

n

s

s t

Shrinkage factor

Page 24: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 24

What if we don’t know m , s2, or t2?

• Option 1: Estimate them:

• Then “plug them” in.

• Not truly Bayesian.– Empirical Bayesian (EB) (next section).– Most of us using PROC MIXED/GLIMMIX are EB!

k

yy

k

ii

1m̂

)1(ˆ ,

2

2

kn

yyji

iij

s 2

2

ii

n y y

kn

st

ˆ ˆ| 1 ˆyi iBE yq m m 2ˆ

| 1 ˆyi BVarnsq

e.g.method of moments

Page 25: §❷ An Introduction to Bayesian inference

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 25

A truly Bayesian approach

• 1) Yij|qi ~ N(qi,s2) ; for all i,j

• 2) q1, q2, …, qk are iid N(m, t2)o Structural prior (exchangeable entities)

• 3) m ~ p(m); t2~ p(t2); s2 ~ p(s2)o Subjective prior

22

1 1

2221 |||,,,,...,, tmsmqqtmsqqq ppppypp

k

ii

n

jiijak

y

2 21 2

2 21 2 1 1

, ,..., , , , |... ...

,..., ,...,

yk a

i i k

p

d d d d d d d d

q q q s m t

q q q q q s m t

y|ip q

Fully Bayesian inference (next section after that!)