Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.

69
Applied Bayesian Inference, KSU, April 29, 2012 § / § Empirical Bayes Robert J. Tempelman 1

Transcript of Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.

Page 1: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

§❸Empirical Bayes

Robert J. Tempelman

1

Page 2: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Origins of hierarchical modeling

• A homework question from Mood (1950, p. 164, exercise 23) recounted by Searle et al. (1992)"Suppose intelligence quotients for students in a particular "Suppose intelligence quotients for students in a particular

age group are normally distributed about a mean of 100 age group are normally distributed about a mean of 100 with standard deviation 15. The IQ, say, Ywith standard deviation 15. The IQ, say, Y11, of a , of a particular student is to be estimated by a test on which particular student is to be estimated by a test on which he scores 130. It is further given that test scores are he scores 130. It is further given that test scores are normally distributed about the true IQ as a mean with normally distributed about the true IQ as a mean with standard deviation 5. What is the maximum likelihood standard deviation 5. What is the maximum likelihood estimate of the student's IQ? (The answer is not 130)"estimate of the student's IQ? (The answer is not 130)"

2

Page 3: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Answer provided by one student (C.R. Henderson)

• The model:The model:

iii eY 2

2 22

215~

100

100

1,

15 15 5

5

i

i

True IQN

YObserved IQ score

This is not really ML but it does maximize the posterior density of (+i)|yj

2

2 2

15

15100| 130 130 1

571 200 i jE a y

130cov ,

|var

i i

i i i i i

i

a yE a y E a y E y

y 3

“shrunk”

Page 4: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

• Later versions of Mood’s textbooks (1963, 1974) were revised:

“What is the maximum likelihood estimate?” replaced by “What is the Bayes estimator?”

Homework was the inspiration of C.R.Henderson’s work on best linear unbiased prediction (BLUP) but also subsequently referred to as empirical Bayes prediction for linear models

4

Page 5: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

What is empirical Bayes ???

5

An excellent primer:

Casella, G. (1985). An introduction to empirical Bayes analysis. The American Statistician 39(2): 83-87

Page 6: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Casella’s problem specified hierarchically

• Suppose we observe t normal random variables, , each random draws from normal distributions with different means i,

• Suppose it is known (believed) that

• i.e. “random effects model”

2~ ; 1,2,...,,i N i t

tin

NX ii ,...,2,1;,~2

iX

6

2: hyperparameters

Page 7: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

• “ML” solution: • Bayes estimator of i:

2

22 2

2 22 2

ˆ

| , ,

i i

i i

Bayes

nE X

n n

tiX ii ,...,2,1;ˆ

iAs ˆ,02

2

ii XnAs , 7

Page 8: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

What is empirical Bayes?

• Empirical Bayes = Bayes with replaced by estimates.

• Does it work?

22 ,,

iii X

nn

nEBayes

22

2

22

2

ˆˆ

ˆˆ

ˆˆ

ˆ

ˆ

8

Page 9: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

From Casella (1985)

Observed data based on first 45 at-bats for 7 NY Yankees in 1971.

“known” batting average

MONEYBALL!

9

Page 10: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

From Casella (1985)

“Stein effect” estimates can be improved by using information from all coordinates when estimating each coordinate (Stein, 1981)

Stein ≡ shrinkage based estimators

ML EB

i

Bat

ting

aver

ages

0.200

0.300

10

iii X

nn

nEBayes

22

2

22

2

ˆˆ

ˆˆ

ˆˆ

ˆ

ˆ

iX i

Page 11: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

When might Bayes/Stein-type estimation be particularly useful?

• When number of classes (t) are large• When number of observations (n)

per class is small• When ratio of 2 to 2 is small

“Shrinkage is a good thing” Allison et al. (2006)

11

Page 12: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Microarray experiments.

• A wonderful application of the power of empirical Bayes methods.

• Microarray analysis in a nutshell– …conducting t-tests on differential gene

expression between two (or more) groups for thousands of different genes.

– multiple comparison issues obvious (inspired research on FDR control).

2

_; 1,2,....,

ˆ2g

g

estimated differencet g G

12

Page 13: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Can we do better than t-tests?

• “By sharing the variance estimate across multiple genes, can form a better estimate for the true residual variance of a given gene, and effectively boost the residual degrees of freedom”

Wright and Simon (2003)

13

Page 14: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Hierarchical model formulation

• Data stage:

• Second stage

njTiGg

gijgigij eY

,....,2,1;,....,2,1;,...,2,1

;

2,0~ ggij NIDe

212

2

exp

~ ,

a gg

g a

bGamma a b

a b

12 abE g

14

Page 15: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Empirical Bayes (EB) estimation

• The Bayes estimate of 2g

• Empirical Bayes = Bayes with estimates of a and b:– Marginal ML estimation of a and b advocated by Wright and

Simon (2003)– Method of moments might be good if G is large.

Modify t-test statistic accordingly,

122

ˆ 2

2g

g

n a aT

T

b

n a

2ˆg

S

T

SE

n

2~_

g

gc

differenceestimatedt

Including posterior degrees of freedom:

2n T a 15

Page 16: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Observed Type I error rates (from Wright and Simon, 2003)

Pooled: As if the residual variance was the same for all genes

16

Page 17: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Power for P < 0.001 and n = 5(Wright and Simon, 2003)

17

Page 18: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ / 18

Less of a need for shrinkage with larger n.

Power for P < 0.001 and n = 10(Wright and Simon, 2003)

Page 19: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

“Shrinkage is a good thing” (David Allison, UAB)(Microarray data from MSU)

-0.2 -0.1 0.0 0.1 0.2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Regular linear model

Trt effect on log2 scale

-lo

g10

(p-v

alu

e)

-0.2 -0.1 0.0 0.1 0.2

01

23

4

Shrinkage analysis

Trt effect on log2 scale

-lo

g10

(p-v

alu

e)

Volcano Plots-Log10(P-value) vs. estimated trt effect for a simple design (no subsampling)

-1 -0.5 0.0 0.5 1.0

-1 -0.5 0.0 0.5 1.0

Regular contrast t-test Shrinkage (on variance) based contrast t test

2

_

ˆg

g

estimated differencet

c 2~

_

g

gc

differenceestimatedt

19

Page 20: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Bayesian inference in the linear mixed model (Lindley and Smith, 1972; Sorensen and Gianola, 2002)

• First stage: Y ~ N(X+Zu, R) (Let R = I2e)

• Second Stage Priors:– Subjective:

– Structural

• Third stage priors

11/2/2

1 1| , exp

22y β u y Xβ Zu R y Xβ Zu

Rnp '

11~ exp '

2β β β β V β βo op

11~ exp '

2u u u G up

2 2~e ep 2 2~u up

(Let G = A2u; A is known)

20

Page 21: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Starting point for Bayesian inference

• Write joint posterior density:

• Want to make fully Bayesian probability statements on and u?– Integrate out uncertainty on all other unknowns.

/

/

22 1 21

2

1ex

1exp '

2p

1exp

'

2, ,

2

y Xβ Zu y

β

β u, , |y, Xβ

β V β

V

u u

β Zu

Aβq

o o

2 2 n2e 2

e

u

u o

e2u 2

u

e

p

'

p

p

0 0

,,,, 2u

2eo

2u

2eo ddpp Vy,|,u,Vy,|u,

21

Page 22: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Let’s suppose (for now) that variance components are known

• i.e. no third stage prior necessary…conditional inference

• Rewrite:– Likelihood– Prior

1

1

1

0

1ex

1exp

1exp

2

p '22

'

y Xβ Zu R y Xβ,u|R,G,β ,V

β β V β β

β Zuy

u G u o o

'p

' 'θ β u'

| , ~ , ββ V 0θ θ Σ θ Σ

0 0 Go

o oN

11exp

2y|R,θ y Wθ R y Wθp '

W X Z

22

Page 23: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Bayesian inference with known VC

where

In other words:

11

1

1

' 11exp

2, ,

1 ˆ ˆexp

e

2

1xp

2

'

θ θ Σy Wθθ|θ Σ R,y

θ θ W R W Σ θ θ

R Wθ θy θoo op

'

'

11 1 1 10

ˆˆ

ˆ' 'β

θ W R W Σ W R y Σ θu

1

111

111

''

'',

ˆ

ˆ~,,,|,

GZRZXRZ

ZRXVXRX

uRyu

No

23

Page 24: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Flat prior on .

Note that as diagonal elements of V , diagonal elements of V -1 0,

Hence

where

Henderson’s MME (Robinson, 1991)

11 1 1

1 1 1 1

ˆ ' ' '

ˆ ' ' '

X R X X R Z X R yβ

Z R X Z R Z G Z R yu

11 1

1 1 1

ˆ ' ', | , , , ~ ,

ˆ ' '

X R X X R Zββ u y R Σ θ

Z R X Z R Z Guo N

1~ exp ' 1

2β β β β 0 β βo op

:

24

Page 25: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Inference• Write:

Posterior density of K’ + M’u:

1 1

1 11

1

'

' '

' βu

ββ

uuCZ R Z G

CX R ZCZ R

CX R X

X

' ' | , , , ~ ' ' 'ˆ ˆ , ' β βu

uβ uu

β KK M y R Σ θ K M K M

Mu

Cβ u

C

Co N

' | , , , ~ ' ˆ , ' ββK y R Σ θ K K Kβ β Co N

25

(M=0)

Page 26: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

RCBD example with random blocks

• Weight gains of pigs in feeding trial (Gill, 1978). Block on litters

Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2

26

Page 27: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

data rcbd; input litter diet1-diet5; datalines; 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2;

data rcbd_2 (drop=diet1-diet5); set rcbd; diet = 1; gain=diet1; output; diet = 2; gain=diet2; output; diet = 3; gain=diet3; output; diet = 4; gain=diet4; output; diet = 5; gain=diet5; output;run;

27

Page 28: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

RCBD model

• Linear Model:

– Fixed diet effects j

– Random litter effects ui

• Prior on random effects:

;ii ijjjY u e

2~ 0,i uNIIDu

e NIIDij e~ ,0 2

28

Page 29: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Posterior inference on and u conditional on known VC.

title 'Posterior inference conditional on known VC';proc mixed data=rcbd_2; class litter diet; model gain = diet / covb solution; random litter; parms (20) (50) /hold = 1,2; lsmeans diet / diff ; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/df=10000; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;

2 50e 2 20u

ββC

β

'k β

29

“Known” variance so tests based on normal (arbitrarily large df) rather than Student t.

Page 30: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Portions of output

• bSolution for Fixed Effects

Effect diet Estimate

Intercept 79.76

diet 1 -8.58

diet 2 -0.88

diet 3 -6.18

diet 4 2.15

diet 5 0

Covariance Matrix for Fixed Effects

Row Effect

diet

Col1

Col2

Col3

Col4

Col5

Col6

1 Int 7 -5. -5. -5 -5

2 diet 1 -5 10. 5. 5. 5.

3 diet 2 -5. 5. 10. 5. 5.

4 diet 3 -5. 5. 5. 10. 5.

5 diet 4 -5. 5. 5. 5. 10.

6 diet 5

| , , , ~ , βββ R θ CΣ βy o N

30

1

2

3

4

5

ββC

Page 31: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Posterior densities of marginal means

• contrasts

Label Estimate Standard Error

DF t Value Pr > |t|

diet 1 lsmean

71.1800 2.6458 1E4 26.90 <.0001

diet 2 lsmean

78.8800 2.6458 1E4 29.81 <.0001

diet1 vs diet2 dif

-7.7000 3.1623 1E4 -2.43 0.0149

' | , , , ~ ˆ' , ' ββk β y kΣ k β kR Cθo N

ˆ'k β ' ββk C k

31

Page 32: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Two stage generalized linear models

• Consider again probit model for binary data.– Likelihood function

– Priors

– Third Stage Prior (if 2u was not known).

1' ' ' '

1

| , 1y β u x β z u x β z ui i

n y y

i i i ii

p

2 1/2 2/2 2

1 1| exp '

22u u A uu qq

uu

p

constantβp

2up

32

Page 33: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Joint Posterior Density• For all parameters

• Let’s condition on 2u being known (for now)

2 22 || | ,, y ββ u, u βy u uu up pp pp

/21' ' ' '

1

22 12

1exp '

21x β z u x β z u u A u

i in

u

y y

i i i ii

q

uu

p

2 2, || | ,y β u u ββ u y, uu ppp p

1' ' ' ' 1

12

1exp

21 'ux β z u x β z u A u

i iy y

i

n

i ui i i

33

Stage 1 Stage 2 Stage 3

2 stage model

3 stage model

Page 34: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Log Joint Posterior Density

• Let’s write:

• Log joint posterior = log likelihood + log prior• L = L1 + L2

' ' '

1

2

1'2

log 1 log 11

'2

log ,

x β

β u|y,

z u x β z A uu un

i i

u

iu

i i ii

y y

L p

' ' ' '1

1

log 1 log 1x β z u x β z un

i i i i i ii

L y y

1

2 2

1'

2u A u

u

L

34

Page 35: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Maximize joint posterior density w.r.t. = [ u]

• i.e. compute joint posterior mode of = [ u]– Analogous to pseudo-likelihood (PL) inference in

PROC GLIMMIX (also penalized likelihood)

• Fisher scoring/Newton Raphson:

2

1

2 21 2

'

'

'' '

'

' +Gθ θ

X WX X WZ

Z WX Z WZθ θθ θ

LLL

1 21

'

'

X v

Z -Gθθ uvθ

LL L

2

[ 1] [ ]ˆ ˆ'

θ θθ θ θ

t tL L

Refer to §❶ for details on v and W.

35

1logi

i

Lv

21

2

logii

i

Lw

' '=x β z ui i i

Page 36: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

PL (or approximate EB)

• Then

[ ] [ ]

[ 1] [ ]

ˆ ˆ

ˆ

1

ˆ

' '

' ''

ˆ ˆ

ˆ

'

ˆβ β β β

u u u

-1

u

X vX WZ β βX WX

Z WZ G Z vu u G uX -Z W t t

[t] [t]

t t

[t 1] [t]

1

ˆ

ˆ

'

'a , | ,

'v

'r 2

uβ β

u u

βu

u u

β-

u

β

β1Z W

X

Z Gβ u

C

WZ C

Z WXy

C

X WX C

2, | ,ˆ

~ β βuβ

uuuβ

Cβu y

Cu

C

Cβ u

approxN

2' ' | , ~ ' ' , 'ˆ

ˆ' βu

u uuβ

ββ KK β M u y K M K M

C

β

M

C

C

C

uu

approxN

36

Page 37: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

How software typically sets this up:

That is,

can be written as:

with being “pseudo-variates”

37

[ ] [ ]

[ 1] [ ]

ˆ ˆ

ˆ ˆ

' ˆ ˆ

ˆ ˆ'

'

' '

'

β β β β

u u u u

-1

X WX X vβ βX WZ

u uZ WZ G Z v-Z G uWX t t

[t] [t]

t t

[t 1 ] 1] [t

[ ] [ ]

[ 1]

ˆ ˆ

ˆ ˆ

ˆ

ˆ'

'

' '

''-

β β β β

u u u

1

u

β

u

X WX X

Z WZ G

W y

Z

X WZ

Z WX W yt t

[t] [t]

t

[t 1]

y

Page 38: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Application

• Go back to same RCBD….suppose we binarize the data

data binarize; set rcbd_2; y = (gain>75);run;

38

Page 39: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

RCBD example with random blocks

• Weight gains of pigs in feeding trial. Block on litters Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5>75 80.9>75 79.1>75 88.6>75 95.9>75 2 70.9 81.8>75 70.9 88.6>75 85.9>75 3 76.8>75 86.4>75 90.5>75 89.1>75 83.2>75 4 75.9>75 75.5>75 62.7 91.4>75 87.7>75 5 77.3>75 77.3>75 69.5 75.0 74.5 6 66.4 73.2 86.4>75 79.5>75 72.7 7 59.1 77.7>75 72.7 85.0>75 90.9>75 8 64.1 72.3 73.6 75.9>75 60.0 9 74.5 81.4>75 64.5 75.5>75 83.6>75 10 67.3 82.3>75 65.9 70.5 63.2

39

Page 40: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

PL inference using GLIMMIX code(Known VC)

title 'Posterior inference conditional on known VC';proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link=probit; random litter; parms (0.5) /hold = 1; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000 ilink; estimate 'diet 2 lsmean‘ intercept 1 diet 0 1 0 0 0/df=10000 ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;

40

'k β : Estimate on underlying normal scale

'βk : Estimated probability of success

Page 41: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ / 41

Solutions for Fixed Effects

Effect diet Estimate Standard Error

Intercept 0.3097 0.4772

diet 1 -0.5935 0.5960

diet 2 0.6761 0.6408

diet 3 -0.9019 0.6104

diet 4 0.6775 0.6410

diet 5 0 .

1

0.3097

0.5935

0.6761

0.9019

0.6775

0

1 1 0 0 0 0.20 838'ˆ = k β

Page 42: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ / 42

Covariance Matrix for Fixed Effects

Effect diet Row Col1 Col2 Col3 Col4 Col5 Col6

Intercept

1 0.2277 -0.1778 -0.1766 -0.1782 -0.1766

diet 1 2 -0.1778 0.3552 0.1760 0.1787 0.1760

diet 2 3 -0.1766 0.1760 0.4107 0.1755 0.1784

diet 3 4 -0.1782 0.1787 0.1755 0.3725 0.1755

diet 4 5 -0.1766 0.1760 0.1784 0.1755 0.4109

diet 5 6

ββC

Page 43: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Delta method:How well does this generally work?

43

Estimates

Label Estimate Standard Error

DF t Value Pr > |t| Mean StandardErrorMean

diet 1 lsmean

-0.2838 0.4768 10000 -0.60 0.5517 0.3883 0.1827

diet 2 lsmean

0.9858 0.5341 10000 1.85 0.0650 0.8379 0.1311

diet1 vs diet2 dif

-1.2697 0.6433 10000 -1.97 0.0484 Non-est .

' ββk C kˆ'k β

11 0.3883ˆ 0.P 2 8r 83obdiet

11

11

1

1

ˆ

1

0.2838

Prob

ˆ 00.4768ˆ .1827

dietse se

se

: standard normal cdf

Page 44: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

What if variance components are not known?

• Given

• Two stage:

• Three stage:

• Fully inference on 1 ?

2 22

1 fixed and random effects

variance components ( )

, )

,

(

θ

βθ

u e

1 2 1 1 2| , | |θ y θ y θ θθp p p

1 21 12 2, | | |y y θθ θθθ θp p p p

2

21 1 2| , |θ

θθ θy y θp p d

Known VC

Unknown VC

NOT POSSIBLE PRE-1990s

44

Page 45: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Approximate empirical Bayes (EB) option

• Goal: Approximate • Note:

• i.e., can be viewed as “weighted” average of conditional densities , the weight function being .

1 | yθp

2

21 1 2| , |θ

θyθ θ y θR

p p d

2

2 21 2| , |θ

θ θθ θy yR

p p d

2 |

21E | ,θ y

θ θ yp

1 | yθp 1 2| , yθ θp

2 | yθp

45

Page 46: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Bayesian justification of REML

• With flat priors on , maximizing with respect to is REML ( ) of (Harville, 1974).

• If is (nearly) symmetric, then is a reasonable

approximation with perhaps one important exception: is what PROC MIXED(GLIMMIX) essentially does by default (REML/RSPL/RMPL)!

2 | yθp2θ

2θ 2θ

2 | yθp

1 21 2| | ,ˆy θ θ yθ θp p

2 21 1var | var ,ˆ|θ θy θ θ y

46

2 21ˆ| , yθ θ θp

Page 47: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Back to Linear Mixed Model• Suppose three stage density with flat priors on , 2

u and 2e .

• Then is the posterior marginal density of interest. – Maximize w.r.t 2

u and 2e to get REML

estimates

• Empirical Bayes strategy: Plug in REML variance component estimates to approximate:

2 2 2 2| , |, ,,β u

β βy u y uu e u

R

e

R

p dp d

11 12 2

1 1 1

ˆ ' ', | , , ~ ,

ˆ ' '

X R X X R Zββ u y

Z R X Z R Z Gue u N

2 2, | yu ep

47

2R=I e2G=A u

Page 48: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

What is ML estimation of VC from a Bayesian perspective?

• Determine the “marginal likelihood”

• Maximize this with respect to , 2u and 2

e to get ML of , 2

u and 2e

– ….assuming flat priors on all three.

• This is essentially what PROC MIXED(GLIMMIX) does with ML (MSPL/MMPL)!

48

2 2 2 2, | , |, ,,u

β β uy uyu e u e

R

dp p

Page 49: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Approximate empirical Bayes inference

Given

where

then

49

2 2, | , , ,ˆ

ˆ~ β βu

uβ u

β

u

β u y C

β C

Cu

Ce u approx

N

2 2

2 2

11 1

1 1ˆ

1

ˆ

' '

' 'βu

β

uu

β X R X X R Z

Z R X Z

C

R Z G

C

C Cu u

e e

ˆ' | , , ' , '~ βββ Ck β y R G k k kapprox

N

Page 50: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Approximate “REML” (PQL) Analysis for GLMM

• Then is not known analytically but must be approximated– e.g. residual pseudo-likelihood method

(RSPL/MSPL) in SAS PROC GLIMMIX

• First approximations proposed by Stiratelli et al. (1984) and Harville and Mee (1984)

y|2p

50

2 22

1 fixed and random effects

variance components ( )

, )

,

(

θ

βθ

u e

Page 51: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Other methods for estimating VC in PROC GLIMMIX

• Based on maximizing marginal likelihood:

• Method = QUAD:– Adaptive quadrature: exact but useful only for simple

models.

• Method = LAPLACE:– Generally better approximation to than

MMPL/MSPL and computationally more efficient than QUAD

51

2 2 2 2, | , |, ,,u

β β uy uyu e u e

R

dp p

2 2, , | yβ u ep

More ML-like

rather

than REML like!

Page 52: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Could we have considered a “residual/restricted” Laplace instead?

• i.e. maximize an approximation of

with respect to the variance components. i.e., maximize:

52

2 2 2 2| , |, ,,β u

β βy u y uu e u

R

e

R

p dp d

( )

2 2

2 2

2 2

2

( )

log |

log |log | 0.

,

,

,5,

',

, ,log

β,u

β,u

β u ,

y

yβ u,

β u β uy

PL

u e

u e

u ePL

p

pp

Tempelman and Gianola (1993; 1996); Tempelman (1998); Wolfinger (1993)Premise: “REML” is generally less biased than “ML”

Page 53: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Ordinal Categorical Data

• Recall threshold concept

• Let’s extend it to mixed model:

1

1 2

1

1

2o i

ii

m i m

y

m

Xβ Zu e 2~ ,u 0 A uN

~ ( , )e 0 I2 2e e| N

53

Page 54: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Joint posterior density

• Likelihood:

• Priors

1

I' ' ' '

11 1

| , Pr | ,y β u,τ β u

x β z u x β z ui

n

ii

n m Y k

k i i k i ii k

p ob y y

'1'2

'

x

xX

xn

'1'2

'

z

zZ

zn

2 11/2 2/2 2

1 1| exp '

22u u A u

Au q

uu

p

constantβp

1 2 1constant; ...τ mp

2up

54

Page 55: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Inference given “known” 2u

• Then

• Use Fisher’s scoring to estimate

2 2| ,, | , |β u y β u,τ,τ y β τuu up ppp p

12

2I

' ' ' '1

1 1

1exp '

2, |β u,τ x β z u x u Aβ z u u,y

in m Y k

k i i k i ik ui

up

' ' ' '1

1

12

1

2log1

, 'l2

I og x ββ u,τ|y z u x β z, uu u An m

i k i i k i ik

uui

L p Y k

' '' 'θ τ β u

1

2[ 1] [ ]log ,| log ,|ˆ ˆE

'

θ y θ yθ θ

θ θ θt tL L

55

Page 56: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Joint Posterior Mode

• Let

• First derivatives:

' ' ' '1 1

1 1

I log x β z u x β z un m

i k i i k i ii k

L Y k

1

2 2

1'

2u A u

u

L

21 'X v 0ββ β px1

LLL

56

21 1'Z v-Gu u

uu

LLL

21( 1)p

τ ττ0 m x1

LL L

1

1 1

11ni kik

ki ik i k

I YI Yp

P P

1

1

mk i k i

ik ik

vP

Pik

See §❶ for details on p and v

' '=x β z ui i i

Page 57: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Second derivatives

• Now

57

21 2

2 2

E'

E''

Eθ θθ θ θθ

LL L

2 2 21 1 1

2 2 2 21 1 1 1

2 2 21 1 1

log log logE E E

' ' '

log log log logE E E E

' ' ' '

log log logE E E

' ' '

τ τ τ β τ u

θ θ β τ β β β u

u τ u β u u

L L L

L L L L

L L L

' '

' ' '

' ' '

T L X L Z

X L X WX X WZ

Z L Z WX Z WZ

22

1

logE

'

0 0 0

0 0 0θ θ

0 0 G

L

1 1

2' ' ' '

' ' ' ' ' '

' ''

' ' ' '

0 0 0

0 0 0

0 0

T L X L Z T L X L Z

X L X WX X WZ X L X WX X WZ

Z L Z WX Z WZ+ Z LG GZ WX Z WZ+θ θ

L

See §❶ for details on T, L, and W.

Page 58: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Fisher’s scoring

• So

• At convergence:

58

[ ] [ ]

[ ] [ ]

[ 1] [ ]

[ 1] [ ]

ˆ ˆˆ ˆ

1 -1

ˆ ˆ

ˆ ˆ

ˆ ˆ' '

' ' ' '

' ' ' 'ˆ ˆτ τ τ τβ β β β

u u u u

τ τ

β β

u

T L X L Z p

X L X WX X WZ X v

Z L Z WX Z W G -G uZ+ u Z vt t

t t

[t] [t]

t t

t t

[t 1] [t]

[ ]

[ ]

1

1 ˆˆ

ˆ

ˆ ' 'ˆvar ' ' '

ˆ ' ' ' τ τβ β

u u

τ T L X L Z

β X L X WX X WZ

u Z L Z WX Z WZ+Gt

t

[t]

Full details in GF(1983)

Page 59: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Recall GF83 Data H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1

H: Herd (1 or 2)A: Age of Dam (2 = Young heifer, 3 = Older cow)G: Gender or sex (M and F)S: Sire of calf (1, 2, 3, or 4)Y: Ordinal Response (1,2, or 3)

59

Page 60: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

SAS data step:data gf83; input herdyear dam_age calfsex $ sire y @@; if herdyear = 2 then hy = 1; /* create dummy variables */ else hy = 0; if dam_age = 3 then age = 1; else age = 0; if calfsex = 'F' then sex = 1; else sex = 0; datalines; 1 2 M 1 1 1 2 F 1 1

etc.

60

Page 61: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Reproducing analyses in GF83(based on created dummy variables)

ods select parameterestimates Estimates;proc glimmix data=gf83 ;model y = hy age sex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'fem marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 1 /ilink; estimate 'fem marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 0 /ilink; estimate 'male marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 0 /ilink;run;

61

2u = 1/19

(as chosen by GF83)

''Prob

k βk βcY c ': 0.5 0.5 1kFemales

': 0.5 0.5 0kMales

Page 62: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ / 62

Solutions for Fixed Effects

Effect y Estimate Standard Error

DF t Value Pr > |t|

Intercept 1 0.3755 0.5580 3 0.67 0.5492

Intercept 2 1.0115 0.5789 3 1.75 0.1789

hy -0.2975 0.4950 20 -0.60 0.5546

age 0.1269 0.4987 20 0.25 0.8017

sex 0.3906 0.4967 20 0.79 0.4409

Estimates

Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean

female marginal mean cat. 1

0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212

female marginal mean cat 2

1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123

male marginal mean cate1

0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380

male marginal mean cat 2

0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014

REPRODUCED IN GIANOLA AND FOULLEY (1983)

Page 63: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Reproducing analyses in GF83(alternative using less than full rank classification model)

ods select parameterestimates estimates;proc glimmix data=gf83 ; class sire herdyear dam_age calfsex; model y = herdyear dam_age calfsex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'female marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'female marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink; estimate 'male marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink;run;

63

''Prob

k βk βcY c ': 0.5 0.5 0.5 0.5 1 0kFemales

': 0.5 0.5 0.5 0.5 0 1kMales

Page 64: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Solutions for Fixed Effects

Effect y calfsex herdyear dam_age Estimate Standard Error

DF t Value Pr > |t|

Intercept 1 0.2050 0.4734 3 0.43 0.6943

Intercept 2 0.8409 0.4946 3 1.70 0.1876

herdyear 1 0.2975 0.4950 20 0.60 0.5546

herdyear 2 0 . . . .

dam_age 2 -0.1269 0.4987 20 -0.25 0.8017

dam_age 3 0 . . . .

calfsex F 0.3906 0.4967 20 0.79 0.4409

calfsex M 0 . . . .

64

Estimates

Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean

female marginal mean category 1

0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212

female marginal mean category 2

1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123

male marginal mean category 1

0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380

male marginal mean category 2

0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014

Page 65: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Conditional versus marginal (“population-averaged”) probabilities:

• Conditional (on u):

• Marginal (on u):

Marginal probably matters just as much….also…there is no corresponding closed form for (cumulative) logistic mixed models.

65

''Prob

k βk βcY c

' '

'

2Prob Pr b

1oEk

uβ k β

k β

u

cmarginal Y c Y c

Page 66: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Accounting for unknown 2u?

ods html select covparms;title "Default RSPL";proc glimmix data=gf83 ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

title "Quadrature";proc glimmix data=gf83 method=quad ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

title "Laplace";proc glimmix data=gf83 method = laplace ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

66

Some alternative

methods available in SAS

PROC GLIMMIX

Page 67: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Which one should I pick?

67

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept sire 0.2700 0.4837

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept sire 0.02568 0.2947

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept sire 0.02488 0.2898

RSPL

QUAD

LAPLACE “ML” vs. “REML” thing?

Page 68: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Yet another option:“Residual” Laplace

(Tempelman and Gianola, 1993)

68

Rather than using point estimate, might also weight inferences on and u based p(2

u|y)

Log(p(2

u|y))

2ˆ 0.40u

2

2

|, | E , |

yβ u y β u ,y

uup p

Page 69: Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

Summary of GLMM as conventionally done today.

• Some issues– 1. Approximate

• Is that wise? – 2. MML point estimates of 2

u are often badly biased.

• Upwards or downwards??? Unpredictable.– 3. Uncertainty in MML estimates not accounted for.– 4. Marginal versus conditional inference on treatment probabilities?

(applies to other dist’n; e.g. Poisson)– Implications?

• We’ll see later with comparison between empirical Bayes and fully Bayes (using MCMC).

• Obvious dependency on n, q, etc.

2, | , ~β u y up N

69