Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.

Applied Bayesian Inference, KSU, April 29, 2012

§ /

§❸Empirical Bayes

Robert J. Tempelman

1


§ /

Origins of hierarchical modeling

• A homework question from Mood (1950, p. 164, exercise 23) recounted by Searle et al. (1992)"Suppose intelligence quotients for students in a particular "Suppose intelligence quotients for students in a particular

age group are normally distributed about a mean of 100 age group are normally distributed about a mean of 100 with standard deviation 15. The IQ, say, Ywith standard deviation 15. The IQ, say, Y11, of a , of a particular student is to be estimated by a test on which particular student is to be estimated by a test on which he scores 130. It is further given that test scores are he scores 130. It is further given that test scores are normally distributed about the true IQ as a mean with normally distributed about the true IQ as a mean with standard deviation 5. What is the maximum likelihood standard deviation 5. What is the maximum likelihood estimate of the student's IQ? (The answer is not 130)"estimate of the student's IQ? (The answer is not 130)"

2


§ /

Answer provided by one student (C.R. Henderson)

• The model:The model:

iii eY 2

2 22

215~

100

100

1,

15 15 5

5

i

i

True IQN

YObserved IQ score

This is not really ML but it does maximize the posterior density of (+i)|yj

2

2 2

15

15100| 130 130 1

571 200 i jE a y

130cov ,

|var

i i

i i i i i

i

a yE a y E a y E y

y 3

“shrunk”


§ /

• Later versions of Mood’s textbooks (1963, 1974) were revised:

“What is the maximum likelihood estimate?” replaced by “What is the Bayes estimator?”

Homework was the inspiration of C.R.Henderson’s work on best linear unbiased prediction (BLUP) but also subsequently referred to as empirical Bayes prediction for linear models

4


§ /

What is empirical Bayes ???

5

An excellent primer:

Casella, G. (1985). An introduction to empirical Bayes analysis. The American Statistician 39(2): 83-87


§ /

Casella’s problem specified hierarchically

• Suppose we observe t normal random variables, , each random draws from normal distributions with different means i,

• Suppose it is known (believed) that

• i.e. “random effects model”

2~ ; 1,2,...,,i N i t

tin

NX ii ,...,2,1;,~2

iX

6

2: hyperparameters


§ /

• “ML” solution: • Bayes estimator of i:

2

22 2

2 22 2

ˆ

| , ,

i i

i i

Bayes

nE X

n n

tiX ii ,...,2,1;ˆ

iAs ˆ,02

2

ii XnAs , 7


§ /

What is empirical Bayes?

• Empirical Bayes = Bayes with replaced by estimates.

• Does it work?

22 ,,

iii X

nn

nEBayes

22

2

22

2

ˆˆ

ˆˆ

ˆˆ

ˆ

ˆ

8


§ /

From Casella (1985)

Observed data based on first 45 at-bats for 7 NY Yankees in 1971.

“known” batting average

MONEYBALL!

9


§ /

From Casella (1985)

“Stein effect” estimates can be improved by using information from all coordinates when estimating each coordinate (Stein, 1981)

Stein ≡ shrinkage based estimators

ML EB

i

Bat

ting

aver

ages

0.200

0.300

10

iii X

nn

nEBayes

22

2

22

2

ˆˆ

ˆˆ

ˆˆ

ˆ

ˆ

iX i


§ /

When might Bayes/Stein-type estimation be particularly useful?

• When number of classes (t) are large• When number of observations (n)

per class is small• When ratio of 2 to 2 is small

“Shrinkage is a good thing” Allison et al. (2006)

11


§ /

Microarray experiments.

• A wonderful application of the power of empirical Bayes methods.

• Microarray analysis in a nutshell– …conducting t-tests on differential gene

expression between two (or more) groups for thousands of different genes.

– multiple comparison issues obvious (inspired research on FDR control).

2

_; 1,2,....,

ˆ2g

g

estimated differencet g G

12


§ /

Can we do better than t-tests?

• “By sharing the variance estimate across multiple genes, can form a better estimate for the true residual variance of a given gene, and effectively boost the residual degrees of freedom”

Wright and Simon (2003)

13


§ /

Hierarchical model formulation

• Data stage:

• Second stage

njTiGg

gijgigij eY

,....,2,1;,....,2,1;,...,2,1

;

2,0~ ggij NIDe

212

2

exp

~ ,

a gg

g a

bGamma a b

a b

12 abE g

14


§ /

Empirical Bayes (EB) estimation

• The Bayes estimate of 2g

• Empirical Bayes = Bayes with estimates of a and b:– Marginal ML estimation of a and b advocated by Wright and

Simon (2003)– Method of moments might be good if G is large.

Modify t-test statistic accordingly,

122

ˆ 2

2g

g

n a aT

T

b

n a

2ˆg

S

T

SE

n

2~_

g

gc

differenceestimatedt

Including posterior degrees of freedom:

2n T a 15


§ /

Observed Type I error rates (from Wright and Simon, 2003)

Pooled: As if the residual variance was the same for all genes

16


§ /

Power for P < 0.001 and n = 5(Wright and Simon, 2003)

17


§ / 18

Less of a need for shrinkage with larger n.

Power for P < 0.001 and n = 10(Wright and Simon, 2003)


§ /

“Shrinkage is a good thing” (David Allison, UAB)(Microarray data from MSU)

-0.2 -0.1 0.0 0.1 0.2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Regular linear model

Trt effect on log2 scale

-lo

g10

(p-v

alu

e)

-0.2 -0.1 0.0 0.1 0.2

01

23

4

Shrinkage analysis

Trt effect on log2 scale

-lo

g10

(p-v

alu

e)

Volcano Plots-Log10(P-value) vs. estimated trt effect for a simple design (no subsampling)

-1 -0.5 0.0 0.5 1.0

-1 -0.5 0.0 0.5 1.0

Regular contrast t-test Shrinkage (on variance) based contrast t test

2

_

ˆg

g

estimated differencet

c 2~

_

g

gc

differenceestimatedt

19


§ /

Bayesian inference in the linear mixed model (Lindley and Smith, 1972; Sorensen and Gianola, 2002)

• First stage: Y ~ N(X+Zu, R) (Let R = I2e)

• Second Stage Priors:– Subjective:

– Structural

• Third stage priors

11/2/2

1 1| , exp

22y β u y Xβ Zu R y Xβ Zu

Rnp '

11~ exp '

2β β β β V β βo op

11~ exp '

2u u u G up

2 2~e ep 2 2~u up

(Let G = A2u; A is known)

20


§ /

Starting point for Bayesian inference

• Write joint posterior density:

• Want to make fully Bayesian probability statements on and u?– Integrate out uncertainty on all other unknowns.

/

/

22 1 21

2

1ex

1exp '

2p

1exp

'

2, ,

2

y Xβ Zu y

β

β u, , |y, Xβ

β V β

V

u u

β Zu

Aβq

o o

2 2 n2e 2

e

u

u o

e2u 2

u

e

p

'

p

p

0 0

,,,, 2u

2eo

2u

2eo ddpp Vy,|,u,Vy,|u,

21


§ /

Let’s suppose (for now) that variance components are known

• i.e. no third stage prior necessary…conditional inference

• Rewrite:– Likelihood– Prior

1

1

1

0

1ex

1exp

1exp

2

p '22

'

y Xβ Zu R y Xβ,u|R,G,β ,V

β β V β β

β Zuy

u G u o o

'p

' 'θ β u'

| , ~ , ββ V 0θ θ Σ θ Σ

0 0 Go

o oN

11exp

2y|R,θ y Wθ R y Wθp '

W X Z

22


§ /

Bayesian inference with known VC

where

In other words:

11

1

1

' 11exp

2, ,

1 ˆ ˆexp

e

2

1xp

2

'

θ θ Σy Wθθ|θ Σ R,y

θ θ W R W Σ θ θ

R Wθ θy θoo op

'

'

11 1 1 10

ˆˆ

ˆ' 'β

θ W R W Σ W R y Σ θu

1

111

111

''

'',

ˆ

ˆ~,,,|,

GZRZXRZ

ZRXVXRX

uRyu

No

23


§ /

Flat prior on .

Note that as diagonal elements of V , diagonal elements of V -1 0,

Hence

where

Henderson’s MME (Robinson, 1991)

11 1 1

1 1 1 1

ˆ ' ' '

ˆ ' ' '

X R X X R Z X R yβ

Z R X Z R Z G Z R yu

11 1

1 1 1

ˆ ' ', | , , , ~ ,

ˆ ' '

X R X X R Zββ u y R Σ θ

Z R X Z R Z Guo N

1~ exp ' 1

2β β β β 0 β βo op

:

24


§ /

Inference• Write:

Posterior density of K’ + M’u:

1 1

1 11

1

'

' '

' βu

uβ

ββ

uuCZ R Z G

CX R ZCZ R

CX R X

X

' ' | , , , ~ ' ' 'ˆ ˆ , ' β βu

uβ uu

β KK M y R Σ θ K M K M

Mu

Cβ u

C

Cβ

Co N

' | , , , ~ ' ˆ , ' ββK y R Σ θ K K Kβ β Co N

25

(M=0)


§ /

RCBD example with random blocks

• Weight gains of pigs in feeding trial (Gill, 1978). Block on litters

Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2

26


§ /

data rcbd; input litter diet1-diet5; datalines; 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2;

data rcbd_2 (drop=diet1-diet5); set rcbd; diet = 1; gain=diet1; output; diet = 2; gain=diet2; output; diet = 3; gain=diet3; output; diet = 4; gain=diet4; output; diet = 5; gain=diet5; output;run;

27


§ /

RCBD model

• Linear Model:

– Fixed diet effects j

– Random litter effects ui

• Prior on random effects:

;ii ijjjY u e

2~ 0,i uNIIDu

e NIIDij e~ ,0 2

28


§ /

Posterior inference on and u conditional on known VC.

title 'Posterior inference conditional on known VC';proc mixed data=rcbd_2; class litter diet; model gain = diet / covb solution; random litter; parms (20) (50) /hold = 1,2; lsmeans diet / diff ; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/df=10000; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;

2 50e 2 20u

ββC

β

'k β

29

“Known” variance so tests based on normal (arbitrarily large df) rather than Student t.


§ /

Portions of output

• bSolution for Fixed Effects

Effect diet Estimate

Intercept 79.76

diet 1 -8.58

diet 2 -0.88

diet 3 -6.18

diet 4 2.15

diet 5 0

Covariance Matrix for Fixed Effects

Row Effect

diet

Col1

Col2

Col3

Col4

Col5

Col6

1 Int 7 -5. -5. -5 -5

2 diet 1 -5 10. 5. 5. 5.

3 diet 2 -5. 5. 10. 5. 5.

4 diet 3 -5. 5. 5. 10. 5.

5 diet 4 -5. 5. 5. 5. 10.

6 diet 5

| , , , ~ , βββ R θ CΣ βy o N

30

1

2

3

4

5

ββC


§ /

Posterior densities of marginal means

• contrasts

Label Estimate Standard Error

DF t Value Pr > |t|

diet 1 lsmean

71.1800 2.6458 1E4 26.90 <.0001

diet 2 lsmean

78.8800 2.6458 1E4 29.81 <.0001

diet1 vs diet2 dif

-7.7000 3.1623 1E4 -2.43 0.0149

' | , , , ~ ˆ' , ' ββk β y kΣ k β kR Cθo N

ˆ'k β ' ββk C k

31


§ /

Two stage generalized linear models

• Consider again probit model for binary data.– Likelihood function

– Priors

– Third Stage Prior (if 2u was not known).

1' ' ' '

1

| , 1y β u x β z u x β z ui i

n y y

i i i ii

p

2 1/2 2/2 2

1 1| exp '

22u u A uu qq

uu

p

constantβp

2up

32


§ /

Joint Posterior Density• For all parameters

• Let’s condition on 2u being known (for now)

2 22 || | ,, y ββ u, u βy u uu up pp pp

/21' ' ' '

1

22 12

1exp '

21x β z u x β z u u A u

i in

u

y y

i i i ii

q

uu

p

2 2, || | ,y β u u ββ u y, uu ppp p

1' ' ' ' 1

12

1exp

21 'ux β z u x β z u A u

i iy y

i

n

i ui i i

33

Stage 1 Stage 2 Stage 3

2 stage model

3 stage model


§ /

Log Joint Posterior Density

• Let’s write:

• Log joint posterior = log likelihood + log prior• L = L1 + L2

' ' '

1

2

1'2

log 1 log 11

'2

log ,

x β

β u|y,

z u x β z A uu un

i i

u

iu

i i ii

y y

L p

' ' ' '1

1

log 1 log 1x β z u x β z un

i i i i i ii

L y y

1

2 2

1'

2u A u

u

L

34


§ /

Maximize joint posterior density w.r.t. = [ u]

• i.e. compute joint posterior mode of = [ u]– Analogous to pseudo-likelihood (PL) inference in

PROC GLIMMIX (also penalized likelihood)

• Fisher scoring/Newton Raphson:

2

1

2 21 2

'

'

'' '

'

' +Gθ θ

X WX X WZ

Z WX Z WZθ θθ θ

LLL

1 21

'

'

X v

Z -Gθθ uvθ

LL L

2

[ 1] [ ]ˆ ˆ'

θ θθ θ θ

t tL L

Refer to §❶ for details on v and W.

35

1logi

i

Lv

21

2

logii

i

Lw

' '=x β z ui i i


§ /

PL (or approximate EB)

• Then

[ ] [ ]

[ 1] [ ]

ˆ ˆ

ˆ

1

ˆ

' '

' ''

ˆ ˆ

ˆ

'

ˆβ β β β

u u u

-1

u

X vX WZ β βX WX

Z WZ G Z vu u G uX -Z W t t

[t] [t]

t t

[t 1] [t]

1

ˆ

ˆ

'

'a , | ,

'v

'r 2

uβ β

u u

βu

u u

β-

u

β

β1Z W

X

Z Gβ u

C

WZ C

Z WXy

C

X WX C

2, | ,ˆ

,ˆ

~ β βuβ

uuuβ

Cβu y

Cu

C

Cβ u

approxN

2' ' | , ~ ' ' , 'ˆ

ˆ' βu

u uuβ

ββ KK β M u y K M K M

C

β

M

C

C

C

uu

approxN

36


§ /

How software typically sets this up:

That is,

can be written as:

with being “pseudo-variates”

37

[ ] [ ]

[ 1] [ ]

ˆ ˆ

ˆ ˆ

' ˆ ˆ

ˆ ˆ'

'

' '

'

β β β β

u u u u

-1

X WX X vβ βX WZ

u uZ WZ G Z v-Z G uWX t t

[t] [t]

t t

[t 1 ] 1] [t

[ ] [ ]

[ 1]

ˆ ˆ

ˆ ˆ

ˆ

ˆ'

'

' '

''-

β β β β

u u u

1

u

β

u

X WX X

Z WZ G

W y

Z

X WZ

Z WX W yt t

[t] [t]

t

[t 1]

y


§ /

Application

• Go back to same RCBD….suppose we binarize the data

data binarize; set rcbd_2; y = (gain>75);run;

38


§ /

RCBD example with random blocks

• Weight gains of pigs in feeding trial. Block on litters Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5>75 80.9>75 79.1>75 88.6>75 95.9>75 2 70.9 81.8>75 70.9 88.6>75 85.9>75 3 76.8>75 86.4>75 90.5>75 89.1>75 83.2>75 4 75.9>75 75.5>75 62.7 91.4>75 87.7>75 5 77.3>75 77.3>75 69.5 75.0 74.5 6 66.4 73.2 86.4>75 79.5>75 72.7 7 59.1 77.7>75 72.7 85.0>75 90.9>75 8 64.1 72.3 73.6 75.9>75 60.0 9 74.5 81.4>75 64.5 75.5>75 83.6>75 10 67.3 82.3>75 65.9 70.5 63.2

39


§ /

PL inference using GLIMMIX code(Known VC)

title 'Posterior inference conditional on known VC';proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link=probit; random litter; parms (0.5) /hold = 1; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000 ilink; estimate 'diet 2 lsmean‘ intercept 1 diet 0 1 0 0 0/df=10000 ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;

40

'k β : Estimate on underlying normal scale

'βk : Estimated probability of success


§ / 41

Solutions for Fixed Effects

Effect diet Estimate Standard Error

Intercept 0.3097 0.4772

diet 1 -0.5935 0.5960

diet 2 0.6761 0.6408

diet 3 -0.9019 0.6104

diet 4 0.6775 0.6410

diet 5 0 .

1

0.3097

0.5935

0.6761

0.9019

0.6775

0

1 1 0 0 0 0.20 838'ˆ = k β


§ / 42

Covariance Matrix for Fixed Effects

Effect diet Row Col1 Col2 Col3 Col4 Col5 Col6

Intercept

1 0.2277 -0.1778 -0.1766 -0.1782 -0.1766

diet 1 2 -0.1778 0.3552 0.1760 0.1787 0.1760

diet 2 3 -0.1766 0.1760 0.4107 0.1755 0.1784

diet 3 4 -0.1782 0.1787 0.1755 0.3725 0.1755

diet 4 5 -0.1766 0.1760 0.1784 0.1755 0.4109

diet 5 6

ββC


§ /

Delta method:How well does this generally work?

43

Estimates

Label Estimate Standard Error

DF t Value Pr > |t| Mean StandardErrorMean

diet 1 lsmean

-0.2838 0.4768 10000 -0.60 0.5517 0.3883 0.1827

diet 2 lsmean

0.9858 0.5341 10000 1.85 0.0650 0.8379 0.1311

diet1 vs diet2 dif

-1.2697 0.6433 10000 -1.97 0.0484 Non-est .

' ββk C kˆ'k β

11 0.3883ˆ 0.P 2 8r 83obdiet

11

11

1

1

ˆ

1

1ˆ

0.2838

Prob

ˆ 00.4768ˆ .1827

dietse se

se

: standard normal cdf


§ /

What if variance components are not known?

• Given

• Two stage:

• Three stage:

• Fully inference on 1 ?

2 22

1 fixed and random effects

variance components ( )

, )

,

(

θ

uθ

βθ

u e

1 2 1 1 2| , | |θ y θ y θ θθp p p

1 21 12 2, | | |y y θθ θθθ θp p p p

2

21 1 2| , |θ

θθ θy y θp p d

Known VC

Unknown VC

NOT POSSIBLE PRE-1990s

44


§ /

Approximate empirical Bayes (EB) option

• Goal: Approximate • Note:

• i.e., can be viewed as “weighted” average of conditional densities , the weight function being .

1 | yθp

2

21 1 2| , |θ

θyθ θ y θR

p p d

2

2 21 2| , |θ

θ θθ θy yR

p p d

2 |

21E | ,θ y

θ θ yp

1 | yθp 1 2| , yθ θp

2 | yθp

45


§ /

Bayesian justification of REML

• With flat priors on , maximizing with respect to is REML ( ) of (Harville, 1974).

• If is (nearly) symmetric, then is a reasonable

approximation with perhaps one important exception: is what PROC MIXED(GLIMMIX) essentially does by default (REML/RSPL/RMPL)!

2 | yθp2θ

2θ 2θ

2 | yθp

1 21 2| | ,ˆy θ θ yθ θp p

2 21 1var | var ,ˆ|θ θy θ θ y

46

2θ

2 21ˆ| , yθ θ θp


§ /

Back to Linear Mixed Model• Suppose three stage density with flat priors on , 2

u and 2e .

• Then is the posterior marginal density of interest. – Maximize w.r.t 2

u and 2e to get REML

estimates

• Empirical Bayes strategy: Plug in REML variance component estimates to approximate:

2 2 2 2| , |, ,,β u

β βy u y uu e u

R

e

R

p dp d

11 12 2

1 1 1

ˆ ' ', | , , ~ ,

ˆ ' '

X R X X R Zββ u y

Z R X Z R Z Gue u N

2 2, | yu ep

47

2R=I e2G=A u


§ /

What is ML estimation of VC from a Bayesian perspective?

• Determine the “marginal likelihood”

• Maximize this with respect to , 2u and 2

e to get ML of , 2

u and 2e

– ….assuming flat priors on all three.

• This is essentially what PROC MIXED(GLIMMIX) does with ML (MSPL/MMPL)!

48

2 2 2 2, | , |, ,,u

β β uy uyu e u e

R

dp p


§ /

Approximate empirical Bayes inference

Given

where

then

49

2 2, | , , ,ˆ

ˆ~ β βu

uβ u

β

u

β u y C

β C

Cu

Ce u approx

N

2 2

2 2

11 1

1 1ˆ

1

ˆ

' '

' 'βu

uβ

β

uu

β X R X X R Z

Z R X Z

C

R Z G

C

C Cu u

e e

ˆ' | , , ' , '~ βββ Ck β y R G k k kapprox

N


§ /

Approximate “REML” (PQL) Analysis for GLMM

• Then is not known analytically but must be approximated– e.g. residual pseudo-likelihood method

(RSPL/MSPL) in SAS PROC GLIMMIX

• First approximations proposed by Stiratelli et al. (1984) and Harville and Mee (1984)

y|2p

50

2 22

1 fixed and random effects

variance components ( )

, )

,

(

θ

uθ

βθ

u e


§ /

Other methods for estimating VC in PROC GLIMMIX

• Based on maximizing marginal likelihood:

• Method = QUAD:– Adaptive quadrature: exact but useful only for simple

models.

• Method = LAPLACE:– Generally better approximation to than

MMPL/MSPL and computationally more efficient than QUAD

51

2 2 2 2, | , |, ,,u

β β uy uyu e u e

R

dp p

2 2, , | yβ u ep

More ML-like

rather

than REML like!


§ /

Could we have considered a “residual/restricted” Laplace instead?

• i.e. maximize an approximation of

with respect to the variance components. i.e., maximize:

52

2 2 2 2| , |, ,,β u

β βy u y uu e u

R

e

R

p dp d

( )

2 2

2 2

2 2

2

( )

log |

log |log | 0.

,

,

,5,

',

, ,log

β,u

β,u

β u ,

y

yβ u,

β u β uy

PL

u e

u e

u ePL

p

pp

Tempelman and Gianola (1993; 1996); Tempelman (1998); Wolfinger (1993)Premise: “REML” is generally less biased than “ML”


§ /

Ordinal Categorical Data

• Recall threshold concept

• Let’s extend it to mixed model:

1

1 2

1

1

2o i

ii

m i m

y

m

Xβ Zu e 2~ ,u 0 A uN

~ ( , )e 0 I2 2e e| N

53


§ /

Joint posterior density

• Likelihood:

• Priors

1

I' ' ' '

11 1

| , Pr | ,y β u,τ β u

x β z u x β z ui

n

ii

n m Y k

k i i k i ii k

p ob y y

'1'2

'

x

xX

xn

'1'2

'

z

zZ

zn

2 11/2 2/2 2

1 1| exp '

22u u A u

Au q

uu

p

constantβp

1 2 1constant; ...τ mp

2up

54


§ /

Inference given “known” 2u

• Then

• Use Fisher’s scoring to estimate

2 2| ,, | , |β u y β u,τ,τ y β τuu up ppp p

12

2I

' ' ' '1

1 1

1exp '

2, |β u,τ x β z u x u Aβ z u u,y

in m Y k

k i i k i ik ui

up

' ' ' '1

1

12

1

2log1

, 'l2

I og x ββ u,τ|y z u x β z, uu u An m

i k i i k i ik

uui

L p Y k

' '' 'θ τ β u

1

2[ 1] [ ]log ,| log ,|ˆ ˆE

'

θ y θ yθ θ

θ θ θt tL L

55


§ /

Joint Posterior Mode

• Let

• First derivatives:

' ' ' '1 1

1 1

I log x β z u x β z un m

i k i i k i ii k

L Y k

1

2 2

1'

2u A u

u

L

21 'X v 0ββ β px1

LLL

56

21 1'Z v-Gu u

uu

LLL

21( 1)p

τ ττ0 m x1

LL L

1

1 1

11ni kik

ki ik i k

I YI Yp

P P

1

1

mk i k i

ik ik

vP

Pik

See §❶ for details on p and v

' '=x β z ui i i


§ /

Second derivatives

• Now

57

21 2

2 2

E'

E''

Eθ θθ θ θθ

LL L

2 2 21 1 1

2 2 2 21 1 1 1

2 2 21 1 1

log log logE E E

' ' '

log log log logE E E E

' ' ' '

log log logE E E

' ' '

τ τ τ β τ u

θ θ β τ β β β u

u τ u β u u

L L L

L L L L

L L L

' '

' ' '

' ' '

T L X L Z

X L X WX X WZ

Z L Z WX Z WZ

22

1

logE

'

0 0 0

0 0 0θ θ

0 0 G

L

1 1

2' ' ' '

' ' ' ' ' '

' ''

' ' ' '

0 0 0

0 0 0

0 0

T L X L Z T L X L Z

X L X WX X WZ X L X WX X WZ

Z L Z WX Z WZ+ Z LG GZ WX Z WZ+θ θ

L

See §❶ for details on T, L, and W.


§ /

Fisher’s scoring

• So

• At convergence:

58

[ ] [ ]

[ ] [ ]

[ 1] [ ]

[ 1] [ ]

ˆ ˆˆ ˆ

1 -1

ˆ ˆ

ˆ ˆ

ˆ ˆ' '

' ' ' '

' ' ' 'ˆ ˆτ τ τ τβ β β β

u u u u

τ τ

β β

u

T L X L Z p

X L X WX X WZ X v

Z L Z WX Z W G -G uZ+ u Z vt t

t t

[t] [t]

t t

t t

[t 1] [t]

[ ]

[ ]

1

1 ˆˆ

ˆ

ˆ ' 'ˆvar ' ' '

ˆ ' ' ' τ τβ β

u u

τ T L X L Z

β X L X WX X WZ

u Z L Z WX Z WZ+Gt

t

[t]

Full details in GF(1983)


§ /

Recall GF83 Data H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1

H: Herd (1 or 2)A: Age of Dam (2 = Young heifer, 3 = Older cow)G: Gender or sex (M and F)S: Sire of calf (1, 2, 3, or 4)Y: Ordinal Response (1,2, or 3)

59


§ /

SAS data step:data gf83; input herdyear dam_age calfsex $ sire y @@; if herdyear = 2 then hy = 1; /* create dummy variables */ else hy = 0; if dam_age = 3 then age = 1; else age = 0; if calfsex = 'F' then sex = 1; else sex = 0; datalines; 1 2 M 1 1 1 2 F 1 1

etc.

60


§ /

Reproducing analyses in GF83(based on created dummy variables)

ods select parameterestimates Estimates;proc glimmix data=gf83 ;model y = hy age sex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'fem marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 1 /ilink; estimate 'fem marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 0 /ilink; estimate 'male marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 0 /ilink;run;

61

2u = 1/19

(as chosen by GF83)

''Prob

k βk βcY c ': 0.5 0.5 1kFemales

': 0.5 0.5 0kMales


§ / 62


Effect y Estimate Standard Error

DF t Value Pr > |t|

Intercept 1 0.3755 0.5580 3 0.67 0.5492

Intercept 2 1.0115 0.5789 3 1.75 0.1789

hy -0.2975 0.4950 20 -0.60 0.5546

age 0.1269 0.4987 20 0.25 0.8017

sex 0.3906 0.4967 20 0.79 0.4409

Estimates

Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean

female marginal mean cat. 1

0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212

female marginal mean cat 2

1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123

male marginal mean cate1

0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380

male marginal mean cat 2

0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014

REPRODUCED IN GIANOLA AND FOULLEY (1983)


§ /

Reproducing analyses in GF83(alternative using less than full rank classification model)

ods select parameterestimates estimates;proc glimmix data=gf83 ; class sire herdyear dam_age calfsex; model y = herdyear dam_age calfsex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'female marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'female marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink; estimate 'male marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink;run;

63

''Prob

k βk βcY c ': 0.5 0.5 0.5 0.5 1 0kFemales

': 0.5 0.5 0.5 0.5 0 1kMales


§ /


Effect y calfsex herdyear dam_age Estimate Standard Error

DF t Value Pr > |t|

Intercept 1 0.2050 0.4734 3 0.43 0.6943

Intercept 2 0.8409 0.4946 3 1.70 0.1876

herdyear 1 0.2975 0.4950 20 0.60 0.5546

herdyear 2 0 . . . .

dam_age 2 -0.1269 0.4987 20 -0.25 0.8017

dam_age 3 0 . . . .

calfsex F 0.3906 0.4967 20 0.79 0.4409

calfsex M 0 . . . .

64

Estimates

Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean

female marginal mean category 1

0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212

female marginal mean category 2

1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123

male marginal mean category 1

0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380

male marginal mean category 2

0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014


§ /

Conditional versus marginal (“population-averaged”) probabilities:

• Conditional (on u):

• Marginal (on u):

Marginal probably matters just as much….also…there is no corresponding closed form for (cumulative) logistic mixed models.

65

''Prob

k βk βcY c

' '

'

2Prob Pr b

1oEk

uβ k β

k β

u

cmarginal Y c Y c


§ /

Accounting for unknown 2u?

ods html select covparms;title "Default RSPL";proc glimmix data=gf83 ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

title "Quadrature";proc glimmix data=gf83 method=quad ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

title "Laplace";proc glimmix data=gf83 method = laplace ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;

66

Some alternative

methods available in SAS

PROC GLIMMIX


§ /

Which one should I pick?

67

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept sire 0.2700 0.4837







RSPL

QUAD

LAPLACE “ML” vs. “REML” thing?


§ /

Yet another option:“Residual” Laplace

(Tempelman and Gianola, 1993)

68

Rather than using point estimate, might also weight inferences on and u based p(2

u|y)

Log(p(2

u|y))

2ˆ 0.40u

2

2

|, | E , |

yβ u y β u ,y

uup p


§ /

Summary of GLMM as conventionally done today.

• Some issues– 1. Approximate

• Is that wise? – 2. MML point estimates of 2

u are often badly biased.

• Upwards or downwards??? Unpredictable.– 3. Uncertainty in MML estimates not accounted for.– 4. Marginal versus conditional inference on treatment probabilities?

(applies to other dist’n; e.g. Poisson)– Implications?

• We’ll see later with comparison between empirical Bayes and fully Bayes (using MCMC).

• Obvious dependency on n, q, etc.

2, | , ~β u y up N

69

Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.

Documents

Transcript of Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.