Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.
-
Upload
scot-hicks -
Category
Documents
-
view
222 -
download
1
Transcript of Applied Bayesian Inference, KSU, April 29, 2012 § / §❸Empirical Bayes Robert J. Tempelman 1.
Applied Bayesian Inference, KSU, April 29, 2012
§ /
§❸Empirical Bayes
Robert J. Tempelman
1
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Origins of hierarchical modeling
• A homework question from Mood (1950, p. 164, exercise 23) recounted by Searle et al. (1992)"Suppose intelligence quotients for students in a particular "Suppose intelligence quotients for students in a particular
age group are normally distributed about a mean of 100 age group are normally distributed about a mean of 100 with standard deviation 15. The IQ, say, Ywith standard deviation 15. The IQ, say, Y11, of a , of a particular student is to be estimated by a test on which particular student is to be estimated by a test on which he scores 130. It is further given that test scores are he scores 130. It is further given that test scores are normally distributed about the true IQ as a mean with normally distributed about the true IQ as a mean with standard deviation 5. What is the maximum likelihood standard deviation 5. What is the maximum likelihood estimate of the student's IQ? (The answer is not 130)"estimate of the student's IQ? (The answer is not 130)"
2
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Answer provided by one student (C.R. Henderson)
• The model:The model:
iii eY 2
2 22
215~
100
100
1,
15 15 5
5
i
i
True IQN
YObserved IQ score
This is not really ML but it does maximize the posterior density of (+i)|yj
2
2 2
15
15100| 130 130 1
571 200 i jE a y
130cov ,
|var
i i
i i i i i
i
a yE a y E a y E y
y 3
“shrunk”
Applied Bayesian Inference, KSU, April 29, 2012
§ /
• Later versions of Mood’s textbooks (1963, 1974) were revised:
“What is the maximum likelihood estimate?” replaced by “What is the Bayes estimator?”
Homework was the inspiration of C.R.Henderson’s work on best linear unbiased prediction (BLUP) but also subsequently referred to as empirical Bayes prediction for linear models
4
Applied Bayesian Inference, KSU, April 29, 2012
§ /
What is empirical Bayes ???
5
An excellent primer:
Casella, G. (1985). An introduction to empirical Bayes analysis. The American Statistician 39(2): 83-87
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Casella’s problem specified hierarchically
• Suppose we observe t normal random variables, , each random draws from normal distributions with different means i,
• Suppose it is known (believed) that
• i.e. “random effects model”
2~ ; 1,2,...,,i N i t
tin
NX ii ,...,2,1;,~2
iX
6
2: hyperparameters
Applied Bayesian Inference, KSU, April 29, 2012
§ /
• “ML” solution: • Bayes estimator of i:
2
22 2
2 22 2
ˆ
| , ,
i i
i i
Bayes
nE X
n n
tiX ii ,...,2,1;ˆ
iAs ˆ,02
2
ii XnAs , 7
Applied Bayesian Inference, KSU, April 29, 2012
§ /
What is empirical Bayes?
• Empirical Bayes = Bayes with replaced by estimates.
• Does it work?
22 ,,
iii X
nn
nEBayes
22
2
22
2
ˆˆ
ˆˆ
ˆˆ
ˆ
ˆ
8
Applied Bayesian Inference, KSU, April 29, 2012
§ /
From Casella (1985)
Observed data based on first 45 at-bats for 7 NY Yankees in 1971.
“known” batting average
MONEYBALL!
9
Applied Bayesian Inference, KSU, April 29, 2012
§ /
From Casella (1985)
“Stein effect” estimates can be improved by using information from all coordinates when estimating each coordinate (Stein, 1981)
Stein ≡ shrinkage based estimators
ML EB
i
Bat
ting
aver
ages
0.200
0.300
10
iii X
nn
nEBayes
22
2
22
2
ˆˆ
ˆˆ
ˆˆ
ˆ
ˆ
iX i
Applied Bayesian Inference, KSU, April 29, 2012
§ /
When might Bayes/Stein-type estimation be particularly useful?
• When number of classes (t) are large• When number of observations (n)
per class is small• When ratio of 2 to 2 is small
“Shrinkage is a good thing” Allison et al. (2006)
11
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Microarray experiments.
• A wonderful application of the power of empirical Bayes methods.
• Microarray analysis in a nutshell– …conducting t-tests on differential gene
expression between two (or more) groups for thousands of different genes.
– multiple comparison issues obvious (inspired research on FDR control).
2
_; 1,2,....,
ˆ2g
g
estimated differencet g G
12
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Can we do better than t-tests?
• “By sharing the variance estimate across multiple genes, can form a better estimate for the true residual variance of a given gene, and effectively boost the residual degrees of freedom”
Wright and Simon (2003)
13
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Hierarchical model formulation
• Data stage:
• Second stage
njTiGg
gijgigij eY
,....,2,1;,....,2,1;,...,2,1
;
2,0~ ggij NIDe
212
2
exp
~ ,
a gg
g a
bGamma a b
a b
12 abE g
14
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Empirical Bayes (EB) estimation
• The Bayes estimate of 2g
• Empirical Bayes = Bayes with estimates of a and b:– Marginal ML estimation of a and b advocated by Wright and
Simon (2003)– Method of moments might be good if G is large.
Modify t-test statistic accordingly,
122
ˆ 2
2g
g
n a aT
T
b
n a
2ˆg
S
T
SE
n
2~_
g
gc
differenceestimatedt
Including posterior degrees of freedom:
2n T a 15
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Observed Type I error rates (from Wright and Simon, 2003)
Pooled: As if the residual variance was the same for all genes
16
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Power for P < 0.001 and n = 5(Wright and Simon, 2003)
17
Applied Bayesian Inference, KSU, April 29, 2012
§ / 18
Less of a need for shrinkage with larger n.
Power for P < 0.001 and n = 10(Wright and Simon, 2003)
Applied Bayesian Inference, KSU, April 29, 2012
§ /
“Shrinkage is a good thing” (David Allison, UAB)(Microarray data from MSU)
-0.2 -0.1 0.0 0.1 0.2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Regular linear model
Trt effect on log2 scale
-lo
g10
(p-v
alu
e)
-0.2 -0.1 0.0 0.1 0.2
01
23
4
Shrinkage analysis
Trt effect on log2 scale
-lo
g10
(p-v
alu
e)
Volcano Plots-Log10(P-value) vs. estimated trt effect for a simple design (no subsampling)
-1 -0.5 0.0 0.5 1.0
-1 -0.5 0.0 0.5 1.0
Regular contrast t-test Shrinkage (on variance) based contrast t test
2
_
ˆg
g
estimated differencet
c 2~
_
g
gc
differenceestimatedt
19
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Bayesian inference in the linear mixed model (Lindley and Smith, 1972; Sorensen and Gianola, 2002)
• First stage: Y ~ N(X+Zu, R) (Let R = I2e)
• Second Stage Priors:– Subjective:
– Structural
• Third stage priors
11/2/2
1 1| , exp
22y β u y Xβ Zu R y Xβ Zu
Rnp '
11~ exp '
2β β β β V β βo op
11~ exp '
2u u u G up
2 2~e ep 2 2~u up
(Let G = A2u; A is known)
20
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Starting point for Bayesian inference
• Write joint posterior density:
• Want to make fully Bayesian probability statements on and u?– Integrate out uncertainty on all other unknowns.
/
/
22 1 21
2
1ex
1exp '
2p
1exp
'
2, ,
2
y Xβ Zu y
β
β u, , |y, Xβ
β V β
V
u u
β Zu
Aβq
o o
2 2 n2e 2
e
u
u o
e2u 2
u
e
p
'
p
p
0 0
,,,, 2u
2eo
2u
2eo ddpp Vy,|,u,Vy,|u,
21
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Let’s suppose (for now) that variance components are known
• i.e. no third stage prior necessary…conditional inference
• Rewrite:– Likelihood– Prior
1
1
1
0
1ex
1exp
1exp
2
p '22
'
y Xβ Zu R y Xβ,u|R,G,β ,V
β β V β β
β Zuy
u G u o o
'p
' 'θ β u'
| , ~ , ββ V 0θ θ Σ θ Σ
0 0 Go
o oN
11exp
2y|R,θ y Wθ R y Wθp '
W X Z
22
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Bayesian inference with known VC
where
In other words:
11
1
1
' 11exp
2, ,
1 ˆ ˆexp
e
2
1xp
2
'
θ θ Σy Wθθ|θ Σ R,y
θ θ W R W Σ θ θ
R Wθ θy θoo op
'
'
11 1 1 10
ˆˆ
ˆ' 'β
θ W R W Σ W R y Σ θu
1
111
111
''
'',
ˆ
ˆ~,,,|,
GZRZXRZ
ZRXVXRX
uRyu
No
23
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Flat prior on .
Note that as diagonal elements of V , diagonal elements of V -1 0,
Hence
where
Henderson’s MME (Robinson, 1991)
11 1 1
1 1 1 1
ˆ ' ' '
ˆ ' ' '
X R X X R Z X R yβ
Z R X Z R Z G Z R yu
11 1
1 1 1
ˆ ' ', | , , , ~ ,
ˆ ' '
X R X X R Zββ u y R Σ θ
Z R X Z R Z Guo N
1~ exp ' 1
2β β β β 0 β βo op
:
24
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Inference• Write:
Posterior density of K’ + M’u:
1 1
1 11
1
'
' '
' βu
uβ
ββ
uuCZ R Z G
CX R ZCZ R
CX R X
X
' ' | , , , ~ ' ' 'ˆ ˆ , ' β βu
uβ uu
β KK M y R Σ θ K M K M
Mu
Cβ u
C
Cβ
Co N
' | , , , ~ ' ˆ , ' ββK y R Σ θ K K Kβ β Co N
25
(M=0)
Applied Bayesian Inference, KSU, April 29, 2012
§ /
RCBD example with random blocks
• Weight gains of pigs in feeding trial (Gill, 1978). Block on litters
Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2
26
Applied Bayesian Inference, KSU, April 29, 2012
§ /
data rcbd; input litter diet1-diet5; datalines; 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2;
data rcbd_2 (drop=diet1-diet5); set rcbd; diet = 1; gain=diet1; output; diet = 2; gain=diet2; output; diet = 3; gain=diet3; output; diet = 4; gain=diet4; output; diet = 5; gain=diet5; output;run;
27
Applied Bayesian Inference, KSU, April 29, 2012
§ /
RCBD model
• Linear Model:
– Fixed diet effects j
– Random litter effects ui
• Prior on random effects:
;ii ijjjY u e
2~ 0,i uNIIDu
e NIIDij e~ ,0 2
28
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Posterior inference on and u conditional on known VC.
title 'Posterior inference conditional on known VC';proc mixed data=rcbd_2; class litter diet; model gain = diet / covb solution; random litter; parms (20) (50) /hold = 1,2; lsmeans diet / diff ; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/df=10000; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;
2 50e 2 20u
ββC
β
'k β
29
“Known” variance so tests based on normal (arbitrarily large df) rather than Student t.
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Portions of output
• bSolution for Fixed Effects
Effect diet Estimate
Intercept 79.76
diet 1 -8.58
diet 2 -0.88
diet 3 -6.18
diet 4 2.15
diet 5 0
Covariance Matrix for Fixed Effects
Row Effect
diet
Col1
Col2
Col3
Col4
Col5
Col6
1 Int 7 -5. -5. -5 -5
2 diet 1 -5 10. 5. 5. 5.
3 diet 2 -5. 5. 10. 5. 5.
4 diet 3 -5. 5. 5. 10. 5.
5 diet 4 -5. 5. 5. 5. 10.
6 diet 5
| , , , ~ , βββ R θ CΣ βy o N
30
1
2
3
4
5
ββC
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Posterior densities of marginal means
• contrasts
Label Estimate Standard Error
DF t Value Pr > |t|
diet 1 lsmean
71.1800 2.6458 1E4 26.90 <.0001
diet 2 lsmean
78.8800 2.6458 1E4 29.81 <.0001
diet1 vs diet2 dif
-7.7000 3.1623 1E4 -2.43 0.0149
' | , , , ~ ˆ' , ' ββk β y kΣ k β kR Cθo N
ˆ'k β ' ββk C k
31
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Two stage generalized linear models
• Consider again probit model for binary data.– Likelihood function
– Priors
– Third Stage Prior (if 2u was not known).
1' ' ' '
1
| , 1y β u x β z u x β z ui i
n y y
i i i ii
p
2 1/2 2/2 2
1 1| exp '
22u u A uu qq
uu
p
constantβp
2up
32
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Joint Posterior Density• For all parameters
• Let’s condition on 2u being known (for now)
2 22 || | ,, y ββ u, u βy u uu up pp pp
/21' ' ' '
1
22 12
1exp '
21x β z u x β z u u A u
i in
u
y y
i i i ii
q
uu
p
2 2, || | ,y β u u ββ u y, uu ppp p
1' ' ' ' 1
12
1exp
21 'ux β z u x β z u A u
i iy y
i
n
i ui i i
33
Stage 1 Stage 2 Stage 3
2 stage model
3 stage model
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Log Joint Posterior Density
• Let’s write:
• Log joint posterior = log likelihood + log prior• L = L1 + L2
' ' '
1
2
1'2
log 1 log 11
'2
log ,
x β
β u|y,
z u x β z A uu un
i i
u
iu
i i ii
y y
L p
' ' ' '1
1
log 1 log 1x β z u x β z un
i i i i i ii
L y y
1
2 2
1'
2u A u
u
L
34
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Maximize joint posterior density w.r.t. = [ u]
• i.e. compute joint posterior mode of = [ u]– Analogous to pseudo-likelihood (PL) inference in
PROC GLIMMIX (also penalized likelihood)
• Fisher scoring/Newton Raphson:
2
1
2 21 2
'
'
'' '
'
' +Gθ θ
X WX X WZ
Z WX Z WZθ θθ θ
LLL
1 21
'
'
X v
Z -Gθθ uvθ
LL L
2
[ 1] [ ]ˆ ˆ'
θ θθ θ θ
t tL L
Refer to §❶ for details on v and W.
35
1logi
i
Lv
21
2
logii
i
Lw
' '=x β z ui i i
Applied Bayesian Inference, KSU, April 29, 2012
§ /
PL (or approximate EB)
• Then
[ ] [ ]
[ 1] [ ]
ˆ ˆ
ˆ
1
ˆ
' '
' ''
ˆ ˆ
ˆ
'
ˆβ β β β
u u u
-1
u
X vX WZ β βX WX
Z WZ G Z vu u G uX -Z W t t
[t] [t]
t t
[t 1] [t]
1
ˆ
ˆ
'
'a , | ,
'v
'r 2
uβ β
u u
βu
u u
β-
u
β
β1Z W
X
Z Gβ u
C
WZ C
Z WXy
C
X WX C
2, | ,ˆ
,ˆ
~ β βuβ
uuuβ
Cβu y
Cu
C
Cβ u
approxN
2' ' | , ~ ' ' , 'ˆ
ˆ' βu
u uuβ
ββ KK β M u y K M K M
C
β
M
C
C
C
uu
approxN
36
Applied Bayesian Inference, KSU, April 29, 2012
§ /
How software typically sets this up:
That is,
can be written as:
with being “pseudo-variates”
37
[ ] [ ]
[ 1] [ ]
ˆ ˆ
ˆ ˆ
' ˆ ˆ
ˆ ˆ'
'
' '
'
β β β β
u u u u
-1
X WX X vβ βX WZ
u uZ WZ G Z v-Z G uWX t t
[t] [t]
t t
[t 1 ] 1] [t
[ ] [ ]
[ 1]
ˆ ˆ
ˆ ˆ
ˆ
ˆ'
'
' '
''-
β β β β
u u u
1
u
β
u
X WX X
Z WZ G
W y
Z
X WZ
Z WX W yt t
[t] [t]
t
[t 1]
y
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Application
• Go back to same RCBD….suppose we binarize the data
data binarize; set rcbd_2; y = (gain>75);run;
38
Applied Bayesian Inference, KSU, April 29, 2012
§ /
RCBD example with random blocks
• Weight gains of pigs in feeding trial. Block on litters Litter Diet 1 Diet 2 Diet 3 Diet 4 Diet 5 1 79.5>75 80.9>75 79.1>75 88.6>75 95.9>75 2 70.9 81.8>75 70.9 88.6>75 85.9>75 3 76.8>75 86.4>75 90.5>75 89.1>75 83.2>75 4 75.9>75 75.5>75 62.7 91.4>75 87.7>75 5 77.3>75 77.3>75 69.5 75.0 74.5 6 66.4 73.2 86.4>75 79.5>75 72.7 7 59.1 77.7>75 72.7 85.0>75 90.9>75 8 64.1 72.3 73.6 75.9>75 60.0 9 74.5 81.4>75 64.5 75.5>75 83.6>75 10 67.3 82.3>75 65.9 70.5 63.2
39
Applied Bayesian Inference, KSU, April 29, 2012
§ /
PL inference using GLIMMIX code(Known VC)
title 'Posterior inference conditional on known VC';proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link=probit; random litter; parms (0.5) /hold = 1; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000 ilink; estimate 'diet 2 lsmean‘ intercept 1 diet 0 1 0 0 0/df=10000 ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run;
40
'k β : Estimate on underlying normal scale
'βk : Estimated probability of success
Applied Bayesian Inference, KSU, April 29, 2012
§ / 41
Solutions for Fixed Effects
Effect diet Estimate Standard Error
Intercept 0.3097 0.4772
diet 1 -0.5935 0.5960
diet 2 0.6761 0.6408
diet 3 -0.9019 0.6104
diet 4 0.6775 0.6410
diet 5 0 .
1
0.3097
0.5935
0.6761
0.9019
0.6775
0
1 1 0 0 0 0.20 838'ˆ = k β
Applied Bayesian Inference, KSU, April 29, 2012
§ / 42
Covariance Matrix for Fixed Effects
Effect diet Row Col1 Col2 Col3 Col4 Col5 Col6
Intercept
1 0.2277 -0.1778 -0.1766 -0.1782 -0.1766
diet 1 2 -0.1778 0.3552 0.1760 0.1787 0.1760
diet 2 3 -0.1766 0.1760 0.4107 0.1755 0.1784
diet 3 4 -0.1782 0.1787 0.1755 0.3725 0.1755
diet 4 5 -0.1766 0.1760 0.1784 0.1755 0.4109
diet 5 6
ββC
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Delta method:How well does this generally work?
43
Estimates
Label Estimate Standard Error
DF t Value Pr > |t| Mean StandardErrorMean
diet 1 lsmean
-0.2838 0.4768 10000 -0.60 0.5517 0.3883 0.1827
diet 2 lsmean
0.9858 0.5341 10000 1.85 0.0650 0.8379 0.1311
diet1 vs diet2 dif
-1.2697 0.6433 10000 -1.97 0.0484 Non-est .
' ββk C kˆ'k β
11 0.3883ˆ 0.P 2 8r 83obdiet
11
11
1
1
ˆ
1
1ˆ
0.2838
Prob
ˆ 00.4768ˆ .1827
dietse se
se
: standard normal cdf
Applied Bayesian Inference, KSU, April 29, 2012
§ /
What if variance components are not known?
• Given
• Two stage:
• Three stage:
• Fully inference on 1 ?
2 22
1 fixed and random effects
variance components ( )
, )
,
(
θ
uθ
βθ
u e
1 2 1 1 2| , | |θ y θ y θ θθp p p
1 21 12 2, | | |y y θθ θθθ θp p p p
2
21 1 2| , |θ
θθ θy y θp p d
Known VC
Unknown VC
NOT POSSIBLE PRE-1990s
44
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Approximate empirical Bayes (EB) option
• Goal: Approximate • Note:
• i.e., can be viewed as “weighted” average of conditional densities , the weight function being .
1 | yθp
2
21 1 2| , |θ
θyθ θ y θR
p p d
2
2 21 2| , |θ
θ θθ θy yR
p p d
2 |
21E | ,θ y
θ θ yp
1 | yθp 1 2| , yθ θp
2 | yθp
45
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Bayesian justification of REML
• With flat priors on , maximizing with respect to is REML ( ) of (Harville, 1974).
• If is (nearly) symmetric, then is a reasonable
approximation with perhaps one important exception: is what PROC MIXED(GLIMMIX) essentially does by default (REML/RSPL/RMPL)!
2 | yθp2θ
2θ 2θ
2 | yθp
1 21 2| | ,ˆy θ θ yθ θp p
2 21 1var | var ,ˆ|θ θy θ θ y
46
2θ
2 21ˆ| , yθ θ θp
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Back to Linear Mixed Model• Suppose three stage density with flat priors on , 2
u and 2e .
• Then is the posterior marginal density of interest. – Maximize w.r.t 2
u and 2e to get REML
estimates
• Empirical Bayes strategy: Plug in REML variance component estimates to approximate:
2 2 2 2| , |, ,,β u
β βy u y uu e u
R
e
R
p dp d
11 12 2
1 1 1
ˆ ' ', | , , ~ ,
ˆ ' '
X R X X R Zββ u y
Z R X Z R Z Gue u N
2 2, | yu ep
47
2R=I e2G=A u
Applied Bayesian Inference, KSU, April 29, 2012
§ /
What is ML estimation of VC from a Bayesian perspective?
• Determine the “marginal likelihood”
• Maximize this with respect to , 2u and 2
e to get ML of , 2
u and 2e
– ….assuming flat priors on all three.
• This is essentially what PROC MIXED(GLIMMIX) does with ML (MSPL/MMPL)!
48
2 2 2 2, | , |, ,,u
β β uy uyu e u e
R
dp p
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Approximate empirical Bayes inference
Given
where
then
49
2 2, | , , ,ˆ
ˆ~ β βu
uβ u
β
u
β u y C
β C
Cu
Ce u approx
N
2 2
2 2
11 1
1 1ˆ
1
ˆ
' '
' 'βu
uβ
β
uu
β X R X X R Z
Z R X Z
C
R Z G
C
C Cu u
e e
ˆ' | , , ' , '~ βββ Ck β y R G k k kapprox
N
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Approximate “REML” (PQL) Analysis for GLMM
• Then is not known analytically but must be approximated– e.g. residual pseudo-likelihood method
(RSPL/MSPL) in SAS PROC GLIMMIX
• First approximations proposed by Stiratelli et al. (1984) and Harville and Mee (1984)
y|2p
50
2 22
1 fixed and random effects
variance components ( )
, )
,
(
θ
uθ
βθ
u e
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Other methods for estimating VC in PROC GLIMMIX
• Based on maximizing marginal likelihood:
• Method = QUAD:– Adaptive quadrature: exact but useful only for simple
models.
• Method = LAPLACE:– Generally better approximation to than
MMPL/MSPL and computationally more efficient than QUAD
51
2 2 2 2, | , |, ,,u
β β uy uyu e u e
R
dp p
2 2, , | yβ u ep
More ML-like
rather
than REML like!
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Could we have considered a “residual/restricted” Laplace instead?
• i.e. maximize an approximation of
with respect to the variance components. i.e., maximize:
52
2 2 2 2| , |, ,,β u
β βy u y uu e u
R
e
R
p dp d
( )
2 2
2 2
2 2
2
( )
log |
log |log | 0.
,
,
,5,
',
, ,log
β,u
β,u
β u ,
y
yβ u,
β u β uy
PL
u e
u e
u ePL
p
pp
Tempelman and Gianola (1993; 1996); Tempelman (1998); Wolfinger (1993)Premise: “REML” is generally less biased than “ML”
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Ordinal Categorical Data
• Recall threshold concept
• Let’s extend it to mixed model:
1
1 2
1
1
2o i
ii
m i m
y
m
Xβ Zu e 2~ ,u 0 A uN
~ ( , )e 0 I2 2e e| N
53
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Joint posterior density
• Likelihood:
• Priors
1
I' ' ' '
11 1
| , Pr | ,y β u,τ β u
x β z u x β z ui
n
ii
n m Y k
k i i k i ii k
p ob y y
'1'2
'
x
xX
xn
'1'2
'
z
zZ
zn
2 11/2 2/2 2
1 1| exp '
22u u A u
Au q
uu
p
constantβp
1 2 1constant; ...τ mp
2up
54
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Inference given “known” 2u
• Then
• Use Fisher’s scoring to estimate
2 2| ,, | , |β u y β u,τ,τ y β τuu up ppp p
12
2I
' ' ' '1
1 1
1exp '
2, |β u,τ x β z u x u Aβ z u u,y
in m Y k
k i i k i ik ui
up
' ' ' '1
1
12
1
2log1
, 'l2
I og x ββ u,τ|y z u x β z, uu u An m
i k i i k i ik
uui
L p Y k
' '' 'θ τ β u
1
2[ 1] [ ]log ,| log ,|ˆ ˆE
'
θ y θ yθ θ
θ θ θt tL L
55
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Joint Posterior Mode
• Let
• First derivatives:
' ' ' '1 1
1 1
I log x β z u x β z un m
i k i i k i ii k
L Y k
1
2 2
1'
2u A u
u
L
21 'X v 0ββ β px1
LLL
56
21 1'Z v-Gu u
uu
LLL
21( 1)p
τ ττ0 m x1
LL L
1
1 1
11ni kik
ki ik i k
I YI Yp
P P
1
1
mk i k i
ik ik
vP
Pik
See §❶ for details on p and v
' '=x β z ui i i
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Second derivatives
• Now
57
21 2
2 2
E'
E''
Eθ θθ θ θθ
LL L
2 2 21 1 1
2 2 2 21 1 1 1
2 2 21 1 1
log log logE E E
' ' '
log log log logE E E E
' ' ' '
log log logE E E
' ' '
τ τ τ β τ u
θ θ β τ β β β u
u τ u β u u
L L L
L L L L
L L L
' '
' ' '
' ' '
T L X L Z
X L X WX X WZ
Z L Z WX Z WZ
22
1
logE
'
0 0 0
0 0 0θ θ
0 0 G
L
1 1
2' ' ' '
' ' ' ' ' '
' ''
' ' ' '
0 0 0
0 0 0
0 0
T L X L Z T L X L Z
X L X WX X WZ X L X WX X WZ
Z L Z WX Z WZ+ Z LG GZ WX Z WZ+θ θ
L
See §❶ for details on T, L, and W.
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Fisher’s scoring
• So
• At convergence:
58
[ ] [ ]
[ ] [ ]
[ 1] [ ]
[ 1] [ ]
ˆ ˆˆ ˆ
1 -1
ˆ ˆ
ˆ ˆ
ˆ ˆ' '
' ' ' '
' ' ' 'ˆ ˆτ τ τ τβ β β β
u u u u
τ τ
β β
u
T L X L Z p
X L X WX X WZ X v
Z L Z WX Z W G -G uZ+ u Z vt t
t t
[t] [t]
t t
t t
[t 1] [t]
[ ]
[ ]
1
1 ˆˆ
ˆ
ˆ ' 'ˆvar ' ' '
ˆ ' ' ' τ τβ β
u u
τ T L X L Z
β X L X WX X WZ
u Z L Z WX Z WZ+Gt
t
[t]
Full details in GF(1983)
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Recall GF83 Data H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1
H: Herd (1 or 2)A: Age of Dam (2 = Young heifer, 3 = Older cow)G: Gender or sex (M and F)S: Sire of calf (1, 2, 3, or 4)Y: Ordinal Response (1,2, or 3)
59
Applied Bayesian Inference, KSU, April 29, 2012
§ /
SAS data step:data gf83; input herdyear dam_age calfsex $ sire y @@; if herdyear = 2 then hy = 1; /* create dummy variables */ else hy = 0; if dam_age = 3 then age = 1; else age = 0; if calfsex = 'F' then sex = 1; else sex = 0; datalines; 1 2 M 1 1 1 2 F 1 1
etc.
60
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Reproducing analyses in GF83(based on created dummy variables)
ods select parameterestimates Estimates;proc glimmix data=gf83 ;model y = hy age sex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'fem marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 1 /ilink; estimate 'fem marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 0 /ilink; estimate 'male marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 0 /ilink;run;
61
2u = 1/19
(as chosen by GF83)
''Prob
k βk βcY c ': 0.5 0.5 1kFemales
': 0.5 0.5 0kMales
Applied Bayesian Inference, KSU, April 29, 2012
§ / 62
Solutions for Fixed Effects
Effect y Estimate Standard Error
DF t Value Pr > |t|
Intercept 1 0.3755 0.5580 3 0.67 0.5492
Intercept 2 1.0115 0.5789 3 1.75 0.1789
hy -0.2975 0.4950 20 -0.60 0.5546
age 0.1269 0.4987 20 0.25 0.8017
sex 0.3906 0.4967 20 0.79 0.4409
Estimates
Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean
female marginal mean cat. 1
0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212
female marginal mean cat 2
1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123
male marginal mean cate1
0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380
male marginal mean cat 2
0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014
REPRODUCED IN GIANOLA AND FOULLEY (1983)
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Reproducing analyses in GF83(alternative using less than full rank classification model)
ods select parameterestimates estimates;proc glimmix data=gf83 ; class sire herdyear dam_age calfsex; model y = herdyear dam_age calfsex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'female marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'female marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink; estimate 'male marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink;run;
63
''Prob
k βk βcY c ': 0.5 0.5 0.5 0.5 1 0kFemales
': 0.5 0.5 0.5 0.5 0 1kMales
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Solutions for Fixed Effects
Effect y calfsex herdyear dam_age Estimate Standard Error
DF t Value Pr > |t|
Intercept 1 0.2050 0.4734 3 0.43 0.6943
Intercept 2 0.8409 0.4946 3 1.70 0.1876
herdyear 1 0.2975 0.4950 20 0.60 0.5546
herdyear 2 0 . . . .
dam_age 2 -0.1269 0.4987 20 -0.25 0.8017
dam_age 3 0 . . . .
calfsex F 0.3906 0.4967 20 0.79 0.4409
calfsex M 0 . . . .
64
Estimates
Label Estimate Standard Error DF t Value Pr > |t| Mean StandardErrorMean
female marginal mean category 1
0.6808 0.3829 20 1.78 0.0906 0.7520 0.1212
female marginal mean category 2
1.3168 0.4249 20 3.10 0.0057 0.9060 0.07123
male marginal mean category 1
0.2902 0.3607 20 0.80 0.4305 0.6142 0.1380
male marginal mean category 2
0.9262 0.3902 20 2.37 0.0277 0.8228 0.1014
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Conditional versus marginal (“population-averaged”) probabilities:
• Conditional (on u):
• Marginal (on u):
Marginal probably matters just as much….also…there is no corresponding closed form for (cumulative) logistic mixed models.
65
''Prob
k βk βcY c
' '
'
2Prob Pr b
1oEk
uβ k β
k β
u
cmarginal Y c Y c
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Accounting for unknown 2u?
ods html select covparms;title "Default RSPL";proc glimmix data=gf83 ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;
title "Quadrature";proc glimmix data=gf83 method=quad ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;
title "Laplace";proc glimmix data=gf83 method = laplace ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run;
66
Some alternative
methods available in SAS
PROC GLIMMIX
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Which one should I pick?
67
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
Intercept sire 0.2700 0.4837
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
Intercept sire 0.02568 0.2947
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
Intercept sire 0.02488 0.2898
RSPL
QUAD
LAPLACE “ML” vs. “REML” thing?
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Yet another option:“Residual” Laplace
(Tempelman and Gianola, 1993)
68
Rather than using point estimate, might also weight inferences on and u based p(2
u|y)
Log(p(2
u|y))
2ˆ 0.40u
2
2
|, | E , |
yβ u y β u ,y
uup p
Applied Bayesian Inference, KSU, April 29, 2012
§ /
Summary of GLMM as conventionally done today.
• Some issues– 1. Approximate
• Is that wise? – 2. MML point estimates of 2
u are often badly biased.
• Upwards or downwards??? Unpredictable.– 3. Uncertainty in MML estimates not accounted for.– 4. Marginal versus conditional inference on treatment probabilities?
(applies to other dist’n; e.g. Poisson)– Implications?
• We’ll see later with comparison between empirical Bayes and fully Bayes (using MCMC).
• Obvious dependency on n, q, etc.
2, | , ~β u y up N
69