Bayesian shrinkage approach in variable selection for...

Bayesian shrinkage approach in variableselection for mixed effects models

Mingan Yang

GGI Statistics Conference, Florence, 2015Bayesian Variable Selection

June 22-26, 2015

Mingan Yang Bayesian shrinkage approach in variable selection for mixed effects models

Outline

1 Introduction

2 model

3 Simulation

4 Application


Introductionmodel

SimulationApplication

Outline

1 Introduction

2 model

3 Simulation

4 Application


Introductionmodel


Variable selection problem draws considerable attention.Penalized regression with shrinkage estimation andsimultaneous variable selection seems promisingFrequentist approaches: Tibshirani (1996), Fan and Li(2001), Zou (2006), Zou and Li (2008) etc.Bayesian approaches: Park and Casella (2008), Hans(2009), Carvalho et. al. (2009, 2010), Griffin and Brown(2007), and Armagan et al. (2013)None proposed for mixed effects models (LME). Evenmore challenging for LME coupled with random effects.


Introductionmodel


Linear mixed effects (LME) model:

yi = Xiβ + Ziζi + εi

ζi ∼ N(0,Ψ)

εi ∼ N(0, σ2Λ)

Challenge: standard methods such as AIC, BIC, Bayesfactors etc. do not work well different number of randomeffects. (4p models for p × 1 fixed and random vector)Very few approaches proposed so far: Chen and Dunson(2003), Cai and Dunson (2006), Kinney and Dunson(2007), Bondell et al. (2010), Ibrahim et al. (2010), Yang(2012, 2013).


Introductionmodel


Some current methods for random effects

Chen and Dunson (2003): Stochastic search variableselection (SSVS) approach to the LMM.Kinney and Dunson (2007):Application of dataaugmentation algorithm, parameter expansion (Gelman,2004, 2005) and model approximation for the GLMM.Ibrahim et. al. (2010):Maximum penalized likelihood (MPL)and smoothly clipped absolute deviation (SCAD) andALASSOBondell et. al. (2010): adaptive LASSO and EMCai and Dunson (2006):Taylor series expansion for GLMM.All these models are parametric ones.


Introductionmodel


We use Bayesian shrinkage priors for joint selection ofboth fixed and random effectsDesirable properties of such priors: spikes at zero, studentt-like tails, simple characterization as a scale mixture ofnormals to facilitate computatonSuch shrinkage priors can shrink small coefficients to zerowithout over-shrink large coefficients.


Introductionmodel


Outline

1 Introduction

2 model

3 Simulation

4 Application


Introductionmodel


Shrinkage priors for fixed effects

Stochastic search variable selection (SSVS, George andMcCulloch 1993): Introduce latent variable J, βJ

k = (βk , jk = 1)′

We use generalized double Pareto, horse shoe prior, andnormal-exponential-gamma priors for the fixed effects.


Introductionmodel


Stochastic search variable selection:SSVS

Proposed by George and McCulloch (1993), originally forsubset selection in linear regression, searching for modelswith highest posterior probabilitiesstarting with full models containing all candidate predictorschoosing mixture priors that allow predictors to drop byzeroing the coefficientsrunning Gibbs sampler with conditional conjugacy from theposterior distribution.


Introductionmodel


Mixed effects

β = (β1, · · · , βp)′, J = (j1, · · · , jp)′

jk = 0 indicates βk = 0jk 6= 0 for βk 6= 0

We take the Zellner g-prior (Zellner and Siow, 1980)βJ ∼ N(0, σ2(X J′

X J)−1/g), g ∼ Gamma(.) and Bernoullifor jkGenerally ζ ∼ N(0,Ω). Application of Choleskydecomposition Ω = ΛΓΓ′Λ where Λ is a positive diagonalmatrix with diagonal elements λl , l = 1, · · · ,q proportionalto the random effects standard deviations, and Γ is a lowertriangular matrix with diagonal elements equal to 1.


Introductionmodel


The generalized double Pareto (GDP) density is defined asf (x |a,b) = 1/(2a)(1 + |x |/(ab))−b+1, x ∼ GDP(a,b).The GDP resembles the Laplace density with similarsparse shrinkage properties, also has Cauchy-like tails,which avoids over-shrinkageHowever the posterior form is very complicated and theparameters are not in closed forms. Instead, it can beexpressed as a scale mixture of normal distribution.x ∼ N(0, τ), τ ∼ Exp(λ2/2), and λ ∼ Ga(α, η), thenx ∼ GDP(a = η/b,b).


Introductionmodel


β jk ∼ N(0, σ2/gκk ), κk ∼ exp(ϕ2

k/2), ϕk ∼ Gamma(ν1, ϑ1),

β jk ∼ N(0, σ2/gκ2

k ), κk ∼ Ca+(0, ν2), ν2 ∼ Ca+(0, ϑ2),

β jk ∼ N(0, σ2/gκk ), κk ∼ exp(ς2/2), ς2 ∼ Gamma(ν3, ϑ3),


Introductionmodel


Model specification

yij = X ′ijβ + Z ′ijζi + εij , V (ζi) = Ω

i = 1, · · · ,n, j = 1, · · · ,ni

Xij : q × 1

Cholesky decomposition

yij = X ′ijβ + Z ′ijΛΓbi + εij , ΛΓV (bi)Γ′Λ′ = Ω

Λ : Diag(λ1, · · · , λq)

Γ: lower triangular matrix q × q with diagonal element 1.Mingan Yang Bayesian shrinkage approach in variable selection for mixed effects models

Introductionmodel


Prior for random effects

The random effect with the generalized shrinkage prior isspecified as follows:

bik ∼ Gk

Gk ∼ DP(αGk0)

Gk0 ∼ GDP(1.0,1.0)

The horse shoe prior and normal-exponential-gamma priorcan be specified similarly.Λ : Diag(λ1, · · · , λq) Γ: lower triangular matrix q × q withdiagonal element 1.λk ∼ pk0δ0 + (1− pk0)N+(0, φ2

k ), φ2k ∼ IG(1/2,1/2)


Introductionmodel


Parameter expansion (PX)

Liu and Wu (1999), Gelman (2004, 2006)y |θ, z ∼ N(θ + z,1), z|θ ∼ N(0,D)θ unknown parameter, z missingy |θ, α,w ∼ N(θ − α + w ,1), w |θ, α ∼ N(α,D)z −→ α,w

PX breaks association, speeds up convergence.


Introductionmodel


Outline

1 Introduction

2 model

3 Simulation

4 Application


Introductionmodel


yij = Xijβ + Zijζi + εij , εij ∼ N(0, σ2)

Xij = (1, xij1, · · · , xij15)′, xijk ∼ Unif (−2,2)

zijk = xijk , k = 1, · · · ,4

Case 1:%i ∼ N(0,Ω),Ω =

0.00 0.00 0.00 0.000.00 0.9 0.48 0.060.00 0.48 0.40 0.100.00 0.06 0.10 0.10

.

β = (β1, · · · , β16)′, β1 = β2 = β3 = 1.5, βk = 0, k = 4, ...,16Case 2:q = 8, (%i1, · · · %i4)′ ∼ N(0,Ω1),Ω = Ω1 same as Case 1

Case 3:β4 = · · · = β9 = 0.01,everything else same as Case 2.


Introductionmodel


Case Method PF PR PM MSE S.E.1 GDPP 1.00 0.63 0.63 0.12 0.18

HSPP 1.00 0.64 0.63 0.11 0.19NEGP 0.98 0.59 0.58 0.13 0.19KDNP 0.86 0.57 0.46 0.18 0.25

2 GDPP 1.00 0.57 0.55 0.11 0.011HSPP 0.97 0.56 0.54 0.12 0.013NEGP 0.96 0.54 0.53 0.13 0.014KDNP 0.82 0.53 0.44 0.16 0.39

3 GDPP 1.00 0.60 0.60 0.18 0.10HSPP 0.99 0.61 0.59 0.20 0.11NEGP 0.93 0.58 0.56 0.26 0.15KDNP 0.40 0.55 0.23 0.35 0.24


Introductionmodel


captionComparisons of parameters estimates in the simulationstudy. S.E. is the standard error of the MSE.PM is thepercentage of times the true model was selected, PF and PRare the percentage of times the correct fixed and randomeffects were selected respectively.


Introductionmodel


Table: Models with highest posterior probability in the simulationstudy for Case 3: ∗ is the true model

Model GDPP HSPP NEGP KDNPxij1, xij2, xij3, zij2, zij3, z∗

ij4 0.62 0.61 0.59 0.23xij1, xij2, xij3, zij1, zij2, zij3, zij4 0.37 0.38 0.39 0.15xij1, xij2, xij3, xij6, xij9, zij2, zij3, zij4 0.082xij1, xij2, xij3, xij5, xij7, xij9, zij2, zij3, zij4 0.060xij1, xij3, xij5, zij2, zij3, zij4 0.051xij1, xij2, xij3, xij9, zij2, zij3, zij4 0.031xij1, xij2, xij3, xij6, zij1, zij2, zij3, zij4 0.027xij1, xij2, xij3, xij8, xij9, zij2, zij3, zij4 0.025xij1, xij2, xij3, xij8, zij1, zij2, zij3, zij4 0.023xij1, xij2, xij3, xij8, zij2, zij3, zij4 0.022


Introductionmodel


The shrinkage approaches provide better results than theKDNP.Based on extensive simulations, we find that the KDNP isnot stable and suffers from numerical issues and failseasily.Of all the approaches, GDPP and HSPP perform betterthan the others in variable selection and parameterestimation. GDPP can be implemented in a more efficientGibbs sampling algorithm than HSPP, which usesMetropolis-Hasting algorithm.


Introductionmodel


Outline

1 Introduction

2 model

3 Simulation

4 Application


Introductionmodel


A bioassay study conducted to identify the in vivoestrogenic responses to chemicals.Data consisted of 2681 female rats, experimented by 19participating labs in 8 countries from OECD.2 protocols on immature female rats with administration ofdoses by oral gavage (protocol A) and subcutaneousinjection (protocol B).2 other protocols on adult female rats with oral gavage(protocol C) and subcutaneous injection (protocol D).Each protocol was comprised of 11 groups, each with 6rats. 1 untreated group, 1 control group, 7 groupsadministered with varied doses of ethinyl estradiol (EE)and ZM.


Introductionmodel


Goal: how the uterus weights interacts with the in vivoactivity of chemical compounds under differentexperimental conditions.There is concerns about heterogeneity across differentlabs.Kannol et. al (2001) analyzed the data with simpleparametric methods.The number of possible competing models is over 4,000.


Introductionmodel


Table: Top 10 models with highest posterior probabilities by GDPP,HSPP and NEGP for real data

Model GDPP HSPP NEGPxij1, xij3, xij4, xij5, xij6, zij1, zij3, zij4 0.098 0.093 0.072xij1, xij3, xij5, xij6, zij1, zij2 0.063 0.049 0.097xij1, xij3, xij4, xij5, xij6, zij1, zij2, zij4 0.047 0.039 0.038xij1, xij3, xij4, xij5, xij6, zij1, zij4 0.038 0.023 0.033xij1, xij3, xij4, xij5, xij6, zij1, zij3, zij4, zij6 0.026xij1, xij3, xij4, xij5, xij6, zij1, zij2, zij6 0.023xij1, xij3, xij4, xij5, xij6, zij1, zij2, zij5 0.020 0.018 0.034xij1, xij3, xij4, xij5, xij6, zij1 0.020 0.013 0.046xij1, xij3, xij4, xij5, xij6, zij1, zij2, zij3, zij4 0.020 0.025xij1, xij3, xij4, xij5, xij6, zij1, zij3, zij4, zij5 0.019xij1, xij3, xij4, xij5, xij6, zij1, zij3, zij4, zij6 0.015xij1, xij3, xij5, xij6, zij1, zij3, zij4, zij5 0.032xij1, xij3, xij4, xij5, xij6, zij1, zij2 0.028xij1, xij3, xij4, xij5, xij6, zij1, zij2, zij4, zij5 0.015xij1, xij3, xij5, xij6, zij1, zij3, zij4 0.078xij1, xij3, xij5, xij6, zij1, zij4 0.055xij1, xij3, xij5, xij6, zij1, zij2, zij4 0.046


Introductionmodel


Table: Comparisons of parameter estimates in the real data study

Parameter GDPP HSPP NEGP LMERβ1 3.61a(3.31, 3.76)b 3.60a(3.35, 3.87)b 3.60a(3.28, 3.96)b 3.61a(3.16, 4.10)c

β2 NAd NAd NAd 0.06 (0.02, 0.12)β3 1.21 (1.04, 1.47) 1.18 (1.06, 1.53) 1.22 (0.97, 1.49) 1.20 (1.03, 1.69)β4 1.20 (0.89, 1.64) 1.21 (0.87, 1.70) 1.19 (0.79, 1.68) 1.27 (0.85, 1.64)β5 0.14 (0.13, 0.16) 0.14 (0.13, 0.16) 0.14 (0.13, 0.16) 0.14 (0.13, 0.15)β6 -0.48 (-0.57, -0.35) -0.47 (-0.56, -0.39) -0.47(-0.59, -0.24) -0.49 (-0.57, -0.37)a Meanb 95% credible intervalc 95% confidence intervald Not selected in the model


Introductionmodel


We try to fit the model with KDNP but it fails with too manynumerical issues.Fixed effect of protocol B is not selected for any model, butthe corresponding random effects is, which indicatesheterogeneity.Both fixed and random effects of protocol C and D areselected in the leading models.The fixed effects of EE dose and ZM dose are selected inthe leading models, but seldom for the correspondingrandom effects.


Introductionmodel


References

Chen and Dunson Biometrics, 2003Cai and Dunson Biometrics, 2006Kinney and Dunson Biometrics, 2007Bondell et al. Biometrics, 2010Ibrahim et al. Biometrics, 2010Yang, M, Computational Statistics and Data Analysis, 2012Yang, M, Biometrical Journal, 2013Yang, M, Under review, 2015


Bayesian shrinkage approach in variable selection for...

Documents

Transcript of Bayesian shrinkage approach in variable selection for...