GLMs and extensions in R
-
Upload
ben-bolker -
Category
Documents
-
view
448 -
download
2
description
Transcript of GLMs and extensions in R
Generalized linear models, and extensions, in R
Ben Bolker
Departments of Mathematics & Statistics and Biology, McMaster University
7 January 2011
Ben Bolker (McMaster University) GLMs in R 7 January 2011 1 / 25
1 Introduction
2 Example
3 Challenges, tricks, extensions
4 (Extended examples)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 2 / 25
What are generalized linear models?
Modeling framework to solve two common statistical problems:
Non-normal dataNon-linearity (continuous predictors)
. . . superset of, and often confused with,“general” linear models (i.e. ANOVA/ANCOVA/regression:SAS PROC GLM)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 3 / 25
GLMs: technical details
Constraints:
Distributions from exponential family(Normal, Poisson, binomial, Gamma, inverse Gaussian)Invertible nonlinearities, i.e. there exists a link function that wouldmake the relationship linear(log, logit, probit, inverse, square root, “cauchit”, . . . )
Efficient, stable algorithm: iteratively re-weighted least squares (IRLS)/ Fisher scoring)
standard methods (methods(class="glm")):coef, summary, plot, predict, residuals, vcov, profile,update, confint, simulate, anova, add1/drop1, logLik, AIC, . . .
logistic and Poisson regression probably make up 99% of GLMs . . .
Ben Bolker (McMaster University) GLMs in R 7 January 2011 4 / 25
Google scholar scraping
Ghits
binomial+regression
generalized+linear+model
Poisson+regression
logistic+regression
●
●
●
●
13500
28700
39300
580000
104 104.5 105 105.5 106
Ben Bolker (McMaster University) GLMs in R 7 January 2011 5 / 25
Example: reed frog predation data
Initial density
Fra
ctio
n ki
lled
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
20 40 60 80 100
Vonesh and Bolker (2005):
> library(emdbook)
> data(ReedfrogFuncresp)
> glm1 <- glm(Killed/Initial~
Initial,
weight=Initial,
family=binomial,
data=ReedfrogFuncresp)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 6 / 25
Summary
> summary(glm1)
Call:
glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,
weights = Initial)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.4132 -0.7275 0.4347 1.0120 1.8172
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.094563 0.188952 -0.50 0.61675
Initial -0.008416 0.002697 -3.12 0.00181 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 47.518 on 15 degrees of freedom
Residual deviance: 37.717 on 14 degrees of freedom
AIC: 98.639
Number of Fisher Scoring iterations: 4
Ben Bolker (McMaster University) GLMs in R 7 January 2011 7 / 25
Diagnostics
−0.8 −0.6 −0.4 −0.2
−4
−2
02
Predicted values
Res
idua
ls
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted
11
13 5
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2−
30−
20−
100
1020
Theoretical Quantiles
Std
. dev
ianc
e re
sid.
Normal Q−Q
11
1613
−0.8 −0.6 −0.4 −0.2
01
23
45
Predicted values
Std
. dev
ianc
e re
sid.
●
●●
●
●
●●●●
●
●
●
●●
●
●
Scale−Location11
1613
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
−4
−2
02
Leverage
Std
. Pea
rson
res
id.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Cook's distance
1
0.5
0.5
1
Residuals vs Leverage
16
11
13
diagnostics inheritfrom plot.lm
overdispersion:residual deviance≈ χ2
n−p
(Venables and Ripley,
2002, p. 209):sum(residuals(glm1,
type="pearson")^2)
=34.3:p � 0.05
Ben Bolker (McMaster University) GLMs in R 7 January 2011 8 / 25
Inference
Coefficients: may be hard to communicate (reflect differences on thescale of linear predictor, e.g. logit/log-odds differences)
Wald statistics: beware the Hauck-Donner effect(Venables and Ripley, 2002, p. 198). Wald CI of slope:stats:::confint.lm(glm1) (-0.0142,-0.0026)
Likelihood ratio test, via anova:
> anova(glm1,test="Chisq") ## OR
> glm0 <- update(glm1, . ~ -Initial)
> anova(glm1,glm0,test="Chisq")
Likelihood profiles (via MASS::profile.glm),profile confidence intervals:MASS:::confint.glm(glm1) (-0.0137,-0.0031)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 9 / 25
Estimation issues
Convergence difficulties, especially with non-standard links: setstarting values, center/scale variables (?)
Complete separation: brglm, logistf, arm (bayesglm)
Big data: biglm (bigglm)
Many predictors (penalized regression):glmnet, glmpath, penalized (Machine learning task view)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 10 / 25
Tricks (within GLM framework)
non-standard link functions:
fitting hyperbolic models of predator attack rates (Michaelis-Menten)via binomial/inverse link(http://emdbolker.wikidot.com/voneshglm)exponential survivorship models via binomial/log link (Strong et al.,1999; Tiwari et al., 2006)Gaussian family with log link: fit exponential growth models withconstant variance
subtleties with Gamma GLMs and dispersion parameter:V&R MASS online complements,Paul Johnson’s notes
offsets: variation in sampling area/intensity(e.g. strict proportionality)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 11 / 25
Overdispersion
Quasilikelihood models:
> glmQ <- update(glm1,family="quasibinomial")
> anova(glmQ,test="F")
(φ̂ = 2.45). No likelihood: qAIC requires some contortions
extended GLMs
negative binomial: MASS (glm.nb)beta-binomial:
aod (betabin)gnlm (gnlr)VGAM (vglm)bbmle (mle2)
GLMMs: lognormal-Poisson, logit-normal-binomial
robust estimation (lmtest, sandwich):
> coeftest(glm1,vcov=sandwich)
See also the vignette for the pscl package.
Ben Bolker (McMaster University) GLMs in R 7 January 2011 12 / 25
Extensions
Generalized additive models (Wood, 2006): mgcv, gamlss
Zero-inflated/altered/hurdle models: pscl, VGAM
Beta regression: betareg
Generalized regression models: bbmle, VGAM, gnlm
Random effects (generalized linear mixed models): lme4 and otherpackages (http://glmm.wikidot.com/faq)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 13 / 25
References
Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761.
Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series,326:283–293.
Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S.Springer, New York, 4th edition.
Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591.
Wood, S.N., 2006. Generalized Additive Models: An Introduction with R.Chapman & Hall/CRC.
Ben Bolker (McMaster University) GLMs in R 7 January 2011 14 / 25
Basic ggplot code
> qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+
geom_smooth(method=glm,family=binomial,
aes(weight=Initial,group=NA))
Ben Bolker (McMaster University) GLMs in R 7 January 2011 15 / 25
Confidence intervals on # killed, by hand
> pframe <- data.frame(Initial=1:100)
> pp <- predict(glm1,newdata=pframe,se.fit=TRUE)
> pmat <- with(pp,plogis(cbind(fit,
fit-1.96*se.fit,
fit+1.96*se.fit)))
> par(bty="l",las=1)
> with(ReedfrogFuncresp,plot(Initial,Killed/Initial,
xlim=c(0,100),ylim=c(0,1),
pch=16))
> matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l")
Ben Bolker (McMaster University) GLMs in R 7 January 2011 16 / 25
Prediction intervals
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Initial
Kill
ed/In
itial
●●
●
●
●●●●
●●●●
●●
●●
> simhack <- function(params) {
glmnew <- glm1
glmnew$coefficients <- params
## simulates on PROBABILITY scale
simulate(glmnew)[[1]]
}
> set.seed(101)
> params <- MASS::mvrnorm(1000,mu=coef(glm1),
Sigma=vcov(glm1))
> sims <- apply(params,1,simhack)
> qmat <- t(apply(sims,1,quantile,
c(0.5,0.025,0.975)))
(Constructing the simulatedvalues at Initial densities from1 to 100 is a bit more work —ideally all simulate methodswould have newdata andnewparam arguments . . . )
Ben Bolker (McMaster University) GLMs in R 7 January 2011 17 / 25
Alternative display (display, coefplot from arm
package)
−0.015 −0.010 −0.005 0.000
Initial ●
> display(glm1)
glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,
weights = Initial)
coef.est coef.se
(Intercept) -0.09 0.19
Initial -0.01 0.00
---
n = 16, k = 2
residual deviance = 37.7, null deviance = 47.5 (difference = 9.8)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 18 / 25
Beta-binomial with aod
> library(aod)
> glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial,
random=~1,
data=ReedfrogFuncresp)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 19 / 25
Beta-binomial with bbmle
> library(bbmle)
> glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp),
theta=exp(logtheta),size=Initial),
parameters=list(logitp~Initial),
data=ReedfrogFuncresp,
start=list(logitp=0,logtheta=0))
Ben Bolker (McMaster University) GLMs in R 7 January 2011 20 / 25
Beta-binomial with VGAM
> library(VGAM)
> glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial,
betabinomial,
data=ReedfrogFuncresp)
> coef(glmBB4,matrix=TRUE)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 21 / 25
Beta-binomial with gnlm
> library(gnlm)
> attach(ReedfrogFuncresp) ## no data= argument!
> glmBB2 <- gnlr(cbind(Killed,Initial-Killed),
dist="beta binomial",
pmu=c(0,0),pshape=0,
mu=function(p,linear) plogis(linear),
linear=~Initial)
> detach(ReedfrogFuncresp)
> detach("package:gnlm")
> detach("package:rmutil")
Ben Bolker (McMaster University) GLMs in R 7 January 2011 22 / 25
Logit-normal-Poisson with lme4
> library(lme4)
> ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp)
> glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID),
family=binomial,
data=ReedfrogFuncresp)
> summary(glmLNP)
Ben Bolker (McMaster University) GLMs in R 7 January 2011 23 / 25
Alternate link functions for reed frog data
Initial density
Fra
ctio
n ki
lled
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
20 40 60 80 100
Ben Bolker (McMaster University) GLMs in R 7 January 2011 24 / 25
Comparing overdispersion estimates
initial density effect
mod
el
binomial Wald
binomial profile
q−binom Wald
sandwich
beta−binomial
LN−binomial
●
●
●
●
●
●
−0.015 −0.010 −0.005 0.000
Ben Bolker (McMaster University) GLMs in R 7 January 2011 25 / 25