GeneralizedLinearModels* - Columbia Universitymadigan/W2025/notes/GLM.pdf · 2013. 2. 25. ·...

Generalized Linear Models

based on Zuur et al.

Kuhnert and Venables

Normal Distribu.on example: weights of 1280 sparrows

Sparrows<-‐read.table("/Users/dbm/Documents/W2025/ZuurDataMixedModelling/Sparrows.txt",header=TRUE) par(mfrow = c(2, 2)) hist(Sparrows$wt, nclass = 15, xlab = "Weight", main = "Observed data",xlim=c(10,30)) hist(Sparrows$wt, nclass = 15, xlab = "Weight", main = "Observed data", freq = FALSE,xlim=c(10,30)) Y <-‐ rnorm(1281, mean = mean(Sparrows$wt), sd = sd(Sparrows$wt)) hist(Y, nclass = 15, main = "Simulated data",xlab = "Weight",xlim=c(10,30)) X <-‐seq(from = 0,to = 30,length = 200) Y <-‐ dnorm(X, mean = mean(Sparrows$wt), sd = sd(Sparrows$wt)) plot(X, Y, type = "l", xlab = "Weight", ylab = "Probabli]es", ylim = c(0, 0.25), xlim = c(10, 30), main = "Normal density curve") par(mfrow = c(1, 1))

Observed data

Weight

Frequency

10 15 20 25 30

Observed data

Weight

Density

10 15 20 25 30

Simulated data

Weight

Frequency

10 15 20 25 30

100150200250

10 15 20 25 30

Normal density curve

Weight

Probablities

Poisson Distribu.on

In prac]ce the variance is o`en bigger than the mean -‐> overdispersion

Gamma Distribu.on

Binomial Distribu.on

Natural Exponen.al Family

to get, e.g., Poisson:

Generalized Linear Models

Generalized Linear Models: Poisson regression

x <-‐ 0:100 y <-‐ rpois(101,exp(0.01+0.03*x)) MyData <-‐ data.frame(x,y) MyModel <-‐ glm(y~x,data=MyData,family=poisson) summary(MyModel) coefs <-‐ coef(MyModel) plot(y~x,MyData,ylim=c(0,26)) par(new=TRUE) plot(x,exp(0.01+0.03*x),ylim=c(0,26),type="l") par(new=TRUE) plot(x,exp(coefs[1]+coefs[2]*x),ylim=c(0,26),type="l",col="red")

Es]mates parameters by maximizing the likelihood:

L = µi × e−µi

yi!i∏ , µi = exp(β0 + β1x1 +…+ βd xd )

RoadKills <-‐ read.table("/Users/dbm/Documents/W2025/ZuurDataMixedModelling/RoadKills.txt",header=TRUE) RK <-‐ RoadKills #Saves some space in the code plot(RK$D.PARK, RK$TOT.N, xlab = "Distance to park", ylab = "Road kills") M1 <-‐ glm(TOT.N ~ D.PARK, family=poisson, data=RK) summary(M1) M2 <-‐ glm(TOT.N ~ D.PARK, family=gaussian, data=RK) summary(M2)

Call: glm(formula = TOT.N ~ D.PARK, family = poisson, data = RK) Deviance Residuals: Min 1Q Median 3Q Max -‐8.1100 -‐1.6950 -‐0.4708 1.4206 7.3337 Coefficients: Es]mate Std. Error z value Pr(>|z|) (Intercept) 4.316e+00 4.322e-‐02 99.87 <2e-‐16 *** D.PARK -‐1.059e-‐04 4.387e-‐06 -‐24.13 <2e-‐16 *** -‐-‐-‐ Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1071.4 on 51 degrees of freedom Residual deviance: 390.9 on 50 degrees of freedom AIC: 634.29 Number of Fisher Scoring itera]ons: 4

Residual Deviance: Difference in G2 = -‐2 log L between a maximal model where every observa]on has Poisson parameter equal to the observed value and your model Null Deviance: Difference in G2 = -‐2 log L between a maximal model where every observa]on has Poisson parameter equal to the observed value and the model with just an intercept Some]mes useful to compare residual deviances for two models. Differences are chi-‐square distributed with degrees of freedom equal to the change in the number of parameters

MyData <-‐ data.frame(D.PARK = seq(from = 0, to = 25000, by = 1000)) G <-‐ predict(M1, newdata = MyData, type = "link", se = TRUE) F <-‐ exp(G$fit) FSEUP <-‐ exp(G$fit + 1.96 * G$se.fit) FSELOW <-‐ exp(G$fit -‐ 1.96 * G$se.fit) lines(MyData$D.PARK, F, lty = 1) lines(MyData$D.PARK, FSEUP, lty = 2) lines(MyData$D.PARK, FSELOW, lty = 2)

0 5000 10000 15000 20000 25000

Distance to park

Model Selec.on in Generalized Linear Models

235000

240000

245000

250000

255000

258500

library(AED) library(lawce) xyplot(Y ∼ X, aspect = "iso", col = 1, pch = 16, data = RoadKills)

Histogram of RK$POLIC

RK$POLIC

Frequency

Histogram of RK$WAT.RES

RK$WAT.RES

Frequency

0 2 4 6

Histogram of RK$URBAN

RK$URBAN

Frequency

0 10 20 30

Histogram of RK$OLIVE

RK$OLIVE

Frequency

0 20 40 60

Histogram of RK$L.P.ROAD

RK$L.P.ROAD

Frequency

0.5 1.5 2.5

Histogram of RK$SHRUB

RK$SHRUB

Frequency

0.0 0.5 1.0 1.5

Histogram of RK$D.WAT.COUR

RK$D.WAT.COUR

Frequency

0 400 800

How to select variables for inclusion in the model

•  Hypothesis tes]ng •  Use the z-‐sta]s]c produced by summary •  drop1(M2,test=“Chi”) •  anova(M2,M3,test=“Chi”)

•  AIC, BIC, etc. -‐ use step(M2) etc.

Overdispersion

Recall that in the Poisson model we assume the mean = variance

With no overdispersion, residual mean deviance should be close to 1: 270.23 / 42 = 6.43…not close to 1

Or, consider mean squared residuals:

sum(resid(M2, type=“pearson”)^2) / 42 = 5.93 (suggests standard errors should be increased by sqrt(5.93) = 2.44) Quasi-‐Poisson model accounts for overdispersion:

Overdispersion

AIC not defined for quasi-‐Poisson model Use drop1(M4, test=“F”) is approximately correct or, be~er, do cross-‐valida]on

0 5000 10000 15000 20000 25000

Distance to park

Residuals for Poisson Model

Because variance increases with the mean, standard residual doesn’t make much sense. Can use “Pearson residuals”

For quasi-‐Poisson should also divide by the square root of the overdispersion parameter

Nega.ve Binomial GLM

M7 <-‐ glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, data = RK) plot(M7)

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Predicted values

Residuals

Residuals vs Fitted

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Predicted values

Std. deviance resid.

Scale-Location21

0.00 0.05 0.10 0.15 0.20 0.25

Leverage

Cook's distance

0.5Residuals vs Leverage

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Predicted values

Residuals

Residuals vs Fitted

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Predicted values

Std. deviance resid.

Scale-Location2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Leverage

Cook's distance

Residuals vs Leverage

Contrast with quasi-‐Poisson model M4

What if zero is impossible?

Library(VGAM) data(Snakes) M3A <-‐ vglm(N_days~PDayRain + Tot_Rain + Road_Loc + PDayRain:Tot_Rain, family = posnegbinomial, data = Snakes) M4A <-‐ vglm(N_days~PDayRain + Tot_Rain + Road_Loc + PDayRain:Tot_Rain, family = pospoisson, data = Snakes)

Too many zeroes much more common

try plot(table(rpois(10000,lambda)))

Zero-‐altered Models

Get the es.mates of γ and β via maximum likelihood es.ma.on

library(pscl) f1 <-‐ formula(Intensity~fArea*fYear + Length | fArea * fYear + Length) H1A <-‐ hurdle(f1, dist = "poisson", link = "logit", data = ParasiteCod2) H1B <-‐ hurdle(f1, dist = "negbin", link = "logit", data = ParasiteCod2) AIC(H1A,H1B)

logis]c part count part

Can s]ll have overdispersion in ZIP/ZAP models so check ZINB/ZANB

Zero-‐inflated Models

Generalized*Linear*Models* - Columbia Universitymadigan/W2025/notes/GLM.pdf · 2013. 2. 25. ·...

Documents

Transcript of Generalized*Linear*Models* - Columbia Universitymadigan/W2025/notes/GLM.pdf · 2013. 2. 25. ·...

Parametric inference for diffusion processes observed …survey).pdf · Parametric inference for diffusion processes observed at ... diffusion processes observed at discrete points

TextMining:%An%Overview% - Columbia Universitymadigan/W2025/notes/IntroTextMining.pdf · To: model370email@bigco.com Dear Sir or Madam, My drier made smoke and a big whoooshie noise

OBSERVED COVID-19 FRAUD SCHEMES Title of Observed COVID …

Basic Analysis of Variance and the General Linear Modelata20315/psy420/Basic ANOVA and GLM.pdf · 2007-09-06 · Basic Analysis of Variance and the General Linear Model ... The average

GIA model – GPS observed

A Grief Observed - Freeditorial

OBSERVED DIFFERENCES IN RHYTHM BETWEEN PERFORMANCES …smcnetwork.org/system/files/OBSERVED DIFFERENCES IN RHYTHM B… · OBSERVED DIFFERENCES IN RHYTHM BETWEEN PERFORMANCES OF CLASSICAL

Applied Regression Modeling for Cross-Section Dataimai.princeton.edu/teaching/files/glm.pdf · Applied Regression Modeling for Cross-Section Data Kosuke Imai Princeton University

High-dimensional generalized linear models and the lassopages.stat.wisc.edu/~shao/stat992/lasso-glm.pdf · 2010. 2. 20. · alized linear models. Let Y ∈Y ⊂R be a real-valued

The Prophet observed:

Measuring Observed Race: Preliminary Findings from the Observed Measures Supplement

Observer Observed

RECENTLY OBSERVED CHANGES

Energy Storage Materials - University of Oregonhomework.uoregon.edu/pub/class/es202/Canvas/glm.pdf · resistance, a lower number of air entrapment and a lower viscosity, resulting

Non-Observed Economy

model selection in linear regression - Columbiamadigan/W2025/notes/linear.pdf · model selection in linear regression basic problem: how to choose between competing ... Quadratic

Observed interview fall 2014

Generalized Linear Models: The Big Pictureguerzhoy/303/lec/lec8/glm.pdf · Generalized Linear Models: The Big Picture STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael

A Grief Observed. - samizdat

Linking Indigenous Knowledge and Observed Climate …Linking Indigenous Knowledge and Observed Climate Change Studies ... Key Words: observed impacts, climate change, indigenous knowledge,

GeneralizedLinearModels* - Columbia Universitymadigan/W2025/notes/GLM.pdf · 2013. 2. 25. ·...

Transcript of GeneralizedLinearModels* - Columbia Universitymadigan/W2025/notes/GLM.pdf · 2013. 2. 25. ·...