Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof....
Transcript of Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof....
1
© 2014 Department of Biostatistics & Demography, Faculty of Public Health, Khon Kaen University
Poisson Regression Model
& Others Count
Asst. Prof. Nikom ThanomsiengDepartment of Epidemiology & Biostatistics
Faculty of Public Health, Khon Kaen UniversityEmail: [email protected] Web: http://home.kku.ac.th/nikom
Poisson Regression Model
Poisson Regression Model: Goal
to concentrate on describing the relation between response
(dependent) variable and the predictor variables through the
regression model
estimate incidence rates & ratio (Frome & Checkoway 1985)
applied to estimate hazard rate ratio (Taulbee 1979;
Laird & Olivier 1981; McCullagh and Nelder 2000.)
Poisson Regression Model
Poisson Regression Model: Real Example
-Relationship of asthma management, socioeconomic status, and medication
insurance characteristics to exacerbation frequency in children with asthma
(Wendy J. Ungar, at al. Ann Allergy Asthma Immunol. 2011;106:17–23.)
Poisson Regression Model
Poisson Regression Model: Real Example
Increased mortality in COPD
among construction workers
exposed to inorganic dust.
(Bergdahl, I.A. et al., (2004)
European Respiratory Journal.)
Poisson Regression Model
Poisson Regression Model: Real Example
Ong,K.C.& Lu,S.J.(2005).A Multidimensional Grading System (BODE Index) as Predictor of Hospitalization for COPD. Chest; 128:3810–3816.
Poisson Regression Model
Poisson Regression Model: Generalized Linear Model (GLM)
Component of GLM
Random Component: Poisson Family
Systematic component: categorical or continuous
Link function: Log link log() or “canonical link”
Stata command (glm):
glm [dep] [ind…], family(poisson) link(log) [lnoffset(varname)]
[eform ]
Stata Poisson standard:
poisson [dep] [ind…] , exposure(varname), offset(ln_varname)
[irr]
pp xxx ...)ln( 22110
2
Poisson Regression Model
Poisson Regression Model: Goal
Poisson log linear model
for this model, the mean satisfies the exponential relationship
1-unit increase in x has a multiplicative impact of The mean at
xj+1 equals the mean at x multiplied by
Poisson Regression for Rate
A response count Yihas index t
i(time, space, other index of
size: population at risk, Person-years. Etc.)
Many text Call “Poisson Regression Model”
pp xxμ ...)ln( 110
pp xxpp eeexx )...()...exp( )(
110110
)ln(...)ln(
...)ln()ln(...)/ln(
110
10110
txx
xxtxxt
pp
pppp
ln(ti) is call “offset”
Poisson Regression Model
Poisson Regression Model: Estimated Parameter & Inference
Poisson Regression estimates parameter with ML or IRLS
Newton-Raphson Method
Initialize # Provide initial or starting values for estimatesWHILE (ABS(n-o) > tol & ABS(n-)>tol) {G = L/ # gradient: 1st derivative of log-likelihood wrt H = 2L/2 # Hessian: 2nd derivative of log-likelihood wrt o = nn = o - H-1g # updated maximum likelihood estimatesLo = LnLn # new log-likelihood value
}
Poisson Regression Model
Poisson Regression Model: Estimated Parameter & Inference
Algoritm Iterative Reweight Least Square (IRLS)
Standard GLM estimating algorithm (expected information matrix)Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomial (Poisson)η = g(μ) // linear predictorWHILE (abs( Dev) > tolerance){w = 1 / (Vg’2)z = η + (y - μ)g’ - offsetβ = (X’wX)-1X’wzη = Xββ + offsetμ = g-1(η)Dev0 = DevDev = Deviance function
Dev = Dev - Dev0}
Chi2 = (y - μ)2 / V(μ)AIC = (-2LL + 2p) /n // AIC at times defined w/o nBIC = Dev - (dof)ln(n) // alternative def. exist
Where p = number of model predictors + constn = number of observations in model
dof = degrees of freedom (n - p)
Poisson Regression Model
Poisson Regression Model: Estimated Parameter & Inference
Algoritm Iterative Reweight Least Square (IRLS)Standard GLM estimating algorithm (observed information matrix)
Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomialη = g(μ) // g; linear predictorWHILE (abs(Dev) > tolerance) {V = V(μ)V’ = 1st derivative of Vg’ = 1st derivative of gg” = 2nd derivative of gw = 1/(Vg’2)z = η + (y - μ)g’ - offsetWo = w + (y - μ)(Vg” + V’g’)/(V2g’3)β = (X’WoX)−1X’Wozη = X’β + offsetμ = g-1(η)Dev0 = DevDev = Deviance functionDev = Dev - Dev0
}Chi2 = (y - μ)2/V(μ)AIC = (-2LL + 2p)/nBIC = -2LL + ln(n)*k // original ver: Dev-(dof)ln(n)Where p = number of model predictors + constk = # predictors : dof = degrees of freedom (n - p)n = number of observations in model
Poisson Regression Model
Poisson Regression Model: Estimated Parameter & Inference
Poisson Regression estimates parameter with ML or IRLS
to test hypothesis
Inference about Model parameters
Wald Statistics:
95%CI
Likelihood Ratio Statistics:
0:0 iH
SEZ i /SEZi 2/
)[2)]ln[)[ln(2]/ln(2 101010 LLLR
Poisson Regression Model
Poisson Regression Model: Interpretation
Poisson Coefficient
The response has a log-count increase of for a one-unit increase
in the value of the predictor. Likewise, the response has a log-
count decrease of for a one-unit decrease in the value of the
predictor. Other predictors are held at their mean value.
Rate Ratio
–Incidence Rate Ratio (IRR) the ratio of the rate of counts between
two ascending contiguos levels of response
- Exponentiate the coefficients (ei)
)exp()...)(exp(
)...)1(exp()( 1
110
110
ppi
ppii xx
xxxIRR
3
Poisson Regression Model
Poisson Regression Model: Poisson log Linear Model , Example
ตวอยาง BODE Index (body mass index, airflow obstruction,
dyspnea, and exercise capacity) as Predictor of
Hospitalization for COPD (Simulate DATA)
5
5
5
3
4
bode
4
7
1
5
6
bodeid y gender id y gender
1 12 1 6 9 1
2 9 0 7 5 0
3 4 0 8 10 1
4 13 1 9 6 1
5 6 0 10 6 1
Poisson Regression Model
Poisson Regression Model: Stata Example: GLM
. glm y bode gender,fam(poisson) link(log)Iteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7
Scale parameter = 1Deviance = 2.907852367 (1/df) Deviance = .4154075Pearson = 2.798074022 (1/df) Pearson = .3997249Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024------------------------------------------------------------------------------
| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457
gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Stata Example: GLM
. glm y bode gender,fam(poisson) link(log) efIteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7
Scale parameter = 1Deviance = 2.907852367 (1/df) Deviance = .4154075Pearson = 2.798074022 (1/df) Pearson = .3997249Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024
------------------------------------------------------------------------------| OIM
y | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
bode | 1.224991 .1241964 2.00 0.045 1.004231 1.494282gender | 1.043919 .3220596 0.14 0.889 .5702461 1.911046_cons | 2.972695 1.258645 2.57 0.010 1.296433 6.816334
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Inference & Interpreted
. glm y bode gender,fam(poisson) link(log)…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024------------------------------------------------------------------------------
| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457
gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322
------------------------------------------------------------------------------
This provides a positive association of BOLD Index & Hospitalization for
COPD
0:0 iH 045.;00.21014.0/2029./ valuepSEZ i
402.0,0042.0%95 2/ SEZCI i
Poisson Regression Model
Poisson Regression Model: coefficient & rate ratio Interpretation
Poisson Coefficient:
bode: For each one score increase in BODE; there is an increase in
expected log-number of hospitalization of 0.203, holding outwork
at its mean.
gender: Female increase the log-number of hospitalization by
0.043 compared with male, holding BODE at its mean.
Rate Ratio
–Male patients had 1.04 times more hospitalization than women,
age is held constant.
-For each one score increase in BODE; there is 22.50% an
increase hospitalization, holding gender is constant.
Poisson Regression Model
Basic of Incidence Rate Ratio: Person-time
ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)
การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ
person-year (ในการศกษา Framingham heart study)
ชาย หญง รวม
โรคหวใจ coronary 823 650 1473
Person-year 42688 61773 104461
4
Poisson Regression Model
Basic of Incidence Rate Ratio: Person-time
ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)
การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ
person-year (ในการศกษา Framingham heart study)
ชาย หญง รวม
โรคหวใจ coronary 823 (n11
) 650 (n12
) 1473
Person-year 42688 (n1) 61773 (n
2) 104461
i
iji n
nirrateincidence )( 1
12
11)(ir
irirrratiorateincidence
Poisson Regression Model
Basic of Incidence Rate Ratio: Person-time
ชาย หญง รวม
โรคหวใจ coronary 823 (n11
) 650 (n12
) 1473
Person-year 42688 (n1) 61773 (n
2) 104461
i
iji n
nirrateincidence )( 1
12
11)(ir
irirrratiorateincidence
0105224.61773
650
0192794.42688
823
12
11
ir
ir
832227.10105224.
0192794.
irr
IRR = 1.83 หมายถง “ผชายมอตราการเกดโรค coronary heart disease
มากกวาผหญง 1.83 เทา”
Poisson Regression Model
Basic of Incidence Rate Ratio: Person-time
]96.1exp[%95 2)log(irri sirrci
1211
2)log(
11
nns
iirr
. iri 823 650 42688 61773| Exposed Unexposed | Total
-----------------+------------------------+------------Cases | 823 650 | 1473
Person-time | 42688 61773 | 104461-----------------+------------------------+------------
| |Incidence Rate | .0192794 .0105224 | .014101
| || Point estimate | [95% Conf. Interval]|------------------------+------------------------
Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |
+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)
Poisson Regression Model
Basic of Incidence Rate Ratio: Person-time
. clear
. input male chd per_yrsmale chd per_yrs
1. 0 650 617732. 1 823 426883. end
. ir chd male per_yrs| male || Exposed Unexposed | Total
-----------------+------------------------+------------chd | 823 650 | 1473
per_yrs | 42688 61773 | 104461-----------------+------------------------+------------
| |Incidence rate | .0192794 .0105224 | .014101
| || Point estimate | [95% Conf. Interval]|------------------------+------------------------
Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |
+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)
Poisson Regression Model
Incidence Rate Ratio: interpretation
ถา X เปนตวแปรตอเนองเชน อาย (ป)
IRR = 0.95 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด
เหตการณลดลง 5%”
IRR = 1.05 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด
เหตการณเพมขน 5%”
IRR = 2.05 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด
เหตการณ คดเปน 2.05 เทาของคากอนหนาน
หรอ อตราหรอโอกาสเสยงเปนผลคณของคา 2.05”
ตวแปรตอเนอง เชน อาย, systolic BP ฯลฯ คาทเพมขน 1 ป 1 (mmHg)หรอลดลง 1 ป (mmHg) นอยเกนไป ไมนาสนใจทาง อาจใช 5, 10 ป
***ตวแปรตอเนอง x มคา 0-1 คาทเพมขน 1 หนวยหรอลดลง 1 หนวย
มากเกนไป อาจใชคา 0.01
Poisson Regression Model
Incidence Rate Ratio: interpretation
ถา X เปนตวแปรกลมเชน เพศ (1=ผชาย 0= ผหญง)
IRR = 0.95 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ
นอยกวาผหญง 5%"
IRR = 1.05 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ
มากกวาผหญง 5%“
IRR = 2.05 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ
มากกวาผหญงเปน 2.05 เทา"
5
Poisson Regression Model
Poisson Regression for Rate: GLM with Stata
. glm chd male, family(poisson) link(log) lnoffset(per_yrs)Iteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708
Generalized linear models No. of obs = 2Optimization : ML Residual df = 0
Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------
| OIMchd | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797
_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373per_yrs | (exposure)
------------------------------------------------------------------------------
. glm chd male, family(poisson) link(log) lnoffset(per_yrs)eformIteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708Generalized linear models No. of obs = 2Optimization : ML Residual df = 0
Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .
Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------
| OIMchd | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698
per_yrs | (exposure)------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression for Rate: IRR (GLM Stata)
Poisson Regression Model
Poisson Regression for Rate: poisson (Stata)
. poisson chd male, exposure( per_yrs)
Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708
Poisson regression Number of obs = 2LR chi2(1) = 134.30Prob > chi2 = 0.0000
Log likelihood = -8.4330708 Pseudo R2 = 0.8884
------------------------------------------------------------------------------chd | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797
_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression for Rate: IRR poisson (Stata)
. poisson chd male, exposure( per_yrs) irr
Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708
Poisson regression Number of obs = 2LR chi2(1) = 134.30Prob > chi2 = 0.0000
Log likelihood = -8.4330708 Pseudo R2 = 0.8884
------------------------------------------------------------------------------chd | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698
_cons | .0105224 .0004127 -116.11 0.000 .0097438 .0113632ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression for Rate: Inference & Interpret
เพศมความสมพนธกบการเกดโรคหวใจ อยางมนยสาคญทางสถต (Z=11.54;
p-value<0.001)
กรณขอมลกลม (0=female,1=male)
แปลความหมายในรป Incidence Rate Ratio
อตราการเกดโรคหวใจ CHD ในผชายสงกวาผหญง เทากบ
exp(.60553236) = 1.832273 เทา หรอ
-ผชายมโอกาสเสยงตอการเกดโรคหวใจ CHD สงกวาผหญง 1.83 เทา
IRR.)(.)(β 832227160553236expexp 1
0:0 iH
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable
ตวอยาง การสบบหรและการเกดมะเรงปอด ( lung cancer)
(From 1983)
id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7
/*Data Input (Stata)/*clearinput id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7end
6
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)
. glm calung smk, family(poisson) link(log) lnoffset(pyear)Iteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5
Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.01773Log likelihood = -12.06205596 BIC = -2.851167------------------------------------------------------------------------------
| OIMcalung | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076
_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209pyear | (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)
. glm calung smk, family(poisson) link(log) lnoffset(pyear)eformIteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5
Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177
Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.01773Log likelihood = -12.06205596 BIC = -2.851167
------------------------------------------------------------------------------| OIM
calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795pyear | (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)
. poisson calung smk, exposure(pyear)
Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056
Poisson regression Number of obs = 7LR chi2(1) = 24.02Prob > chi2 = 0.0000
Log likelihood = -12.062056 Pseudo R2 = 0.4990
------------------------------------------------------------------------------calung | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076
_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209ln(pyear) | 1 (exposure)
------------------------------------------------------------------------------
.
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)
. poisson calung smk, exposure(pyear) irr
Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056
Poisson regression Number of obs = 7LR chi2(1) = 24.02Prob > chi2 = 0.0000
Log likelihood = -12.062056 Pseudo R2 = 0.4990
------------------------------------------------------------------------------calung | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795
_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436ln(pyear) | 1 (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)
. poisson calung smk, exposure(pyear)---omit---
. poisson calung smk, exposure(pyear) irr---omit---------------------------------------------------------------------------------
calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436
ln(pyear) | 1 (exposure)------------------------------------------------------------------------------
. lincom 20*smk,irr( 1) 20*[calung]smk = 0------------------------------------------------------------------------------
calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 4.443348 1.383159 4.79 0.000 2.414026 8.178595------------------------------------------------------------------------------
เมอสบบหรเพมขน 1 มวน/วน อตราการเกดมะเรงปอด เพมขน
เทากบ exp(.0745704) =1.077421 เทา
(ถาสบบหร 20 มวน/วน อตราการมะเรงปอดเพมขนเทากบ
exp(20x0.0745704) = 4.443348 เทา )
. qui poisson calung smk, exposure(pyear) irr
. listcoef,percent
poisson (N=7): Percentage Change in Expected Count Observed SD: 4.2706083
----------------------------------------------------------------------calung | b z P>|z| % %StdX SDofX
-------------+--------------------------------------------------------smk | 0.07457 4.791 0.000 7.7 180.9 13.8508
Poisson Regression Model
Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)
แปลผลลพธในรป %
การสบบหรเพมขน 1 มวน/วน มโอกาสเกดมะเรงปอดเพมขนเทากบ
7.74%
7
Poisson Regression Model
Poisson Regression Model: Multiple Poisson Regression
ppxxx ...)ln( 22110
)ln(...)ln( 22110 txxx pp
offset
ตวแปร explanatory เปนตวแปร categorical หรอ continuous ทมมากกวา
1 ตวแปร
Poisson regression for rate
Poisson regression for count
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
Data are from the Canadian National Cardiovascular Disease registry
called, FASTRAK. Years covered at 1996-1998. (Hilbe, 2011)
died: number died from MI
cases: number of cases with same covariate pattern
Anterior: 1=anterior site MI; 0=inferior site MI
hcabg: 1=history of CABG; 0=no history of CABG
age75: 1= Age>75; 0=Age<=75
killip: Killip level of cardiac event severity (1-4)
kk1(1/0) non-symptomatic; stress; tightness left shoulder; not MI
kk2(1/0) moderate severity cardiac event; angina
kk3(1/0) Severe cardiac event; severe chest pains
kk4(1/0) Severe cardiac event; death
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
Data 15 observations on the following 9 variables.
+-----------------------------------------------------------------+
| die cases anterior hcabg killip kk1 kk2 kk3 kk4 ||-----------------------------------------------------------------|
1. | 5 19 0 0 4 0 0 0 1 |2. | 10 83 0 0 3 0 0 1 0 |3. | 15 412 0 0 2 0 1 0 0 |4. | 28 1864 0 0 1 1 0 0 0 |5. | 1 1 0 1 4 0 0 0 1 |
|-----------------------------------------------------------------|6. | 0 3 0 1 3 0 0 1 0 |7. | 1 18 0 1 2 0 1 0 0 |8. | 2 70 0 1 1 1 0 0 0 |9. | 10 28 1 0 4 0 0 0 1 |10. | 9 139 1 0 3 0 0 1 0 |
|-----------------------------------------------------------------|11. | 39 443 1 0 2 0 1 0 0 |12. | 50 1374 1 0 1 1 0 0 0 |13. | 1 6 1 1 3 0 0 1 0 |14. | 3 16 1 1 2 0 1 0 0 |15. | 2 27 1 1 1 1 0 0 0 |
+-----------------------------------------------------------------+
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log)lnoffset(cases) nolog
i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9
Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168
hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog
i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9
Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879
Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------
| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828
hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096
_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
Inference and model checking
เมอ fit Poisson regression model
จากตวอยาง สมการ Poisson regression model ไดแก
การทดสอบสมมตฐานตวแปร explanatory มความสมพนธ
กบตวแปร response ไดแก
ix 0)ln(
killip_4)2.51264(_I+
)Ikillip_31.113287(_Ikillip_2).9020431(_+
(hcabg) .6613804+ nterior).6748639(a-4.06977)ˆln(
i
0: ioH
8
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
Inference and model checking
-การทดสอบใชสถต Ward test
-หรอ
-ชวงเชอมน
)1,0(~ˆˆ
0 NASEASE
z
21
2
2 ~ˆ
dfASE
z
ASEz 2/ˆ%100)1(
0: ioH
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog
i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9
Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168
hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
ASEztestWald
;
0: oH
ASEz 2/ˆ%100)1(
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
An anterior site heart attack, a history of having a CABG
procedure, killip 2-4 status are significantly associated with
number died from MI.
0: ioH
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog
…------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168
hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog
…------------------------------------------------------------------------------
| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828
hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096
_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
Patients having an anterior site heart attack are twice (1.96) as likely
to die than if the damage was to another area of the heart.
Patients with a history of having a CABG procedure are twice (1.94)
as likely to die than if they did not have such a procedure.
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog
------------------------------------------------------------------------------| OIM
die | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546
_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096
_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
Patients having a killip 2 status are two-and-a-half times (2.47) more
likely to die than if they have level 1 killip level status
(no perceived problem). Those at level 3 are 3 times (3.04) more
likely to die, and those at level 4, which is experiencing a massive
heart attack, are 12 times (12.34) more likely to die than those with
no apparent heart problems.
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)
. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------
die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164
_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140
----------------------------------------------------------------------
Patients having an anterior site heart attack are 96.4% as likely to die
than if the damage was to another area of the heart.
Patients with a history of having a CABG procedure are 93.7%e
as likely to die than if they did not have such a procedure
9
Poisson Regression Model
Poisson Regression Model: Real Data (Multiple Poisson Regression)
. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)
. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------
die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164
_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140
----------------------------------------------------------------------
Patients having a killip 2 status are 146.5% more likely to die than if
they have level 1 killip level status (no perceived problem). Those
at level 3 are 204.4% more likely to die, and those at level 4,
which is experiencing a massive heart attack, are 1133% more likely
to die than those with no apparent heart problems.
Poisson Regression Model
Poisson Regression Model: Basic Poisson Assumptions
Basic Poisson Assumptions (Hilbe, 2014)
1. The distribution is discrete with a single parameter, the mean, which is
usually symbolized as either (lambda) or (mu). The mean is also
understood as a rate parameter. It is the expected number of times that an
item or event occurs per unit of time, area, or volume.
2. The response terms, or y values, are nonnegative integers; i.e., the
distribution allows for the possibility of counts where Y 0.
3. Observations are independent of one another.
Poisson Regression Model
Poisson Regression Model: Basic Poisson Assumptions
4. No cell of observed counts has substantially more or less than what is
expected based on the mean of the empirical distribution. For example,
the data should not have more zero counts than is expected based on
a Poisson distribution with a given mean. As the value of increases,
the probability of zero (0) counts is reduced.
5. The mean and variance of the model are identical, or at least nearly
the same; i.e., Poisson distributions with higher mean values have
correspondingly greater variability.
6. The Pearson Chi2
dispersion statistic has a value approximating 1.0.
A value of 1.0 results when the observed and predicted variances of
the response are the same.
Poisson Regression Model
Poisson Regression Model: Basics of Count Model Fit Statistics
goodness of fit test (GOF)
Deviance Statistics
Pearson GOF
H0: The Model fits the data
N
i i
iiy
1
22
ˆ)ˆ(
N
iiii yyG
1
2 )ˆ/log(2
Poisson Regression Model
Poisson Regression Model: Basics of Count Model Fit Statistics
gg
. qui xi:poisson die anterior hcabg ,exposure(cases)
. estat gofDeviance goodness-of-fit = 84.94489Prob > chi2(12) = 0.0000
Pearson goodness-of-fit = 170.7135Prob > chi2(12) = 0.0000
. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)
. estat gofDeviance goodness-of-fit = 10.932Prob > chi2(9) = 0.2804
Pearson goodness-of-fit = 12.60791Prob > chi2(9) = 0.1812
. qui xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog
. gofDeviance Goodness-of-fit chi2 = 84.94484
Prob > chi2(12) = 0.00000
Pearson Goodness-of-fit chi2 = 170.71347Prob > chi2(12) = 0.00000
. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog
. gofDeviance Goodness-of-fit chi2 = 10.93196
Prob > chi2(9) = 0.28040
Pearson Goodness-of-fit chi2 = 12.60791Prob > chi2(9) = 0.18117
Poisson Regression Model
Poisson Regression Model: Model Selection AIC, BIC
เกณฑสารสนเทศอะกะอเกะ (Akaike information criterion: AIC)
p=จานวน predictor; n=จานวนคาสงเกต, L(Mk)=log likelihood ของโมเดล k
คา AIC คานอยแสดงวา better fit model
n
pML k 2)(2AIC
----------------------------------------------Difference between Decision Models A and B if A < B---------------------------------------------->0.0 & >= 2.5 No difference in models>2.5 & >= 6.0 Prefer A if n > 256>6.0 & >= 9.0 Prefer A if n > 64>9.0 Prefer A----------------------------------------------
การแปลความหมายคา AIC (Hilbe, 2009)
10
Poisson Regression Model
Poisson Regression Model: Model Selection AIC, BIC
เกณฑสารสนเทศของเบส (Bayesian information criterion: BIC)
D(Mk) = deviance ของโมเดล k
|difference| Degree of preference-----------------------------------------------------
0-2 Weak2-6 Positive6-10 Strong >10 Very strong
------------------------------
การแปลความหมายคา BIC (Raftery,1996)
)ln()(2 ndfMLBIC k )ln()()( ndfMDBIC k
การเปรยบเทยบ 2 โมเดล
(A & B)
ถา BICA-BIC
B< 0
เลอกโมเดล A
ถา BICA-BIC
B> 0
เลอกโมเดล B
Poisson Regression Model
Poisson Regression Model: Analysis of fit
. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 9.466972Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671
hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553
ln(cases) | 1 (exposure)------------------------------------------------------------------------------
A
. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog
…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168
hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
B
Poisson Regression Model
Poisson Regression Model: Analysis of fit
65.88873
)(-13.44049-52.44824BIC Difference
8887.65)(-13.44049-2.44824BIC Difference
BA
BA
BICBIC
StrongVery
BICBIC
การเปรยบเทยบ 2 โมเดล, A & B : BICA-BIC
B> 0 เลอกโมเดล B
4.5341924.93278-9.466972AIC Difference BA AICAIC
AICA
> BICB
----> เลอกโมเดล B
การเปรยบเทยบ 2 โมเดล: Model A & Model B
Poisson Regression Model
Poisson Regression Model: Analysis of fit
การเปรยบเทยบ 2 โมเดล: Likelihood Ratio Test
)(2Test Ratio Likelihood FR LL
LR
= Log Likelihood for Reduce Model
LF
= Log Likelihood for Full Model
74.012884
752)](-30.99584-61-68.002289[2
)(2Test Ratio Likelihood
FR LL
. di -2*(-68.00228961-(-30.99584752))74.012884
Poisson Regression Model
Poisson Regression Model: Analysis of fit
. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog
...AIC = 9.466972
Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671
hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553
ln(cases) | 1 (exposure)------------------------------------------------------------------------------. est store A. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases)nolog
...AIC = 4.93278
Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168
hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------. lrtest A
Likelihood-ratio test LR chi2(3) = 74.01(Assumption: A nested in .) Prob > chi2 = 0.0000
Poisson Regression Model
Poisson Regression Model: Pseudo R2
0
22 1'L
LRRPseudosMcFadden pmf
. xi:poisson die anterior hcabg i.killip ,exposure(cases)i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)...Log likelihood = -30.995848 Pseudo R2 = 0.6294------------------------------------------------------------------------------
die | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702
_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266
_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)
------------------------------------------------------------------------------
. fitstatMeasures of Fit for poisson of dieLog-Lik Intercept Only: -83.646 Log-Lik Full Model: -30.996D(9): 61.992 LR(5): 105.300
Prob > LR: 0.000McFadden's R2: 0.629 McFadden's Adj R2: 0.558Maximum Likelihood R2: 0.999 Cragg & Uhler's R2: 0.999AIC: 4.933 AIC*n: 73.992BIC: 37.619 BIC': -91.760
11
Poisson Regression Model
Poisson Regression Model: Count Model Residual: Pearson, etc
n
i VarianceiyRPearson
1
)ˆ(2
Poisson Regression Model
Poisson Regression Model: link test
เมอคาทานายเชงเสนยกกาลงสอง มนยสาคญทางสถต แสดงวา
การระบฟงกชนเชอมโยงไมเหมาะสม และอาจหมายถงการกาหนด
องคประกอบเชงระบบ หรอการกาหนดตวแปรอธบายไมเหมาะสม
-วเคราะหสมการถดถอยใดๆ ระหวางตวแปรตอบสนองกบ
ตวแปรอธบาย ไดแกคาทานายเชงเสน (linear prediction) และ
คาทานายเชงเสนยกกาลงสอง
y = f(X )
x = เมตรกซคาทานายเชงเสนและคาทานายเชงเสนยกกาลงสอง
y = response variable
Poisson Regression Model
Poisson Regression Model: link test
. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog
. linktest, family(poisson) link(log)…
Generalized linear models No. of obs = 15Optimization : ML Residual df = 12
Scale parameter = 1Deviance = 10.14389538 (1/df) Deviance = .8453246Pearson = 11.29756699 (1/df) Pearson = .9414639Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.480242Log likelihood = -30.60181564 BIC = -22.35271------------------------------------------------------------------------------
| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------_hat | .7361077 .2874108 2.56 0.010 .1727929 1.299422
_hatsq | .0513088 .0621576 0.83 0.409 -.0705178 .1731354_cons | .2809098 .3284797 0.86 0.392 -.3628985 .9247181
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Why is overdispersion a problem?
Overdispersion may cause standard errors of the estimates to
be deflated or underestimated.
a variable may appear to be a significant predictor when it is
in fact not significant.
How is overdispersion recognized?
A model may be overdispersed
if the value of the Pearson
or Deviance statistic divided by the degrees of freedom (n-p)
is greater than 1.0.
The quotient of either is called the dispersion.
n
i
iypnPhi
1
21
ˆ)ˆ(
)()(
dfPhi Pearson /)( 2
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
How is overdispersion recognized?
Small amounts of overdispersion are of little concern;
however, if the dispersion statistic is greater than 1.25
for moderate sized models, then a correction may be
warranted. Models with large numbers of observations may be
overdispersed with a dispersion statistic of 1.05.
if overdispersion is grater than 2.0, then adjustedment to SE
may be required
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
What is apparent overdispersion; how may it be corrected?
Apparent overdispersion occurs when:
(a) the model omits important explanatory predictors;
(b) the data include outliers;
(c) the model fails to include a sufficient number of interaction terms;
(d) a predictor needs to be transformed to another scale;
(e) the assumed linear relationship between the response and the
link function and predictors is mistaken, i.e. the link is
misspecified.
12
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Why is overdispersion a problem?
Overdispersion may cause standard errors of the estimates to
be deflated or underestimated.
a variable may appear to be a significant predictor when it is
in fact not significant.
How is overdispersion recognized?
A model may be overdispersed if the value of the Pearson
or Deviance statistic divided by the degrees of freedom (n-p)
is greater than 1.0.
The quotient of either is called the dispersion.
Poisson Regression Model
Poisson Regression Model: Statistical for Testing Overdispersion
Score test (Regression Base test)
Lagrange Multiplier test
Likelihood Ratio Test
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Score test:
Obtain the fited value Calculate
Regress Z as a constant-only model
The test of the hypothesis
)2()(:
)1()(:)]([)()(:
:)()(:
2.2
.2
.200
NBH
NBHoryEgyEyVarH
HoryEyVarH
A
AiiiA
ii
2ˆ
)ˆ( 2
i
iii yyz
/*Stata code*/
glm dep [ind…], family(poisson) ///link(log) eform nolog noheader
predict double mu, mugenerate z=((y-mu)^2-y)/(mu*sqrt(2))regress z
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Example: data consist of 1991 Arizona Medicare in-patient (hospital)
data collected for a particular disease.
Response: los length of stay
Predictors:
hmo 1=member of a Health Maintenance Organization (HMO);
0=private pay
race 1=identifies as Caucasian (white); 0=other
type 1=elective admission (reference level)
2=urgent admission
3=emergency admission
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
. clear
. use "J:\516707_2559\data\medpar.dta", clear
. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------
| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806
white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778
_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. generate z=((los-mu)^2-los)/(mu*sqrt(2)). regress z
Source | SS df MS Number of obs = 1,495-------------+---------------------------------- F(0, 1494) = 0.00
Model | 0 0 . Prob > F = .Residual | 348013.947 1,494 232.941062 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = 0.0000Total | 348013.947 1,494 232.941062 Root MSE = 15.262
------------------------------------------------------------------------------z | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------_cons | 3.704561 .3947321 9.39 0.000 2.930273 4.478849
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Lagrange Multiplier test:
n
ii
n
iii yn
LM
1
2
2
1
22
2
)
/*Stata code*/
glm dep [ind…], family(poisson) link(log) eform nologpredict double mu, musum los, meanonlyscalar nybar=r(sum)gen double musq = mu*musum musq, meanonlyscalar mu2=r(sum)scalar chi2=(mu2-nybar)^2/(2*mu2)display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///
as res %8.5f chiprob(1,chi2)
13
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
. clear
. use "J:\516707_2559\data\medpar.dta", clear
. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------
| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806
white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778
_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. sum los, meanonly. scalar nybar=r(sum). gen double musq = mu*mu. sum musq ,meanonly. scalar mu2=r(sum). scalar chi2=(mu2-nybar)^2/(2*mu2). display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///> as res %8.5f chiprob(1,chi2)LM-Test =62987.844P-Value = 0.00000
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
Likelihood ratio Statistics
Poisson Regression VS Negative Binomial Regression
/*Stata code*/
clearglm dep [ind…], family(poisson) link(log) eform nologuse “…", clearxi:nbreg dep [ind…]scalar llnb=e(ll)xi:poisson dep [ind…] ,[exposure]scalar llp=e(ll)scalar LR = 2*(llnb-llp)di "LR = " LRdi "P-value = " as res %8.5f chi2tail(1, LR)
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
. clear
. use "J:\516707_2559\data\medpar.dta", clear
. xi:nbreg los hmo race i.typeFitting Poisson model:Iteration 0: log likelihood = -6929.2112 …Iteration 3: log likelihood = -4797.4766 Negative binomial regression Number of obs = 1,495
LR chi2(4) = 118.03Dispersion = mean Prob > chi2 = 0.0000Log likelihood = -4797.4766 Pseudo R2 = 0.0122------------------------------------------------------------------------------
los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351white | -.1290654 .0685418 -1.88 0.060 -.2634049 .005274
_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731
_cons | 2.310279 .0679474 34.00 0.000 2.177105 2.443453-------------+----------------------------------------------------------------
/lnalpha | -.807982 .0444542 -.8951107 -.7208533-------------+----------------------------------------------------------------
alpha | .4457567 .0198158 .4085624 .4863371------------------------------------------------------------------------------LR test of alpha=0: chibar2(01) = 4262.86 Prob >= chibar2 = 0.000
. scalar llnb=e(ll)
Poisson Regression Model
Poisson Regression Model: Testing Overdispersion
. xi:poisson los hmo race i.type…Iteration 0: log likelihood = -6929.2112 …Log likelihood = -6928.9078 Pseudo R2 = 0.0519------------------------------------------------------------------------------
los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462white | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143
_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022
_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------
. scalar llp=e(ll)
. scalar LR = 2*(llnb-llp)
. di "LR = " LRLR = 4262.8624
. di "P-value = " as res %8.5f chi2tail(1, LR)P-value = 0.00000
Poisson Regression Model
Poisson Regression Model: Handling Overdispertsions
Scaling Standard Errors: Quasi-count Models
Quasi-likelihood Models
Sandwich or Robust Variance Estimators*
Bootstrapped Standard Errors*
Negative Binomial (Next…)
SEdfSE Pearsonadj /2
dfSESE Pearsonadj // 2
Poisson Regression Model
Poisson Regression Model: Scaling Standard Error
Standard Model
. xi:glm los hmo race i.type,family(poisson) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------
| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462race | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143
_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022
_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------
dfPearson /2
Model Standard Error
14
Poisson Regression Model
Poisson Regression Model: Scaling Standard Error
xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------
| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393
_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647
_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)
SEdfSE Pearsonadj /2
Poisson Regression Model
Poisson Regression Model: Scaling Standard Error
xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]------------------------------------------------------------------------------
| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393
_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647
_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)
0599097.023944.260391.6/)( 2 SEdfhmoSE Pearsonadj
-quick & dirty method
-useful for models with little to moderate overdispersion
Poisson Regression Model
Poisson Regression Model: Quasi likelihood Poisson Standard Error
. xi:glm los hmo race i.type,family(poisson) nolog irls disp(6.260391)Generalized linear models No. of obs = 1,495Optimization : MQL Fisher scoring Residual df = 1,490
(IRLS EIM) Scale parameter = 6.260391Deviance = 1300.664128 (1/df) Deviance = .8729289Pearson = 1490.00008 (1/df) Pearson = 1Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Quasi-likelihood model with dispersion: 6.260391 BIC = -9591.059------------------------------------------------------------------------------
| EIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .0095696 -7.48 0.000 -.0903054 -.0527932race | -.153871 .010956 -14.04 0.000 -.1753444 -.1323977
_Itype_2 | .2216518 .0084138 26.34 0.000 .2051611 .2381424_Itype_3 | .7094767 .0104457 67.92 0.000 .6890035 .7299499
_cons | 2.332933 .0108742 214.54 0.000 2.31162 2.354246------------------------------------------------------------------------------
0095696.260391.6/023944.//)( 2 dfSEhmoSE Pearsonadj
-SE are not based on a correct model-base Hessian matri
Poisson Regression Model
Poisson Regression Model: Sandwich or Robust Variance Estimators
. xi:glm los hmo race i.type,family(poisson) vce(robust) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 9.276131Log pseudolikelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------
| Robustlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .0517323 -1.38 0.167 -.1729427 .0298441race | -.153871 .0833013 -1.85 0.065 -.3171386 .0093965
_Itype_2 | .2216518 .0528824 4.19 0.000 .1180042 .3252993_Itype_3 | .7094767 .1158289 6.13 0.000 .4824562 .9364972
_cons | 2.332933 .0787856 29.61 0.000 2.178516 2.48735------------------------------------------------------------------------------
. bootstrap ,reps(1000) :glm los hmo race type2 type3 ,family(poisson)(running glm on estimation sample)Bootstrap replications (1000)----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50.................................................. 1000Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------
| Observed Bootstrap Normal-basedlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0715493 .053066 -1.35 0.178 -.1755567 .0324581race | -.153871 .0827678 -1.86 0.063 -.3160929 .0083508
type2 | .2216518 .0522548 4.24 0.000 .1192341 .3240694type3 | .7094767 .1166441 6.08 0.000 .4808585 .9380949_cons | 2.332933 .0799406 29.18 0.000 2.176252 2.489614
------------------------------------------------------------------------------
Poisson Regression Model
Poisson Regression Model: Bootstrap Standard Error
If the values of bootstrapped or robust standard errors differ
substantially from model standard errors, this is evidence
that the count model is extradispersed.
Use the bootstrapped or robust standard errors for reporting
your model,
but check for reasons why the data are overdispersed
and identify an appropriate model to estimate parameters.
Poisson Regression Model
Poisson Regression Model: Bootstrap Standard Error
HIlbe (2014, p-106)
15
. xi:glm los hmo race i.type,family(nb ml) nologi.type _Itype_1-3 (naturally coded; _Itype_1 omitted)Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490
Scale parameter = 1Deviance = 1568.14286 (1/df) Deviance = 1.052445Pearson = 1624.538251 (1/df) Pearson = 1.090294Variance function: V(u) = u+(.4458)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]
AIC = 6.424718Log likelihood = -4797.476603 BIC = -9323.581------------------------------------------------------------------------------
| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351race | -.1290654 .0685416 -1.88 0.060 -.2634046 .0052737
_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731
_cons | 2.310279 .0679472 34.00 0.000 2.177105 2.443453------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once
estimated.
Poisson Regression Model
Poisson Regression Model: Negative Binomial Negative Binomial Regression Analysis & other count
Outlines:
Negative Binomial regression
Problem of Zero Counts
Zero inflated Poisson (zip)
Zero inflated negative Binomial (zinb)
Comparison of Models
Test of Comparative Fit
Other count data models
Negative Binomial Regression Analysis
Negative Binomial Regression (NB)
The earliest definitions of the negative binomial are based on
the binomial PDF.
NB2 (Cameron and Trivedi, 1986), NB2 is derived from a
Poisson– gamma mixture distribution.
NB1, The NB1 model can also be derived as a form of
Poisson–gamma mixture, but with different properties resulting
in a linear variance.
The negative binomial model, as a Poisson–gamma mixture model,
is appropriate to use when the overdispersion in an otherwise Poisson
model is thought to take the form of a gamma shape or distribution.
A more general class of negative binomial models with mean μiand
variance function (μi+ αμ
i
p). NB2 with p = 2, NB1 with p=1.
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2)
NB2 (Cameron and Trivedi, 1986), NB2 is derived from a
Poisson– gamma mixture distribution.
The NB2 model, with p = 2, is the standard formulation of the
negative binomial model
NB2 variance function μ + αμ2
It has density.
This reduces to the Poisson if α = 0
...,2,1,0,0
)()1(
)(),|(
11
1
1
11
y
y
yyf
y
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2)
The log-likelihood function for NB2
NB1, The NB1 model can also be derived as a form of
Poisson–gamma mixture, but with different properties resulting
in a linear variance.
The negative binomial model, as a Poisson–gamma mixture
model, is appropriate to use when the overdispersion in an
otherwise Poisson model is thought to take the form of a gamma
shape or distribution.
iiiii
n
ii
y
j
xyyxy
yjL i
ln))exp(1ln()(
!ln)ln(),(ln
1
1
1
0
1
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Example
A comparison of financial performance, organizational characteristics
and management strategy among rural & urban facilities. (Smith, HL.,
Piland, NF. & Fisher, N. J. Rural Health, 27-40, 1992)
Sample: Licensed Nurse n=52
bed = number of beds in home,
tdays = annual total patient days (in hundreds)
pcrev = annual total patient care revenue(in $ millions)
nsal = annual nursing salaries(in $ millions)
fexp = annual facilities expenditures(in $ millions)
rural = (1 = rural; 0 = nonrural)
16
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): nbreg
. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean)Fitting Poisson model:…Negative binomial regression Number of obs = 52
LR chi2(7) = 17.60Dispersion = mean Prob > chi2 = 0.0139Log likelihood = -223.23966 Pseudo R2 = 0.0379------------------------------------------------------------------------------
bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
pcrev | -.3868934 .1543459 -2.51 0.012 -.6894058 -.0843809nsal | .1556637 .9194312 0.17 0.866 -1.646388 1.957716fexp | 1.429801 .511777 2.79 0.005 .4267365 2.432866
rural | -.1193119 .0704735 -1.69 0.090 -.2574375 .0188137pn | .3323483 .2933881 1.13 0.257 -.2426818 .9073784pf | .7531993 .5164349 1.46 0.145 -.2589945 1.765393nf | -4.56582 2.00498 -2.28 0.023 -8.495509 -.6361308
_cons | -.9103272 .1988939 -4.58 0.000 -1.300152 -.5205023tdays | (exposure)
-------------+----------------------------------------------------------------/lnalpha | -3.505601 .2714876 -4.037707 -2.973495
-------------+----------------------------------------------------------------alpha | .0300287 .0081524 .0176379 .0511243
------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000
Neagative Binomial Regression Analysis
Negative Binomial Regression (NB2): glm
. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb .0300287) l(log)Iteration 0: log likelihood = -223.40458 Iteration 1: log likelihood = -223.23965 Iteration 2: log likelihood = -223.23965 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44
Scale parameter = 1Deviance = 52.37224156 (1/df) Deviance = 1.190278Pearson = 57.5930065 (1/df) Pearson = 1.308932Variance function: V(u) = u+(.0300287)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]
AIC = 8.893833Log likelihood = -223.239651 BIC = -121.4825------------------------------------------------------------------------------
| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------pcrev | -.3868933 .1543257 -2.51 0.012 -.6893661 -.0844204nsal | .1556692 .9194152 0.17 0.866 -1.646352 1.95769fexp | 1.429802 .5116407 2.79 0.005 .4270048 2.432599
rural | -.1193121 .0704696 -1.69 0.090 -.2574299 .0188057pn | .3323467 .2933803 1.13 0.257 -.2426681 .9073615pf | .7532008 .5163957 1.46 0.145 -.2589161 1.765318nf | -4.565827 2.004979 -2.28 0.023 -8.495514 -.6361409
_cons | -.9103282 .1988345 -4.58 0.000 -1.300037 -.5206197tdays | (exposure)
------------------------------------------------------------------------------
Neagative Binomial Regression Analysis
Negative Binomial Regression (NB2): glm (Stata 11+)
. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb ml) l(log)Iteration 0: log likelihood = -223.40459 Iteration 1: log likelihood = -223.23966 Iteration 2: log likelihood = -223.23966 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44
Scale parameter = 1Deviance = 52.3722233 (1/df) Deviance = 1.190278Pearson = 57.59299049 (1/df) Pearson = 1.308932Variance function: V(u) = u+(.03)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]
AIC = 8.893833Log likelihood = -223.239656 BIC = -121.4825------------------------------------------------------------------------------
| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------pcrev | -.386893 .1543258 -2.51 0.012 -.689366 -.0844201nsal | .1556643 .9194159 0.17 0.866 -1.646358 1.957686fexp | 1.429801 .5116407 2.79 0.005 .4270039 2.432599
rural | -.119312 .0704696 -1.69 0.090 -.2574298 .0188059pn | .3323478 .2933805 1.13 0.257 -.2426674 .907363pf | .7531989 .516396 1.46 0.145 -.2589187 1.765316nf | -4.565819 2.00498 -2.28 0.023 -8.495507 -.6361303
_cons | -.9103275 .1988346 -4.58 0.000 -1.300036 -.5206188ln(tdays) | 1 (exposure)
------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once estimated.
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using the rate
Methods of interpretation based on E(y|x) -->
The interpretation
For a change of in xk
f, the expected count increases by a factor of
exp(k
x ), holding all other variables constant.
-For specific values of
Factor change. For a unit change in xk, the expected count changes
by a factor of exp(k), holding all other variables constant.
Standardize factor change. For a standard deviation change to xk, the
expected count changes by a factor of exp(k
x sk), holding all other
variables constant.
IRRexxyE
xxyEk
k
k
),|(
),|(
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using percentage
Alternatively, the percentage change th the expected count for a unit
change in xk, holding other variables constant.
Methods of interpretation based on E(y|x)
The interpretation
For a factor xk
, the expected count increases (decreases) by n%
[exp(k)-1]x100, holding all other variables constant.
100]1[exp100),|(
),|(),|( )( xxxxyE
xxyExxyEk
k
kk
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using the rate
. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…Negative binomial regression Number of obs = 52
LR chi2(7) = 17.60Dispersion = mean Prob > chi2 = 0.0139Log likelihood = -223.23965 Pseudo R2 = 0.0379------------------------------------------------------------------------------
bed | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
pcrev | .6791633 .1048261 -2.51 0.012 .5018741 .9190808nsal | 1.168439 1.074299 0.17 0.866 .1927459 7.083157fexp | 4.177871 2.138139 2.79 0.005 1.53225 11.39149
rural | .8875309 .0625474 -1.69 0.090 .7730299 1.018992pn | 1.394237 .4090522 1.13 0.257 .7845205 2.477814pf | 2.123788 1.096798 1.46 0.145 .7718291 5.843878nf | .0104013 .0208543 -2.28 0.023 .0002044 .5293314
tdays | (exposure)-------------+----------------------------------------------------------------
/lnalpha | -3.505601 .2714876 -4.037707 -2.973495-------------+----------------------------------------------------------------
alpha | .0300287 .0081524 .0176379 .0511243------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000
17
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using the rate
. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…. listcoef ,helpnbreg (N=52): Factor Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------
bed | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------
pcrev | -0.38689 -2.507 0.012 0.6792 0.7635 0.6974nsal | 0.15567 0.169 0.866 1.1684 1.0262 0.1659fexp | 1.42980 2.794 0.005 4.1779 1.3214 0.1949
rural | -0.11931 -1.693 0.090 0.8875 0.9443 0.4804pn | 0.33235 1.133 0.257 1.3942 1.1790 0.4954pf | 0.75320 1.458 0.145 2.1238 1.3894 0.4366nf | -4.56583 -2.277 0.023 0.0104 0.5918 0.1149
-------------+--------------------------------------------------------ln alpha | -3.50560
alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------
b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X
e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using the rate
. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(m) irr…. listcoef ,help percentnbreg (N=52): Percentage Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------
bed | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
pcrev | -0.38689 -2.507 0.012 -32.1 -23.6 0.6974nsal | 0.15567 0.169 0.866 16.8 2.6 0.1659fexp | 1.42980 2.794 0.005 317.8 32.1 0.1949
rural | -0.11931 -1.693 0.090 -11.2 -5.6 0.4804pn | 0.33235 1.133 0.257 39.4 17.9 0.4954pf | 0.75320 1.458 0.145 112.4 38.9 0.4366nf | -4.56583 -2.277 0.023 -99.0 -40.8 0.1149
-------------+--------------------------------------------------------ln alpha | -3.50560
alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------
b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-test% = percent change in expected count for unit increase in X
%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): Interpretation using the rate
Interpretation based on Incidnce rate ratio
Being a annual total patient care revenue decreases the expected
number of beds in home by .6792, holding all other variables
constant.
Interpreatation based on percentage
Being a annual total patient care revenue decreases the expected
number of beds in home by 32.1%, holding all other variables
constant.
Negative Binomial Regression Analysis
Negative Binomial Regression (NB1)
NB1, The NB1 model can also be derived as a form of
Poisson–gamma mixture, but with different properties resulting
in a linear variance.
The NB1 model, which sets p = 1, is also of interest because it
has the same variance function, (1 + α)μi= μ
i, as that used in
the GLM approach.
The NB1 log-likelihood function is
ln)1ln())exp((
!ln)exp()ln(),(ln
1
1
1
0
1
iii
n
ii
y
j i
yxy
yxjL i
Negative Binomial Regression Analysis
Negative Binomial Regression (NB1): nbreg
. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(c)Fitting Poisson model:Iteration 0: log likelihood = -264.43404...Iteration 4: log likelihood = -223.70024Negative binomial regression Number of obs = 52
LR chi2(7) = 14.50Dispersion = constant Prob > chi2 = 0.0430Log likelihood = -223.70024 Pseudo R2 = 0.0314------------------------------------------------------------------------------
bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
pcrev | -.3177338 .1380812 -2.30 0.021 -.588368 -.0470996nsal | .2634129 .9281847 0.28 0.777 -1.555796 2.082622fexp | 1.345714 .5563743 2.42 0.016 .2552406 2.436188
rural | -.1166414 .0692708 -1.68 0.092 -.2524096 .0191268pn | .2374021 .2853126 0.83 0.405 -.3218002 .7966045pf | .628185 .4937371 1.27 0.203 -.3395219 1.595892nf | -4.031638 1.836357 -2.20 0.028 -7.630831 -.4324443
_cons | -.9878807 .2124139 -4.65 0.000 -1.404204 -.5715572tdays | (exposure)
-------------+----------------------------------------------------------------/lndelta | 1.014998 .2637996 .4979601 1.532035
-------------+----------------------------------------------------------------delta | 2.759357 .7279173 1.645361 4.627587
------------------------------------------------------------------------------Likelihood-ratio test of delta=0: chibar2(01) = 81.44 Prob>=chibar2 = 0.000
Negative Binomial Regression Analysis
Negative Binomial Regression (NB2): glm … ,(nb 1) . glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb 1) l(log)Iteration 0: log likelihood = -284.66051Iteration 1: log likelihood = -284.65619Iteration 2: log likelihood = -284.65619Generalized linear models No. of obs = 52Optimization : ML Residual df = 44
Scale parameter = 1Deviance = 2.219059843 (1/df) Deviance = .0504332Pearson = 2.461198101 (1/df) Pearson = .0559363Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]
AIC = 11.25601Log likelihood = -284.6561904 BIC = -171.6357------------------------------------------------------------------------------
| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------pcrev | -.3972157 .7698816 -0.52 0.606 -1.906156 1.111724nsal | .1331111 4.408084 0.03 0.976 -8.506575 8.772797fexp | 1.350175 2.449433 0.55 0.581 -3.450627 6.150976
rural | -.1159449 .3485077 -0.33 0.739 -.7990075 .5671176pn | .3367189 1.443701 0.23 0.816 -2.492884 3.166321pf | .788123 2.510264 0.31 0.754 -4.131904 5.70815nf | -4.53271 9.898894 -0.46 0.647 -23.93419 14.86876
_cons | -.885872 .9548721 -0.93 0.354 -2.757387 .9856429tdays | (exposure)
------------------------------------------------------------------------------
18
Problem of Zero in Counts Model
Problem of Zero counts
Count response models having for more zeros than expected by
distributional assumptions of Poisson and Negative binomial models
result incorrect & biased.
Incorrect parameter estimates
Biased standard Error.
Cause of Overdispersion
Zero Inflated Poisson Regression Model
Zero Inflated Poisson (ZIP)
Zero-inflated count models were first introduced by Lambert (1992)
to provide another method of accounting for excessive zero counts.
ZIP are two-part models, consisting of both binary and count model
sections. (provide for the modeling of zero counts using both binary
and count processes.)
Let the response Yidenote a non-negative integer count for the ith
observation, i = 1, · · · ,N.
Zero Inflated Poisson Model
Probability of Zero Inflated Poisson
The probability of an excess zero is denoted by πi, 0 ≤ i≤ 1 , the
random variable Yifollows a ZIP distribution if
10
,...,2,1,!
)1(
0,)1(
)Pr(
i
ii
yi
i
iii
iiy
e
ye
yY ii
i
2
1)(;)1()( i
i
iiiiiii YVarYE
Zero Inflated Negative Binomial Model
Zero Inflated Negative Binomial (ZINB)
Let the response Yidenote a non-negative integer count for the ith
observation, i = 1, · · · ,N. then ZINB distribution
E(Yi) = (1−
i)λ
iand Var(Y
i) = (1−
i)λ
i(1+(κ+
i)λ
i),
where κ is an overdispersion parameter
10
0,1
1
1)!)((
)()1(
0,)1
1)(1(
)Pr( 1
1
1
1
i
i
k
i
y
i
i
i
ii
ik
iii
ii
ykk
k
yk
yk
yk
yYi
i
Zero Inflated Negative Binomial Model
Zero Inflated Negative Binomial (ZINB)
Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1
y1 | Freq. Percent Cum.------------+-----------------------------------
0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00
------------+-----------------------------------
Total | 50,000 100.00
ZIP & ZINB Model
ZIP & ZINB: example
Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1
y1 | Freq. Percent Cum.------------+-----------------------------------
0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00
------------+-----------------------------------Total | 50,000 100.00
. di exp(- 1.40606)* 1.40606^0/exp(lnfactorial(0))
.24510711
19
Zero Inflated Poisson Model
Zero Inflated Poisson Example: zip
. zip y1 x1 x2, inflate(x1 x2)Fitting constant-only model:Iteration 0: log likelihood = -93719.413…Iteration 4: log likelihood = -84524.083Fitting full model:Iteration 0: log likelihood = -84524.083…Iteration 4: log likelihood = -81687.514Zero-inflated Poisson regression Number of obs = 50000
Nonzero obs = 29404Zero obs = 20596
Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |
x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024
_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |
x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883
_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402
Zero Inflated Negative Binomial Model
Zero Inflated Negative Binomial Example: zinb
. zinb y1 x1 x2, inflate(x1 x2)…Zero-inflated negative binomial regression Number of obs = 50000
Nonzero obs = 29404Zero obs = 20596
Inflation model = logit LR chi2(2) = 3733.39Log likelihood = -78723.31 Prob > chi2 = 0.0000
------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------y1 |
x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368
_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |
x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738
_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------
/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------
alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------
Zero Inflated Poisson Regression Model
Zero inflated Poisson Model (ZIP): Interpretation
Interpretation based on Poisson Model
Poisson Model, contains coefficients for the factor change in expected
count for those in the Not Always Zero group.
constant.
The coefficients can be interpreted in the same way as coefficient
from the Poisson Regression Model.
Interpretation based on Binary Logit Model
Binary Logit Model, contains coefficients for the factor change in
the odds of being in the Always Zero group compared with the Not
Always Zero group.
The coefficients interpreted in the same way as coefficients for a
binary logit model
Zero Inflated Negative Binomial Model
Zero inflated Negative Binomial Model (ZINB): Interpretation
Interpretation based on Negative Binomial Model
NB Model, contains coefficients for the factor change in expected
count for those in the Not Always Zero group.
The coefficients can be interpreted in the same way as coefficient
from the Negative Binomial Model.
Interpretation based on Binary Logit Model
Binary Logit Model, contains coefficients for the factor change in
the odds of being in the Always Zero group compared with the Not
Always Zero group.
The coefficients interpreted in the same way as coefficients for a
binary logit model
Zero Inflated Poisson Regression Model
Zero inflated Poisson Model (ZIP): Example Interpretation
. zip y1 x1 x2, inflate(x1 x2)…. listcoef, helpzip (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------
y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------
x1 | 0.62773 39.128 0.000 1.8734 1.1990 0.2892x2 | -1.06927 -63.041 0.000 0.3433 0.7336 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X
e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X
Binary Equation: Factor Change in Odds of Always 0
----------------------------------------------------------------------Always0 | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------x1 | -0.45515 -9.329 0.000 0.6344 0.8767 0.2892x2 | 0.71085 14.270 0.000 2.0357 1.2287 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X
Zero Inflated Poisson Regression Model
Zero inflated Poisson Model (ZIP): Example Interpretation
. listcoef, help percentzip (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------
y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
x1 | 0.62773 39.128 0.000 87.3 19.9 0.2892x2 | -1.06927 -63.041 0.000 -65.7 -26.6 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-test% = percent change in expected count for unit increase in X
%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X
Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------
Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
x1 | -0.45515 -9.329 0.000 -36.6 -12.3 0.2892x2 | 0.71085 14.270 0.000 103.6 22.9 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-test% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X
20
Zero Inflated Negative Binomial Model
Zero Inflated Negative Binomial Model (ZINB): Example Interpretation
. zinb y1 x1 x2, inflate(x1 x2)
...
. listcoef, helpzinb (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------
y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------
x1 | 0.73713 31.752 0.000 2.0899 1.2376 0.2892x2 | -1.25461 -54.355 0.000 0.2852 0.6952 0.2898
-------------+--------------------------------------------------------ln alpha | -0.29152
alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------
b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X
e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X
Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------
Always0 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------
x1 | -4.33425 -1.170 0.242 0.0131 0.2855 0.2892x2 | 3.05896 1.500 0.134 21.3053 2.4263 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X
. zinb y1 x1 x2, inflate(x1 x2)
...
. listcoef, help percentzinb (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------
y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
x1 | 0.73713 31.752 0.000 109.0 23.8 0.2892x2 | -1.25461 -54.355 0.000 -71.5 -30.5 0.2898
-------------+--------------------------------------------------------ln alpha | -0.29152
alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------
b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-test% = percent change in expected count for unit increase in X
%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X
Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------
Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------
x1 | -4.33425 -1.170 0.242 -98.7 -71.4 0.2892x2 | 3.05896 1.500 0.134 2030.5 142.6 0.2898
----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0
P>|z| = p-value for z-test% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X
Zero Inflated Negative Binomial Model
Zero Inflated Negative Binomial Model (ZINB): Example Interpretation
Test of Comparative Fit
Test comparative: Vuong test
The standard fit test for ZINB is the Vuong test (Vuong, 1989)
- Comparative of Standard Poisson & ZIP
- Comparative of ZINB & ZIP
deviationstandard)(
&meanthe
)|(
)|(ln;
)(
uSD
u
xyP
xyPu
uSD
unV
iiZINPi
iiZIPii
i
Test of Comparative fit
Comparative test: Zero Inflated Poisson VS ZIP
. zip y1 x1 x2, inflate(x1 x2) vuongFitting constant-only model:...
Zero-inflated Poisson regression Number of obs = 50000Nonzero obs = 29404Zero obs = 20596
Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000
------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------y1 |
x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024
_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |
x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883
_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402------------------------------------------------------------------------------Vuong test of zip vs. standard Poisson: z = 39.10 Pr>z = 0.0000
Test of Comparative fit
Comparative test: Zero Inflated Negative Binomial VS NB
. zinb y1 x1 x2, inflate(x1 x2) vuong zip
... Zero-inflated negative binomial regression Number of obs = 50000
Nonzero obs = 29404Zero obs = 20596
Inflation model = logit LR chi2(2) = 3733.39Log likelihood = -78723.31 Prob > chi2 = 0.0000------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |
x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368
_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |
x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738
_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------
/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------
alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 5928.42 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.86 Pr>z = 0.1954
Comparison of Models
Comparison model: Graph & statistics across models
Summary statistics across models: BIC, AIC, likelihood Ratio Test,
Voung test
Graph Difference between the observed and predicted probability for
the PRM, NB2, ZIP & ZINB models
(Long & Freese, 2006)
21
Comparison of Models
Comparison model: countfit (Graph & statistics across models)
Summary statistics across models: BIC, AIC, likelihood Ratio Test,
Voung test
Graph Difference between the observed and predicted probability for
the PRM, NB2, ZIP & ZINB models. countfit y1 x1 x2, gen(Base_) inflate(x1 x2) maxcount(10) ///
prm nbreg zip zinb nodash…Comparison of Mean Observed and Predicted Count
Maximum At MeanModel Difference Value |Diff|---------------------------------------------Base_PRM 0.124 0 0.029Base_NBRM -0.014 2 0.005Base_ZIP 0.069 1 0.016Base_ZINB -0.014 2 0.005
…Tests and Fit Statistics
Comparison of Models
Comparison model: countfit (Graph & statistics across models)
Tests and Fit Statistics
Base_PRM BIC= -1311.572 AIC= 3.566 Prefer Over Evidence -------------------------------------------------------------------------vs Base_NBRM BIC= -1466.037 dif= 154.465 NBRM PRM Very strong
AIC= 3.249 dif= 0.317 NBRM PRMLRX2= 160.680 prob= 0.000 NBRM PRM p=0.000
-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= 75.466 ZIP PRM Very strong
AIC= 3.390 dif= 0.176 ZIP PRMVuong= 3.963 prob= 0.000 ZIP PRM p=0.000
-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 137.399 ZINB PRM Very strong
AIC= 3.258 dif= 0.309 ZINB PRM-------------------------------------------------------------------------Base_NBRM BIC= -1466.037 AIC= 3.249 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= -78.999 NBRM ZIP Very strong
AIC= 3.390 dif= -0.141 NBRM ZIP-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= -17.067 NBRM ZINB Very strong
AIC= 3.258 dif= -0.009 NBRM ZINBVuong= 0.520 prob= 0.302 ZINB NBRM p=0.302
-------------------------------------------------------------------------Base_ZIP BIC= -1387.037 AIC= 3.390 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 61.933 ZINB ZIP Very strong
AIC= 3.258 dif= 0.132 ZINB ZIPLRX2= 68.147 prob= 0.000 ZINB ZIP p=0.000
-------------------------------------------------------------------------
Comparison of Models
Comparison model: countfit (Graph & statistics across models)
Comparison of Models
Comparison model: zinb (Voung test)
.zinb y1 x1 x2, inflate(x1 x2) vuong zipFitting zip model:…Zero-inflated negative binomial regression Number of obs = 500
Nonzero obs = 304Zero obs = 196
Inflation model = logit LR chi2(2) = 41.68Log likelihood = -807.4158 Prob > chi2 = 0.0000------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |
x1 | .7905583 .1924543 4.11 0.000 .4133548 1.167762x2 | -1.352218 .1952302 -6.93 0.000 -1.734862 -.9695734
_cons | .5679291 .1385531 4.10 0.000 .2963701 .8394882-------------+----------------------------------------------------------------inflate |
x1 | 24.1426 23.66368 1.02 0.308 -22.23736 70.52257x2 | -18.07713 19.19718 -0.94 0.346 -55.70292 19.54865
_cons | -23.10625 22.25758 -1.04 0.299 -66.73031 20.51781-------------+----------------------------------------------------------------
/lnalpha | -.3324529 .1445162 -2.30 0.021 -.6156994 -.0492064-------------+----------------------------------------------------------------
alpha | .7171625 .1036416 .5402629 .9519846------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 68.15 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.52 Pr>z = 0.3016
Other Count Data Models
Zero& others Count data Model
Zero truncated Poisson & Zero truncated negative binomial
Truncated Poisson & truncated negative binomial
Hurdle model (Mullahy, 1986) or zero-altered model
(zap & zanb)
Censored Poisson & censored negative binomial
Generalized Poisson Regression
Generalized Negative Binomial
etc
Reference
Reference: Negative Binomial & other Count Models
Agresti, A. (2002). Categorical Data Analysis. John Wiley & Sons. New York. Cameron A.C. and Trivedi P.K. (1990). Regression Analysis of Count Data.
Cambridge University Press. New York.Cameron, A.C. and Trivedi, P.K. (1990). Regression-based tests for
overdispersion in the Poisson model. J.Econometrics, 46, 347-364.Dean, C. B. (1992). Testing for overdispersion in Poisson and binomial
regression models. J. Am. Statist. Assoc.,87, 451-457.Dean, C. and Lawless, J. F. (1989). Tests for detecting overdispersion in
Poisson Regression models. J. Am. Statist. Assoc., 84, 467-472.Fleiss, J.L., Levin, B., & Paik, M.C. (2003). Statistical methods for rates
and proportions. 3rd edition. John Wiley & Sons. New York.Greene, W.H. (2003). Econometric Analysis 5th. Prentice & Hall. New Jersey. Hilbe, J.M. (2007). Negative Binomial Regression. Cambridge University Press.
New York. Hilbe, J.M. (2014). Modeling NCOunt Data. Cambridge University Press.New YorkThanomsieng, N. (2007). overtest.ado STATA ado file: Overdispersion test.
Available at http://home.kku.ac.th/nikom