THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.
-
Upload
princess-shaker -
Category
Documents
-
view
238 -
download
10
Transcript of THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.
![Page 1: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/1.jpg)
THE POISSON &
NEGATIVE BINOMIAL MODELS
By: ALVARD AYRAPETYAN
![Page 2: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/2.jpg)
OUTLINE OF PRESENTATION Poisson Regression
Model Assumptions, Assessment, and Interpretations Applications in SAS and R Quick Programming in SPSS and MINITAB
Negative Binomial Model Assumptions, Assessment, and Interpretations Applications in SAS and R Quick Programming in SPSS
![Page 3: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/3.jpg)
3
ASSUMPTIONS FOR POISSON MODEL• Number of events must occur at a
fixed period of time• Number of events must occur at a
constant rate• Events must be independent• Dependent variable’s conditional
mean and variance must be equal• Dependent variable must be an
integer
![Page 4: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/4.jpg)
4
THE POISSON MODEL
Random Component: Poisson Distribution for the # of lead changes
Systematic Component:
Mass Function: E(Y) = µ & V(Y)= µ Link Function: g(µ) = log(µ)
,...2,1,0
!
)(),,|(
)(
321
yy
XeXXXyYP
yX
332211
332211)log()(XXXe
XXXg
![Page 5: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/5.jpg)
5
EXAMPLES OF POISSON DISTRIBUTION• Number of earthquakes in a region
• Number of accidents on a highway in a certain area in a specified time
• Number of telephone calls received in one hour
• Number of customers that enter a bank in one hour
• Number of times an elderly person will fall in a month
![Page 6: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/6.jpg)
6
INTEPRETING COEFFICIENTSCONTINUOUS PREDICTOR Keeping all constant,
when is increased by one unit, Y increases/decreases (+/-) by
Keeping all constant, when is increased by one unit, the expected number of Y will go up/down (+/-) by
CATEGORICAL PREDICTOR Keeping all constant,
when , Y increases/decreases (+/-) by
Keeping all constant, when the expected number of Y will go up/down (+/-) by
1x
%100)1)ˆ(( 1 Exp
1x
1̂
11 x
%100)ˆ( 1 Exp11 x
1̂
![Page 7: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/7.jpg)
7
POTENTIAL PROBLEM WITH POISSON
• OVERDISPERSION-the variance is much larger than the mean
• Negative Binomial is the solution!
![Page 8: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/8.jpg)
8
THE DATA Trying to predict the number of field goal
attempts in NBA Extracted the top 100 highest scoring players
in the NBA for the 2013-2014 season The following were used as predictors:
Number of games played (GP) Number of defensive rebounds(DREB) Number of assists (AST) Number of steals (STL) Number of blocks (BLK) Number of turnovers (TOV) Number of free throws made (FTM)
![Page 9: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/9.jpg)
9
SAMPLE OF THE DATA
Rank Player GP FGA DREB AST STL FTM TOV
1 Kevin Love (MIN) 15 268 146 68 13 95 41
2 Kevin Durant (OKC) 12 209 72 62 17 131 45
3 Monta Ellis (DAL) 14 235 42 76 22 85 55
4 Blake Griffin (LAC) 15 242 129 47 19 59 40
5 LeBron James (MIA) 13 201 67 88 12 71 49
6 Evan Turner (PHI) 15 272 85 53 15 71 56
7 Kevin Martin (MIN) 14 248 48 33 18 71 20
8 Paul George (IND) 13 231 72 41 23 70 33
9 LaMarcus Aldridge (POR) 14 285 105 35 19 54 34
10 Carmelo Anthony (NYK) 12 264 79 33 15 72 36
11 Kyrie Irving (CLE) 14 268 40 89 14 55 47
12 Klay Thompson (GSW) 14 212 30 22 12 30 20
13 Dirk Nowitzki (DAL) 14 206 82 33 16 74 25
14 James Harden (HOU) 12 195 45 65 21 91 52
15 Chris Paul (LAC) 15 208 65 188 36 81 44
16 Arron Afflalo (ORL) 13 197 62 61 10 56 33
17 Damian Lillard (POR) 14 225 54 85 11 64 31
18 DeMarcus Cousins (SAC) 13 230 103 31 22 65 36
![Page 10: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/10.jpg)
10
POISSON-EXAMPLE WITH SAS
proc genmod data = nba;
model FGA= GP DREB AST STL TOV FTM /dist=poisson;
run;
/*check goodness of fit for model*/
data pvalue;
df = 93; chisq = 511.6210;
pvalue = 1 - probchi(chisq, df);
run;
proc print data = pvalue noobs;
run; /*pvalue is NOT significant, model isnt good*; dispersion parameter 5.5013 >> 1, major overdipsersion/
![Page 11: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/11.jpg)
11
EXAMPLE RESULTS-GOODNESS OF FIT
The GENMOD Procedure Model Information Data Set WORK.NBA Distribution Poisson Link Function Log Dependent Variable FGA Number of Observations Read 100 Number of Observations Used 100 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 93 511.6210 5.5013 Scaled Deviance 93 511.6210 5.5013 Pearson Chi-Square 93 518.3345 5.5735 Scaled Pearson X2 93 518.3345 5.5735 Log Likelihood 72301.7048 Full Log Likelihood -604.2412 AIC (smaller is better) 1222.4824 AICC (smaller is better) 1223.6998 BIC (smaller is better) 1240.7186
![Page 12: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/12.jpg)
12
RESULTS: Analysis of Maximum Likelihood Parameter Estimates
PARAMETER DF ESTIMATE STANDARD ERROR
WALD 95% CONFIDENCE LIMITS
WALD CHI-SQUARE
PR>CHISQ
Intercept 1 4.1864 0.0749 (4.0396,43332)
3125.02 <.0001
GP 1 0.0422 0.0057 (0.0310,0.0534)
54.93 <.0001
DREB 1 0.0004 0.0003 (-0.0002,0.0010)
1.55 0.2131
AST 1 -0.0002 0.0003 (-0.0008,0.0005)
0.28 0.5995
STL 1 0.0028 0.0012 (0.0004,0.0052)
5.17 0.0230
TOV 1 0.0057 0.0010 (0.0038,0.0077)
33.53 <.0001
FTM 1 0.0040 0.0004 (0.0032,0.0048)
98.23 <.0001
Scale 0 1.000 0 (1.0, 1.0)
![Page 13: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/13.jpg)
13
ASSESSMENT OF RESULTSRatio of Deviance/Df=5.5013
>>>1==major overdispersionDeviance=511.6210, not well fit because
pvalue=1-prob(chisq,df) is NOT significant
Every term significant except for AST and DREB
False results possible if model is inaccurate
Must perform a NEGATIVE BINOMIAL
![Page 14: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/14.jpg)
14
POISSON-EXAMPLE WITH R
nba <- read.csv("F:/STATS544/nba.csv",header=TRUE)
poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FTM, family = "poisson", data = nba)
summary(poiss)
![Page 15: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/15.jpg)
15
R-GOODNESS OF FITS
Deviance Residuals:
Min 1Q Median 3Q Max
-5.5397 -1.2614 -0.1643 1.2650 6.2786
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 926.60 on 99 degrees of freedom
Residual deviance: 511.62 on 93 degrees of freedom
AIC: 1222.5
![Page 16: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/16.jpg)
R-ANALYSIS OF PARAMETER ESTIMATES
Call:
glm(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, family = "poisson",
data = nba)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
16
ESTIMATE STD.ERROR Z VALUE PR(>|z|)
(Intercept) 4.1864100 0.0748885 55.902 < 2e-16 ***
GP 0.0422013 0.0056940 7.411 1.25e-13 ***
DREB 0.0003719 0.0002987 1.245 0.213
AST -0.0001778 0.0003387 -0.525 0.600
STL 0.0027777 0.0012221 2.273 0.023 *
TOV 0.0057220 0.0009882 5.790 7.02e-09 ***
FTM 0.0040405 0.0004077 9.911 < 2e-16 ***
![Page 17: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/17.jpg)
17
POISSON WITH SPSS & MINITAB
SPSS
genlin FGA with GP DREB AST STL TOV FTM
/model GP DREB AST STL TOV FTM INTERCEP=YESdistribution = poisson link = log
/print FIT SUMMARY SOLUTION.
MINITAB
Stat > Regression > Poisson Regression > Fit Poisson Model.
![Page 18: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/18.jpg)
Detecting over-dispersionwith SAS
Poisson regression gives a ratio between DEVIANCE and DF >1.
proc genmod data = nba;
model FGA= GP DREB AST STL TOV FTM /dist=poisson;
run;
PROC MEANS--- the variance of FGA(Y) is much higher than its mean
proc means data = nba n mean var min max;
var FGA
run;
![Page 19: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/19.jpg)
Detecting over-dispersionwith R
Poisson regression gives a ratio between RESIDUAL DEVIANCE and DF >1 poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FTM, family = "poisson",
data = nba)
summary(poiss)
mean(nba$FGA) [1] 173.47
var(nba$FGA) [1] 1684.858
![Page 20: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/20.jpg)
20
NEGATIVE BINOMIAL REGRESSION
Generalization of Poisson regression
Used for over-dispersed count data
PMF:
E(Y)= m, V(Y) = +m k*(m2) K=dispersion parameter As k0, the V(Y) , m NB approaches Poisson and
V(Y)=E(Y)= m Link Function same as Poisson: g(m) = log(m.) Equation: Log(λ(X))= β0 + β1Χ1 + β2Χ2+……..+ βp-1Xp-1 Goodness Of fit Test-same as Poisson
,...2,1,0)1()(
)(),,,|( 321
y
kk
k
yk
kykXXXyYP
yk
![Page 21: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/21.jpg)
21
NEGATIVE BINOMAL-EXAMPLE WITH SAS
proc genmod data = nba;
model FGA= GP DREB AST STL TOV FTM /dist=negbin; (ONLY DIFFERENCE FROM POISSON)
run;
/*check goodness of fit for model*/
data pvalue;
df = 93; chisq = 99.3405;
pvalue = 1 - probchi(chisq, df);
run;
proc print data = pvalue noobs;
run;
![Page 22: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/22.jpg)
22
EXAMPLE RESULTS-GOODNESS OF FIT Data Set WORK.NBA
Distribution Negative Binomial Link Function Log Dependent Variable FGA Number of Observations Read 100 Number of Observations Used 100 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 93 99.3405 1.0682 Scaled Deviance 93 99.3405 1.0682 Pearson Chi-Square 93 100.7383 1.0832 Scaled Pearson X2 93 100.7383 1.0832 Log Likelihood 72428.1189 Full Log Likelihood -477.8271 AIC (smaller is better) 971.6543 AICC (smaller is better) 973.2367 BIC (smaller is better) 992.4957
![Page 23: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/23.jpg)
23
RESULTS: Analysis of Maximum Likelihood Parameter Estimates
PARAMETER
DF ESTIMATE
STANDARD ERROR
WALD 95% CONFIDENCE LIMITS
WALD CHI-SQUARE
PR>CHI-SQ
INTERCEPT 1 4.1742 0.1641 (3.8525,4.4958)
647.01 <.0001
GP 1 0.0426 0.0125 (0.0181,0.0671)
11.62 0.0007
DREB 1 0.0003 0.0007 (-0.0011,0.0016)
0.15 0.7028
AST 1 -0.0001 0.0008 (-0.0017,0.0014)
0.03 0.8619
STL 1 0.0024 0.0027 (-0.0029,0.0077)
0.78 0.3756
TOV 1 0.0060 0.0023 (0.0015,0.0105)
6.95 0.0084
FTM 1 0.0042 0.0010 (0.0023,0.0061)
19.32 <.0001
DISPERSION
1 0.0230 0.0040 (0.0163,0.0325)
![Page 24: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/24.jpg)
24
Assessment of Results Ratio of Deviance/Df=1.0682 ≈1 (over-dispersion fixed!) Deviance=99.3405, now is well fit because pvalue=1-
prob(chisq,df) IS significant Extra parameter in the “Analysis of Maximum Likelihood
Parameter Estimates” called “Dispersion” (aka ALPHA) Accounts for the over-dispersion factor we came across
in the Poisson regression This estimate has a value of .0230 with a Wald
Confidence Interval of (.0163, 0325). Based on the 95% Confidence Limits for our dispersion parameter, we can say that dispersion is significantly different from 0, justifying the negative binomial model is more appropriate
GP, TOV, & FTM only significant predictors
![Page 25: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/25.jpg)
25
NEGATIVE BINOMIAL-EXAMPLE WITH R
nba <- read.csv("F:/STATS544/nba.csv",header=TRUE)
install.packages('MASS') library(MASS) nb<-glm.nb(FGA
~GP+DREB+AST+STL+TOV+FTM, data = nba)
summary(nb)
![Page 26: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/26.jpg)
26
EXAMPLE RESULTS-GOODNESS OF FIT
(Dispersion parameter for Negative Binomial(43.4291) family taken to be 1)
Null deviance: 182.54 on 99 degrees of freedom
Residual deviance: 99.34 on 93 degrees of freedom
AIC: 971.65
Number of Fisher Scoring iterations: 1
Deviance Residuals:
Min 1Q Median 3Q Max
-2.36322 -0.60467 -0.06083 0.55227 2.72053
Theta: 43.43
Std. Err.: 7.62
2 x log-likelihood: -955.654
![Page 27: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/27.jpg)
27
RESULTS: Analysis of Maximum Likelihood Parameter Estimates
Call:
glm.nb(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, data = nba,
init.theta = 43.42912732, link = log)
Coefficients:
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ESTIMATE STD.ERROR Z-VALUE PR(>|Z|)
(Intercept) 4.1741833 0.1626544 25.663 < 2e-16 ***
GP 0.0425988 0.0123895 3.438 0.000585 ***
DREB 0.0002619 0.0006835 0.383 0.701571
AST -0.000139 0.0007904 -0.176 0.860433
STL 0.0023962 0.0027055 0.886 0.375794
TOV 0.0060360 0.0022760 2.652 0.008001 **
FTM 0.0042121 0.0009430 4.467 7.95e-06 ***
![Page 28: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/28.jpg)
28
INTERPETATION OF SIGNIFICANT COEFFICIENTS
GP: Holding all other variables constant, for every one unit addition of games played, the expected log number of field goal attempts will go up by .0426. Or similarly, for every additional game played, the number of field goal attempts will increase by 4.35%
TOV: Holding all other variables constant, for every one extra TOV, the expected log number of field goal attempts will increase by 0.0060. Or similarly, for every additional turnover made, the number of field goal attempts will increase by 0.60%.
FTM: Holding all other variables constant, for every one unit addition of free throws made, the expected log number of field goal attempts will go up by 0.0042. Or similarly, for every additional free throw made, the number of field goal attempts will increase by 0.42%.
![Page 29: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/29.jpg)
29
NEGATIVE BINOMIAL WITH SPSS & MINITAB
SPSS
genlin FGA with GP DREB AST STL TOV FTM/model GP DREB AST STL TOV FTM INTERCEP=YESDistribution=negbin(mle) link = log /print FIT SUMMARY SOLUTION.
MINITAB
NA
![Page 30: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/30.jpg)
30
SUMMARY
Use Poisson regression when dealing with COUNT data
If there’s Overdispersion, switch to Negative binomial
Assumptions for both Poisson and NB are the same
Both model coefficients are interpreted same manner
Can perform both regressions in SAS, R, & SPSS
![Page 31: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.](https://reader035.fdocuments.net/reader035/viewer/2022062515/56649cc35503460f9498bc4f/html5/thumbnails/31.jpg)
31