Section

16
Section Count Data Models

description

Section. Count Data Models. Introduction. Many outcomes of interest are integer counts Doctor visits Low work days Cigarettes smoked per day Missed school days OLS models can easily handle some integer models. Example SAT scores are essentially integer values Few at ‘tails’ - PowerPoint PPT Presentation

Transcript of Section

Page 1: Section

Section

Count Data Models

Page 2: Section

Introduction

• Many outcomes of interest are integer counts– Doctor visits– Low work days– Cigarettes smoked per day– Missed school days

• OLS models can easily handle some integer models

Page 3: Section

• Example– SAT scores are essentially integer values– Few at ‘tails’– Distribution is fairly continuous– OLS models well

• In contrast, suppose– High fraction of zeros– Small positive values

Page 4: Section

• OLS models will– Predict negative values– Do a poor job of predicting the mass of

observations at zero

• Example– Dr visits in past year, Medicare patients(65+)– 1987 National Medical Expenditure Survey– Top code (for now) at 10– 17% have no visits

Page 5: Section

• visits | Freq. Percent Cum.• ------------+-----------------------------------• 0 | 915 17.18 17.18• 1 | 601 11.28 28.46• 2 | 533 10.01 38.46• 3 | 503 9.44 47.91• 4 | 450 8.45 56.35• 5 | 391 7.34 63.69• 6 | 319 5.99 69.68• 7 | 258 4.84 74.53• 8 | 216 4.05 78.58• 9 | 192 3.60 82.19• 10 | 949 17.81 100.00• ------------+-----------------------------------• Total | 5,327 100.00

Page 6: Section

Poisson Model

• yi is drawn from a Poisson distribution

• Poisson parameter varies across observations

• f(yi;λi) =e-λi λi yi/yi! For λi>0

• E[yi]= Var[yi] = λi = f(xi, β)

Page 7: Section

• λi must be positive at all times

• Therefore, we CANNOT let λi = xiβ

• Let λi = exp(xiβ)

• ln(λi) = (xiβ)

Page 8: Section

• d ln(λi)/dxi = β

• Remember that d ln(λi) = dλi/λi

• Interpret β as the percentage change in mean outcomes for a change in x

Page 9: Section

Problems with Poisson

• Variance grows with the mean– E[yi]= Var[yi] = λi = f(xi, β)

• Most data sets have over dispersion, where the variance grows faster than the mean

• In dr. visits sample, = 5.6, s=6.7• Impose Mean=Var, severe restriction

and you tend to reduce standard errors

Page 10: Section

Negative Binomial Model

• Where γi = exp(xiβ) and δ ≥ 0

• E[yi] = δγi = δexp(xiβ)

• Var[yi] = δ (1+δ) γi

• Var[yi]/ E[yi] = (1+δ)

ii y

ii

iii y

yy

11

1

)1()(

)()Pr(

Page 11: Section

• δ must always be ≥ 0• In this case, the variance grows

faster than the mean• If δ=0, the model collapses into the

Poisson• Always estimate negative binomial• If you cannot reject the null that δ=0,

report the Poisson estimates

Page 12: Section

• Notice that ln(E[yi]) = ln(δ) + ln(γi), so

• d ln(E[yi]) /dxi = β

• Parameters have the same interpretation as in the Poisson model

Page 13: Section

In STATA

• POISSON estimates a MLE model for poisson– Syntax

POISSON y independent variables

• NBREG estimates MLE negative binomial– Syntax

NBREG y independent variables

Page 14: Section

Interpret results for Poisson

• Those with CHRONIC condition have 50% more mean MD visits

• Those in EXCELent health have 78% fewer MD visits

• BLACKS have 33% fewer visits than whites

• Income elasticity is 0.021, 10% increase in income generates a 2.1% increase in visits

Page 15: Section

Negative Binomial

• Interpret results the same was as Poisson• Look at coefficient/standard error on delta• Ho: delta = 0 (Poisson model is correct)• In this case, delta = 5.21 standard error is

0.15, easily reject null.• Var/Mean = 1+delta = 6.21, Poisson is

mis-specificed, should see very small standard errors in the wrong model

Page 16: Section

Selected Results, Count ModelsParameter (Standard Error)

Variable Poisson Negative Binomial

Age65 0.214 (0.026) 0.103 (0.055)

Age70 0.787 (0.026) 0.204 (0.054)

Chronic 0.500 (0.014) 0.509 (0.029)

Excel -0.784 (0.031) -0.527 (0.059)

Ln(Inc). 0.021 (0.007) 0.038 (0.016)