Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

31
Modeling Big Count Data An IRLS framework for COM-Poisson regression and GAM Suneel Chatla Galit Shmueli November 12, 2016 Institute of Service Science National Tsing Hua University, Taiwan (R.O.C)

Transcript of Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Page 1: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Modeling Big Count DataAn IRLS framework for COM-Poisson regression and GAM

Suneel ChatlaGalit ShmueliNovember 12, 2016

Institute of Service ScienceNational Tsing Hua University, Taiwan (R.O.C)

Page 2: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Table of contents

1. Speed Dating Experiment- Count data models

2. Motivation

3. An IRLS framework

4. Simulation Study-Comparison of IRLS with MLE

5. A CMP Generalized Additive Model

6. Results & Conclusions

1

Page 3: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Speed Dating Experiment- Countdata models

Page 4: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Speed dating experiment

Fisman et al. (2006) conducted a speed dating experiment toevaluate the gender differences in mate selection 1.

Total sessions 14Decision 1 or 0

Attractiveness 1-10Intelligence 1-10Ambition 1-10

......

Control variables

1https://www.kaggle.com/annavictoria/speed-dating-experiment

2

Page 5: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Outcome/Count variables

Matches : When both persons decide YesTot.Yes : Total number of Yes for each subject in a particular session

3

Page 6: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Summary Statistics

Statistic N Mean St. Dev. Min Maxmatches 531 2.524 2.304 0 14Tot.Yes 531 6.433 4.361 0 21

Tot.partner 531 15.311 4.967 5 22age 531 26.303 3.735 18 55perc.samerace 531 0.391 0.242 0.000 0.833avg.intcor 531 0.190 0.167 βˆ’0.298 0.569attr 531 6.195 1.122 1.818 10.000sinc 531 7.205 1.108 2.773 10.000intel 531 7.381 0.988 3.409 10.000func 531 6.438 1.103 2.682 10.000amb 531 6.812 1.133 3.091 10.000shar 531 5.511 1.333 1.409 10.000like 531 6.157 1.072 1.682 10.000prob 531 5.234 1.525 0.778 10.000mean.agep 531 26.314 1.674 20.444 31.667attr_o 531 6.200 1.186 2.333 8.688sinc_o 531 7.224 0.690 4.167 9.000intel_o 531 7.410 0.614 4.875 9.150fun_o 531 6.438 1.015 2.625 8.615amb_o 531 6.827 0.756 4.600 8.842shar_o 531 5.498 0.942 1.375 7.700like_o 531 6.161 0.873 2.333 8.300prob_o 531 5.256 0.736 3.200 7.200Tot.part.Yes 531 6.420 4.128 0 20 4

Page 7: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Tools:

β€’ Poisson Regressionβ€’ Negative Binomial Regressionβ€’ Conway-Maxwell Poisson (CMP) Regression

5

Page 8: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

The CMP distribution

From Shmueli et al. (2005),

Y ∼ CMP(λ, ν)

implies

P(Y = y) = Ξ»y

(y!)Ξ½Z(Ξ», Ξ½) , y = 0, 1, 2, . . .

Z(Ξ», Ξ½) =βˆžβˆ‘s=0

Ξ»s

(s!)Ξ½

for Ξ» > 0, Ξ½ β‰₯ 0.

The CMP distribution includes three well-known distributions asspecial cases:

β€’ Poisson (Ξ½ = 1),β€’ Geometric (Ξ½ = 0, Ξ» < 1),β€’ Bernoulli (Ξ½ β†’ ∞ with probability Ξ»

1+Ξ» ).6

Page 9: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

CMP distribution for different (Ξ», Ξ½) combinations

Ξ»=2,Ξ½=0.5

Den

sity

0 5 10 15

0.00

0.05

0.10

0.15

Ξ»=2,Ξ½=0.75

0 2 4 6 8 10 12

0.00

0.10

0.20

Ξ»=2,Ξ½=1

0 2 4 6 8

0.0

0.2

0.4

Ξ»=2,Ξ½=3

0 1 2 3 4

0.0

1.0

2.0

Ξ»=8,Ξ½=0.5

Den

sity

40 60 80 100

0.00

00.

015

0.03

0

Ξ»=8,Ξ½=0.75

5 10 15 20 25 30 35

0.00

0.04

0.08

Ξ»=8,Ξ½=1

0 5 10 15 20

0.00

0.06

0.12

Ξ»=8,Ξ½=3

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

Ξ»=15,Ξ½=0.5

Den

sity

150 200 250 300

0.00

00.

010

Ξ»=15,Ξ½=0.75

20 30 40 50 60

0.00

0.02

0.04

Ξ»=15,Ξ½=1

5 10 15 20 25 30

0.00

0.04

0.08

Ξ»=15,Ξ½=3

0 1 2 3 4 5 6

0.0

0.4

0.8

7

Page 10: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

CMP Regression

CMP regression models can be formulated as follows:

log(Ξ») = XΞ² (1)log(Ξ½) = ZΞ³ (2)

Maximizing the log-likelihood w.r.t the parameters Ξ² and Ξ³ will yieldthe following normal equations Sellers and Shmueli (2010):

U =βˆ‚logLβˆ‚Ξ²

= XT(yβˆ’ E(y)) (3)

V =βˆ‚logLβˆ‚Ξ³

= Ξ½ZT(βˆ’log(y!) + E(log(y!))) (4)

8

Page 11: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Motivation

Page 12: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Exploration of Speed Dating data

●

●

●

●

● ●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●●

●

● ●

●

● ●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●●

●

●●

●

● ●

● ●

●

●

●● ●

●

●

●● ●

●● ●

●

●

● ● ●

●

●

● ●

●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

● ●●●

●

●

●●

●

●

●

●● ●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●● ● ●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

4 5 6 7 8 9

βˆ’2

βˆ’1

01

23

Sincerity (Others)

Tot.Y

es (

log)

●

●

●

●

● ●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

● ●

●

● ●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●●

●

●●

●

●●

●●

●

●

●● ●

●

●

●●●

●●●

●

●

● ●●

●

●

● ●

●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●●● ●

●

●

● ●

●

●

●

●● ●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●●● ●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

5 6 7 8 9

βˆ’2

βˆ’1

01

23

Intelligence (Others)

Tot.Y

es (

log)

●

●

●

●

● ●

●

● ●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●●

●

● ●

●

● ●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●● ●●

●

●●

●

●●

● ●

●

●

●●●

●

●

●●●

●● ●

●

●

●● ●

●

●

● ●

●

●

● ●

●

●

●●

●

●●

●●

●

●

●

●

● ●● ●

●

●

● ●

●

●

●

●● ●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

4 6 8 10

βˆ’2

βˆ’1

01

23

Sincerity

Tot.Y

es (

log)

●

●

●

●

●●

●

● ●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●●

●

● ●

●

● ●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●● ●●

●

●●

●

●●

● ●

●

●

●●●

●

●

●●●

●● ●

●

●

●● ●

●

●

●●

●

●

● ●

●

●

●●

●

●●

●●

●

●

●

●

● ●● ●

●

●

● ●

●

●

●

●● ●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●● ●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

4 6 8 10

βˆ’2

βˆ’1

01

23

Fun seeking

Tot.Y

es (

log)

9

Page 13: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

More flexibility?

Generalized Additive Models

β€’ Smoothing Splinesβ€’ Penalized Splines

Both implementations are dependent upon the Iterative ReweightedLeast Squares (IRLS) estimation framework.

At present, there is no IRLS framework available for CMP !!

10

Page 14: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

An IRLS framework

Page 15: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Update for each iteration

I[Ξ²

Ξ³

](m)

= I[Ξ²

Ξ³

](mβˆ’1)

+

[UV

]

which implies the following equations

XTΞ£yXΞ²(m) βˆ’ XTΞ£y,log(y!)Ξ½ZΞ³(m) = XTΞ£yXΞ²(mβˆ’1) βˆ’XTΞ£y,log(y!)Ξ½ZΞ³(mβˆ’1) + XT(yβˆ’ E(y))

and

βˆ’ Ξ½ZTΞ£y,log(y!)XΞ²(m) + Ξ½2ZTΞ£log(y!)ZΞ³(m) = βˆ’Ξ½ZTΞ£y,log(y!)XΞ²(mβˆ’1) +

Ξ½2ZTΞ£log(y!)ZΞ³(mβˆ’1) +

Ξ½ZT(βˆ’log(y!) + E(log(y!)))

11

Page 16: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

For the fixed values of both Ξ² and Ξ³ the equations

XTΞ£yXΞ²(m) = XTΞ£yXΞ²(mβˆ’1) + XT(yβˆ’ E(y)) (5)

Ξ½2ZTΞ£log(y!)ZΞ³(m) = Ξ½2ZTΞ£log(y!)ZΞ³(mβˆ’1) + Ξ½ZT(βˆ’log(y!) + E(log(y!))).(6)

12

Page 17: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Algorithm

https://arxiv.org/abs/1610.08244

13

Page 18: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Practical issues

Initial Values

β€’ For Ξ» = (y+ 0.1)Ξ½

β€’ For Ξ½ = 0.2

Calculation of Cumulants

β€’ Bounding error 10βˆ’8 or 10βˆ’10

β€’ Asymptotic expressions

Stopping Criterion

β€’ Based on βˆ’2βˆ‘l(yi; Ξ»Μ‚i, Ξ½Μ‚i)

Step size

β€’ Step halving

14

Page 19: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Simulation Study-Comparison ofIRLS with MLE

Page 20: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Study design

We compare our IRLS algorithm with the existing implementationwhich is based on maximizing the likelihood function (through optimin R).

(a) Set sample size n = 100(b) Generate x1 ∼ U(0, 1) and x2 ∼ N(0, 1)(c) Calculate x3 = 0.2x1 + U(0, 0.3) and x4 = 0.3x2 + N(0, 0.1) (to

create correlated variables)(d) Generate

y ∼ CMP(log(Ξ») = 0.05+ 0.5x1 βˆ’ 0.5x2 + 0.25x3 βˆ’ 0.25x4, Ξ½)where Ξ½ = {0.5, 2, 5}

15

Page 21: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Results

●

●●●

IR MLE IR MLE IR MLE

βˆ’0.

50.

00.

51.

01.

5

x1

● ●

●

●

●

●

●

●

IR MLE IR MLE IR MLE

βˆ’2.

0βˆ’

1.5

βˆ’1.

0βˆ’

0.5

0.0

0.5

x2

●

●

●

IR MLE IR MLE IR MLE

βˆ’4

βˆ’2

02

46

x3

●

●

●

●

●

●

●●

●●

IR MLE IR MLE IR MLE

βˆ’4

βˆ’2

02

4

x4

●

●

●

IR MLE IR MLE IR MLE

βˆ’2

βˆ’1

01

23

4

log(Ξ½)

Ξ½=0.5Ξ½=2Ξ½=5

16

Page 22: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

A CMP Generalized AdditiveModel

Page 23: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Additive Model

log(Ξ») = Ξ±+

pβˆ‘j=1

fj(Xj)

log(Ξ½) = ZΞ³

where fj (j = 1, 2, . . . ,p) are the smooth functions for the p variables.

17

Page 24: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Backfitting

Based on Hastie and Tibshirani (1990); Wood (2006), the algorithm asfollows

1. Initialize: fj = f(0)j , j = 1, . . . ,p2. Cycle: j = 1, . . . ,p, 1, . . . ,p, . . .

fj = Sj(yβˆ’

βˆ‘kΜΈ=j

fk|xj)

3. Continue (2) until the individual functions don’t change.

One more nested loop inside theIRLS framework !

18

Page 25: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Results & Conclusions

Page 26: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Comparison of Regression models on Tot.Yes

Poisson Negative Binomial CMP(Intercept) 0.49 0.59 0.14

(0.43) (0.55) (0.33)GenderMale 0.05 0.05 0.03

(0.04) (0.06) (0.03)age βˆ’0.01 βˆ’0.01 βˆ’0.004

(0.01) (0.01) (0.004)Tot.partner 0.07βˆ—βˆ—βˆ— 0.07βˆ—βˆ—βˆ— 0.04βˆ—βˆ—βˆ—

(0.00) (0.01) (0.003)avg.intcor βˆ’0.04 βˆ’0.04 βˆ’0.02

(0.11) (0.15) (0.09)attr 0.19βˆ—βˆ—βˆ— 0.18βˆ—βˆ—βˆ— 0.11βˆ—βˆ—βˆ—

(0.03) (0.04) (0.02)sinc βˆ’0.06 βˆ’0.05 βˆ’0.04

(0.03) (0.04) (0.02)intel 0.05 0.06 0.03

(0.04) (0.05) (0.03)func 0.03 0.04 0.02

(0.04) (0.05) (0.03)amb βˆ’0.12βˆ—βˆ—βˆ— βˆ’0.13βˆ—βˆ— βˆ’0.07βˆ—βˆ—

(0.03) (0.04) (0.02)shar 0.10βˆ—βˆ—βˆ— 0.10βˆ—βˆ—βˆ— 0.06βˆ—βˆ—βˆ—

(0.02) (0.03) (0.02)mean.agep βˆ’0.01 βˆ’0.01 βˆ’0.007

(0.01) (0.02) (0.009)attr_o βˆ’0.10βˆ—βˆ—βˆ— βˆ’0.10βˆ—βˆ—βˆ— βˆ’0.06βˆ—βˆ—βˆ—

(0.02) (0.03) (0.02)sinc_o 0.02 0.02 0.01

(0.04) (0.05) (0.03)intel_o 0.08 0.08 0.05

(0.05) (0.07) (0.04)fun_o βˆ’0.01 βˆ’0.01 βˆ’0.003

(0.03) (0.04) (0.02)amb_o βˆ’0.00 βˆ’0.01 0.0005

(0.04) (0.05) (0.03)shar_o 0.02 0.03 0.01

(0.03) (0.04) (0.02)Ξ½ 0.53βˆ—βˆ—βˆ—AIC 2844.92 2777.24 2751.7BIC 3011.64 2948.23 2922.66Log Likelihood -1383.46 -1348.62 -1335.33Deviance 970.04 637.25Num. obs. 531 531 531βˆ—βˆ—βˆ—p < 0.001, βˆ—βˆ—p < 0.01, βˆ—p < 0.05

19

Page 27: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Comparison of Additive Models on Tot.Yes

Dependent variable:Tot.Yes

CMP(Chi.Sq) Poisson(Chi.Sq)s(sinc) 7.16 11.53βˆ—βˆ—s(func) 7.51 11.40βˆ—βˆ—s(sinc_o) 13.96βˆ—βˆ— 29.30βˆ—βˆ—βˆ—s(intel_o) 14.06βˆ—βˆ— 13.26βˆ—βˆ—βˆ—

Ξ½ 0.56AIC 2737.03 2804.77

Note: βˆ—p<0.1; βˆ—βˆ—p<0.05; βˆ—βˆ—βˆ—p<0.01

It’s more about the behavior of opposite person that guide us toselect her/him.

20

Page 28: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Summary

β€’ The IRLS framework is far more efficient than the existinglikelihood based method and provides more flexibility.

β€’ Since CMP is computationally heavier than the other GLMs wecould parallelize some matrix computations inorder to increasethe speed.

β€’ The IRLS framework allows CMP to have other modelingextensions such as LASSO etc.

Full paper available from https://arxiv.org/abs/1610.08244and the source code is available fromhttps://github.com/SuneelChatla/cmp

21

Page 29: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Suggestions and1. 1.1

Questions?

21

Page 30: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

References

Fisman, R., Iyengar, S. S., Kamenica, E., and Simonson, I. (2006).Gender differences in mate selection: Evidence from a speeddating experiment. The Quarterly Journal of Economics, pages673–697.

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models,volume 43. CRC Press.

Sellers, K. F. and Shmueli, G. (2010). A flexible regression model forcount data. Annals of Applied Statistics, 4(2):943–961.

Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., and Boatwright, P.(2005). A useful distribution for fitting discrete data: revival of theconway–maxwell–poisson distribution. Journal of the RoyalStatistical Society: Series C (Applied Statistics), 54(1):127–142.

Page 31: Modeling Big Count Data: An IRLS Framework for COM-Poisson Regression and GAM

Wood, S. (2006). Generalized additive models: an introduction with R.CRC press.