Practical Model Selection and Multi-model Inference using R

34
Practical Model Selection and Multi- model Inference using R Modified from on a presentation by : Eric Stolen and Dan Hunt

description

Practical Model Selection and Multi-model Inference using R. Modified from on a presentation by : Eric Stolen and Dan Hunt. Theory. This is the link with science, which is about understanding how the world works. - PowerPoint PPT Presentation

Transcript of Practical Model Selection and Multi-model Inference using R

Page 1: Practical Model Selection and Multi-model Inference using R

Practical Model Selection and Multi-model Inference

using R

Modified from on a presentation by :

Eric Stolen and Dan Hunt

Page 3: Practical Model Selection and Multi-model Inference using R

Indigo Snake Habitat selectionDavid R. Breininger, M. Rebecca Bolt, Michael L. Legare, John H. Drese, and Eric D. Stolen

Source: Journal of Herpetology, 45(4):484-490. 2011.

– Animal perception– Evolutionary Biology– Population Demography

http://www.seaworld.org/animal-info/animal-bytes/spooky-safari/eastern-indigo-snake.htm

Page 4: Practical Model Selection and Multi-model Inference using R

Hypotheses• To use the Information-theoretic toolbox,

we must be able to state a hypothesis as a statistical model (or more precisely an equation which allows us to calculate the maximum likelihood of the hypothesis)

http://www.seaworld.org/animal-info/animal-bytes/spooky-safari/eastern-indigo-snake.htm

Page 5: Practical Model Selection and Multi-model Inference using R

Multiple Working Hypotheses

• We operate with a set of multiple alternative hypotheses (models)

• The many advantages include safeguarding objectivity, and allowing rigorous inference.

Chamberlain (1890)Strong Inference - Platt (1964)Karl Popper (ca. 1960)– Bold Conjectures

Page 6: Practical Model Selection and Multi-model Inference using R

Deriving the model set

• This is the tough part (but also the creative part) • much thought needed, so don’t rush• collaborate, seek outside advice, read the

literature, go to meetings…• How and When hypotheses are better than What

hypotheses (strive to predict rather than describe)

Page 7: Practical Model Selection and Multi-model Inference using R

Models – Indigo Snake exampleDavid R. Breininger, M. Rebecca Bolt, Michael L. Legare, John H. Drese, and Eric D. Stolen

Source: Journal of Herpetology, 45(4):484-490. 2011.

• Study of indigo snake habitat use• Response variable: home range size ln(ha)• SEX• Land cover – 2-3 levels (lC2)• weeks = effort/exposure• Science question: “Is there a seasonal difference in

habitat use between sexes?”

Page 8: Practical Model Selection and Multi-model Inference using R

Models – Indigo Snake exampleSEXland cover type (lc2)weeksSEX + lc2SEX + weeksllc2 + weeksSEX + lc2 + weeksSEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

http://www.herpnation.com/hn-blog/indigo-snake-survival-demographics/?simple_nav_category=john-c-murphy

Page 9: Practical Model Selection and Multi-model Inference using R

SEXland cover type (lc2)weeksSEX + lc2SEX + weeksllc2 + weeksSEX + lc2 + weeksSEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

Models – Indigo Snake example

Page 10: Practical Model Selection and Multi-model Inference using R

Modeling

• Trade-off between precision and bias• Trying to derive knowledge / advance learning; not

“fit the data”• Relationship between data (quantity and quality) and

sophistication of the model

Page 11: Practical Model Selection and Multi-model Inference using R

Precision-Bias Trade-offB

ias

2

Model Complexity – increasing umber of Parameters

Page 12: Practical Model Selection and Multi-model Inference using R

Precision-Bias Trade-offB

ias

2

varia

nce

Model Complexity – increasing umber of Parameters

Page 13: Practical Model Selection and Multi-model Inference using R

Precision-Bias Trade-offB

ias

2

varia

nce

Model Complexity – increasing umber of Parameters

Page 14: Practical Model Selection and Multi-model Inference using R

Kullback-Leibler Information

• Basic concept from Information theory• The information lost when a model is used to

represent full reality• Can also think of it as the distance between a

model and full reality

Page 15: Practical Model Selection and Multi-model Inference using R

Kullback-Leibler Information

Truth / reality

G1 (best model in set)

G2

G3

Page 16: Practical Model Selection and Multi-model Inference using R

Kullback-Leibler Information

Truth / reality

G1 (best model in set)

G2

G3

Page 17: Practical Model Selection and Multi-model Inference using R

Kullback-Leibler Information

Truth / reality

G1 (best model in set)

G2

G3

Page 18: Practical Model Selection and Multi-model Inference using R

Kullback-Leibler Information

Truth / reality

G1 (best model in set)

G2

G3The relative difference between models is constant

Page 19: Practical Model Selection and Multi-model Inference using R

Akaike’s Contributions

• Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models

• Figured out how to link maximum likelihood estimation theory with expected K-L information

• An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K

Page 20: Practical Model Selection and Multi-model Inference using R

AICci = -2*loge (Likelihood of model i given the data) + 2*K (n/(n-K-1))

or = AIC + 2*K*(K+1)/(n-K-1)

(where K = the number of parameters estimated and n = the sample size)

Page 21: Practical Model Selection and Multi-model Inference using R

AICcmin = AICc for the model with the lowest AICc value

i = AICci– AICcmin

Page 22: Practical Model Selection and Multi-model Inference using R

wi =Prob{gi | data} Model Probability (model probabilities)

evidence ratio of model i to model j = wi / wj

n

r

iiw

1

)5.0exp(

)5.0exp(

Page 23: Practical Model Selection and Multi-model Inference using R

Least Squares Regression

AIC = n loge () + 2*K (n/(n-K-1))

Where RSS / n

Page 24: Practical Model Selection and Multi-model Inference using R

Counting Parameters:

K = number of parameters estimated

Least Square Regression K = number of parameters + 2 (for intercept &

Page 25: Practical Model Selection and Multi-model Inference using R

Counting Parameters:

K = number of parameters estimated

Logistic Regression K = number of parameters + 1 (for intercept

Page 26: Practical Model Selection and Multi-model Inference using R

Comparing Models

Model selection based on AICc :

K AICc Delta_AICc AICcWt Cum.Wt LLmod4 4 112.98 0.00 0.71 0.71 -51.99mod7 5 114.89 1.91 0.27 0.98 -51.67mod1 3 121.52 8.54 0.01 0.99 -57.47mod5 4 122.27 9.29 0.01 1.00 -56.64mod2 3 125.93 12.95 0.00 1.00 -59.67mod6 4 128.34 15.36 0.00 1.00 -59.67mod3 3 141.26 28.28 0.00 1.00 -67.34

Model 1 = “SEX ",Model 2 = "ha.ln ~ lc2",Model 3 = "ha.ln ~ weeks ",Model 4 = "ha.ln ~ SEX + lc2",Model 5 = "ha.ln ~ SEX + weeks",Model 6 = "ha.ln ~ lc2 + weeks",Model 7 = "ha.ln ~ SEX + lc2 + weeks"

Page 27: Practical Model Selection and Multi-model Inference using R

Model Averaging Predictions

R

iiiYwY

1

Page 28: Practical Model Selection and Multi-model Inference using R

R

iiiYwY

1

Model-averaged prediction

Model Averaging Predictions

Page 29: Practical Model Selection and Multi-model Inference using R

R

iiiYwY

1

Prediction from modeli

Model Averaging Predictions

Page 30: Practical Model Selection and Multi-model Inference using R

R

iiiYwY

1

Weight modeli

Model Averaging Predictions

Page 31: Practical Model Selection and Multi-model Inference using R

R

i

iiw1

Model-averaged parameter estimate

Model Averaging Parameters

Page 32: Practical Model Selection and Multi-model Inference using R

Unconditional Variance Estimator

2

1

varvar i

R

iiii gw

Page 33: Practical Model Selection and Multi-model Inference using R

varSE

SECI *96.1%95

Unconditional Variance Estimator

Page 34: Practical Model Selection and Multi-model Inference using R