Test fit logit Lecture

download Test fit logit Lecture

of 45

Transcript of Test fit logit Lecture

  • 7/27/2019 Test fit logit Lecture

    1/45

    Lecture 5: ANOVA andRegression II: Model Selection

    and Model Checkingor, How to choose a model, and then

    find out its wrong

    Bob OHara

  • 7/27/2019 Test fit logit Lecture

    2/45

    Model Selection

    We could fit all effects into a model

    But this would be difficult to understand

    which factors are important?

    Instead, we want to remove the effects

    which are not important, to leave the

    interesting ones

    How do we do this?

  • 7/27/2019 Test fit logit Lecture

    3/45

  • 7/27/2019 Test fit logit Lecture

    4/45

    Whats a Good Model?

    Should fit to the data

    obvious?

    Simple

    easier to understand

    Trade-off between model fit and complexity

    Also: Interpretability

    importance scientifically

    depends on the purpose of the model

  • 7/27/2019 Test fit logit Lecture

    5/45

    Criteria for Comparing Models

    Ftests, from ANOVA table

    test individual effects

    can have problems with order of terms Information Criteria

    AIC, BIC

    Made up of two terms:xIC = Deviance + Complexity

    Deviance = -2xLikelihood = Goodness of Fit

    Complexity - penalises for number of parameters

  • 7/27/2019 Test fit logit Lecture

    6/45

    Information Criteria

    Try to minimise xIC

    Better model fit, lower deviance

    More parameters, higher the penalisation For n observations, p parameters

    AIC = Deviance + 2p

    tends to overestimate number of parametersBIC = Deviance + (ln n)p

    leads to smaller models - perhaps too small?

    can overpenalise factors with many levels

  • 7/27/2019 Test fit logit Lecture

    7/45

  • 7/27/2019 Test fit logit Lecture

    8/45

  • 7/27/2019 Test fit logit Lecture

    9/45

    Selection

    Forward selection

    Start with no factors

    add the best unselected factor until the present

    model is the best

    use AIC, BIC, F-ratios to decide the best

    Backward selection

    Start with all factors in the nodel

    eliminate the worst covariates one by one until

    all remaining covariates are good

    again use AIC etc.

  • 7/27/2019 Test fit logit Lecture

    10/45

    Stepwise Selection

    Start with full model

    Use backward selection

    try and remove a term

    Use forward selection

    try and add a term

    Iterate, trying to remove and add terms

    Stop when the model doesnt change

  • 7/27/2019 Test fit logit Lecture

    11/45

  • 7/27/2019 Test fit logit Lecture

    12/45

  • 7/27/2019 Test fit logit Lecture

    13/45

    Then...

    Do the more automatic stuff

    Stepwise Selection

    F-stats

    If you use the ANOVA table:

    be careful about the order of the effects

    try different orders

    Always keep main effects if you have an

    interaction

    unless you have a good reason not to

  • 7/27/2019 Test fit logit Lecture

    14/45

  • 7/27/2019 Test fit logit Lecture

    15/45

  • 7/27/2019 Test fit logit Lecture

    16/45

    Automatic Model Selection

    Use AIC as a criterion

    Try 2 starting points

    just a constant

    full model (all terms and interactions)

    Can do automatically in R

  • 7/27/2019 Test fit logit Lecture

    17/45

    Starting from Nothing

    Initial AIC (just a constant): -43.66

    Step 1:+ Eth + Age 0 + Sex + Lrn

    -57.1 -44.3 -43.7 -42.5 -41.7

    Add Eth to the model

    Step 2:+ Age 0 + Sex + Lrn -Eth

    -57.4 -57.1 -56.00 -55.1 -43.7

  • 7/27/2019 Test fit logit Lecture

    18/45

    Carry on... Add Age to the model

    Step 3:+ Eth.Age 0 - Age + Lrn + Sex -Eth

    -61.1 -57.4 -57.1 -56.2 -55.9 -44.3

    Add Eth.Age interaction

    Step4:

    0 +Lrn + Sex -Eth.Age

    -61.1 -60.1 -59.6 -57.4

    Stop Here!

  • 7/27/2019 Test fit logit Lecture

    19/45

    Try from different starting points

    Start from a constant in the model

    end with Eth + Age + Eth.Age

    Start from all main effects in the modelAll Main effects + Eth.Age + Sex.Age + Age:Lrn

    Start from full model

    All Main effects + All First Order interactions +Eth.Sex.Lrn + Eth.Age.Lrn

    Last one has lowest AIC

  • 7/27/2019 Test fit logit Lecture

    20/45

  • 7/27/2019 Test fit logit Lecture

    21/45

  • 7/27/2019 Test fit logit Lecture

    22/45

  • 7/27/2019 Test fit logit Lecture

    23/45

  • 7/27/2019 Test fit logit Lecture

    24/45

    A Good Fit

    0 10 20 30 40 50

    20

    40

    60

    80

    100

    x

    y

    20 40 60 80 100

    -10

    -5

    0

    5

    10

    Predicted values

    Residuals

  • 7/27/2019 Test fit logit Lecture

    25/45

    An Outlier

    0 10 20 30 40 50

    20

    40

    60

    80

    100

    1

    20

    140

    x

    y

    20 40 60 80 100

    0

    20

    40

    60

    Predicted values

    Residuals

  • 7/27/2019 Test fit logit Lecture

    26/45

    Curved Relationship

    y=a+bx2+e

    0 10 20 30 40 50

    0

    1000

    2000

    3000

    4

    000

    5000

    x

    y

    0 1000 2000 3000 4000

    -500

    0

    50

    0

    Predicted values

    Residuals

  • 7/27/2019 Test fit logit Lecture

    27/45

  • 7/27/2019 Test fit logit Lecture

    28/45

  • 7/27/2019 Test fit logit Lecture

    29/45

  • 7/27/2019 Test fit logit Lecture

    30/45

  • 7/27/2019 Test fit logit Lecture

    31/45

    The Example (again)

    Weve already found a good model, butdoes it fit?

    Look at some figures...

  • 7/27/2019 Test fit logit Lecture

    32/45

  • 7/27/2019 Test fit logit Lecture

    33/45

  • 7/27/2019 Test fit logit Lecture

    34/45

    Cooks D

    0 50 100 150

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    Obs. number

    Cook'sdistance

    Cook's distance plot

    32

    14

    98

    Female, Aborigine,

    Slow learner, Primary Age.

    Only One.(6 days off, mean 16.4)

  • 7/27/2019 Test fit logit Lecture

    35/45

  • 7/27/2019 Test fit logit Lecture

    36/45

  • 7/27/2019 Test fit logit Lecture

    37/45

  • 7/27/2019 Test fit logit Lecture

    38/45

  • 7/27/2019 Test fit logit Lecture

    39/45

  • 7/27/2019 Test fit logit Lecture

    40/45

  • 7/27/2019 Test fit logit Lecture

    41/45

  • 7/27/2019 Test fit logit Lecture

    42/45

  • 7/27/2019 Test fit logit Lecture

    43/45

  • 7/27/2019 Test fit logit Lecture

    44/45

  • 7/27/2019 Test fit logit Lecture

    45/45