Extending the discrete-time hazard model ALDA, Chapter Twelve

20
Judith D. Singer & John B. Willett Harvard Graduate School of Education Extending the discrete-time hazard model ALDA, Chapter Twelve “Some departure from the norm will occur as time grows more open about it” John Ashbery

description

Extending the discrete-time hazard model ALDA, Chapter Twelve. “Some departure from the norm will occur as time grows more open about it” John Ashbery. Judith D. Singer & John B. Willett Harvard Graduate School of Education. Chapter 12: Extending the discrete-time hazard model. - PowerPoint PPT Presentation

Transcript of Extending the discrete-time hazard model ALDA, Chapter Twelve

Page 1: Extending the discrete-time hazard model ALDA, Chapter Twelve

Judith D. Singer & John B. WillettHarvard Graduate School of

Education

Extending the discrete-time hazard model ALDA, Chapter Twelve

“Some departure from the norm will occur as time grows more open about it”

John Ashbery

Page 2: Extending the discrete-time hazard model ALDA, Chapter Twelve

Chapter 12: Extending the discrete-time hazard model

Alternative specifications for TIME in the discrete-time hazard model (§12.1)—must we always use the TIME indicators or might a more parsimonious representation for TIME be nearly as good?Including time-varying predictors (§12.3)—as in growth modeling, the use of the person-period data set makes them easy to include (although be careful with interpretations)Evaluating the assumptions of the discrete-time hazard model—like all statistical models, these invoke important assumptions that should be examined (and if necessary relaxed):

Linear additivity assumption (§12.4)—must all predictors operate only as “main effects” or can there be interactions?Proportionality assumption (§12.5)—must the effects of all predictors be constant over time?

Page 3: Extending the discrete-time hazard model ALDA, Chapter Twelve

Pros and cons of the dummy specification for the “main effect of TIME”?

(ALDA, Section 12.1, pp 408-409)

][][)( 1111 kkjjj XXDDthlogit

The dummy specification for TIME is:• Completely general, placing no constraints on

the shape of the baseline (logit) hazard function;• Easily interpretable—each associated

parameter represents logit hazard in time period j for the baseline group

• Consistent with life-table estimates

PRO

The dummy specification for TIME is also:• Nothing more than an analytic decision, not a

requirement of the discrete-time hazard model

• Completely lacking in parsimony. If J is large, it requires the inclusion of many unknown parameters;

• A problem when it yields fitted functions that fluctuate erratically across time periods because of nothing more than sampling variation

CON

Three reasons for considering an alternative specificationYour study involves many discrete time periods (because data collection is long or time is less coarsely discretized)Hazard is expected to be near 0 in some time periods (causing convergence problems)Some time periods have small risk sets (because either the initial sample is small or hazard and censoring dramatically diminish the risk set over time)

The variable PERIOD in the person-period data set can be treated as continuous TIME

Page 4: Extending the discrete-time hazard model ALDA, Chapter Twelve

An ordered set of smooth polynomial representations for TIMENot necessarily “the best,” but practically speaking a very good place to start

(ALDA, Section 12.1.1, pp 409-412)

Completely general spec always the “best fitting” model

(lowest Deviance)

Constant spec always the “worst fitting” model

(highest Deviance)Use of ONE facilitates programming Polynomial specifications

As in growth modeling, a systematic set of choices

Choose centering constant “c” to ease interpretation

Because each lower order model is nested within each higher order model, Deviance statistics can be directly compared to help make analytic decisions

The 4th and 5th order polynomials are rarely adopted, but give you a sense of whether you should stick with the completely general specification.

Page 5: Extending the discrete-time hazard model ALDA, Chapter Twelve

Illustrative example: Time to tenure in colleges and universities

Sample: 260 faculty members (who had received a National Academy of Education/Spencer Foundation Post-Doctoral Fellowship)Research design:

Each was tracked for up to 9 years after taking his/her first academic jobBy the end of data collection, n=166 (63.8%) had received tenure; the other 36.2% were censored (because they might eventually receive tenure somewhere).

For simplicity, we won’t include any substantive predictors (although the study itself obviously did)

Data source: Beth Gamse and Dylan Conger (1997) Abt Associates Report

(ALDA, Section 12.1.1 p 412)

Page 6: Extending the discrete-time hazard model ALDA, Chapter Twelve

Examining alternative polynomial specification for TIME:Deviance statistics and fitted logit hazard functions

(ALDA, Section 12.1.1, pp 412-419)

The quadratic looks reasonably good, but can we test whether it’s “good

enough”?

As expected, deviance declines as model

becomes more general

0 1 2 3 4 5 6 7 8 9

Years after hire

0.0

-1.0

-2.0

-3.0

-4.0

-5.0

-6.0

Fitted logit(hazard)

GeneralConstant

Linear

Quadratic

Cubic

Page 7: Extending the discrete-time hazard model ALDA, Chapter Twelve

Testing alternative polynomial specification for TIME:Comparing deviance statistics (and AIC and BIC statistics) across nested models

(ALDA, Section 12.1.1, pp 412-419)

Two comparisons always worth making

Is the added polynomial term necessary?

Is this polynomial as good as the general spec?

Lousy

Better, but not as good as general

As good as general, better than linear

No better than linear

Clear preference for quadratic (although cubic has some appeal)

Page 8: Extending the discrete-time hazard model ALDA, Chapter Twelve

Including time-varying predictors: Age of onset of psychiatric disorder

Sample: 1,393 adults ages 17 to 57 (drawn randomly through a phone survey in metropolitan Toronto)Research design:

Each was ask whether and, if so, at what age (in years) he or she had first experienced a depressive episoden=387 (27.8%) reported a first onset between ages 4 and 39

Time-varying question predictor: PD, first parental divorce n=145 (10.4%) had experienced a parental divorce while still at risk of first depression onsetPD is time-varying, indicating whether the parents of individual i divorced during, or before, time period j.

PDij=0 in periods before the divorce

PDij=1 in periods coincident with or subsequent to the divorce

Additional time-invariant predictors: FEMALE – which we’ll use nowNSIBS (total number of siblings)—which we’ll use in a few minutes

Data source: Blair Wheaton and colleagues (1997) Stress & adversity across the life course

(ALDA, Section 12.3, p 428)

Page 9: Extending the discrete-time hazard model ALDA, Chapter Twelve

Including a time-varying predictor in the person-period data set

(ALDA, Section 12.3, p 428)

ID PERIOD PD FEMALE NSIBS EVENT 40 4 0 1 4 0 40 5 0 1 4 0 40 6 0 1 4 0 40 7 0 1 4 0 40 8 0 1 4 0 40 9 1 1 4 0 40 10 1 1 4 0 40 11 1 1 4 0 40 12 1 1 4 0 40 13 1 1 4 0 40 14 1 1 4 0 40 15 1 1 4 0 40 16 1 1 4 0 40 17 1 1 4 0 40 18 1 1 4 0 40 19 1 1 4 0 40 20 1 1 4 0 40 21 1 1 4 0 40 22 1 1 4 0 40 23 1 1 4 1

ID 40: Reported first depression onset at 23; first parental divorce at age 9

Many periods per person (because annual data from age 4 to respondent’s current age, up to

age 39)

In fact, there are 36,997 records in this PP data set and only 387 events—would we really want to

include 36 TIME dummies?

First depression onset at age 23

PD is time-varying: Her parents

divorced when she was 9

33

221 )18()18()18()( AGEAGEAGEONEthlogit ijoij

Turns out that a cubic function of TIME fits nearly as well as the completely general

specification (2=34.51, 32 df, p>.25) and

measurably better than a quadratic (2=5.83, 1

df, p<.05)

FEMALE and NSIBS are time-invariant

predictors that we’ll soon use

Page 10: Extending the discrete-time hazard model ALDA, Chapter Twelve

Including a time-varying predictor in the discrete-time hazard model

(ALDA, Section 12.3.1, p 428-434)

ijijoij PDAGEAGEAGEONEthlogit 13

32

21 )18()18()18()(

What does 1 tell us ?Contrasts the population logit hazard for people who have experienced a parental divorce with those who have not, But because PDij is time-varying, membership in the parental divorce group changes over time so we’re not always comparing the same peopleThe predictor effectively compares different groups of people at different times!But, we’re still assuming that the effect of the time-varying predictor is constant over time.

0 5 10 15 20 25 30 35 40

Age

-8.00

-7.00

-6.00

-5.00

-4.00

-3.00

-2.00Logit (proportion experiencing event)

PD = 0

PD = 1

Sample logit(proportions) of people experiencing first depression onset at each age, by PD status at that age

Hypothesized population model (note constant effect of PD)

Implicit particular realization of population model (for those whose parents divorce when they’re age 20)

Page 11: Extending the discrete-time hazard model ALDA, Chapter Twelve

Interpreting a fitted DT hazard model that includes a TV predictor

(ALDA, Section 12.3.2, pp 434-440)

FEMALEPD

AGEAGEAGEONEthlogit

ij

ijij

5455.04151.

)18(0002.0)18(0074.0)18(0596.05866.4)(ˆ 32

e0.4151=1.51 Controlling for gender, at every age from 4 to 39, the estimated odds of first depression onset are about 50% higher for

individuals who experienced a concurrent, or previous, parental divorce

e0.5455=1.73 Controlling for parental divorce, the estimated odds of first depression onset are

73% higher for women

What about a woman whose parents divorced when she

was 20?

Page 12: Extending the discrete-time hazard model ALDA, Chapter Twelve

Using time-varying predictors to test competing hypotheses about a predictor’s effect:The long term vs short term effects of parental death on first depression onset

Age

fitted hazard

Parental death treated as a short-term effectOdds of onset are 462% higher in the year a parent dies

Age

fitted hazard

Parental death treated as a long-term effectOdds of onset are 33% higher among people who parents have died

ID PERIOD PDEATH1 PDEATH2 40 4 0 040 5 0 040 6 0 040 7 0 040 8 0 040 9 1 140 10 1 040 11 1 040 12 1 040 13 1 040 14 1 040 15 1 040 16 1 040 17 1 040 18 1 040 19 1 040 20 1 040 21 1 040 22 1 040 23 1 0

PDEATH1 is the long term effect

PDEATH2 is the short term effect

Page 13: Extending the discrete-time hazard model ALDA, Chapter Twelve

The linear additivity assumption: Uncovering violations and simple solutions

(ALDA, Section 12.4, pp 443)

Linear additivity assumptionUnit differences in a predictor—time-

invariant or time-varying—correspond to fixed differences in logit-hazard.

Data source: Nina Martin & Margaret Keiley (2002)

Sample: 1,553 adolescents (n=887, 57.1% had been abused as children)Research design:

Incarceration history from age 8 to 18n=342 (22.0.8%) had been arrested.

RQs:What’s the effect of abuse on the risk of arrest?What’s the effect of race?Does the effect of abuse differ by race (or conversely, does the effect of race differ by abuse status)?

Non-linear effects of substantive predictors

Interactions among substantive predictors

Page 14: Extending the discrete-time hazard model ALDA, Chapter Twelve

Evidence of an interaction between ABUSE and RACE

(ALDA, Section 12.4.1, pp 444-447)

7 8 9 10 11 12 13 14 15 16 17 18 19Age

-7.0

-6.0

-5.0

-4.0

-3.0

-2.0Sample logit(hazard)

7 8 9 10 11 12 13 14 15 16 17 18 19Age

-7.0

-6.0

-5.0

-4.0

-3.0

-2.0Sample logit(hazard)

White Black

Not abused Not abused

Abused

Abused

What is the shape of the logit hazard functions?

For all groups, Risk of 1st arrest is low during

childhood, accelerates during the teen years, and

peaks between 14-17

How does the level differ across groups?

While abused children appear to be consistently at greater

risk of 1st arrest, but the differential is especially

pronounced among Blacks

As in regular regression, when the effect of one predictor differs by the levels of another, we need to include a statistical interaction

Page 15: Extending the discrete-time hazard model ALDA, Chapter Twelve

Interpreting the interaction between ABUSE and RACE

(ALDA, Section 12.4.1, pp 444-447)

jjjj

j

BLACKABUSEDBLACKABUSED

DDthlogit

*4787.02455.03600.

][)(ˆ 181888

7 8 9 10 11 12 13 14 15 16 17 18 19Age

-8.0

-7.0

-6.0

-5.0

-4.0

-3.0

-2.0Fitted logit(hazard)

Not abused

Abused

B

BW

W

Estimated odds ratios for the 4 possible prototypical individuals

In comparison to a White child who had not been abused, the odds of 1st arrest are:

28% higher for Blacks who had not been abused (note: this is not stat sig.)

43% higher for Whites who had been abused (this is stat sig.)

Nearly 3 times higher for Blacks who had been abused.

This is not the only way to violate the linear additivity assumption…

Page 16: Extending the discrete-time hazard model ALDA, Chapter Twelve

Checking the linear additivity assumption: Is the effect of NSIBS on depression onset linear?

(ALDA, Section 12.4.2, pp 447-451)

Use all your usual strategies for checking non-linearity:transform the predictors, use polynomials, re-bin the predictor, ….

6.31 (4) ns

All models include a cubic effect of TIME, and the main effects of FEMALE and PD

Page 17: Extending the discrete-time hazard model ALDA, Chapter Twelve

The proportionality assumption:Is a predictor’s effect constant over time or might it vary?

(ALDA, Section 12.5.1, pp 451-456)

0 1 2 3 4 5 6 7 8

Time period

-5.00

-4.00

-3.00

-2.00

-1.00

Logit hazard

Predictor’s effect is constant over time

0 1 2 3 4 5 6 7 8

Time period

-5.00

-4.00

-3.00

-2.00

-1.00

Logit hazard

Predictor’s effect

increases over time

0 1 2 3 4 5 6 7 8

Time period

-5.00

-4.00

-3.00

-2.00

-1.00

Logit hazard

Predictor’s effect decreases over time

0 1 2 3 4 5 6 7 8

Time period

-5.00

-4.00

-3.00

-2.00

-1.00

Logit hazard

Predictor’s effect is particularly pronounced in

certain time periods

Page 18: Extending the discrete-time hazard model ALDA, Chapter Twelve

Discrete-time hazard models that do not invoke the proportionality assumption

(ALDA, Section 12.5.1, pp 454-456)

][][)(ˆ 11111 JJJJJj DXDXDDthlogit

A completely general representation:

The predictor has a unique effect in each period

on... soand ,Xthlogit :2 period time In

Xthlogit :1 period time In

j

j

122

111

)(ˆ)(ˆ

)(][)(ˆ 121111 cTIMEXXDDthlogit JJj A more parsimonious representation:

The predictor’s effect changes linearly with time 1 assesses the effect of

X1 in time period c

2 describes how this effect linearly increases (if positive) or decreases

(if negative)

LATEXXDDthlogit JJj 121111 ][)(ˆ Another parsimonious representation:

The predictor’s effect differs across epochs 2 assesses the additional effect of X1

during those time periods declared to be “later” in time

Page 19: Extending the discrete-time hazard model ALDA, Chapter Twelve

The proportionality assumption: Uncovering violations and simple solutions

(ALDA, Section 12.4, pp 443)

Data source: Suzanne Graham (1997) dissertationSample: 3,790 high school students who participated in the Longitudinal Survey of American Youth (LSAY)Research design:

Tracked from 10th grade through 3rd semester of college—a total of 5 periodsOnly n=132 (3.5%) took a math class for all of the 5 periods!

RQs:When are students most at risk of dropping out of math?What’s the effect of gender?Does the gender differential vary over time?

HS 11HS 12 C 1 C 2 C 3

Term

0.0

-1.0

-2.0

Sample logit(hazard)

Risk of dropping out zig-zags over time—peaks at 12th and 2nd semester of college

Magnitude of the gender differential varies over time—smallest in 11th grade and increases over time

Suggests that the proportionality assumption is being violated

Page 20: Extending the discrete-time hazard model ALDA, Chapter Twelve

Checking the proportionality assumption: Is the effect of FEMALE constant over time?

(ALDA, Section 12.5.2, pp 456-460)

All models include a completely general specification for TIME using 5 time dummies: HS11, HS12, COLL1, COLL2, and COLL3

Model C: Interaction between FEMALE and time

HS 11HS 12 C 1 C 2 C 3

Term

0.0

-1.0

-2.0

Fitted logit(hazard)

8.04 (4) ns

6.50 (1) p=0.0108