IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997...

87
Statistical Analysis in the Lexis Diagram: Age-Period-Cohort models Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/ ~ bxc/ University of Lisboa, 19–21 September www.bendixcarstensen.com/APC/Lisboa.2011 Introduction Monday 19th, morning Bendix Carstensen Age-Period-Cohort models 19–21 September University of Lisboa, www.bendixcarstensen.com/APC/Lisboa.2011 Welcome Purpose of the course: Knowledge about APC-models Technincal knowledge of handling them insight in the basic survival concepts Remedies of the course: Lectures with handouts (BxC) Practicals with suggested solutions (BxC + EG) Introduction (intro) 1/ 238

Transcript of IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997...

Page 1: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Statistical Analysis in theLexis Diagram:

Age-Period-Cohort models

Bendix Carstensen Steno Diabetes Center, Gentofte, Denmarkhttp://staff.pubhealth.ku.dk/~bxc/

University of Lisboa,19–21 September

www.bendixcarstensen.com/APC/Lisboa.2011

IntroductionMonday 19th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Welcome� Purpose of the course:

� Knowledge about APC-models� Technincal knowledge of handling them� insight in the basic survival concepts

� Remedies of the course:� Lectures with handouts (BxC)� Practicals with suggested solutions (BxC + EG)

Introduction (intro) 1/ 238

Page 2: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Scope of the course

� Rates as observed in populations — diseaseregisters for example.

� Understanding of survival analysis (statisticalanalysis of rates) is needed — that is thecontent of much of the first day.

� Besides the concepts the practicalunderstanding of the actual computations (inR) are emphasized.

� There is a section in the practicals:“Probability concepts in follow-up studies”

Introduction (intro) 2/ 238

About the lectures

� Please interrupt:Most likely I did a mistake or left out a crucialargument.

� The handpout are not perfect — pleasecomment on them, prospective students wouldbenefit from it.

� There is a time-schedule in the practicals.I might need revision as we go.

Introduction (intro) 3/ 238

About the practicals

� You should use you preferred R-enviroment.

� Epi-package for R is needed.

� Data are all on the net; but we also have themon USB-sticks.

� Try to make a text version of the answers tothe exercises — it is more rewarding than justlooking at output. The latter is soon forgotten.

� (An opportunity to learn R-weave?)

� A minor bug in apc.fit, apc.plot andapc.frame is fixed in Epi 1.0.11

Introduction (intro) 4/ 238

Page 3: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Rates and SurvivalMonday 19th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Survival data

Persons enter the study at some date.

Persons exit at a later date, either dead or alive.

Observation:Actual time span to death (“event”)or

Some time alive (“at least this long”)

Rates and Survival (surv-rate) 5/ 238

Examples of time-to-event measurements

� Time from diagnosis of cancer to death.

� Time from randomisation to death in a cancerclinical trial

� Time from HIV infection to AIDS.

� Time from marriage to 1st child birth.

� Time from marriage to divorce.

� Time to re-offending after being released fromjail

Rates and Survival (surv-rate) 6/ 238

Page 4: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Each line aperson

Each blob adeath

Study endedat 31 Dec.2003

Calendar time

● ●●

● ●

●●

●●

● ●

●●● ●●

1993 1995 1997 1999 2001 2003

Rates and Survival (surv-rate) 7/ 238

Ordered bydate of entry

Most likelythe order inyourdatabase.

Calendar time

●●●

●● ●●

●● ●●

●●

●● ●

1993 1995 1997 1999 2001 2003

Rates and Survival (surv-rate) 8/ 238

Timescalechanged to“Time sincediagnosis”.

Time since diagnosis

●●●

●● ●●

●● ●●

●●

●● ●

0 2 4 6 8 10

Rates and Survival (surv-rate) 9/ 238

Page 5: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Patientsordered bysurvivaltime.

Time since diagnosis

●●

●●●

●●●●

●●

●●●●

●●●●

0 2 4 6 8 10

Rates and Survival (surv-rate) 10/ 238

Survivaltimesgrouped intobands ofsurvival.

Year of follow−up

●●

●●●

●●●●

●●

●●●●●

●●●●

1 2 3 4 5 6 7 8 9 10

Rates and Survival (surv-rate) 11/ 238

Patientsordered bysurvivalstatus withineach band.

Year of follow−up

●●●

●●

●●●

●●●●●●●

●●●●●●●

●●●●●

1 2 3 4 5 6 7 8 9 10

Rates and Survival (surv-rate) 12/ 238

Page 6: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Survival after Cervix cancer

Stage I Stage II

Year N D L N D L

1 110 5 5 234 24 32 100 7 7 207 27 113 86 7 7 169 31 94 72 3 8 129 17 75 61 0 7 105 7 136 54 2 10 85 6 67 42 3 6 73 5 68 33 0 5 62 3 109 28 0 4 49 2 1310 24 1 8 34 4 6

Estimated risk in year 1 for Stage I women is 5/107.5 = 0.0465

Estimated 1 year survival is 1− 0.0465 = 0.9535

Life-table estimator.Rates and Survival (surv-rate) 13/ 238

Survival function

Persons enter at time 0:Date of birthDate of randomizationDate of diagnosis.

How long do they survive, survival time T— a stochastic variable.

Distribution is characterized by the survival function:

S(t) = P {survival at least till t}= P {T > t} = 1− P {T ≤ t} = 1− F (t)

Rates and Survival (surv-rate) 14/ 238

Intensity or rate

λ(t) = P {event in (t, t+ h] | alive at t} /h

=F (t+ h)− F (t)

S(t)× h

= − S(t+ h)− S(t)

S(t)h−→h→0

− dlogS(t)

dt

This is the intensity or hazard function for thedistribution. Characterizes the survival distributionas does f or F .

Theoretical counterpart of a rate.

Rates and Survival (surv-rate) 15/ 238

Page 7: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Relationships

− dlogS(t)

dt= λ(t)

�S(t) = exp

(−∫ t

0

λ(u) du

)= exp (−Λ(t))

Λ(t) =∫ t

0 λ(s) ds is called the integrated intensity.Not an intensity — it is dimensionless.

λ(t) = − dlog(S(t))

dt= −S ′(t)

S(t)=

F ′(t)1− F (t)

=f(t)

S(t)

Rates and Survival (surv-rate) 16/ 238

Rate and survival

S(t) = exp

(−∫ t

0

λ(s) ds

)λ(t) = −S ′(t)

S(t)

Survival is a cumulative measure, the rate is aninstantaneous measure.

Note: A cumulative measure requires an origin!

Rates and Survival (surv-rate) 17/ 238

Observed survival and rate

� Survival studies: Observation of (rightcensored) survival time:

X = min(T, Z), δ = 1{X = T}— sometimes conditional on T > t0(left truncated).

� Epidemiological studies:Observation of (components of) a rate:

D/Y

D: no. events, Y no of person-years, in aprespecified time-frame.

Rates and Survival (surv-rate) 18/ 238

Page 8: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Empirical rates for individuals

At the individual level we introduce theempirical rate: (d, y),— no. of events (d ∈ {0, 1}) during y risk time.

Each person contributes several obs. of (d, y).

Empirical rates are responses in survival analysis.

The timescale is a covariate — varies acrossempirical rates from one individual:Age, calendar time, time since diagnosis.

Don’t confuse timescale with y — differencebetween two points on any timescale we maychoose.

Rates and Survival (surv-rate) 19/ 238

Empiricalrates bycalendartime.

Calendar time

●●●

●● ●●

●● ●●

●●

●● ●

1993 1995 1997 1999 2001 2003

Rates and Survival (surv-rate) 20/ 238

Empiricalrates bytime sincediagnosis.

Time since diagnosis

●●●

●● ●●

●● ●●

●●

●● ●

0 2 4 6 8 10

Rates and Survival (surv-rate) 21/ 238

Page 9: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Two timescales

Note that we actually have two timescales:

� Time since diagnosis (i.e. since entry into thestudy)

� Calendar time.

These can be shown simultaneously in a Lexisdiagram.

Rates and Survival (surv-rate) 22/ 238

Follow-up bycalendartime andtime sincediagnosis:

A Lexisdiagram!

1994 1996 1998 2000 2002 2004

02

46

810

12

Calendar time

Tim

e si

nce

diag

nosi

s

●●

Rates and Survival (surv-rate) 23/ 238

Empiricalrates bycalendartime andtime sincediagnosis

1994 1996 1998 2000 2002 2004

02

46

810

12

Calendar time

Tim

e si

nce

diag

nosi

s

●●

Rates and Survival (surv-rate) 24/ 238

Page 10: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Likelihood for ratesMonday 19th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Likelihood from one person

The likelihood from several empirical rates from oneindividual is a product of conditional probabilities:

P {event in (t3, t4)} = P {event in (t3, t4)| alive at t3}×P {survive (t2, t3)| alive at t2}×P {survive (t1, t2)| alive at t1}×P {survive (t0, t1)| alive at t0}

Log-likelihood from one individual is a sum of terms.

Each term refers to one empirical rate (d, y)— y = ti − ti−1 and mostly d = 0.

Likelihood for rates (likelihood) 25/ 238

Likelihood for an empirical rate

Model: the rate is constant in the interval we arelooking at. The interval should sufficiently small forthis assumption to be reasonable.

If π = 1− e−λy is the death probability:

L(λ) = P {d events during y time } = πd(1− π)1−d

= (1− e−λy)d(e−λy)1−d

=

(1− e−λy

e−λy

)d

(e−λy) ≈ (λy)de−λy

since the first term is equal to e−λy − 1 ≈ λy.

Likelihood for rates (likelihood) 26/ 238

Page 11: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Log-likelihood:

l(λ) = d log(λy)− λy = d log(λ) + d log(y)− λy

The term d log(y) does not include λ, so therelevant part of the log-likelihood is:

l(λ) = d log(λ)− λy

Likelihood for rates (likelihood) 27/ 238

Likelihood

Probability of the data and the parameter:

Assuming the rate (intensity) is constant, λ, theprobability of observing 7 deaths in the course of500 person-years:

P {D = 7, Y = 500|λ} = λDeλY ×K

= λ7eλ500 ×K

= L(λ|data)Best guess of λ is where this function is as large aspossible.

Confidence interval is where it is not too far fromthe maximum

Likelihood for rates (likelihood) 28/ 238

Likelihood function

0.00 0.01 0.02 0.03 0.04 0.05

0e+00

2e−17

4e−17

6e−17

8e−17

Rate parameter, λλ

Like

lihoo

d

Likelihood for rates (likelihood) 29/ 238

Page 12: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Likelihood-ratio function

0.00 0.01 0.02 0.03 0.04 0.05

Rate parameter, λλ

Like

lihoo

d ra

tio

0.00

0.25

0.50

0.75

1.00

Likelihood for rates (likelihood) 30/ 238

Log-likelihood ratio

0.00 0.01 0.02 0.03 0.04 0.05

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Rate parameter, λλ

Log−

likel

ihoo

d ra

tio

Likelihood for rates (likelihood) 31/ 238

Log-likelihood ratio, θ = log(λ)

−7 −6 −5 −4 −3

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

log−Rate parameter, θθ

Log−

likel

ihoo

d ra

tio

Likelihood for rates (likelihood) 32/ 238

Page 13: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Log-likelihood ratio

0.5 1.0 2.0 5.0 10.0 20.0 50.0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Rate parameter, λλ (per 1000)

Log−

likel

ihoo

d ra

tio

Likelihood for rates (likelihood) 33/ 238

Log-likelihood ratio

0.5 1.0 2.0 5.0 10.0 20.0 50.0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Rate parameter, λλ (per 1000)

Log−

likel

ihoo

d ra

tio

Likelihood for rates (likelihood) 34/ 238

Log-likelihood ratio

0.5 1.0 2.0 5.0 10.0 20.0 50.0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Rate parameter, λλ (per 1000)

Log−

likel

ihoo

d ra

tio

Likelihood for rates (likelihood) 35/ 238

Page 14: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Log-likelihood ratio

0.5 1.0 2.0 5.0 10.0 20.0 50.0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

Rate parameter, λλ (per 1000)

Log−

likel

ihoo

d ra

tio

λ = 7/500 = 14 λ×÷ exp(1.96/

√7) = (6.7, 29.4)

Likelihood for rates (likelihood) 36/ 238

Poisson likelihood

The contributions from one individual:

dtlog(λ(t))− λ(t)yt, t = 1, . . . , T

is like the log-likelihood from several independentPoisson observations with mean λ(t)yt, i.e.log-mean log(λ(t)) + log(yt)

Analysis of the rates, (λ) can be based on a Poissonmodel with log-link applied to empirical rates where:

� d is the response variable.

� log(y) is the offset variable.

Likelihood for rates (likelihood) 37/ 238

Likelihood for follow-up of many subjects

Adding empirical rates over the follow-up of persons:

D =∑

d Y =∑

y ⇒ Dlog(λ)− λY

� Persons are assumed independent

� Contribution from the same person areconditionally independent, hence give separatecontributions to the log-likelihood.

Likelihood for rates (likelihood) 38/ 238

Page 15: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

The log-likelihood is maximal for:

dl(λ)

dλ=

D

λ− Y = 0 ⇔ λ =

D

Y

Information about θ = log(λ):

l(θ|D, Y ) = Dθ−eθY, l′θ = D−eθY, l′′θ = −eθY

so I(θ) = eθY = λY = D, hence var(θ) = 1/D

Standard error of log-rate: 1/√D.

Note that this only depends on the no. events, noton the follow-up time.

Likelihood for rates (likelihood) 39/ 238

Confidence interval for a rate

A 95% confidence interval for the log of a rate is:

θ ± 1.96/√D = log(λ)± 1.96/

√D

Take the exponential to get the confidence intervalfor the rate:

λ×÷ exp(1.96/

√D)︸ ︷︷ ︸

error factor,erf

Likelihood for rates (likelihood) 40/ 238

Exercise

Suppose we have 17 deaths during 843.6 years offollow-up.

Calculate the mortality rate with a 95% c.i.

Likelihood for rates (likelihood) 41/ 238

Page 16: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Exercise – solution

The rate is computed as:

λ = D/Y = 17/843.7 = 0.0201 = 20.1 per 1000 years

The confidence interval is computed as:

λ×÷ erf = 20.1

×÷ exp(1.96/

√D) = (12.5, 32.4)

per 1000 person-years.

Likelihood for rates (likelihood) 41/ 238

Ratio of two rates

If we have observations two rates λ1 and λ0, basedon (D1, Y1) and (D0, Y0) the variance of thedifference of the ratio of the rates, RR, is:

var(log(RR)) = var(log(λ1/λ0))

= var(log(λ1)) + var(log(λ0))

= 1/D1 + 1/D0

As before a 95% c.i. for the RR is then:

RR×÷ exp

(1.96

√1

D1+

1

D0

)︸ ︷︷ ︸

error factor

Likelihood for rates (likelihood) 42/ 238

Exercise

Suppose we in group 0 have 17 deaths during 843.6years of follow-up in one group, and in group 1 have28 deaths during 632.3 years.

Calculate the rate-ration between group 1 and 0with a 95% c.i.

Likelihood for rates (likelihood) 43/ 238

Page 17: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Exercise – solution

The rate-ratio is computed as:

RR = λ1/λ0 = (D1/Y1)/(D0/Y0)

= (28/632.3)/(17/843.7) = 0.0443/0.0201 = 2.19

The 95% confidence interval is computed as:

RR×÷ erf = 2.198

×÷ exp

(1.96

√1/17 + 1/28

)= 2.198

×÷ 1.837 = (1.20, 4.02)

Likelihood for rates (likelihood) 43/ 238

LifetablesMonday 19th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

The life table method

The simplest analysis is by the “life-table method”:

interval alive dead cens.i ni di li pi

1 77 5 2 5/(77− 2/2)= 0.0662 70 7 4 7/(70− 4/2)= 0.1033 59 8 1 8/(59− 1/2)= 0.137

pi = P {death in interval i} = 1− di/(ni − li/2)

S(t) = (1− p1)× · · · × (1− pt)

Lifetables (lifetable) 44/ 238

Page 18: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

The life table method

The life-table method computes survivalprobabilities for each time interval, in demographynormally one year.

The rate is the number of deaths di divided by therisk time (ni − di/2− li/2)× �i:

λi =di

(ni − di/2− li/2)× �i

and hence the death probability:

pi = 1−exp−λi�i = 1−exp

(− di(ni − di/2− li/2)

)

The modified life-table estimator.Lifetables (lifetable) 45/ 238

Population life table, DK 1997–98

Men Women

a S(a) λ(a) E[�res(a)] S(a) λ(a) E[�res(a)]

0 1.00000 567 73.68 1.00000 474 78.651 0.99433 67 73.10 0.99526 47 78.022 0.99366 38 72.15 0.99479 21 77.063 0.99329 25 71.18 0.99458 14 76.084 0.99304 25 70.19 0.99444 14 75.095 0.99279 21 69.21 0.99430 11 74.106 0.99258 17 68.23 0.99419 6 73.117 0.99242 14 67.24 0.99413 3 72.118 0.99227 15 66.25 0.99410 6 71.119 0.99213 14 65.26 0.99404 9 70.12

10 0.99199 17 64.26 0.99395 17 69.1211 0.99181 19 63.28 0.99378 15 68.1412 0.99162 16 62.29 0.99363 11 67.1513 0.99147 18 61.30 0.99352 14 66.1514 0.99129 25 60.31 0.99338 11 65.1615 0.99104 45 59.32 0.99327 10 64.1716 0.99059 50 58.35 0.99317 18 63.1817 0.99009 52 57.38 0.99299 29 62.1918 0.98957 85 56.41 0.99270 35 61.2119 0.98873 79 55.46 0.99235 30 60.2320 0.98795 70 54.50 0.99205 35 59.2421 0.98726 71 53.54 0.99170 31 58.27

Lifetables (lifetable) 46/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Danish life tables 1997−98

log2( mortality per 105 (40−85 years) )Men: −14.244 + 0.135 age

Women: −14.877 + 0.135 age

Lifetables (lifetable) 47/ 238

Page 19: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Swedish life tables 1997−98

log2( mortality per 105 (40−85 years) )Men: −15.453 + 0.146 age

Women: −16.204 + 0.146 age

Lifetables (lifetable) 48/ 238

Practical

Based on the previous slides answer the followingfor both Danish and Swedish lifetables:

� What is the doubling time for mortality?

� What is the rate-ratio between males andfemales?

� How much older should a woman be in order tohave the same mortality as a man?

Lifetables (lifetable) 49/ 238

Denmark Males Females

log2(λ(a)

) −14.244 + 0.135 age −14.877 + 0.135 ageDoubling time 1/0.135 = 7.41 yearsM/F rate-ratio 2−14.244+14.877 = 20.633 = 1.55Age-difference (−14.244 + 14.877)/0.135 = 4.69 years

Sweden: Males Females

log2(λ(a)

) −15.453 + 0.146 age −16.204 + 0.146 ageDoubling time 1/0.146 = 6.85 yearsM/F rate-ratio 2−15.453+16.204 = 20.751 = 1.68Age-difference (−15.453 + 16.204)/0.146 = 5.14 years

Lifetables (lifetable) 49/ 238

Page 20: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Observations for the lifetable

Age

1995

2000

50

55

60

65

1996

1997

1998

1999

Life table is based onperson-years and deathsaccumulated in a short period.

Age-specific rates —cross-sectional!

Survival function:

S(t) = e−∫ t

0λ(a) da = e−

∑t0 λ(a)

— assumes stability of rates to beinterpretable for actual persons.

Lifetables (lifetable) 50/ 238

Life table approach

The observation of interest is not the survival timeof the individual.

It is the population experience:

D: Deaths (events).

Y : Person-years (risk time).

The classical lifetable analysis compiles these forprespecified intervals of age, and computesage-specific mortality rates.

Data are collected cross-sectionally, but interpretedlongitudinally.

Lifetables (lifetable) 51/ 238

Rates vary over time:

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1986

log2( mortality per 105 (40−85 years) )Men: −14.061 + 0.138 age

Women: −15.266 + 0.138 age

Lifetables (lifetable) 52/ 238

Page 21: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1986

log2( mortality per 105 (40−85 years) )Men: −14.061 + 0.138 age

Women: −15.266 + 0.138 age

Lifetables (lifetable) 53/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1987

log2( mortality per 105 (40−85 years) )Men: −14.216 + 0.140 age

Women: −15.384 + 0.140 age

Lifetables (lifetable) 54/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1988

log2( mortality per 105 (40−85 years) )Men: −14.043 + 0.137 age

Women: −15.211 + 0.137 age

Lifetables (lifetable) 55/ 238

Page 22: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1989

log2( mortality per 105 (40−85 years) )Men: −14.007 + 0.136 age

Women: −15.156 + 0.136 age

Lifetables (lifetable) 56/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1990

log2( mortality per 105 (40−85 years) )Men: −13.996 + 0.136 age

Women: −15.139 + 0.136 age

Lifetables (lifetable) 57/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1991

log2( mortality per 105 (40−85 years) )Men: −14.093 + 0.136 age

Women: −15.247 + 0.136 age

Lifetables (lifetable) 58/ 238

Page 23: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1992

log2( mortality per 105 (40−85 years) )Men: −14.122 + 0.137 age

Women: −15.265 + 0.137 age

Lifetables (lifetable) 59/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1993

log2( mortality per 105 (40−85 years) )Men: −14.300 + 0.139 age

Women: −15.418 + 0.139 age

Lifetables (lifetable) 60/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1994

log2( mortality per 105 (40−85 years) )Men: −14.275 + 0.137 age

Women: −15.412 + 0.137 age

Lifetables (lifetable) 61/ 238

Page 24: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1995

log2( mortality per 105 (40−85 years) )Men: −14.224 + 0.136 age

Women: −15.351 + 0.136 age

Lifetables (lifetable) 62/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1996

log2( mortality per 105 (40−85 years) )Men: −14.235 + 0.136 age

Women: −15.388 + 0.136 age

Lifetables (lifetable) 63/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1997

log2( mortality per 105 (40−85 years) )Men: −14.151 + 0.134 age

Women: −15.235 + 0.134 age

Lifetables (lifetable) 64/ 238

Page 25: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1998

log2( mortality per 105 (40−85 years) )Men: −14.192 + 0.135 age

Women: −15.337 + 0.135 age

Lifetables (lifetable) 65/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 1999

log2( mortality per 105 (40−85 years) )Men: −14.234 + 0.135 age

Women: −15.374 + 0.135 age

Lifetables (lifetable) 66/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 2000

log2( mortality per 105 (40−85 years) )Men: −14.205 + 0.133 age

Women: −15.271 + 0.133 age

Lifetables (lifetable) 67/ 238

Page 26: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 2001

log2( mortality per 105 (40−85 years) )Men: −14.268 + 0.133 age

Women: −15.329 + 0.133 age

Lifetables (lifetable) 68/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 2002

log2( mortality per 105 (40−85 years) )Men: −14.391 + 0.135 age

Women: −15.448 + 0.135 age

Lifetables (lifetable) 69/ 238

0 20 40 60 80 100

510

5010

050

050

00

Age

Mor

talit

y pe

r 100

,000

per

son

year

s

Finnish life tables 2003

log2( mortality per 105 (40−85 years) )Men: −14.339 + 0.134 age

Women: −15.412 + 0.134 age

Lifetables (lifetable) 70/ 238

Page 27: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Who needs the Cox-modelanyway?Monday 19th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Empirical rates

At the individual level we introduce theempirical rate: (d, y),— the number of events (d ∈ {0, 1}) during y risktime.

Each individual contributes several observations.

Empirical rates are responses in survival analysis.

The timescale is a covariate — varies acrossempirical rates from one individual.

Who needs the Cox-model anyway? (WntCma) 71/ 238

Likelihood

The likelihood from several empirical rates from oneindividual is a product of conditional probabilities.Hence, the log-likelihood contribution from oneindividual is a sum of terms.

The log-likelihood from one empirical rate (d, y),assuming the event rate λ is constant is:

dlog(λ)− λy

so the contribution from one individual is as thecontribution from several independent Poissonobservations.

Who needs the Cox-model anyway? (WntCma) 72/ 238

Page 28: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

The proportional hazards model

λ(t, x) = λ0(t)× exp(x′β)

A model for the rate as a function of t and x.

The covariate t has a special status:

� Computationally, because all individualscontribute to (some of) the range of t.

� Conceptually it is less clear — t is but acovariate that varies within individual.

Who needs the Cox-model anyway? (WntCma) 73/ 238

Cox-likelihood

The partial likelihood for the regression parameters:

�(β) =∑

death times

log

(eηdeath∑i∈Rt

eηi

)

is also a profile likelihood in the model whereobservation time has been subdivided in small pieces(empirical rates) and each small piece provided withits own parameter:

log(λ(t, x)

)= log

(λ0(t)

)+ x′β = αt + η

Who needs the Cox-model anyway? (WntCma) 74/ 238

The Cox-likelihood as profile likelihood

Regression parameters describing the effect ofcovariates (other than the chosen underlying timescale).One parameter per death time to describe the effectof time (i.e. the chosen timescale).

log(λ(t, xi)

)= log

(λ0(t)

)+β1x1i+· · ·+βpxpi = αt+ηi

Suppose the time scale has been divided into smallintervals with at most one death in each.

Assume w.l.o.g. the ys in the empirical rates all are1.

Who needs the Cox-model anyway? (WntCma) 75/ 238

Page 29: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Log-likelihood contributions that containinformation on a specific time-scale parameter αt

will be from:

� the (only) empirical rate (d, y) = (1, 1)from the person that died at time t.

� all other empirical rates (d, y) = (0, 1)from those who were at risk at time t.

Who needs the Cox-model anyway? (WntCma) 76/ 238

Note: There is one contribution from each personat risk to this part of the log-likelihood:

�t(αt, β) =∑i∈Rt

di log(λi(t))− λi(t)yi

=∑i∈Rt

{di(αt + ηi)− eαt+ηi

}

= αt + ηdeath − eαt

∑i∈Rt

eηi

where ηdeath is the linear predictor for the personthat died.

Who needs the Cox-model anyway? (WntCma) 77/ 238

The derivative w.r.t. αt is:

Dαt�(αt, β) = 1−eαt

∑i∈Rt

eηi = 0 ⇔ eαt =1∑

i∈Rteηi

If this estimate is fed back into the log-likelihood forαt, we get the profile likelihood (with αt “profiledout”):

log

(1∑

i∈Rteηi

)+ηdeath−1 = log

(eηdeath∑i∈Rt

eηi

)−1

which is the same as the contribution from time t toCox’s partial likelihood.

Who needs the Cox-model anyway? (WntCma) 78/ 238

Page 30: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

What the Cox-model really is

Taking the life-table approach ad absurdum by:

� dividing time as finely as possible,

� modelling one covariate, the time-scale, withone parameter per distinct value,

� profiling these parameters out by maximizingthe profile likelihood

Subsequently, one may recover the effect of thetimescale by smoothing an estimate of thecumulative sum of these.

Who needs the Cox-model anyway? (WntCma) 79/ 238

Sensible modelling

Replace the αts by a parmetric function f(t) with alimited number of parameters, for example:

� Piecewise constant

� Splines (linear, quadratic or cubic)

� Fractional polynomials

Use Poisson modelling software on a dataset ofempirical rates for small intervals (ys).

Note that the intervals need not be derived from thedeath times.

Just small enough to make the constant rateassumption reasonable.

Who needs the Cox-model anyway? (WntCma) 80/ 238

Splitting the dataset

The Poisson approach needs a dataset of empiricalrates with small values of y.

Larger than the original: each individual contributesmany empirical rates. From each empirical rate weget:

� Poisson-response d

� Risk time y

� Covariate value for the timescale (time sinceentry, current age, current date, . . . )

� other covariates

Who needs the Cox-model anyway? (WntCma) 81/ 238

Page 31: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Example: Mayo Clinic lung cancer

time status age sex1 306 2 74 12 455 2 68 1

> Lx <- Lexis( exit=list( tfd=time), exit.status=(status==2), daNOTE: entry is assumed to be 0 on the tfd timescale.

> tab(Lx,scale=365.25)States:

#records:To

From FALSE TRUE Sum #events: #risk time: Rate (95FALSE 63 165 228 165 190.5352 0.8659815 0.743432

> dx <- splitLexis( Lx, "tfd", breaks=c(0,unique(Lx$time)) )> tab( dx, scale=365.25 )States:

#records:To

From FALSE TRUE Sum #events: #risk time: Rate (FALSE 19857 165 20022 165 190.5352 0.8659815 0.7434

Who needs the Cox-model anyway? (WntCma) 82/ 238

The baseline hazard and survival functions

Using a parametric function to model the baselinehazard gives the possibility to plot this withconfidence intervals for a given set of covariatevalues, x0

The survival function in a multiplicative Poissonmodel has the form:

S(t) = exp(−∑

τ<t

exp(g(τ) + x′0γ))

This is just a non-linear function of the parametersin the model, g and γ. So the variance can becomputed using the δ-method.

Who needs the Cox-model anyway? (WntCma) 83/ 238

δ-method for survival function

1. Select timepoints ti (fairly close).

2. Get estimates of log-rates f(ti) = g(ti) + x′0γfor these points:

f(ti) = B β

where β is the total parameter vector in themodel.

3. Variance-covariance matrix of β: Σ.

4. Variance-covariance of f(ti): BΣB′.5. Transformation to the rates is the

coordinate-wise exponential function, withderivative diag

[exp

(f(ti)

)]Who needs the Cox-model anyway? (WntCma) 84/ 238

Page 32: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

6. Variance-covariance matrix of the rates at thepoints ti:

diag(ef(ti))B ΣB′ diag(ef(ti))′

7. Transformation to cumulative hazard (� isinterval length):

�×

⎡⎢⎢⎣

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 0

⎤⎥⎥⎦⎡⎢⎢⎢⎣

ef(t1))

ef(t2))

ef(t3))

ef(t4))

⎤⎥⎥⎥⎦ = L

⎡⎢⎢⎢⎣

ef(t1))

ef(t2))

ef(t3))

ef(t4))

⎤⎥⎥⎥⎦

Who needs the Cox-model anyway? (WntCma) 85/ 238

8. Variance-covariance matrix for the cumulativehazard is:

L diag(ef(ti))B ΣB′ diag(ef(ti))′L′

This is all implemented in the ci.cum() function inEpi.

Who needs the Cox-model anyway? (WntCma) 86/ 238

Mayo clinic lung cancer data

Smoothing by natural splines with 7 parameters;knots at 0, 25, 75, 150, 250, 500, 1000 days

0 200 400 600 800

12

34

5

Days since diagnosis

Mor

talit

y ra

te p

er y

ear

0 200 400 600 800

0.0

0.2

0.4

0.6

0.8

1.0

Days since diagnosis

Sur

viva

l

0 200 400 600 800

0.0

0.2

0.4

0.6

0.8

1.0

Who needs the Cox-model anyway? (WntCma) 87/ 238

Page 33: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Computational tools for time-splitting

R: A function splitLexis, written byMartyn Plummer, in the package Epiavailable at CRAN.

Stata: The function stsplit (part of standardStata).Descendant of stlexis written byMichael Hills & David Clayton.

SAS: A macro %Lexis, available athttp://wwww.biostat.ku.dk/~bxc/Lexis.

Who needs the Cox-model anyway? (WntCma) 88/ 238

Conclusion

� 1 — 1 correspondence between life-tables andclassical survival analysis.

� Cox-model (and the Kaplan-Meier estimator) isthe lifetable taken ad absurdum, estimating oneprobability per interval defined byevents/censorings.

� The natural modification is to use the modifiedlife table estimator as basis for Poissonmodelling.

Who needs the Cox-model anyway? (WntCma) 89/ 238

Follow-up dataMonday 19th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 34: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Follow-up and rates� Follow-up studies:

� D — events, deaths� Y — person-years� λ = D/Y rates

� Rates differ between persons.� Rates differ within persons:

� Along age� Along calendar time

� Multiple timescales.

Follow-up data (FU-rep-Lexis) 90/ 238

Representation of follow-up data

In a cohort study we have records of:Events and Risk time.

Follow-up data for each individual must have(at least) three variables:

� Date of entry — date variable.

� Date of exit — date variable

� Status at exit — indicator-variable (0/1)

Specific for each type of outcome.

Follow-up data (FU-rep-Lexis) 91/ 238

Aim of dividing time into bands:

PutD — eventsY — risk time

in intervals on the timescale:

Origin: The date where the time scale is 0:

� Age — 0 at date of birth

� Disease duration — 0 at date of diagnosis

� Occupation exposure — 0 at date of hire

Intervals: How should it be subdivided:

� 1-year classes? 5-year classes?

� Equal length?

Follow-up data (FU-rep-Lexis) 92/ 238

Page 35: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Cohort with 3 persons:

Id Bdate Entry Exit St1 14/07/1952 04/08/1965 27/06/1997 12 01/04/1954 08/09/1972 23/05/1995 03 10/06/1987 23/12/1991 24/07/1998 1

� Define strata: 10-years intervals of current age.

� Split Y for every subject accordingly

� Treat each segment as a separate unit ofobservation.

� Keep track of exit status in each interval.

Follow-up data (FU-rep-Lexis) 93/ 238

Splitting the follow up

subj. 1 subj. 2 subj. 3

Age at Entry: 13.06 18.44 4.54Age at eXit: 44.95 41.14 11.12

Status at exit: Dead Alive Dead

Y 31.89 22.70 6.58D 1 0 1

Follow-up data (FU-rep-Lexis) 94/ 238

subj. 1 subj. 2 subj. 3∑

Age Y D Y D Y D Y D

0– 0.00 0 0.00 0 5.46 0 5.46 010– 6.94 0 1.56 0 1.12 1 8.62 120– 10.00 0 10.00 0 0.00 0 20.00 030– 10.00 0 10.00 0 0.00 0 20.00 040– 4.95 1 1.14 0 0.00 0 6.09 1

∑31.89 1 22.70 0 6.58 1 60.17 2

Follow-up data (FU-rep-Lexis) 95/ 238

Page 36: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Splitting the follow-up

id Bdate Entry Exit St risk int

1 14/07/1952 03/08/1965 14/07/1972 0 6.9432 101 14/07/1952 14/07/1972 14/07/1982 0 10.0000 201 14/07/1952 14/07/1982 14/07/1992 0 10.0000 301 14/07/1952 14/07/1992 27/06/1997 1 4.9528 402 01/04/1954 08/09/1972 01/04/1974 0 1.5606 102 01/04/1954 01/04/1974 31/03/1984 0 10.0000 202 01/04/1954 31/03/1984 01/04/1994 0 10.0000 302 01/04/1954 01/04/1994 23/05/1995 0 1.1417 403 10/06/1987 23/12/1991 09/06/1997 0 5.4634 03 10/06/1987 09/06/1997 24/07/1998 1 1.1211 10

- but what if we want to keep track of calendar timetoo?

Follow-up data (FU-rep-Lexis) 96/ 238

Timescales

� A timescale is a variable that variesdeterministically within each person duringfollow-up:

� Age� Calendar time� Time since treatment� Time since relapse

� All timescales advance at the same pace(1 year per year . . . )

� Note: Cumulative exposure is not a timescale.

Follow-up data (FU-rep-Lexis) 97/ 238

Representation of follow-upon several timescales

� The time followed is the same on all timescales.� Only use the entry point on each time scale:

� Age at entry.� Date of entry.� Time since treatment at entry.

— if time of treatment is the entry, this is 0 for all.

Follow-up data (FU-rep-Lexis) 98/ 238

Page 37: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Follow-up data in Epi: Lexis objectsA follow-up study:

> round( th, 2 )

id sex birthdat contrast injecdat volume exitdat exitstat

1 1 2 1916.61 1 1938.79 22 1976.79 1

2 640 2 1896.23 1 1945.77 20 1964.37 1

3 3425 1 1886.97 2 1955.18 0 1956.59 1

4 4017 2 1936.81 2 1957.61 0 1992.14 2

Timescales of interest:

� Age

� Calendar time

� Time since injection

Follow-up data (FU-rep-Lexis) 99/ 238

Definition of Lexis object

> thL <- Lexis( entry = list( age=injecdat-birthdat,

+ per=injecdat,

+ tfi=0 ),

+ exit = list( per=exitdat ),

+ exit.status = (exitstat==1)*1,

+ data = th )

entry is defined on three timescales,but exit is only defined on one timescale:Follow-up time is the same on all timescales.

Follow-up data (FU-rep-Lexis) 100/ 238

The looks of a Lexis object

> round( thL[,c(1:8,14,15)], 2 )

age per tfi lex.dur lex.Cst lex.Xst lex.id id

1 22.18 1938.79 0 38.00 0 1 1 1

2 49.55 1945.77 0 18.60 0 1 2 640

3 68.21 1955.18 0 1.40 0 1 3 3425

4 20.80 1957.61 0 34.52 0 0 4 4017

Follow-up data (FU-rep-Lexis) 101/ 238

Page 38: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

20 30 40 50 60 70

1940

1950

1960

1970

1980

1990

age

per

> plot( thL, lwd=3 )

Follow-up data (FU-rep-Lexis) 102/ 238

1940 1950 1960 1970 1980 1990

2030

4050

6070

per

age

> plot( thL, 2:1, lwd=5, col=c("red","blue")[thL$contrast], grid=T )

> points( thL, 2:1, pch=c(NA,3)[thL$lex.Xst+1],lwd=3, cex=1.5 )

Follow-up data (FU-rep-Lexis) 103/ 238

1930 1940 1950 1960 1970 1980 1990 200010

20

30

40

50

60

70

80

per

age

> plot( thL, 2:1, lwd=5, col=c("red","blue")[thL$contrast],

+ grid=TRUE, lty.grid=1, col.grid=gray(0.7),

+ xlim=1930+c(0,70), xaxs="i", ylim= 10+c(0,70), yaxs="i", las=1 )

> points( thL, 2:1, pch=c(NA,3)[thL$lex.Xst+1],lwd=3, cex=1.5 )Follow-up data (FU-rep-Lexis) 104/ 238

Page 39: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Splitting follow-up time

> spl1 <- splitLexis( thL, "age", breaks=seq(0,100,20) )

> round( spl1, 2 )

lex.id age per tfi lex.dur lex.Cst lex.Xst id sex bi

1 1 22.18 1938.79 0.00 17.82 0 0 1 2 1

2 1 40.00 1956.61 17.82 20.00 0 0 1 2 1

3 1 60.00 1976.61 37.82 0.18 0 1 1 2 1

4 2 49.55 1945.77 0.00 10.45 0 0 640 2 1

5 2 60.00 1956.23 10.45 8.14 0 1 640 2 1

6 3 68.21 1955.18 0.00 1.40 0 1 3425 1 1

7 4 20.80 1957.61 0.00 19.20 0 0 4017 2 1

8 4 40.00 1976.81 19.20 15.33 0 0 4017 2 1

Follow-up data (FU-rep-Lexis) 105/ 238

Split on a second timescale

> # Split further on tfi:

> spl2 <- splitLexis( spl1, "tfi", breaks=c(0,1,5,20,100) )

> round( spl2, 2 )

lex.id age per tfi lex.dur lex.Cst lex.Xst id sex b

1 1 22.18 1938.79 0.00 1.00 0 0 1 2

2 1 23.18 1939.79 1.00 4.00 0 0 1 2

3 1 27.18 1943.79 5.00 12.82 0 0 1 2

4 1 40.00 1956.61 17.82 2.18 0 0 1 2

5 1 42.18 1958.79 20.00 17.82 0 0 1 2

6 1 60.00 1976.61 37.82 0.18 0 1 1 2

7 2 49.55 1945.77 0.00 1.00 0 0 640 2

8 2 50.55 1946.77 1.00 4.00 0 0 640 2

9 2 54.55 1950.77 5.00 5.45 0 0 640 2

10 2 60.00 1956.23 10.45 8.14 0 1 640 2

11 3 68.21 1955.18 0.00 1.00 0 0 3425 1

12 3 69.21 1956.18 1.00 0.40 0 1 3425 1

13 4 20.80 1957.61 0.00 1.00 0 0 4017 2

14 4 21.80 1958.61 1.00 4.00 0 0 4017 2

15 4 25.80 1962.61 5.00 14.20 0 0 4017 2

16 4 40 00 1976 81 19 20 0 80 0 0 4017 2Follow-up data (FU-rep-Lexis) 106/ 238

The Poisson likelihood for time-split data

One record per person-interval (i, t):

Dln(λ)− λY =∑i,t

(ditln(λ)− λyit

)

Assuming that the death indicator (di ∈ {0, 1}) isPoisson, with log-offset yi will give the same result.

The model assume that rates are constant.

But the split data allows relaxing this to modelsthat assume different rates for different (dit, yit).

Where are the (dit, yit) in the split data?

Follow-up data (FU-rep-Lexis) 107/ 238

Page 40: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

The Poisson likelihood for time-split data

If d ∼ Poisson(λy), i.e. with mean (λy) then thelog-likelihood is

dlog(λy)− λy

If we assume a multiplicative model for the rates,i.e. an additive model for the log-rates, we can usea Poisson model which is multiplicative in the mean,μ, i.e. linear in log(μ):

log(μ) = log(λy) = log(λ) + log(y)

Regression model must include log(y) as covariatewith coefficient fixed to 1 — an offset-variable.

Follow-up data (FU-rep-Lexis) 108/ 238

20 30 40 50 60 70

010

2030

4050

age

tfi

plot( spl2, c(1,3), col="black", lwd=2 )

Follow-up data (FU-rep-Lexis) 109/ 238

Where is (dit, yit) in the split data?

> round( spl2, 2 )

lex.id age per tfi lex.dur lex.Cst lex.Xst id sex b

1 1 22.18 1938.79 0.00 1.00 0 0 1 2

2 1 23.18 1939.79 1.00 4.00 0 0 1 2

3 1 27.18 1943.79 5.00 12.82 0 0 1 2

4 1 40.00 1956.61 17.82 2.18 0 0 1 2

5 1 42.18 1958.79 20.00 17.82 0 0 1 2

6 1 60.00 1976.61 37.82 0.18 0 1 1 2

7 2 49.55 1945.77 0.00 1.00 0 0 640 2

8 2 50.55 1946.77 1.00 4.00 0 0 640 2

9 2 54.55 1950.77 5.00 5.45 0 0 640 2

10 2 60.00 1956.23 10.45 8.14 0 1 640 2

11 3 68.21 1955.18 0.00 1.00 0 0 3425 1

12 3 69.21 1956.18 1.00 0.40 0 1 3425 1

13 4 20.80 1957.61 0.00 1.00 0 0 4017 2

14 4 21.80 1958.61 1.00 4.00 0 0 4017 2

15 4 25.80 1962.61 5.00 14.20 0 0 4017 2

16 4 40.00 1976.81 19.20 0.80 0 0 4017 2

17 4 40.80 1977.61 20.00 14.52 0 0 4017 2Follow-up data (FU-rep-Lexis) 110/ 238

Page 41: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Analysis of results

� di — events in the variable: lex.Xst.

� yi — risk time: lex.dur (duration).Enters in the model via log(y) as offset.

� Covariates are:� timescales (age, period, time in study)� other variables for this person (constant or

assumed constant in each interval).

� Model rates using the covariates in glm — nodifference between time-scales and othercovariates.

Follow-up data (FU-rep-Lexis) 111/ 238

Poisson model for split data

� Each interval contribute λY to thelog-likelihood.

� All intervals with the same set of covariatevalues (age,exposure,. . . ) have the same λ.

� The log-likelihood contribution from these isλ∑

Y — the same as from aggregated data.� The event intervals contribute each Dlogλ.� The log-likelihood contribution from those withthe same lambda is

∑Dlogλ — the same as

from aggregated data.� The log-likelihood is the same for split data andaggregated data — no need to tabulate first.

Follow-up data (FU-rep-Lexis) 112/ 238

Age at entryMonday 19th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 42: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Age at entry as covariate

t: time since entrye: age at entrya = e+ t: current age

log(λ(a, t)

)= f(t) + βe = (f(t)− βt) + βa

Immaterial whether a or e is used as (log)-linearcovariate as long as t is in the model.

In a Cox-model with time since entry as time-scale,only the baseline hazard will change if age at entry isreplaced by current age (a time-dependent variable).

Age at entry (Age-at-entry) 113/ 238

Non-linear effects of time-scales

Arbitrary effects of the three variables t, a and e:=⇒ genuine extension of the model.

log(λ(a, t, xi)

)= f(t) + g(a) + h(e) + ηi

Three quantities can be arbitrarily moved betweenthe three functions:

f(t)=f(a)−μa−μe+γt

g(a)=g(p)+μa −γa

h(e)=h(c) +μa+γe

because t− a+ e = 0.This is the age-period-cohort modelling problemagain.

Age at entry (Age-at-entry) 114/ 238

“Controlling for age”

— is not a well defined statement.

Mostly it means that age at entry is included in themodel.

But ideally one would check whether there werenon-linear effects of age at entry and current age.

This would require modelling of multiple timescales.

Which is best accomplished by splitting time.

Age at entry (Age-at-entry) 115/ 238

Page 43: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Models for tabulated dataMonday 19th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Conceptual set-up

Follow-up of the entire (male) population from1943–2006 w.r.t. occurrence of testiscancer:

� Split follow-up time for all about 4 mio. men in1-year classes by age and calendar time (y).

� Allocate testis cancer event (d = 0, 1) to each.

� Analyse all 200, 000, 000 records by a Poissonmodel.

Models for tabulated data (tab-mod) 116/ 238

Realistic set-up

� Tabulate the follow-up time and events by ageand period.

� 100 age-classes.

� 65 periods (single calendar years).

� 6500 aggregate records of (D, Y ).

� Analyze by a Poisson model.

Models for tabulated data (tab-mod) 117/ 238

Page 44: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Practical set-up

� Tabulate only events (as obtained from thecancer registry) by age and period.

� 100 age-classes.

� 65 periods (single calendar years).

� 6500 aggregate records of D.

� Estimate the population follow-up based oncensus data from Statistics Denmark.Or get it from the human mortality database.

� Analyse by Poisson model.

Models for tabulated data (tab-mod) 118/ 238

Lexis diagram 1

Calendar time

Age

1940 1950 1960 1970 19800

10

20

30

40 Disease registersrecord events.

Official statisticscollect populationdata.

1 Named after the German statistician andeconomist William Lexis (1837–1914), whodevised this diagram in the book “Einleitung indie Theorie der Bevolkerungsstatistik” (Karl J.Trubner, Strassburg, 1875).

Models for tabulated data (tab-mod) 119/ 238

Lexis diagram

Calendar time

Age

1943 1953 1963 1973 1983 199315

25

35

45

55

Registration of:

cases (D)

risk time,person-years (Y )

in subsets of theLexis diagram.

Models for tabulated data (tab-mod) 120/ 238

Page 45: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Lexis diagram

Calendar time

Age

1943 1953 1963 1973 1983 199315

25

35

45

55

●●

Registration of:

cases (D)

risk time,person-years (Y )

in subsets of theLexis diagram.

Rates available ineach subset.

Models for tabulated data (tab-mod) 121/ 238

Register data

Classification of cases (Dap) by age at diagnosisand date of diagnosis, and population (Yap) by ageat risk and date at risk, in compartments of theLexis diagram, e.g.:

Seminoma cases Person-yearsAge 1943 1948 1953 1958 1943 1948 1953 195815 2 3 4 1 773812 744217 794123 97285320 7 7 17 8 813022 744706 721810 77085925 28 23 26 35 790501 781827 722968 69861230 28 43 49 51 799293 774542 769298 71159635 36 42 39 44 769356 782893 760213 76045240 24 32 46 53 694073 754322 768471 749912

Models for tabulated data (tab-mod) 122/ 238

Reshape data to analysis form:

A P D Y1 15 1943 2 7738122 20 1943 7 8130223 25 1943 28 7905014 30 1943 28 7992935 35 1943 36 7693566 40 1943 24 6940731 15 1948 3 7442172 20 1948 7 7447063 25 1948 23 7818274 30 1948 43 7745425 35 1948 42 7828936 40 1948 32 7543221 15 1953 4 7941232 20 1953 17 7218103 25 1953 26 7229684 30 1953 49 7692985 35 1953 39 7602136 40 1953 46 7684711 15 1958 1 9728532 20 1958 8 7708593 25 1958 35 698612

Models for tabulated data (tab-mod) 123/ 238

Page 46: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Tabulated data

Once data are in tabular form, models are resticted:

� Rates must be assumed constant in each cell ofthe table / subset of the Lexis diagram.

� With large cells it is customary to put aseparate paramter on each cell or on eachlevels of classifying factors.

� Output from the model will be rates andrate-ratios.

� Since we use multiplicative Poisson, usually thelog rates and the log-RR are reported

Models for tabulated data (tab-mod) 124/ 238

Simple model for the testiscancer rates:

> m0 <- glm( D ~ factor(A) + factor(P) + offset( log(Y/10^5) ),+ family=poisson, data=ts )> summary( m0 )

Call:glm(formula = D ~ factor(A) + factor(P) + offset(log(Y/10^5)),

family = poisson, data = ts)

Deviance Residuals:Min 1Q Median 3Q Max

-1.5991 -0.6974 0.1284 0.6671 1.8904

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.4758 0.3267 -4.517 6.26e-06factor(A)20 1.4539 0.3545 4.101 4.11e-05factor(A)25 2.5321 0.3301 7.671 1.71e-14factor(A)30 2.9327 0.3254 9.013 < 2e-16factor(A)35 2.8613 0.3259 8.779 < 2e-16factor(A)40 2.8521 0.3263 8.741 < 2e-16factor(P)1948 0.1753 0.1211 1.447 0.14778factor(P)1953 0.3822 0.1163 3.286 0.00102

Models for tabulated data (tab-mod) 125/ 238

ci.lin() from the Epi package extractscoefficients and computes confidence inetrvals:

> round( ci.lin( m0 ), 3 )Estimate StdErr z P 2.5% 97.5%

(Intercept) -1.476 0.327 -4.517 0.000 -2.116 -0.836factor(A)20 1.454 0.354 4.101 0.000 0.759 2.149factor(A)25 2.532 0.330 7.671 0.000 1.885 3.179factor(A)30 2.933 0.325 9.013 0.000 2.295 3.570factor(A)35 2.861 0.326 8.779 0.000 2.223 3.500factor(A)40 2.852 0.326 8.741 0.000 2.213 3.492factor(P)1948 0.175 0.121 1.447 0.148 -0.062 0.413factor(P)1953 0.382 0.116 3.286 0.001 0.154 0.610factor(P)1958 0.466 0.115 4.052 0.000 0.241 0.691

Models for tabulated data (tab-mod) 126/ 238

Page 47: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Subsets of parameter estimates accesed via acharcter string that is greped to the names.

> round( ci.lin( m0, subset="P" ), 3 )Estimate StdErr z P 2.5% 97.5%

factor(P)1948 0.175 0.121 1.447 0.148 -0.062 0.413factor(P)1953 0.382 0.116 3.286 0.001 0.154 0.610factor(P)1958 0.466 0.115 4.052 0.000 0.241 0.691

Models for tabulated data (tab-mod) 127/ 238

Rates / rate-ratios are computed on the fly byExp=TRUE:

> round( ci.lin( m0, subset="P", Exp=TRUE ), 3 )Estimate StdErr z P exp(Est.) 2.5% 97.5%

factor(P)1948 0.175 0.121 1.447 0.148 1.192 0.940 1.511factor(P)1953 0.382 0.116 3.286 0.001 1.466 1.167 1.841factor(P)1958 0.466 0.115 4.052 0.000 1.593 1.272 1.996

Models for tabulated data (tab-mod) 128/ 238

Linear combinations of the parameters can becomputed using the ctr.mat option:

> CM <- rbind( c( 0,-1, 0),+ c( 1,-1, 0),+ c( 0, 0, 0),+ c( 0,-1, 1) )> round( ci.lin( m0, subset="P", ctr.mat=CM, Exp=TRUE ), 3 )

Estimate StdErr z P exp(Est.) 2.5% 97.5%[1,] -0.382 0.116 -3.286 0.001 0.682 0.543 0.857[2,] -0.207 0.110 -1.874 0.061 0.813 0.655 1.010[3,] 0.000 0.000 NaN NaN 1.000 1.000 1.000[4,] 0.084 0.104 0.808 0.419 1.087 0.887 1.332

Models for tabulated data (tab-mod) 129/ 238

Page 48: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Age-Period and Age-CohortmodelsTuesday 24th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Register data - rates

Rates in “tiles” of the Lexis diagram:

λ(a, p) = Dap/Yap

Descriptive epidemiology based on disease registers:How do the rates vary across by age and time:

� Age-specific rates for a given period.

� Age-standardized rates as a function ofcalendar time.(Weighted averages of the age-specific rates).

Age-Period and Age-Cohort models (AP-AC) 130/ 238

Synthetic cohorts

Calendar time

Age

1940 1950 1960 1970 19800

10

20

30

40

1940

1940

1940

1940

1940

1940

1940

1940

1945

1945

1945

1945

1945

1945

1945

1950

1950

1950

1950

1950

1950

Events and risktime in cells alongthe diagonals areamong personswith roughly samedate of birth.

Successivelyoverlapping 10-yearperiods.

Age-Period and Age-Cohort models (AP-AC) 131/ 238

Page 49: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Lexis diagram: data

Calendar time

Age

1943 1953 1963 1973 1983 199315

25

35

45

55

10773.8

7744.2

13794.1

13972.9

151051.5

33961.0

35952.5

371011.1

491005.0

51929.8

41670.2

30813.0

31744.7

46721.8

49770.9

55960.3

851053.8

110967.5

140953.0

1511019.7

1501017.3

112760.9

55790.5

62781.8

63723.0

82698.6

87764.8

103962.7

1531056.1

201960.9

214956.2

2681031.6

194835.7

56799.3

66774.5

82769.3

88711.6

103700.1

124769.9

164960.4

2071045.3

209955.0

258957.1

251821.2

53769.4

56782.9

56760.2

67760.5

99711.6

124702.3

142767.5

152951.9

1881035.7

209948.6

199763.9

35694.1

47754.3

65768.5

64749.9

67756.5

85709.8

103696.5

119757.8

121940.3

1551023.7

126754.5

29622.1

30676.7

37737.9

54753.5

45738.1

64746.4

63698.2

66682.4

92743.1

86923.4

96817.8

16539.4

28600.3

22653.9

27715.4

46732.7

36718.3

50724.2

49675.5

61660.8

64721.1

51701.5

6471.0

14512.8

16571.1

25622.5

26680.8

29698.2

28683.8

43686.4

42640.9

34627.7

45544.8

Testis cancer casesin Denmark.

Male person-yearsin Denmark.

Age-Period and Age-Cohort models (AP-AC) 132/ 238

Data matrix: Testis cancer cases

Number of cases

Date of diagnosis (year − 1900)

Age 48–52 53–57 58–62 63–67 68–72 73–77 78–82 83–87 8

15–19 7 13 13 15 33 35 37 4920–24 31 46 49 55 85 110 140 15125–29 62 63 82 87 103 153 201 21430–34 66 82 88 103 124 164 207 20935–39 56 56 67 99 124 142 152 18840–44 47 65 64 67 85 103 119 12145–49 30 37 54 45 64 63 66 9250–54 28 22 27 46 36 50 49 6155–59 14 16 25 26 29 28 43 42

Age-Period and Age-Cohort models (AP-AC) 133/ 238

Data matrix: Male risk time

1000 person-years

Date of diagnosis (year − 1900)

Age 48–52 53–57 58–62 63–67 68–72 73–77 78–82 83–87

15–19 744.2 794.1 972.9 1051.5 961.0 952.5 1011.1 1005.020–24 744.7 721.8 770.9 960.3 1053.8 967.5 953.0 1019.725–29 781.8 723.0 698.6 764.8 962.7 1056.1 960.9 956.230–34 774.5 769.3 711.6 700.1 769.9 960.4 1045.3 955.035–39 782.9 760.2 760.5 711.6 702.3 767.5 951.9 1035.740–44 754.3 768.5 749.9 756.5 709.8 696.5 757.8 940.345–49 676.7 737.9 753.5 738.1 746.4 698.2 682.4 743.150–54 600.3 653.9 715.4 732.7 718.3 724.2 675.5 660.855–59 512.8 571.1 622.5 680.8 698.2 683.8 686.4 640.9

Age-Period and Age-Cohort models (AP-AC) 134/ 238

Page 50: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Data matrix: Empirical rates

Rate per 1000,000 person-years

Date of diagnosis (year − 1900)

Age 48–52 53–57 58–62 63–67 68–72 73–77 78–82 83–87 8

15–19 9.4 16.4 13.4 14.3 34.3 36.7 36.6 48.820–24 41.6 63.7 63.6 57.3 80.7 113.7 146.9 148.125–29 79.3 87.1 117.4 113.8 107.0 144.9 209.2 223.8 230–34 85.2 106.6 123.7 147.1 161.1 170.8 198.0 218.8 235–39 71.5 73.7 88.1 139.1 176.6 185.0 159.7 181.5 240–44 62.3 84.6 85.3 88.6 119.8 147.9 157.0 128.745–49 44.3 50.1 71.7 61.0 85.7 90.2 96.7 123.850–54 46.6 33.6 37.7 62.8 50.1 69.0 72.5 92.355–59 27.3 28.0 40.2 38.2 41.5 40.9 62.6 65.5

Age-Period and Age-Cohort models (AP-AC) 135/ 238

The classical plots

Given a table of rates classified by age and period,we can do 4 “classical” plots:

� Rates versus age at diagnosis (period):— rates in the same ageclass connected.

� Rates versus age at diagnosis:— rates in the same birth-cohort connected.

� Rates versus date of diagnosis:— rates in the same ageclass connected.

� Rates versus date of date of birth:— rates in the same ageclass connected.

These plots can be produced by the R-functionrateplot.

Age-Period and Age-Cohort models (AP-AC) 136/ 238

20 30 40 50 60

0.01

0.02

0.05

0.10

0.20

Age at diagnosisAge-Period and Age-Cohort models (AP-AC) 137/ 238

Page 51: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

20 30 40 50 60

0.01

0.02

0.05

0.10

0.20

Age at diagnosisAge-Period and Age-Cohort models (AP-AC) 138/ 238

20 30 40 50 60

0.01

0.02

0.05

0.10

0.20

Age at diagnosis20 30 40 50 60

0.01

0.02

0.05

0.10

0.20

Age at diagnosisAge-Period and Age-Cohort models (AP-AC) 139/ 238

1950 1960 1970 1980 1990

0.01

0.02

0.05

0.10

0.20

Date of diagnosisAge-Period and Age-Cohort models (AP-AC) 140/ 238

Page 52: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

1900 1920 1940 1960

0.01

0.02

0.05

0.10

0.20

Date of birthAge-Period and Age-Cohort models (AP-AC) 141/ 238

Age-period model

Rates are proportional between periods:

λ(a, p) = aa × bp or log[λ(a, p)] = αa + βp

Choose p0 as reference period, where βp0 = 0

log[λ(a, p0)] = αa + βp0 = αa

Age-Period and Age-Cohort models (AP-AC) 142/ 238

Fitting the model in RReference period is the 5th period (1970.5 ∼1968–72):

> ap <- glm( D ~ factor( A ) - 1 + relevel( factor( P ), 5 ) ++ offset( log( Y ) ),+ family=poisson )> summary( ap )

Call:glm(formula = D ~ factor(A) - 1 + relevel(factor(P), 5) + offset

Deviance Residuals:Min 1Q Median 3Q Max

-3.0925 -0.8784 0.1148 0.9790 2.7653

Coefficients:Estimate Std. Error z value Pr(>|z|)

factor(A)17.5 -3.56605 0.07249 -49.194 < 2e-16factor(A)22.5 -2.38447 0.04992 -47.766 < 2e-16factor(A)27.5 -1.94496 0.04583 -42.442 < 2e-16factor(A)32.5 -1.85214 0.04597 -40.294 < 2e-16factor(A)37.5 -1.99308 0.04770 -41.787 < 2e-16factor(A)42.5 -2.23017 0.05057 -44.104 < 2e-16factor(A)47.5 -2.58125 0.05631 -45.839 < 2e-16Age-Period and Age-Cohort models (AP-AC) 143/ 238

Page 53: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Graph of estimates with confidenceintervals

20 30 40 50

0.01

0.02

0.05

0.10

0.20

Age

Inci

denc

e ra

te p

er 1

000

PY

(197

0)

1950 1960 1970 1980 1990

12

34

Date of diagnosis

Rat

e ra

tio

Age-Period and Age-Cohort models (AP-AC) 144/ 238

Age-cohort model

Rates are proportional between cohorts:

λ(a, c) = aa × cc or log[λ(a, p)] = αa + γc

Choose c0 as reference cohort, where γc0 = 0

log[λ(a, c0)] = αa + γc0 = αa

Age-Period and Age-Cohort models (AP-AC) 145/ 238

Fit the model in RReference period is the 9th cohort (1933 ∼1928–38):

> ac <- glm( D ~ factor( A ) - 1 + relevel( factor( C ), 9 ) ++ offset( log( Y ) ),+ family=poisson )> summary( ac )

Call:glm(formula = D ~ factor(A) - 1 + relevel(factor(C), 9) + offset

Deviance Residuals:Min 1Q Median 3Q Max

-1.92700 -0.72364 -0.02422 0.59623 1.87770

Coefficients:Estimate Std. Error z value Pr(>|z|)

factor(A)17.5 -4.07597 0.08360 -48.753 < 2e-16factor(A)22.5 -2.72942 0.05683 -48.031 < 2e-16factor(A)27.5 -2.15347 0.05066 -42.505 < 2e-16factor(A)32.5 -1.90118 0.04878 -38.976 < 2e-16factor(A)37.5 -1.89404 0.04934 -38.387 < 2e-16factor(A)42.5 -1.98846 0.05178 -38.399 < 2e-16factor(A)47.5 -2.23047 0.05775 -38.626 < 2e-16Age-Period and Age-Cohort models (AP-AC) 146/ 238

Page 54: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Graph of estimates with confidenceintervals

20 30 40 50

0.01

0.02

0.05

0.10

0.20

Age

Inci

denc

e ra

te p

er 1

000

PY

(193

3 co

hort)

1900 1920 1940 1960

12

34

Date of diagnosis

Rat

e ra

tio

Age-Period and Age-Cohort models (AP-AC) 147/ 238

Age-drift modelTuesday 24th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Linear effect of period:

log[λ(a, p)] = αa + βp = αa + β(p− p0)

that is, βp = β(p− p0).

Linear effect of cohort:

log[λ(a, p)] = αa + γc = αa + γ(c− c0)

that is, γc = γ(c− c0)

Age-drift model (Ad) 148/ 238

Page 55: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Age and linear effect of period:

> apd <- glm( D ~ factor( A ) - 1 + I(P-1970.5) ++ offset( log( Y ) ),+ family=poisson )> summary( apd )

Call:glm(formula = D ~ factor(A) - 1 + I(P - 1970.5) + offset(log(Y))

Deviance Residuals:Min 1Q Median 3Q Max

-2.97593 -0.77091 0.02809 0.95914 2.93076

Coefficients:Estimate Std. Error z value Pr(>|z|)

factor(A)17.5 -3.58065 0.06306 -56.79 <2e-16...factor(A)57.5 -3.17579 0.06256 -50.77 <2e-16I(P - 1970.5) 0.02653 0.00100 26.52 <2e-16

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 89358.53 on 81 degrees of freedomResidual deviance: 126.07 on 71 degrees of freedom

Age-drift model (Ad) 149/ 238

Age and linear effect of cohort:

> acd <- glm( D ~ factor( A ) - 1 + I(C-1933) ++ offset( log( Y ) ),+ family=poisson )> summary( acd )

Call:glm(formula = D ~ factor(A) - 1 + I(C - 1933) + offset(log(Y)),

Deviance Residuals:Min 1Q Median 3Q Max

-2.97593 -0.77091 0.02809 0.95914 2.93076

Coefficients:Estimate Std. Error z value Pr(>|z|)

factor(A)17.5 -4.11117 0.06760 -60.82 <2e-16...factor(A)57.5 -2.64527 0.06423 -41.19 <2e-16I(C - 1933) 0.02653 0.00100 26.52 <2e-16

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 89358.53 on 81 degrees of freedomResidual deviance: 126.07 on 71 degrees of freedom

Age-drift model (Ad) 150/ 238

What goes on?

αa + β(p− p0) = αa + β(a+ c− (a0 + c0)

)

= αa + β(a− a0)︸ ︷︷ ︸cohort age-effect

+β(c− c0)

The two models are the same.The parametrization is different.

The age-curve refers either• to a period (cross-sectional rates) or• to a cohort (longitudinal rates).

Age-drift model (Ad) 151/ 238

Page 56: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

20 30 40 50

0.01

0.02

0.05

0.10

0.20

Age

Inci

denc

e ra

te p

er 1

000

PY

(193

3 co

hort)

1900 1920 1940 1960 1980

12

34

Date of diagnosis

Rat

e ra

tio

Which age-curve is period and which is cohort?

Age-drift model (Ad) 152/ 238

Age-Period-Cohort modelTuesday 24th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

The age-period-cohort model

log[λ(a, p)] = αa + βp + γc

� Three effects:� Age (at diagnosis)� Period (of diagnosis)� Cohort (of birth)

� Modelled on the same scale.

� No assumptions about the shape of effects.

Age-Period-Cohort model (APC-cat) 153/ 238

Page 57: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Fitting the model in R

> c1933.p <- glm( D ~ factor( A ) - 1 ++ relevel( factor( C ), "1933" ) ++ factor( P ) + offset( log( Y ) ), family=p> summary( c1933.p )Coefficients: (1 not defined because of singularities)

Estimate Std. Error z value Pr(>|factor(A)17.5 -4.27754 0.10479 -40.819 < 2e...factor(A)57.5 -2.75892 0.08380 -32.922 < 2erelevel(factor(C), "1933")1893 -0.84187 0.28009 -3.006 0.00...relevel(factor(C), "1933")1928 -0.17922 0.05965 -3.005 0.00relevel(factor(C), "1933")1938 0.07540 0.05592 1.348 0.17...relevel(factor(C), "1933")1973 1.37438 0.17490 7.858 3.90efactor(P)1955.5 0.04793 0.07022 0.683 0.49...factor(P)1985.5 0.09276 0.04091 2.267 0.02factor(P)1990.5 NA NA NA

(Dispersion parameter for poisson family taken to be 1)Null deviance: 89358.526 on 81 degrees of freedom

Residual deviance: 35.459 on 49 degrees of freedomAge-Period-Cohort model (APC-cat) 154/ 238

No. of parameters

A has 9 levelsP has 9 levelsC has 17 levels

Age-drift model has A+ 1 = 10 parametersAge-period model has A+ P − 1 = 17 parametersAge-cohort model has A+ C − 1 = 25 parametersAge-period-cohort model has A+ P + C − 3 = 32parameters

The missing parameter is beacuse of theidentifiability problem.

Age-Period-Cohort model (APC-cat) 155/ 238

Relationship of models

Age865.08 / 72

Age−drift126.07 / 71

Age−Period117.7 / 64

Age−Cohort65.47 / 56

Age−Period−Cohort35.46 / 49

739.01 / 1p=0.0000

60.6 / 15p=0.0000

30.01 / 7p=0.0001

8.37 / 7p=0.3010

82.24 / 15p=0.0000

Testis cancer, Denmark

Age-Period-Cohort model (APC-cat) 156/ 238

Page 58: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Test for effects

Model Deviance d.f. p-value

Age - drift 126.07 71Δ 60.60 15 0.000Age - cohort 65.47 56Δ 30.01 7 0.000Age - period - cohort 35.46 49Δ 82.24 15 0.000Age - period 117.70 64Δ 8.37 7 0.301Age - drift 126.07 71

Age-Period-Cohort model (APC-cat) 157/ 238

How to choose a parametrization

Standard programs: Put extremes of periods orcohorts to 0, and choose a reference for the other.

Holford: Extract linear effects by regression:

λ(a, p) = αa +

βp +γc

= αa + μa + δaa +

βp + μp + δpp +

γc + μc + δcc

Age-Period-Cohort model (APC-cat) 158/ 238

Putting it together again

Assumptions are needed, e.g.:

� Age is the major time scale.

� Cohort is the secondary time scale (the majorsecular trend).

� c0 is the reference cohort.

� Period is the residual time scale: 0 on average,0 slope.

Age-Period-Cohort model (APC-cat) 159/ 238

Page 59: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Period effect, on average 0, slope is 0:

g(p) = βp = βp − μp − δpp

Cohort effect, absorbing all time-trend(δpp = δp(a+ c)) and risk relative to c0:

h(c) = γc − γc0 + δp(c− c0)

The rest is the age-effect:

f(a) = αa + μp + δpa+ δpc0 + γc0

Age-Period-Cohort model (APC-cat) 160/ 238

How it adds up:

λ(a, p) = αa + βp + γc

= αa + γc0 + μp + δp(a+ c0) +

βp − μp − δp(a+ c) +

γc − γc0 + δp(c− c0)

Only the regression on period is needed! (For thismodel. . . )

Age-Period-Cohort model (APC-cat) 161/ 238

● ●●

●●

●●

●●

● ●● ●

● ● ●

●●

20 40 60 1900 1920 1940 1960 1980Age Calendar time

0.01

0.02

0.05

0.1

0.2

Inci

denc

e ra

te p

er 1

000

pers

on−y

ears

0.2

0.5

12

5

Rat

e ra

tio

Age-Period-Cohort model (APC-cat) 162/ 238

Page 60: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

A simple practical approach

First fit the age-cohort model, with cohort c0 asreference and get estimates αa and γc:

log[λ(a, p)] = αa + γc

Now consider the full APC-model with age andcohort effects as estimated:

log[λ(a, p)] = αa + γc + βp

Age-Period-Cohort model (APC-cat) 163/ 238

The residual period effect can be estimated if wenote that for the number of cases we have:

log(expected cases) = log[λ(a, p)Y ] = αa + γc + log(Y )︸ ︷︷ ︸“known”

+

This is analogous to the expression for a Poissonmodel in general, but now is the offset not justlog(Y ) but αa + γc + log(Y ), the log of the fittedvalues from the age-cohort model.

βps are estimated in a Poisson model with this asoffset.

Advantage: We get the standard errors for free..

Age-Period-Cohort model (APC-cat) 164/ 238

20 40 60 1900 1920 1940 1960 1980Age Calendar time

0.01

0.02

0.05

0.1

0.2

Inci

denc

e ra

te p

er 1

000

pers

on−y

ears

0.2

0.5

12

5

Rat

e ra

tio

Age-Period-Cohort model (APC-cat) 165/ 238

Page 61: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

● ●●

●●

●●

●●

● ●● ●

● ● ●

●●

20 40 60 1900 1920 1940 1960 1980Age Calendar time

0.01

0.02

0.05

0.1

0.2

Inci

denc

e ra

te p

er 1

000

pers

on−y

ears

0.2

0.5

12

5

Rat

e ra

tio

Age-Period-Cohort model (APC-cat) 166/ 238

Tabulation in the LexisdiagramTuesday 24th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Tabulation of register data

Calendar time

Age

1943 1953 1963 1973 1983 199315

25

35

45

55

10773.8

7744.2

13794.1

13972.9

151051.5

33961.0

35952.5

371011.1

491005.0

51929.8

41670.2

30813.0

31744.7

46721.8

49770.9

55960.3

851053.8

110967.5

140953.0

1511019.7

1501017.3

112760.9

55790.5

62781.8

63723.0

82698.6

87764.8

103962.7

1531056.1

201960.9

214956.2

2681031.6

194835.7

56799.3

66774.5

82769.3

88711.6

103700.1

124769.9

164960.4

2071045.3

209955.0

258957.1

251821.2

53769.4

56782.9

56760.2

67760.5

99711.6

124702.3

142767.5

152951.9

1881035.7

209948.6

199763.9

35694.1

47754.3

65768.5

64749.9

67756.5

85709.8

103696.5

119757.8

121940.3

1551023.7

126754.5

29622.1

30676.7

37737.9

54753.5

45738.1

64746.4

63698.2

66682.4

92743.1

86923.4

96817.8

16539.4

28600.3

22653.9

27715.4

46732.7

36718.3

50724.2

49675.5

61660.8

64721.1

51701.5

6471.0

14512.8

16571.1

25622.5

26680.8

29698.2

28683.8

43686.4

42640.9

34627.7

45544.8

Testis cancercases inDenmark.

Maleperson-yearsin Denmark.

Tabulation in the Lexis diagram (Lexis-tab) 167/ 238

Page 62: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Tabulation of register data

Calendar time

Age

1943 1953 1963 1973 1983 199315

25

35

45

55

10773.8

7744.2

13794.1

13972.9

151051.5

33961.0

35952.5

371011.1

491005.0

51929.8

41670.2

30813.0

31744.7

46721.8

49770.9

55960.3

851053.8

110967.5

140953.0

1511019.7

1501017.3

112760.9

55790.5

62781.8

63723.0

82698.6

87764.8

103962.7

1531056.1

201960.9

214956.2

2681031.6

194835.7

56799.3

66774.5

82769.3

88711.6

103700.1

124769.9

164960.4

2071045.3

209955.0

258957.1

251821.2

53769.4

56782.9

56760.2

67760.5

99711.6

124702.3

142767.5

152951.9

1881035.7

209948.6

199763.9

35694.1

47754.3

65768.5

64749.9

67756.5

85709.8

103696.5

119757.8

121940.3

1551023.7

126754.5

29622.1

30676.7

37737.9

54753.5

45738.1

64746.4

63698.2

66682.4

92743.1

86923.4

96817.8

16539.4

28600.3

22653.9

27715.4

46732.7

36718.3

50724.2

49675.5

61660.8

64721.1

51701.5

6471.0

14512.8

16571.1

25622.5

26680.8

29698.2

28683.8

43686.4

42640.9

34627.7

45544.8

Testis cancercases inDenmark.

Maleperson-yearsin Denmark.

Tabulation in the Lexis diagram (Lexis-tab) 168/ 238

Tabulation of register data

Calendar time

Age

1983 1984 1985 1986 1987 198830

31

32

33

34

35

209955.0

Testis cancer casesin Denmark.

Male person-yearsin Denmark.

Tabulation in the Lexis diagram (Lexis-tab) 169/ 238

Tabulation of register data

Calendar time

Age

1983 1984 1985 1986 1987 198830

31

32

33

34

35

738.0

538.0

938.1

1038.2

838.3

638.0

738.0

938.1

1138.2

1038.3

1238.1

737.9

1338.0

838.1

838.2

838.7

438.0

637.9

1138.0

1138.1

1240.2

538.7

538.0

1137.9

638.0

1988

35 Testis cancer casesin Denmark.

Male person-yearsin Denmark.

Tabulation in the Lexis diagram (Lexis-tab) 170/ 238

Page 63: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Tabulation of register data

Calendar time

Age

1983 1984 1985 1986 1987 198830

31

32

33

34

35

019.1

118.9

419.2

319.2

619.1

718.9

419.2

518.9

719.0

219.2

319.0

419.1

518.9

619.2

619.2

319.0

318.9

419.1

519.0

419.1

618.8

319.0

819.1

318.9

219.2

619.2

418.9

518.9

519.2

619.0

419.1

318.8

319.0

819.1

418.9

419.7

119.2

318.9

318.9

719.2

819.2

219.0

218.8

519.1

219.1

420.9

319.6

319.2

618.9

418.9

Testis cancer casesin Denmark.

Male person-yearsin Denmark.

Subdivision by yearof birth (cohort).

Tabulation in the Lexis diagram (Lexis-tab) 171/ 238

Major sets in the Lexis diagram

A-sets: Classification by age and period. ( )

B-sets: Classification by age and cohort. ( )

C-sets: Classification by cohort and period. ( )

The mean age, period and cohort for these sets isjust the mean of the tabulation interval.

The mean of the third variable is found by usinga = p− c.

Tabulation in the Lexis diagram (Lexis-tab) 172/ 238

Analysis of rates from a complete observation in aLexis digram need not be restricted to theseclassical sets classified by two factors.

We may classifiy cases and risk time by all threefactors:

Upper triangles: Classification by age and period,earliest born cohort. ( )

Lower triangles: Classification by age and cohort,last born cohort. ( )

Tabulation in the Lexis diagram (Lexis-tab) 173/ 238

Page 64: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Mean time in triangles

Modelling requires that each set (=observation inthe dataset) be assigned a value of age, period andcohort. So for each triangle we need:

� mean age at risk.

� mean date at risk.

� mean cohort at risk.

Tabulation in the Lexis diagram (Lexis-tab) 174/ 238

Means in upper (A) and lower (B) triangles:

A

B

0 10

1

2

p

a

pp−a

a

0

Tabulation in the Lexis diagram (Lexis-tab) 175/ 238

Upper triangles ( ), A:

EA(a) =

∫ p=1

p=0

∫ a=1

a=p

a× 2 da dp =

∫ p=1

p=0

1− p2 dp = 23

EA(p) =

∫ a=1

a=0

∫ p=a

p=0

p× 2 dp da =

∫ a=1

a=0

a2 dp = 13

EA(c) =13− 2

3= −1

3

Tabulation in the Lexis diagram (Lexis-tab) 176/ 238

Page 65: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Lower triangles ( ), B:

EB(a) =

∫ p=1

p=0

∫ a=p

a=0

a× 2 da dp =

∫ p=1

p=0

p2 dp = 13

EB(p) =

∫ a=1

a=0

∫ p=1

p=a

p× 2 dp da =

∫ a=1

a=0

1− a2 dp = 23

EB(c) =23− 1

3= 1

3

Tabulation in the Lexis diagram (Lexis-tab) 177/ 238

Tabulation by age, period and cohort

Period

Age

1982 1983 1984 19850

1

2

3

197923 19801

3 198023 19811

3 198123

198213

198223

198313

198323

198413

198213 19822

3 198313 19832

3 198413 19842

3

13

23

113

123

213

223

Gives triangularsets withdiffering meanage, period andcohort:

These correctmidpoints forage, period andcohort must beused inmodelling.

Tabulation in the Lexis diagram (Lexis-tab) 178/ 238

Population figures

Population figures in the form of size of thepopulation at certain date are available from moststatistical bureaux.

This corresponds topopulation sizes alongthe vertical linesindicated in the diagram.We want risk time figuresfor the population in thesquares and triangles inthe diagram.

Calendar time

Age

1990 1992 1994 1996 1998 20000

2

4

6

8

10

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

Tabulation in the Lexis diagram (Lexis-tab) 179/ 238

Page 66: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Prevalent population figures

�a,p is the number ofpersons in age calss aalive at the beginningof period (=year) p.

The aim is to computeperson-years for thetriangles A and B,respectively.

��

��

���

��

��

��

���

��

�a,p

�a+1,p

a

a+ 1

a+ 2

�a,p+1

�a+1,p+1

year p

A

B

Tabulation in the Lexis diagram (Lexis-tab) 180/ 238

A person dying in age a at date pin A contributes p risk time, so theaverage will be:

∫ p=1

p=0

∫ a=1

a=p

2p da dp

=

∫ p=1

p=0

2p(1− p) dp

=

[p2 − 2

3p3]p=1

p=0

=1

3

p

a

Tabulation in the Lexis diagram (Lexis-tab) 181/ 238

A person dying in age a at date pin B contributes p− a risk time inA, so the average will be:

∫ p=1

p=0

∫ a=p

a=0

2(p− a) da dp

=

∫ p=1

p=0

[2pa− a2

]a=p

a=0dp

=

∫ p=1

p=0

p2 dp =1

3

pp−a

a

0

Tabulation in the Lexis diagram (Lexis-tab) 182/ 238

Page 67: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

A person dying in age a at date pin B contributes a risk time in B,so the average will be:

∫ p=1

p=0

∫ a=p

a=0

2a da dp

=

∫ p=1

p=0

p2 dp =1

3

pp−a

a

0

Tabulation in the Lexis diagram (Lexis-tab) 183/ 238

Contributions to risk time in A and B:

A: B:

Survivors: �a+1,p+1 × 12y �a+1,p+1 × 1

2y

Dead in A: 12(�a,p − �a+1,p+1)× 1

3y

Dead in B: 12(�a,p − �a+1,p+1)× 1

3y 1

2(�a,p − �a+1,p+1)× 1

3y∑

(13�a,p +

16�a+1,p+1)× 1y (1

6�a,p +

13�a+1,p+1)× 1y

Tabulation in the Lexis diagram (Lexis-tab) 184/ 238

Population as of 1. January from StatisticsDenmark:

Men Women2000 2001 2002 2000 2001 2002

Age22 33435 33540 32272 32637 32802 3170923 35357 33579 33742 34163 32853 3315624 38199 35400 33674 37803 34353 3307025 37958 38257 35499 37318 37955 3452626 38194 38048 38341 37292 37371 3811927 39891 38221 38082 39273 37403 37525

Tabulation in the Lexis diagram (Lexis-tab) 185/ 238

Page 68: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Exercise:

Fill in the risktime figures inas manytriangles aspossible fromthe previoustable for menand women,respectively.

Calendar time

Age

2000 2001 200222

23

24

25

26

27

Calendar time

Age

2000 2001 200222

23

24

25

26

27

Tabulation in the Lexis diagram (Lexis-tab) 186/ 238

Summary:

Populationrisk time:

A: (13�a,p+

16�a+1,p+1)× 1y

B: (16�a−1,p+

13�a,p+1)× 1y

Mean age, periodand cohort:13 into the inter-val.

1980 1981 1982 198360

61

62

63

Age

Calendar time

● 6113

6123

192023 19211

3 198213 19822

3

AB

L62,1981

L61,1981

L60,1980

L61,1980

Tabulation in the Lexis diagram (Lexis-tab) 187/ 238

APC-model for triangular dataWednesday 25th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 69: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Model for triangular data

� One parameter per distinct value on eachtimescale.

� Example: 3 age-classes and 3 periods:� 6 age parameters� 6 period parameters� 10 cohort parameters

� Model:λap = αa + βp + γc

APC-model for triangular data (APC-tri) 188/ 238

Problem: Disconnected design!

Log-likelihood contribution from one triangle:

Daplog(λap)−λapYap = Daplog(αa+βp+γc)−(αa+βp+γc)

The log-likelihood can be separated:∑a,p∈

Daplog(λap)−λapYap+∑

a,p∈Daplog(λap)−λapYap

No common parameters between terms — we havetwo separate models:One for upper triangles, one for lower.

APC-model for triangular data (APC-tri) 189/ 238

Illustration by lung cancer data

> library( Epi )> data( lungDK )> lungDK[1:10,]

A5 P5 C5 up Ax Px Cx D Y1 40 1943 1898 1 43.33333 1944.667 1901.333 52 336233.82 40 1943 1903 0 41.66667 1946.333 1904.667 28 357812.73 40 1948 1903 1 43.33333 1949.667 1906.333 51 363783.74 40 1948 1908 0 41.66667 1951.333 1909.667 30 390985.85 40 1953 1908 1 43.33333 1954.667 1911.333 50 391925.36 40 1953 1913 0 41.66667 1956.333 1914.667 23 377515.37 40 1958 1913 1 43.33333 1959.667 1916.333 56 365575.58 40 1958 1918 0 41.66667 1961.333 1919.667 43 383689.09 40 1963 1918 1 43.33333 1964.667 1921.333 44 385878.510 40 1963 1923 0 41.66667 1966.333 1924.667 38 371361.5

APC-model for triangular data (APC-tri) 190/ 238

Page 70: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Fill in the num-ber of cases (D)and person-years(Y) from previousslide.

Indicate birth co-horts on the axesfor upper andlower triangles.

Mark mean dateof birth for these. Calendar time

Age

1943 1953 196340

50

60

APC-model for triangular data (APC-tri) 191/ 238

Fill in the num-ber of cases (D)and person-years(Y) from previousslide.

Indicate birth co-horts on the axesfor upper andlower triangles.

Mark mean dateof birth for these. Calendar time

Age

1943 1953 196340

50

60

52336.2

28357.8

51363.8

30391.0

50391.9

23377.5

56365.6

43383.7

70301.4

65320.9

77327.8

86349.0

115355.0

93383.3

124383.7

102370.7

84255.3

113283.7

152290.3

140310.2

235315.7

207338.2

282343.0

226372.8

106227.6

155243.4

196242.5

208269.9

285274.4

311296.9

389299.1

383323.3

APC-model for triangular data (APC-tri) 192/ 238

APC-model with “synthetic” cohorts

> mc <- glm( D ~ factor(A5) - 1 ++ factor(P5-A5) ++ factor(P5) + offset( log( Y ) ),+ family=poisson )> summary( mc )

...

Null deviance: 1.0037e+08 on 220 degrees of freedomResidual deviance: 8.8866e+02 on 182 degrees of freedom

No. parameters: 220− 182 = 38.

A = 10, P = 11, C = 20 ⇒ A+P+C−3 = 38

APC-model for triangular data (APC-tri) 193/ 238

Page 71: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

APC-model with “correct” cohorts

> mx <- glm( D ~ factor(Ax) - 1 ++ factor(Cx) ++ factor(Px) + offset( log( Y ) ),+ family=poisson )> summary( mx )...

Null deviance: 1.0037e+08 on 220 degrees of freedomResidual deviance: 2.8473e+02 on 144 degrees of freedom

No. parameters: 220− 144 = 76 (= 38× 2).

A = 20, P = 22, C = 40 ⇒ A+P+C−3 = 79 �=We have fitted two age-period-cohort modelsseparately to upper and lower triangles.

APC-model for triangular data (APC-tri) 194/ 238

40 50 60 70 80 90

12

510

2050

100

200

Age

Rat

e pe

r 100

,000

●●

1860 1880 1900 1920 1940

0.1

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Cohort

RR

● ● ●

1950 1960 1970 1980 1990

0.1

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Period

RR

APC-model for triangular data (APC-tri) 195/ 238

40 50 60 70 80 90

12

510

2050

100

200

Age

Rat

e pe

r 100

,000

●●

1860 1880 1900 1920 1940

0.1

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Cohort

RR

●●

● ●

● ● ●

1950 1960 1970 1980 1990

0.1

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Period

RR

●●● ●●

APC-model for triangular data (APC-tri) 196/ 238

Page 72: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Now, explicitly fit models for upper and lowertriangles:

> mx.u <- glm( D ~ factor(Ax) - 1 ++ factor(Cx) ++ factor(Px) + offset( log( Y/10^5 ) ), family=po+ data=lungDK[lungDK$up==1,] )> mx.l <- glm( D ~ factor(Ax) - 1 ++ factor(Cx) ++ factor(Px) + offset( log( Y/10^5 ) ), family=po+ data=lungDK[lungDK$up==0,] )> mx$deviance[1] 284.7269> mx.l$deviance[1] 134.4566> mx.u$deviance[1] 150.2703> mx.l$deviance+mx.u$deviance[1] 284.7269

APC-model for triangular data (APC-tri) 197/ 238

●●

● ●

40 50 60 70 80 90

0.5

1.0

2.0

5.0

10.0

20.0

50.0

Age

Rat

e pe

r 100

,000

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●● ●

●●●

●●●

1860 1880 1900 1920 1940

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Cohort

RR

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●●●●●●

●●●●●●●

● ●

● ● ● ●● ● ●

● ● ●●●

●●

●●

●●

● ●

1950 1960 1970 1980 1990

0.2

0.5

1.0

2.0

5.0

10.0

20.0

Period

RR

●●●● ●●

●● ●● ●● ●● ●

●●●

●●●●

APC-model for triangular data (APC-tri) 198/ 238

APC-model: ParametrizationWednesday 25th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 73: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

What’s the problem?

One parameter is assigned to each distinct value ofthe timescales, the ordering of the variables is notused.

The solution is to “tie together” the points on thescales together with smooth functions of the meanage, period and cohort with three functions:

λap = f(a) + g(p) + h(c)

The practical problem is how to choose a reasonableparametrization of these functions, and how to getestimates.

APC-model: Parametrization (APC-par) 199/ 238

The identifiability problem still exists:

c = p− a ⇔ p− a− c = 0

λap = f(a) + g(p) + h(c)

= f(a) + g(p) + h(c) + γ(p− a− c)

= f(a) − μa − γa +

g(p) + μa + μc + γp +

h(c) − μc − γc

A decision on parametrization is needed.It must be external to the model.

APC-model: Parametrization (APC-par) 200/ 238

Smooth functions

log[λ(a, p)] = f(a) + g(p) + h(c)

Possible choices for parametric functions describingthe effect of the three continuous variables:

� Polynomials / fractional polynomials.

� Linear / quadratic / cubic splines.

� Natural splines.

All of these contain the linear effect as special case.

APC-model: Parametrization (APC-par) 201/ 238

Page 74: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Parametrization of effects

There are still three “free” parameters:

f(a) = f(a) − μa − γa

g(p) = g(p) + μa + μc + γp

h(c) = h(c) − μc − γc

Choose μa, μc and γ according to some criterion forthe functions.

APC-model: Parametrization (APC-par) 202/ 238

Parametrization principle

1. The age-function should be interpretable as logage-specific rates in cohort c0 after adjustmentfor the period effect.

2. The cohort function is 0 at a reference cohortc0, interpretable as log-RR relative to cohort c0.

3. The period function is 0 on average with 0slope, interpretable as log-RR relative to theage-cohort prediction. (residual log-RR).

Longitudinal or cohort age-effects.

Biologically interpretable — what happens duringthe lifespan of a cohort?

APC-model: Parametrization (APC-par) 203/ 238

Alternatively, the period function could beconstrained to be 0 at a reference date, p0.

Then, age-effects at a0 = p0 − c0 would equal thefitted rate for period p0 (and cohort c0), and theperiod effects would be residual log-RRs relative top0.

Cross-sectional or period age-effects?

Bureaucratically interpretable — whats is seen at aparticular date?

APC-model: Parametrization (APC-par) 204/ 238

Page 75: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Implementation:

1. Obtain any set of parameters f(a), g(p), h(c).

2. Extract the trend from the period effect:

g(p) = g(p)− (μ+ βp)

3. Use the functions:

f(a) = f(a) + μ + βa + h(c0) + βc0g(p) = g(p) − μ − βp

h(c) = h(c) + βc − h(c0) − βc0

These functions fulfill the criteria.

APC-model: Parametrization (APC-par) 205/ 238

“Extract the trend”

Not a well-defined concept:

� Regress g(p) on p for all units in the dataset.

� Regress g(p) on p for all different values of p.

� Weighted regression?

How do we get the standard errors?

Matrix-algebra! Projections!

APC-model: Parametrization (APC-par) 206/ 238

Parametic function

Suppose that g(p) is parametrized using the designmatrix M, with the estimated parameters π.

Example: 2nd order polymomial:

M =

⎡⎢⎢⎣

1 p1 p211 p2 p22...

......

1 pn p2n

⎤⎥⎥⎦ π =

⎡⎣ π0

π1π2

⎤⎦ g(p) = Mπ

nrow(M) is the number of observations in thedataset.

APC-model: Parametrization (APC-par) 207/ 238

Page 76: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Extract the trend from g:

〈g(p)|1〉 = 0, 〈g(p)|p〉 = 0

i.e. g is orthogonal to [1|p].Suppose g(p) = Mπ, then for any parameter vectorπ:

〈Mπ|1〉 = 0, 〈Mπ|p〉 = 0 =⇒ M ⊥ [1|p]Thus we just need to be able to produce M fromM: Projection on the orthogonal space ofspan([1|p]).(Orthogonality requires an inner product!)

APC-model: Parametrization (APC-par) 208/ 238

Practical parametization

1. Set up model matrices for age, period andcohort, Ma, Mp and Mc. Intercept in all three.

2. Extract the linear trend from Mp and Mc, byprojecting their columns onto the orthogonalcomplement of [1|p] and [1|c].

3. Center the cohort effect around c0: Take a rowfrom Mc corresponding to c0, replicate todimension as Mc, and subtract it from Mc toform Mc0.

APC-model: Parametrization (APC-par) 209/ 238

4. Use:Ma for the age-effects,Mp for the period effects and

[c− c0|Mc0] for the cohort effects.

5. Value of f(a) is Maβa, similarly for the other

two effects. Variance is found by M ′aΣaMa,

where Σa is the variance-covariance matrix ofβa.

APC-model: Parametrization (APC-par) 210/ 238

Page 77: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Information in the data and inner product

Log-lik for an observation (D, Y ), log-rate θ:

l(θ|D, Y ) = Dθ−eθY, l′θ = D−eθY, l′′θ = −eθY

so I(θ) = eθY = λY = D.

Two relevant inner products:

〈mj|mk〉 =∑i

mijmik 〈mj|mk〉 =∑i

mijwimik

the weights could be chosen as wi = Di, i.e.proportional to the information content in the unitsof the dataset.

APC-model: Parametrization (APC-par) 211/ 238

How to?Implemented in apc.fit:

m1 <- apc.fit( A=lungDK$Ax,P=lungDK$Px,D=lungDK$D,Y=lungDK$Y/10^5,

ref.c=1900 )apc.plot( m1 )

Consult the help page for:apc.fit to see options for weights in inner product,type of function, variants of parametrization etc.apc.plot, apc.lines and apc.frame to see howto plot the results.

APC-model: Parametrization (APC-par) 212/ 238

40 50 60 70 80 90 1860 1880 1900 1920 1940 1960 1980 2000Age Calendar time

10

20

50

100

200

Rat

e

0.1

0.2

0.5

11

2

Rat

e ra

tio

●●

Weighted i.p.

APC-model: Parametrization (APC-par) 213/ 238

Page 78: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

40 50 60 70 80 90 1860 1880 1900 1920 1940 1960 1980 2000Age Calendar time

10

20

50

100

200

Rat

e

0.1

0.2

0.5

11

2

Rat

e ra

tio

●●●●

Weighted i.p.Naïve i.p.

APC-model: Parametrization (APC-par) 214/ 238

40 50 60 70 80 90 1860 1880 1900 1920 1940 1960 1980 2000Age Calendar time

10

20

50

100

200

Rat

e

0.1

0.2

0.5

11

2

Rat

e ra

tio

●●●●●●

Weighted i.p.Naïve i.p.

AC then P

APC-model: Parametrization (APC-par) 215/ 238

20 40 60 1880 1900 1920 1940 1960 1980 2000Age Calendar time

0.5

1

2

5

10

Rat

e pe

r 100

,000

per

son−

year

s

0.25

0.5

11

2.5

5

Rat

e ra

tio

●●●●

Weighted drift: 2.58 ( 2.42 − 2.74 ) %/year

APC-model: Parametrization (APC-par) 216/ 238

Page 79: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

20 40 60 1880 1900 1920 1940 1960 1980 2000Age Calendar time

0.5

1

2

5

10

Rat

e pe

r 100

,000

per

son−

year

s

0.25

0.5

11

2.5

5

Rat

e ra

tio

●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●

APC-model: Parametrization (APC-par) 217/ 238

APC-models for severaldatasetsWednesday 25th, afternoon

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Two sets of dataExample: Testis cancer in Denmark, Seminoma andnon-Seminoma cases.

> stat.table( list( Histology=hist ),+ list( D=sum(d), Y=sum(y/10^6) ),+ margins = TRUE )----------------------------Histology D Y----------------------------1 4708.00 127.532 3632.00 127.533 466.00 127.53

Total 8806.00 382.58----------------------------

First step is separate analyses for each subtype.

APC-models for several datasets (APC-2) 218/ 238

Page 80: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

20 40 60 1880 1900 1920 1940 1960 1980 2000Age Calendar time

0.5

1

2

5

10

Rat

e pe

r 100

,000

per

son−

year

s

0.25

0.5

11

2.5

5

Rat

e ra

tio

●● ●● ●● ●●

SeminomaNon−seminoma

●●

APC-models for several datasets (APC-2) 219/ 238

1880 1900 1920 1940 1960 1980

1

2

3

4

56

Date of birth

Rat

e R

atio

(ref

=193

5)

0.3

0.4

0.50.6

0.8

APC-models for several datasets (APC-2) 220/ 238

1950 1960 1970 1980 1990

1

2

3

4

56

Date of diagnosis

Rat

e R

atio

(ref

=196

0)

0.3

0.4

0.50.6

0.8

APC-models for several datasets (APC-2) 221/ 238

Page 81: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Analysis of two rates: Formal tests I

> Ma <- ns( A, df=15, intercept=TRUE )> Mp <- ns( P, df=15 )> Mc <- ns( P-A, df=20 )> Mp <- detrend( Mp, P, weight=D )> Mc <- detrend( Mc, P-A, weight=D )>> m.apc <- glm( D ~ -1 + Ma:type + Mp:type + Mc:type + offset( l> m.ap <- update( m.apc, . ~ . - Mc:type + Mc )> m.ac <- update( m.apc, . ~ . - Mp:type + Mp )> m.a <- update( m.ap , . ~ . - Mp:type + Mp )>> anova( m.a, m.ac, m.apc, m.ap, m.a, test="Chisq")Analysis of Deviance Table

Model 1: D ~ Mc + Mp + Ma:type + offset(log(Y)) - 1Model 2: D ~ Mp + Ma:type + type:Mc + offset(log(Y)) - 1Model 3: D ~ -1 + Ma:type + Mp:type + Mc:type + offset(log(Y))Model 4: D ~ Mc + Ma:type + type:Mp + offset(log(Y)) - 1Model 5: D ~ Mc + Mp + Ma:type + offset(log(Y)) - 1

Resid. Df Resid. Dev Df Deviance P(>|Chi|)

APC-models for several datasets (APC-2) 222/ 238

Analysis of two rates: Formal tests II1 10737 10553.72 10718 10367.9 19 185.7 2.278e-293 10704 10199.6 14 168.3 1.513e-284 10723 10508.6 -19 -309.0 2.832e-545 10737 10553.7 -14 -45.0 4.042e-05

APC-models for several datasets (APC-2) 223/ 238

APC-model: InteractionsFriday 27th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 82: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Analysis of DM-rates: Age×sex interactionI

� 10 centres

� 2 sexes

� Age: 0-15

� Period 1989–1999

� Is the sex-effect the same between all centres?

� How are the timetrends.

APC-model: Interactions (APC-int) 224/ 238

Analysis of DM-rates: Age×sex interactionII

library( Epi )library( splines )load( file="c:/Bendix/Artikler/A_P_C/IDDM/Eurodiab/data/tri.Rdatdm <- dm[dm$cen=="D1: Denmark",]

# Define knots and points of predictionn.A <- 5n.C <- 8n.P <- 5pA <- seq(1/(3*n.A),1-1/(3*n.A),,n.A )pC <- seq(1/(3*n.C),1-1/(3*n.C),,n.P )pP <- seq(1/(3*n.P),1-1/(3*n.P),,n.C )c0 <- 1985attach( dm, warn.conflicts=FALSE )A.kn <- quantile( rep( A, D ), probs=pA[-c(1,n.A)] )A.ok <- quantile( rep( A, D ), probs=pA[ c(1,n.A)] )A.pt <- sort( A[match( unique(A), A )] )C.kn <- quantile( rep( C, D ), probs=pC[-c(1,n.C)] )C.ok <- quantile( rep( C, D ), probs=pC[ c(1,n.C)] )C.pt <- sort( C[match( unique(C), C )] )

APC-model: Interactions (APC-int) 225/ 238

Analysis of DM-rates: Age×sex interactionIII

P.kn <- quantile( rep( P, D ), probs=pP[-c(1,n.P)] )P.ok <- quantile( rep( P, D ), probs=pP[ c(1,n.P)] )P.pt <- sort( P[match( unique(P), P )] )

# Age-cohort model with age-sex interaction# The model matrices for the ML fitMa <- ns( A, kn=A.kn, Bo=A.ok, intercept=TMc <- cbind( C-c0, detrend( ns( C, kn=C.kn, Bo=C.ok ), C, weightMp <- detrend( ns( P, kn=P.kn, Bo=P.ok ), P, weight# The prediction matricesPa <- Ma[match(A.pt,A),,drop=F]Pc <- Mc[match(C.pt,C),,drop=F]Pp <- Mp[match(P.pt,P),,drop=F]

# Fit the apc model by MLapcs <- glm( D ~ Ma:sex - 1 + Mc + Mp +

offset( log (Y/10^5) ),family=poisson,data=dm )

summary( apcs )

APC-model: Interactions (APC-int) 226/ 238

Page 83: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Analysis of DM-rates: Age×sex interactionIV

ci.lin( apcs )ci.lin( apcs, subset="sexF", Exp=T)ci.lin( apcs, subset="sexF", ctr.mat=Pa, Exp=T)

# Extract the effectsF.inc <- ci.lin( apcs, subset="sexF", ctr.mat=Pa, Exp=T)[,5:7]M.inc <- ci.lin( apcs, subset="sexM", ctr.mat=Pa, Exp=T)[,5:7]MF.RR <- ci.lin( apcs, subset=c("sexM","sexF"), ctr.mat=cbind(Pac.RR <- ci.lin( apcs, subset="Mc", ctr.mat=Pc, Exp=T)[,5:7]p.RR <- ci.lin( apcs, subset="Mp", ctr.mat=Pp, Exp=T)[,5:7]

# plt( paste( "DM-DK" ), width=11 )par( mar=c(4,4,1,4), mgp=c(3,1,0)/1.6, las=1 )# The the frame for the effectsfr <- apc.frame( a.lab=c(0,5,10,15),

a.tic=c(0,5,10,15),r.lab=c(c(1,1.5,3,5),c(1,1.5,3,5)*10),r.tic=c(c(1,1.5,2,5),c(1,1.5,2,5)*10),cp.lab=seq(1980,2000,10),cp.tic=seq(1975,2000,5),

APC-model: Interactions (APC-int) 227/ 238

Analysis of DM-rates: Age×sex interactionV

rr.ref=5,gap=1,

col.grid=gray(0.9),a.txt="",cp.txt="",r.txt="",rr.txt="" )

# Draw the estimatesmatlines( A.pt, M.inc, lwd=c(3,1,1), lty=1, col="blue" )matlines( A.pt, F.inc, lwd=c(3,1,1), lty=1, col="red" )matlines( C.pt - fr[1], c.RR * fr[2],

lwd=c(3,1,1), lty=1, col="black" )matlines( P.pt - fr[1], p.RR * fr[2],

lwd=c(3,1,1), lty=1, col="black" )matlines( A.pt, MF.RR * fr[2],

lwd=c(3,1,1), lty=1, col=gray(0.6) )abline(h=fr[2])

APC-model: Interactions (APC-int) 228/ 238

1.5

3

5

1015

30

50

Z2: Czech 0.3

0.6

11

23

6

10

A1: Austria

1.5

3

5

1015

30

C1: Hungary 0.3

0.6

11

23

6

D1: Denmark

1.5

3

5

1015

30

M1: Germany 0.3

0.6

11

23

6

N1: Norway

1.5

3

5

1015

30

S1: Spain 0.3

0.6

11

23

6

U1: UK−Belfast

0 5 10 15 1985 19951

1.5

3

5

1015

30

W3: Poland

0 5 10 15 1985 19950.20.3

0.6

11

23

6

X1: Sweden

Age / Cohort

Rat

es p

er 1

00,0

00 p

erso

n−ye

ars

No. parameters for each curve: 4

Rat

e ra

tio

1.5

3

5

1015

30

50

Z2: Czech 0.3

0.6

11

23

6

10

A1: Austria

1.5

3

5

1015

30

C1: Hungary 0.3

0.6

11

23

6

D1: Denmark

1.5

3

5

1015

30

M1: Germany 0.3

0.6

11

23

6

N1: Norway

1.5

3

5

1015

30

S1: Spain 0.3

0.6

11

23

6

U1: UK−Belfast

0 5 10 15 1985 19951

1.5

3

5

1015

30

W3: Poland

0 5 10 15 1985 19950.20.3

0.6

11

23

6

X1: Sweden

Age / Cohort

Rat

es p

er 1

00,0

00 p

erso

n−ye

ars

No. parameters for each curve: 5

Rat

e ra

tio

APC-model: Interactions (APC-int) 229/ 238

Page 84: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

0 5 10 15 1980 1985 1990 1995Age Date of birth

1.5

2

4

5

10

15

25

Rat

e pe

r 100

,000

0.3

0.4

0.8

11

2

3

5

Rat

e ra

tio

Poland

Austria

Norway

Hungary

Czech

Spain

Germany

Denmark

UK−Belfast

Sweden

APC-model: Interactions (APC-int) 230/ 238

0 5 10 15 1980 1985 1990 1995Age Date of birth

1.5

2

4

5

10

15

25

Rat

e pe

r 100

,000

0.3

0.4

0.8

11

2

3

5

Rat

e ra

tio

Poland

Austria

Norway

Hungary

Czech

Spain

Germany

Denmark

UK−Belfast

Sweden

APC-model: Interactions (APC-int) 231/ 238

Predicting future ratesFriday 27th, morning

Bendix Carstensen

Age-Period-Cohort models19–21 SeptemberUniversity of Lisboa,

www.bendixcarstensen.com/APC/Lisboa.2011

Page 85: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Prediction of future rates

Model:

log(λ(a, p)) = f(a) + g(p) + h(c)

� Why not just extend the estimated functionsinto the future?

� The parametrization curse — the model asstated is not uniquely parametrized.

� Prediction must be invariant underreparametrization.

Predicting future rates (predict) 232/ 238

Identifiability

Predictions based in the three functions (f(a), g(p)and h(c) must give the same prediction also for theversion:

log(λ(a, p)) = f(a) + g(p) + h(c)

=(f(a)− γa

)+(

g(p) + γp)+(

h(c)− γc)

Predicting future rates (predict) 233/ 238

Prediction of the future course of g and h mustpreserve addition of a linear term in the argument:

pred(g(p) + γp

)= pred

(g(p)) + γp

)pred

(h(c)− γc

)= pred

(h(c))− γc

)If this is met, the predictions made will not dependon the parametrizatioin chosen.

If one of the conditions does not hold, the predictionwil depend on the parametrization chosen.

Any linear combination of (known) function valuesof g(p) and h(c) will work.

Predicting future rates (predict) 234/ 238

Page 86: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Identifiability

� Any linear combination of function values ofg(p) and h(c) will work.

� Coefficients in the linear combinations used forg and h must be the same; otherwise theprediction will depend on the specificparametrization.

� What works best in reality is difficult to say:depends on the subject matter.

Predicting future rates (predict) 235/ 238

Example: Breast cancer in Denmark

30 50 70 90 1860 1880 1900 1920 1940 1960 1980 2000 2020Age Calendar time

10

20

50

100

200

500

Rat

e pe

r 100

,000

per

son−

year

s

0.1

0.2

0.5

11

2

5

Rat

e ra

tio

Predicting future rates (predict) 236/ 238

Practicalities

� Long term predicitons notoriously unstable.

� Decreasing slopes are possible, therequitrement is that at any future pointchanges in the parametrization should cancelout in the predictions.

Predicting future rates (predict) 237/ 238

Page 87: IntroductionEachlinea person Eachbloba death Studyended at31Dec. 2003 Calendar time 1993 1995 1997 1999 2001 2003

Bresat cancer prediction

30 40 50 60 70 80 90

1020

5010

020

050

0

Age

Pre

dict

ed ra

tes

per 1

00,0

00

Predicted age-specific breastcancer rates at2020 (black),

in the 1950cohort (blue),

and the es-timated age-effects (red).

Predicting future rates (predict) 238/ 238