Lecture 6: Repeated Measures Analyses Elizabeth Garrett [email protected] Child Psychiatry Research...

Lecture 6:Repeated Measures Analyses

Elizabeth Garrett

[email protected]

Child Psychiatry Research Methods Lecture Series

Outline for Today

Overview

ANOVA models

Repeated Measures ANOVA

Longitudinal Data Analysis

Overview• Linear and logistic regression thus far:

– assume each individual has one observation– e.g. one exposure one outcome– can’t go back and “unexpose” the individual and see what happens

• Practically,– useful to do “experiments” with more than one exposure and more than one outcome per

individual– each individual serves as his own control– much “tighter” design in terms of variability

– this is called “repeated measures:” the outcome is observed on the same individual at multiple

times under different conditions. • Generalization of repeated measures: longitudinal analysis

– observe multiple outcomes on the same individual at different times– might be observational or experimental– exposure/treatment may or may not vary at different times.

ANOVA models• ANOVA = analysis of variance (bad name!)• A simple case of linear regression

– continuous outcome

– categorical dependent variable(s)

• Why do we hear about ANOVA so often if it is just a special case of linear regression?– Historically, very popular because….easy to perform WITHOUT a computer!

– Very prevalent in psychometrics

– Interpretation is nice and simple

– In its simplest form, an ANOVA represents a generalization of the two sample t-test. It allows for the testing of more than two groups.

– Tests to see if means in all groups are equal.

– Instead of t-statistic, we look at F-statistic

ANOVA for Independent ObservationsExample: Drug Study of Hyperactivity in Children under Age 10

• 180 observations on children with hyperactivity.

• Hyperactivity (H) measured by a “scale” instrument – range is 0 to 30– child is designated as “hyperactive” if score > 15– to enter the study, must score > 20

• 3 Treatments: 60 placebo, 60 ritalin, 60 “new” drug.• Evaluation based on hyperactivity score (H) measured at study end (2

weeks).

• Questions: – Do all three treatments have approximately the same effect? – Is the new drug better than placebo?– Is the new drug as good as ritalin?

10

15

20

25

30

Hyp

era

ctiv

ity S

core

Placebo Ritalin New Drug

10

15

20

25

30


Intuitive approach

• Estimate mean H in each group: p = mean of H in the placebo group

r = mean of H in the ritalin group

n = mean of H in the new drug group

• Test if the means are the same or different– H0: group means are all the same

– H1: at least one group mean is different than some other group mean.

H

H

p r n

p r n

n r p

r n p

r n p

0

1

:

:

H p la ceb o

H rita lin

H n ew d ru g

i p

i r

i n

^

^

^

|

|

|

0

0 1

0 2

Nice thing about ANOVA models…..

0 is the estimated score for kids on placebo

1 is the “treatment” effect of ritalin

2 is the “treatment” effect of the new drug

1 - 2 is the difference in effect between ritalin and the new drug

H I rita lin I new drugi i 0 1 2( ) ( )

Hyperactivity Example Results Source | SS df MS Number of obs = 180

---------+------------------------------ F( 2, 177) = 121.53

Model | 2213.37778 2 1106.68889 Prob > F = 0.0000

Residual | 1611.86667 177 9.10659134 R-squared = 0.5786

---------+------------------------------ Adj R-squared = 0.5739

Total | 3825.24444 179 21.3700807 Root MSE = 3.0177

------------------------------------------------------------------------------

H | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

Itrt_2 | -8.366667 .5509565 -15.186 0.000 -9.453956 -7.279378

Itrt_3 | -5.866667 .5509565 -10.648 0.000 -6.953956 -4.779378

_cons | 23.1 .3895851 59.294 0.000 22.33117 23.86883

------------------------------------------------------------------------------

H I rita lin I n ew d ru gi

^

. . ( ) . ( ) 2 3 1 8 3 7 5 8 7

H p la ceb o

H rita lin

H n ew d ru g

i p

i r

i n

^

^

^

| .

| .

| .

0

0 1

0 2

2 3 1

1 4 7

1 7 2

ANOVA Table

Number of obs = 180 R-squared = 0.5786

Root MSE = 3.01771 Adj R-squared = 0.5739

Source | Partial SS df MS F Prob > F

-----------+----------------------------------------------------

Model | 2213.37778 2 1106.68889 121.53 0.0000

|

trt | 2213.37778 2 1106.68889 121.53 0.0000

|

Residual | 1611.86667 177 9.10659134

-----------+----------------------------------------------------

Total | 3825.24444 179 21.3700807

.

Answers to scientific questions

1. Do all three treatments have approximately the same effect?

2. Is new drug better than placebo?

3. Is the new drug as good as ritalin?

No. There is evidence that the intercept alone is not sufficient for describing variability.Why? Pvalue on Fstatistic < 0.001

Yes. Treatment effect is -8.4.Why? Pvalue on 2 is less than 0.001

No. The treatment effect difference is 2.5Why? Pvalue on 2 - 1 is less than 0.001***

Repeated MeasuresWhat happens when we have more than one treatment per individual?

Most often “experiments” and not “observational” studies

Need special methods– each individual is considered more than once– observations from the same person are likely to be correlated

Example: – Consider two kids: One has placebo score of 20 and other has placebo score of

30.– Child with the LOW placebo score also likely to have a LOW ritalin score.– Child with the HIGH placebo score also likely to have a HIGH ritalin score.

Observations from the same child are CORRELATED. Independence assumption of linear regression is violated.

Repeated MeasuresExample: Drug Study of Hyperactivity in Children under Age 10

• 180 observations on 60 children with hyperactivity.• 3 Treatments: placebo, ritalin, “new” drug.• Each child receives one of treatments at times 1,2, and 3.• Order of treatments is random• There is sufficient “wash out” period between treatments to minimize

“carry over” effects• Evaluation based on hyperactivity score (H) measured at study end (2

weeks).

• Questions: – Do all three treatments have approximately the same effect? – Is the new drug better than placebo?– Is the new drug as good as ritalin?

10

15

20

25

30

Hyp

era

ctiv

ity S

core


1 - 15

10

15

20

25

30

Hyp

era

ctiv

ity S

core


15 - 30

10

15

20

25

30

Hyp

era

ctiv

ity S

core


31 - 45

10

15

20

25

30

Hyp

era

ctiv

ity S

core


46 - 60

Repeated Measures ANOVA Results Source | SS df MS Number of obs = 180

---------+------------------------------ F( 61, 118) = 10.37

Model | 3223.95556 61 52.8517304 Prob > F = 0.0000


---------+------------------------------ Adj R-squared = 0.7616

Total | 3825.24444 179 21.3700807 Root MSE = 2.2574

------------------------------------------------------------------------------

H | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

Itrt_2 | -8.366667 .4121355 -20.301 0.000 -9.174437 -7.558896

Itrt_3 | -5.866667 .4121355 -14.235 0.000 -6.674437 -5.058896

_cons | 23.1 .3895851 59.294 0.000 22.33643 23.86357

---------+--------------------------------------------------------------------

H I rita lin I n ew d ru gi

^

. . ( ) . ( ) 2 3 1 8 3 7 5 8 7

H p la ceb o

H rita lin

H n ew d ru g

i p

i r

i n

^

^

^

| .

| .

| .

0

0 1

0 2

2 3 1

1 4 7

1 7 2

ANOVA Table



-----------+----------------------------------------------------

Model | 3223.95556 61 52.8517304 10.37 0.0000

|

id | 1010.57778 59 17.1284369 3.36 0.0000

trt | 2213.37778 2 1106.68889 217.18 0.0000

|

Residual | 601.288889 118 5.09566855

-----------+----------------------------------------------------

Total | 3825.24444 179 21.3700807

So where is the difference?

ANOVARepeated Measures

ANOVA se() se()

0 23.1 0.39 23.1 0.39

1 -8.37 0.55 -8.37 0.41

2 -5.87 0.55 -5.87 0.41





No. There is evidence that the intercept alone is not sufficient for describing variability.Why? Pvalue on Model Fstatistic < 0.001


No. The treatment effect difference is 2.5Why? Pvalue on 2 - 1 is less than 0.001***

Other issues in repeated measures ANOVA

• “Period” Effects Example: – Children screened into study if H > 20.– It is likely that, if we didn’t give them anything, on average, the H scores would go down– This phenomenon is called “regression to the mean”

• Why is this an issue?– We might expect all kids at time 1 to be “worse” than at other time periods.

– We need to adjust for the “period” in which drug was given.• Related issue: “Carry over” effects

– In many studies, the treatment might be curative or at least long-lasting.

– If an individual is cured by a treatment at time 1, we would not want to attribute his effect to placebo at time 2.

– In addition to adjustment (as we will see in a minute), it is important to consider building a “wash out” period into cross-over designs.

Period Adjustment

H I rita lin I n ew d ru g I j I jij i j j j j ij 0 1 2 3 42 3( ) ( ) ( ) ( )

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

1

1

1

0

0 1

0 2

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

2

2

2

0 3

0 1 3

0 2 3

+

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

3

3

3

0 4

0 1 4

0 2 4

+

,

Source | SS df MS Number of obs = 180

---------+------------------------------ F( 63, 116) = 11.69

Model | 3304.89281 63 52.4586161 Prob > F = 0.0000


---------+------------------------------ Adj R-squared = 0.7901

Total | 3825.24444 179 21.3700807 Root MSE = 2.118

------------------------------------------------------------------------------


---------+--------------------------------------------------------------------

Itrt_2 | -8.289234 .3878228 -21.374 0.000 -9.049352 -7.529115

Itrt_3 | -5.876126 .3892903 -15.094 0.000 -6.639121 -5.113131

Itime_2 | -.8925736 .3888018 -2.296 0.022 -1.654611 -.1305361

Itime_3 | -1.643255 .3873324 -4.242 0.000 -2.402413 -.8840976

_cons | 23.92262 .4452343 53.730 0.000 23.04998 24.79526

---------+--------------------------------------------------------------------

Repeated Measures ANOVA Results

H I rita lin I n ew d ru g I j I jij j j j j

^

. . ( ) . ( ) . ( ) . ( ) 2 3 9 8 3 5 9 0 8 9 2 1 6 4 34

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

1 2 3 9

1 1 5 6

1 1 8 0

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

2 2 3 0

2 1 4 7

2 1 7 2

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

3 2 2 3

3 1 4 0

3 1 6 4

,

ANOVA Table




-----------+----------------------------------------------------

Model | 3304.89281 63 52.4586161 11.69 0.0000

|

id | 1010.57778 59 17.1284369 3.82 0.0000

trt | 2163.63726 2 1081.81863 241.17 0.0000

time | 80.9372588 2 40.4686294 9.02 0.0002

|

Residual | 520.35163 116 4.48578991

-----------+----------------------------------------------------

Total | 3825.24444 179 21.3700807







No. The treatment effect difference is 2.4Why? Pvalue on 2 - 1 is less than 0.001****

Is that it? Not quite…..• Interactions!

• Is it possible that the effect of treatment is different at different times?

• Current model: forces treatment effects to be the same across all time periods.

• Why might this not be okay? – What if the kids would get “better” by time period 3 anyway?

(Think about diseases/disorders which have “flares”, e.g. depression, herpes).

• Interactions allow more flexibility in the model

• They allow the treatment effects to be different at different times.

“Full blown” model

H I rita lin I n ew d ru g

I j I j

I j I r ita lin I j I n ew d ru g

I j I rita lin I j I n ew d ru g

ij i j j

j j

j j j j

j j j j

ij

0 1 2

3 4

5 6

7 8

2 3

2 2

3 3

( ) ( )

( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

Including Interactions

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

1

1

1

0

0 1

0 2

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

2

2

2

0 3

0 1 3 5

0 2 3 7

+

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| ,

|

| ,

3

3

3

0 4

0 1 4 6

0 2 4 8

+

, +

+

How do we measure treatment effects?

• Now that we have interactions, 1 is not “treatment effect” of ritalin and 2 is not “treatment effect” effect of new drug

• For ritalin: 1 is the treatment effect of ritalin at time 1

1 + 5 is the treatment effect of ritalin at time 2

1 + 7 is the treatment effect of ritalin at time 3

• For new drug: 2 is the treatment effect of new drug at time 1

2 + 6 is the treatment effect of new drug at time 2

2 + 8 is the treatment effect of new drug at time 3

Are they significant? Source | SS df MS Number of obs = 180

---------+------------------------------ F( 67, 112) = 11.72

Model | 3347.79611 67 49.9671062 Prob > F = 0.0000


---------+------------------------------ Adj R-squared = 0.8005

Total | 3825.24444 179 21.3700807 Root MSE = 2.0647

------------------------------------------------------------------------------


---------+--------------------------------------------------------------------

Itrt_2 | -9.486124 .8174333 -11.605 0.000 -11.08826 -7.883985

Itrt_3 | -6.954453 .7701922 -9.030 0.000 -8.464002 -5.444904

Itime_2 | -1.788544 .7639526 -2.341 0.019 -3.285864 -.2912243

Itime_3 | -3.104565 .82816 -3.749 0.000 -4.727729 -1.481401

ItXt_2_2 | 1.682091 1.195857 1.407 0.160 -.6617452 4.025926

ItXt_2_3 | 1.904287 1.207546 1.577 0.115 -.4624599 4.271034

ItXt_3_2 | .9351192 1.203284 0.777 0.437 -1.423273 3.293512

ItXt_3_3 | 2.305485 1.20026 1.921 0.055 -.0469805 4.65795

_cons | 24.69504 .6075533 40.647 0.000 23.50426 25.88583

Need another F-test




-----------+----------------------------------------------------

Model | 3347.79611 67 49.9671062 11.72 0.0000

|

id | 1049.02243 59 17.7800412 4.17 0.0000

trt | 2110.38776 2 1055.19388 247.53 0.0000

time | 89.5774582 2 44.7887291 10.51 0.0001

trt*time | 42.9032988 4 10.7258247 2.52 0.0454|

Residual | 477.448331 112 4.26293153

-----------+----------------------------------------------------

Total | 3825.24444 179 21.3700807

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

1 2 4 7

1 1 5 2

1 1 7 7

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

2 2 2 9

2 1 5 1

2 1 6 9

,

H p la ceb o tim e

H rita lin tim e

H n ew d ru g tim e

i

i

i

^

^

^

| , .

| .

| , .

3 2 1 6

3 1 4 0

3 1 6 9

,

R ita lin E ffec t|tim e

N ew D ru g E ffec t|tim e

d ifferen ce

p

1 9 5

1 7 0

2 5 3

0 0 0 1

.

.

.

( . )



d ifferen ce

p

2 7 8

2 6 0

1 7 8

0 0 3

.

.

.

( . )



d ifferen ce

p

1 7 6

1 4 6

2 9 3

0 0 0 1

.

.

.

( . )






Yes. Treatment effects are -9.5,-7.8,-7.6.Why? Pvalue on 2 is less than 0.001***

No. The treatment effect differences are -2.5,-1.8,-2.9Why? Pvalue on differences are less than 0.05.

Longitudinal Analyses

• Repeated measures ANOVA is a simple case of a longitudinal analysis

• In longitudinal analysis:– can be observational or experimental study– the observations can be at random times (need

not be at time 1, time 2, and time 3 as previously.)

– Generally, time is an important component of the study.

New example:

• Depression in adults• Due to the episodic nature of depression, if an

individual is in a depressive episode today, s/he is likely to not be in one in 8 weeks

• Evaluation of anti-depressants can be difficult for that reason.

• In evaluation of treatments, we care about not only IF a treatment works, but HOW SOON it works.

Clinical Trial of Paroxetine (hypothetical)

• 200 individuals blindly randomized to receive either paroxetine or placebo beginning at week 0.

• Subjects are screened at week -1 and have to score at least 22 on Hamilton D depression scale.

• Subjects are followed for 8 weeks with evaluations at weeks 0 (baseline), 1, 2, 4, 6, 8.

• Outcome measure is Hamilton D score.

• Questions:– Is paroxetine more effective than placebo?

– Do individuals tend to improve more quickly on paroxetine versus placebo?

Change in HamD score from week 0 to week 6

-12

-10

-8-6

Paroxetine Placebo

Ha

milt

on

D S

core

If we stopped here, we would conclude that the drug was useless!

Cha

nge

in

Look at data over time….

Time in Weeks

Ha

milt

on

D s

core

0 2 4 6 8

15

20

25

30

ParoxetinePlacebo

Random Effects Models

• Assumes that an individual has his/her own “intercept”/”effect”.

• Observations within individuals are correlated.

• The model estimates intercept for each person, but assumes that individuals have the same slope (within covariate groups)

Y tim e eij i ij 0 1

Time

Y

Notice the “i” subscript

Covariates

• Time: we know that HamD changes over time. To get “curvy” line, we need to include more than just a linear time variable.

• Paroxetine: we want to see if the paroxetine group differs from the placebo group

Y trt w eek w eek eij i i j j ij 0 1 2 32

Results xtreg y week week2 trt, i(id)

Random-effects GLS regression Number of obs = 1200

Group variable (i) : id Number of groups = 200

R-sq: within = 0.7288 Obs per group: min = 6

between = 0.4242 avg = 6.0

overall = 0.6090 max = 6

Random effects u_i ~ Gaussian Wald chi2(3) = 2828.24

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------

y | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

week | -1.112177 .0740175 -15.026 0.000 -1.257249 -.9671058

week2 | .0126631 .0089831 1.410 0.159 -.0049435 .0302697

trt | -3.4973 .2895713 -12.078 0.000 -4.064849 -2.92975

_cons | 26.22405 .226547 115.755 0.000 25.78003 26.66808

---------+--------------------------------------------------------------------

sigma_u | 1.8942426

sigma_e | 1.9043451

rho | .49734047 (fraction of variance due to u_i)

------------------------------------------------------------------------------

Results

Time in Weeks

Ha

m D

Sco

re

0 2 4 6 8

16

18

20

22

24

26

Paroxetine

Placebo

Y trt w eek w eekij i j j 2 6 2 3 5 11 0 0 1 2. . . .

[ | ] . . .Y trt w eek w eekij i j j 0 2 6 2 11 0 0 1 2

[ | ] . . .Y trt w eek w eekij i i i 1 2 2 7 11 0 0 1 2

Time in Weeks

Ha

milt

on

D s

core

0 2 4 6 8

15

20

25

30

ParoxetinePlacebo

Something is wrong with our model!

Problem: We need to let the treatment VARY over time!

• The approach above simply ADJUSTS for time.

• We want to see how the relationship differs between treatment groups over time.

• We need interactions again!

[ | ] ( ) ( ) ( )Y trt w eek w eek eij i i j j ij 1 0 1 2 4 3 52

[ | ]Y trt w eek w eek eij i i j j ij 0 0 2 32

Y trt w eek w eek

trt w eek trt w eek e

ij i i j j

i j i j ij

0 1 2 3

2

4 42

Results

. xi: xtreg y i.trt*week i.trt*week2 , i(id)

------------------------------------------------------------------------------

y | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

Itrt_1 | -.0729227 .1007545 -0.724 0.469 -.2703978 .1245525

week | .8142721 .0546507 14.900 0.000 .7071588 .9213855

week2 | -.2294649 .0066327 -34.596 0.000 -.2424647 -.2164651

ItXwee_1 | -3.852899 .0772877 -49.851 0.000 -4.00438 -3.701418

ItXweea1 | .4842559 .00938 51.626 0.000 .4658714 .5026404

_cons | 24.36439 .2169079 112.326 0.000 23.93926 24.78952

---------+--------------------------------------------------------------------

Results

Paroxetine

Placebo

[ | ] . . .Y trt w eek w eekij i j j 0 2 4 5 0 8 1 0 2 3 2

[ | ] . . .Y trt w eek w eekij i i i 1 2 4 3 3 0 0 2 5 2

Time in Weeks

Ha

m D

Sco

re

0 2 4 6 8

16

18

20

22

24

Time in Weeks

Ha

milt

on

D s

core

0 2 4 6 8

15

20

25

30

ParoxetinePlacebo

References• Diggle, Liang, and Zeger (1994) Analysis of Longitudinal Data• B.S. Everitt (1995) The analysis of repeated measures: a

practical review with examples. The Statistician, 44, pp. 113-135.

• Crowder and Hand (1990) Analysis of Repeated Measures.• D. Elkstrom (1990) Statistical analysis of repeated measures in

psychiatric research. Archives of General Psychiatry, 47, pp.770-772.

[ANOVA (for non-repeated measures) is covered in most basic stats books.]

Lecture 6: Repeated Measures Analyses Elizabeth Garrett [email protected] Child Psychiatry Research...

Documents

Transcript of Lecture 6: Repeated Measures Analyses Elizabeth Garrett [email protected] Child Psychiatry Research...