Interrupted Time Series Designs - Northwestern University .ITS •A series of...

download Interrupted Time Series Designs - Northwestern University .ITS •A series of observations on a dependent

of 61

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Interrupted Time Series Designs - Northwestern University .ITS •A series of...

  • Interrupted Time Series Designs

  • Overview

    • Role of ITS in the history of WSC

    – Two classes of ITS for WSCs

    • Two examples of WSC comparing ITS to RE

    • Issues in ITS vs RE WSCs

    – Methodological and logistical

    – Analytical

  • ITS

    • A series of observations on a dependent variable over time

    – N = 100 observations is the desirable standard

    – N < 100 observations is still helpful, even with very few observations—and by far the most common!

    • Interrupted by the introduction of an intervention.

    • The time series should show an “effect” at the time of the interruption.

  • Two Classes of ITS for WSC

    • Large scale ITS on aggregates

    • Single-case (SCD) and N-of-1 designs in social science and medicine

    • These two classes turn out to have very different advantages and disadvantages in the context of WSCs.

    • Consider examples of the two classes:

  • Large Scale ITS on Aggregates: The effects of an alcohol warning label on prenatal drinking









    Se p-


    Ja n-


    M ay

    -8 7

    Se p-


    Ja n-


    M ay

    -8 8

    Se p-


    Ja n-


    M ay

    -8 9

    Se p-


    Ja n-


    M ay

    -9 0

    Se p-


    Ja n-


    M ay

    -9 1

    Se p-


    Month of First Prenatal Visit

    p re

    n a ta

    l d

    ri n

    k in







  • Large Scale ITS on Aggregates

    • Advantages: – Very high policy interest. – Sometimes very long time series which makes analysis easier.

    • Disadvantages – Typically very simple with only a few design elements (perhaps a

    control group, little chance to introduce and remove treatment, rarely even implemented with multiple baseline designs).

    – Usual problems with uncontrolled and unknown attrition and treatment implementation

    – We have yet to find a really strong example in education

    • Formidable logistical problems in designing WSCs that are well enough controlled to meet the criteria we outlined on Day 1 for good WSCs. – We are not aware of any WSC’s comparing RE to this kind of ITS

  • Single-case (SCD) and N-of-1 designs in social science and medicine

    • Each time series is done on a single person, though a study usually includes multiple SCDs

    • Advantages: – Very well controlled with many opportunities to introduce design

    elements (treatment withdrawal, multiple baseline and more), low attrition, excellent treatment implementation.

    – Plentiful in certain parts of education and psychology

    • Disadvantages: – Of less general policy interest except in some circles (e.g., special

    education) but • IES now allows them for both treatment development studies and for impact

    studies under some conditions. • Increasing interest in medicine (e.g. CENT reporting standards).

    – Typically short time series that makes analysis more difficult • Much work currently being done on this.

    • Should be applicable to short time series in schools or classes • Has proven somewhat more amenable to WSC

  • Two Examples of WSC of RE vs ITS • Roifman et al (1987)

    – WSC Method: A longitudinal randomized crossover design

    – Medical Example – One study that can be analyzed simultaneously as

    • Randomized experiment • 6 single-case designs

    • Pivotal Response Training – WSC Method: Meta-analytic comparison of RE vs ITS – Educational example on treatment of autism – Multiple studies with multiple outcomes

    • No claim that these two examples are optimal – But they do illustrate some possibilities, and the

    design, analytical and logistical issues that arise.

  • Roifman et al (1987)

    • High-dose versus low-dose intravenous immunoglobulin in hypogammaglobulinaemia and chronic lung disease

    • 12 patients in a longitudinal randomized cross-over design. After one baseline (no IgG) observation: – Group A: 6 receive high dose for 6 sessions, then low dose

    for 6 sessions.

    – Group B: 6 receive low dose for 6 sessions, then high dose for 6 sessions

    • Outcome is serum IgG levels

    • Here is a graph of results

  • Even though this example uses individual people for each time series, one can imagine this kind of study being implemented using schools or classrooms. How many time points are needed is an interesting question.

  • Analysis Strategy

    • To compare RE to SCD results, we analyze the data two ways – As an N-of-1 Trial: Analyze Group B only as if it

    were six single-case designs

    – As a RE: Analyze Time 6 data as a randomized experiment comparing Group A and Group B.

    • Analyst blinding – I analyzed the RE

    – David Rindskopf analyzed SCD

  • Analytic Methods • The RCT is easy

    – usual regression (or ANOVA) to get group mean difference and se. • We did run ANCOVA covarying pretest but results were

    essentially the same.

    – Or a usual d-statistic (or bias-corrected g)

    • The SCD analysis needs to produce a result that is in a comparable metric – Used a multilevel model in WinBUGS to adjust for

    nonlinearity and get a group mean difference at time 6 (or 12, but with potential carryover effects)

    – d-statistic (or g) for SCD that is in the same metric as the usual between-groups d (Hedges, Pustejovsky & Shadish, in press). • But current incarnation assumes linearity

  • Analysis: RCT

    • If we analyze as a randomized experiment with the endpoint at the last observation before the crossover (time 6):

    – Group A (M = 794.93, SD = 90.48)

    – Group B (M = 283.89, SD = 71.10)

    – MD = 511.05 (SE = 46.98) (t = 10.88, df = 10, p < .001)

    • d = 6.28, g = 5.80, V(g) = 1.98 (se = 1.41)

  • Analysis 2: SCD

    • If we analyze only Group B (6 cases) using a d- estimator1:

    – g = 4.59, V(g) = 1.43 (se = 1.196)

    – Close to RE estimate g = 5.80, V(g) = 1.98 (se = 1.41)

    • We also have a WinBUGS analysis2 taking trend into account:

    – MD = 495, SE = 54 , “t” = 495/54 = 9.2

    – Very close to the best estimate from the randomized experiment of MD = 511.05, SE = 46.98

    1 Hedges, Pustejovsky and Shadish, in press, Research Synthesis Methods 2 Script and data input available on request

  • Comparing Results RE vs SCD • Means and d in same direction • Means and d of similar magnitude • It is not clear that the standard errors from previous

    slides are really comparable, but treating them as if they were: – Test overlap using 84% confidence intervals simulates z-


    • For g, they are 3.82 < 5.80 < 7.76 for SCD • 2.91 < 4.59 < 6.27 for RE • For the group mean difference 419.13 < 495 < 570.87 for SCD • 445.04 < 511 < 577.06 for RE

    – That is, no significant difference between the SCD and RE.

    • Another option would be to bootstrap the standard errors.

    1 Julious, 2004, Pharmaceutical Statistics

  • Comments on This WSC Method

    • Using randomized crossover designs with longitudinal observations is a promising method.

    • Statistical problems: – how to compare results from RE and SCD when

    they clearly are not independent.

    – Did not deal with autocorrelation • Should be possible to do in several ways

    • but correcting would likely make SEs larger so make RE- ITS differences less significant

    – Need to explore further the effects of trend and nonlinearities

  • Example: PRT

    • Pivotal Response Training (PRT) for Childhood Autism

    • This WSC method does a meta-analytic comparison of results from SCDs to results from an RE.

    • Meta-analytic WSC’s have a long history but also have significant flaws in that many unknown variables may be confounded with the designs. – But those flaws may often be no more than in the

    usual 3-arm nonrandomized WSC – Big difference is the latter usually has raw data but

    meta-analysis does not. In the case of SCDs, however, we do have the raw data (digitized).

  • The PRT Data Set

    • Pivotal Response Training (PRT) for Childhood Autism

    • 18 studies containing 91 cases.

    • We used only the 14 studies with at least 3 cases (66 cases total).

    • If there were only one outcome measure per study, this would result in 14 effect sizes.

    • But each study measures multiple outcomes on cases, so the total number of effect sizes is 54

  • Histogram of Effect Sizes

    Statistics G N Valid 54 Missing 0 Mean 1.311520 Median 1.041726 Std. Deviation