An introduction to the stepped wedge cluster randomised trial

05/03/2023

The stepped wedge trial (SW-CRT): Recommendations for research methods

and reporting

1: University of Birmingham, UK2: University of Warwick, UK

3: University of Ottawa, Canada

Karla Hemming1

Alan Girling1, James Martin1, Celia Brown2 Richard Lilford2, Peter Chilton1 Monica Taljaard3

What is a SW-CRT• Modification of cross-over design:

- All clusters start in control- Clusters (or groups of clusters) cross to intervention at

randomly assigned times until all have received intervention- Outcome typically observed at each time point

1 2 3 4 5 6

TimeSTEPS (Cluster or Group of Clusters)

1

2

3

4

5

Exposed to intervention Unexposed to intervention

Cross-sectional designs only

• Assume all participants at each step or time point are different

• Other types of designs include:

– Cohort design – where individuals have repeated measures

– Open cohort – where individuals have repeated measures and new individuals can join the study over its duration

Example: the EPOCH trial

• Intervention:– Service delivery intervention to improve care of patients

undergoing emergency laparotomy• Setting:

– Includes 90 hospitals– Rolled out to 15 geographically close hospitals at a time

• Outcome:– 90 day mortality

• Sample size:– TSS is 27,500– 90% power to detect a change from 25% to 22%

Example: The EPOCH study

Systematic review of SW-CRTs Rapid update

0

5

10

15

20

25

1985 1990 1995 2000 2005 2010 2015

Cum

ulati

ve n

umbe

r of S

W-C

RTs p

ublis

hed

Year

Cumulative completed

Cumulative protocols (excludes subsequently completed and published studies)

Quality of reporting of cluster effects

Protocols published in journals(N=14)

Results papers published in journals

(N=18)

SS calculation reported 14 (100%) 12 (67%)

ICC stated in methods 12 (86%) 5 (28%)

Uncertainty of ICC considered in methods

3 (21%) 1 (6%)

Results fully accounted for clustering

N/A 6 (33%)

ICC given in results N/A 1 (6%)

Some unresolved (or debated) issues:

• Design: –How to determine sample size (number

clusters, cluster size)?–Which is the most efficient design?

• Analysis: How to analyse these studies:–Temporal confounding

Design

Conventional representation of designs

10

This representation leads us to believe…

• The SW-CRT is of longer duration than the PCRT and CRT-BA

• The cluster sizes in the SW-CRT are larger than those in the PCRT

• The SW-CRT allows staggered roll-out and the PCRT does not

This representation leads us to believe…

• The SW-CRT is of longer duration than the PCRT and CRT-BA– We claim: false

• The cluster sizes in the SW-CRT are larger than those in the PCRT– We claim: false

• The SW-CRT allows staggered roll-out and the PCRT does not – We claim: false

Alternative representation (unified framework)

Motivating example…

• PCRT in primary care• GP practices are clusters• Patients presenting with a new diagnosis of diabetes are the

individuals• These patients wont all present at a fixed point in time• Rather they will become eligible for the study over a prolonged

period of time

Time

Using this representation:

• The SW-CRT is of the same duration as the PCRT and CRT-BA

• The cluster sizes in the SW-CRT are the same as those in the PCRT

• The SW-CRT allows staggered roll-out and so does the PCRT

The staggered PCRT

• One often cited reason for the SW-CRT is the phased implementation

• This is possible under parallel design

• Balanced on time, so no time effects

0 1 2 3 4 5

Time (step)Cluster

1

2

3

4

5

6

Intervention

Control

Staggered Cluster Study

Cluster unexposed to intervention

Cluster in transition period

Cluster exposed to intervention

Efficiency

How to determine which design to use?

Efficiency – depends on ICC ICC=0.01 ICC=0.1

PCRT PCRT-BA SW-CRT PCRT PCRT-BA SW-CRTNumber of clusters 20 20 20 20 20 20Cluster size 50 50 50 50 50 50Total sample size 1000 1000 1000 1000 1000 1000 Number of steps 0 1 4 0 1 4Number of clusters per step 10 5 10 5

Power 0.97 0.87 0.88 0.50 0.77 0.82Study to detect a moderate effect size of 0.3 (SD 1) at 5% significance

Sometimes the CRT is infeasible

• Example:– NI is 788: 0.2 80% power and 5% significance – ICC is 0.10– 30 clusters

• Can’t run this trial using a parallel design:– minimum number of clusters is (p*NI)– i.e. 788*0.1=79

• Under SW-CRT with 4 steps:– Need 75 observations in 30 clusters– TSS of 2250

Efficiency comparisons

Minimise total sample size or clusters?

• Design constraints– NI is 6,426; 10% to 8%; ICC 0.05– Cluster trial– Over each year possible to recruit M=800 per cluster

• SW-CRT :– 4 steps, 26 clusters, 1 year, TSS=20,800

• CRT:– M=800, 330 clusters, 1 year, TSS=264,000!!!!– M=200, 354 clusters, 3 months, TSS=70,000– M=20, 630 clusters, Random Sample, TSS=12,500

Sample size and power

How do we work it out?

Simple notation

Notation Sample size for RCT NI

ICC pNumber clusters kNumber steps tCluster size per step mTotal cluster size M

Sample size calculations

• Accommodate:– Clustering– Time effects

• Seminal paper by Hussey and Hughes– Power for fixed design

• Algebraically complicated, BUT:– Stata Function

0 1 2 3 4 5

Time (step)Cluster

1

2

3

4

5

6

Intervention

Control

Conventional Stepped Wedge Study

Stata power function

Extensions allow:• Two levels

– i.e. wards within hospitals

• Transition periods – i.e. training periods

• Varying cluster size– (work in progress)

Hemming K, Girling A. A menu driven facility for sample size for power and detectable difference calculations in stepped wedge randomised trials. STATA Journal. 2014

Determining number of clusters:

• Design effect (Woerterman, 2013):

• Sample size needed:

N=TSSRCT *DESW * (t+1)

Hemming K, Girling A. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size. J Clin Epidemiol. 2013;66(12):1427-8.

Determining number of clusters:

• Design effect (Woerterman, 2013):

• Sample size needed:

N=TSSRCT *DESW * (t+1)

Hemming K, Girling A. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size. J Clin Epidemiol. 2013;66(12):1427-8.

Need this!

What cluster size do I need?

Setting straight the sample size determination for stepped wedge and cluster randomised trials: design effects and illustrative examples Karla Hemming and Monica Taljaard Submitted to J Clin Epi

Analysis

Analysis • Summarise key characteristics by exposure / unexposed status

– Identify selection biases

• Analysis either GEE or mixed models– Clustering– Time effects

• Imbalance of calendar time between exposed / unexposed:– The majority of the control observations will be before the

majority of the intervention observations– Time is a confounder!

• Unadjusted effect meaningless

Hemming K., Haines T.P., Chilton P.J., Girling A.J., Lilford R.J. The stepped wedge cluster randomised trial: rationale, design, analysis and reporting. The BMJ, in press

31

Example 1: Maternity sweeping

• Objective: evaluate a training scheme to improve the rate of membrane sweeping in post term pregnancies

– Primary outcome:• Proportion of women having a membrane sweep

– Cluster design:• 10 teams (clusters)• Pragmatic design – rolled out when possible• Transition period to allow training

32

Example 1: Maternity sweeping (transition period)

Example 1: Underlying trend 0

.2.4

.6.8

Prop

. wom

en sw

ept (

95%

CI)

20005/03/12 23/04/12 18/06/12 13/08/12week commencing

P-value for trend <0.05

Example 1: results

Unexposed to

interventionn=1417

Exposed to intervention

n=1356Relative Risk P-

value

Number of women offered and accepting membrane sweepingNumber (%) 629 (44.4%) 634 (46.8%) Cluster adjusted 1.06 (0.97, 1.16) 0.21Time and cluster adjusted Fixed effects time 0.88 (0.69, 1.12) 0.30Linear time effect 0.90 (0.73, 1.11) 0.34

Example 1: results

Unexposed to

interventionn=1417



value


Going up!

Example 1: results

Unexposed to

interventionn=1417



value


Going up!

Going down!

Explanations

• Rising tide– General move towards improving care – perhaps due

to very initiative that prompted study investigators to do this study

• Contamination– Unexposed clusters became exposed before their

randomisation date• Lack of precision

– Intervention wasn’t ruled out as being effective

Recommendations

Recommendations• SW-CRT a pragmatic study design which reconciles the need for

robust evaluations with political or logistical constraints.– But, can have a staggered parallel CRT

• The SW-CRT design is recommended when:– Higher the ICC (process outcomes)– Limited number of clusters– Routinely collected outcome data

• Design and analysis– Appropriate consideration of time effects in power and analysis

Next steps …

• Published a set of recommendations for reporting in BMJ.. out soon…

• Updating the systematic review of quality of reporting– Look at quality of reporting of SS calculations– Looking at ethical issues around recruitment and concealment of

allocation

• Consort Extension for SW-CRTs

• Alan and James – numerical work on varying cluster size

AcknowledgementsWe acknowledge financial support from:

• The National Institute for Health Research (NIHR) Collaborations for Leadership in Applied Health Research and Care for West Midlands (CLAHRC WM).

• The Medical Research Council Midland Hub for Trials Methodology Research [grant number G0800808].

References

• Hemming K, Girling A. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size. J Clin Epidemiol. 2013;66(12):1427-8.

• Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28(2):182-91.

• Hemming K, Girling A. A menu driven facility for sample size for power and detectable difference calculations in stepped wedge randomised trials. STATA Journal. 2014;[In Press]

An introduction to the stepped wedge cluster randomised trial

Data & Analytics

Transcript of An introduction to the stepped wedge cluster randomised trial