Dr Caroline Sabin, Royal Free Hospital UK-CAB - 13 August 2004- Session 5: The design of RCTs of...

Dr Caroline Sabin, Royal Free HospitalDr Caroline Sabin, Royal Free Hospital UK-CAB - 13 August 2004- www.i-Base.infoUK-CAB - 13 August 2004- www.i-Base.info

Session 5: The design of RCTs of Session 5: The design of RCTs of treatments for HIV infectiontreatments for HIV infection

Caroline SabinCaroline Sabin

Reader in Medical Statistics and EpidemiologyReader in Medical Statistics and EpidemiologyDepartment of Primary Care and Population Sciences, Department of Primary Care and Population Sciences,

RF&UCMS RF&UCMS

A beginners guide to some of the A beginners guide to some of the methodological and statistical methodological and statistical

issues in HIV researchissues in HIV research


What is a clinical trial?What is a clinical trial?

Any form of Any form of planned experimentplanned experiment which involves which involves patientspatients and is designed to find the and is designed to find the most appropriate most appropriate

treatmenttreatment for a particular medical condition for a particular medical condition


Types of trials (clinical)Types of trials (clinical)

Phase I studies Focus on safety rather than efficacyDose-escalation studies, studies of drug metabolism and bioavailabilityUsually based on small numbers of subjects, often healthy volunteers

Phase II studiesInitial investigation for clinical effect. Small-scale studies into effectiveness and safety of drug.

Phase III studiesFull-scale treatment evaluation. Comparison to standard therapy (if one exists) or placebo

Phase IV trials Post-marketing surveillance. Monitoring for adverse effects. Long-term studies of morbidity and mortality. Promotion exercises


Topics already covered in first sessionTopics already covered in first session

Control groupsControl groups

RandomisationRandomisation

BlindingBlinding

Parallel vs. cross-over trialsParallel vs. cross-over trials

The limitations of RCTsThe limitations of RCTs


Topics to be covered todayTopics to be covered today

Why do we need a control group?Why do we need a control group?

Why do we need randomisation?Why do we need randomisation?

The protocolThe protocol

Defining endpoints (primary and secondary Defining endpoints (primary and secondary endpoints, clinical vs surrogate endpoints)endpoints, clinical vs surrogate endpoints)

How to deal with ‘protocol violations’ (patients who How to deal with ‘protocol violations’ (patients who drop out of the study and missing data)drop out of the study and missing data)

Approaches to analysis (ITT, as treated)Approaches to analysis (ITT, as treated)

Subgroup and interim analysesSubgroup and interim analyses


Why do we need a control groupWhy do we need a control group

Early medical developments were usually so Early medical developments were usually so great that controls weren’t always needed (eg. great that controls weren’t always needed (eg. trials of anaesthetics, first trials of antibiotics trials of anaesthetics, first trials of antibiotics etc.)etc.)

However, most developments these days are However, most developments these days are more modest and some form of control group is more modest and some form of control group is now essentialnow essential



Silverman, 1985 – Epidemic of retrolental fibroplasia Silverman, 1985 – Epidemic of retrolental fibroplasia in babiesin babies

Uncontrolled trials suggested that treatment with Uncontrolled trials suggested that treatment with adrenocorticotrophic hormone had a 75% success adrenocorticotrophic hormone had a 75% success raterate

After controlled trials were finally carried out, it was After controlled trials were finally carried out, it was found that 75% of infants return to normal without found that 75% of infants return to normal without treatmenttreatment

Identification of true cause of epidemic (oxygen to Identification of true cause of epidemic (oxygen to premature babies) was delayedpremature babies) was delayed



Uncontrolled trials may give a distorted view of a Uncontrolled trials may give a distorted view of a new therapynew therapy

Patients may improve over time, even without Patients may improve over time, even without treatment – thus, any improvement cannot treatment – thus, any improvement cannot necessarily be attributed to treatmentnecessarily be attributed to treatment

Patients selected for treatment may be less Patients selected for treatment may be less seriously ill than those not selected for treatment seriously ill than those not selected for treatment which may overestimate the benefits of new therapywhich may overestimate the benefits of new therapy

Patients in clinical trials generally do better than Patients in clinical trials generally do better than patients on same treatment who are not in trialspatients on same treatment who are not in trials


Which control group? - the use of Which control group? - the use of historical or non-randomised controlshistorical or non-randomised controls

Controls less likely to have clearly defined Controls less likely to have clearly defined criteria for inclusion/exclusioncriteria for inclusion/exclusion

May have been a change in the type of patient May have been a change in the type of patient eligible for treatment, or prognosis may have eligible for treatment, or prognosis may have changed over timechanged over time

Investigator may have been more restrictive in Investigator may have been more restrictive in choice of patients for the trial, then when treating choice of patients for the trial, then when treating patients in the pastpatients in the past

Characteristics of patientsCharacteristics of patients


Which control group? - the use of Which control group? - the use of historical or non-randomised controlshistorical or non-randomised controls

Quality of recorded data may not be as goodQuality of recorded data may not be as good

Definitions of response may differ between Definitions of response may differ between groups (eg. viral load endpoints)groups (eg. viral load endpoints)

Ancillary care may improve in a trial (eg. Ancillary care may improve in a trial (eg. adherence support, support for toxicities etc.)adherence support, support for toxicities etc.)

Experimental environmentExperimental environment

Thus, treatment and control groups may differ with Thus, treatment and control groups may differ with respect to many features other than treatment, and so respect to many features other than treatment, and so we cannot attribute any difference in outcome to the we cannot attribute any difference in outcome to the new treatmentnew treatment


What is randomisation?What is randomisation?

Allocation of patients to treatments is determined Allocation of patients to treatments is determined by by chancechance

Randomised trials provide most Randomised trials provide most efficient efficient trial design trial design (ie. they are the most powerful) as they ensure that (ie. they are the most powerful) as they ensure that any factors that may affect outcome will be any factors that may affect outcome will be distributed equally between the treatment groupsdistributed equally between the treatment groups

This allows any difference in treatment response to This allows any difference in treatment response to be attributed to the treatmentbe attributed to the treatment

Removes impact of known confounding factors as Removes impact of known confounding factors as well as well as unknownunknown ones ones


Why do trials need to be randomisedWhy do trials need to be randomised

Non-randomised trials have the potential to be Non-randomised trials have the potential to be seriously biasedseriously biased

If there are systematic differences between the If there are systematic differences between the patients in the treatment groups at the outset of patients in the treatment groups at the outset of the trial, then any differences in treatment the trial, then any differences in treatment response cannot necessarily be attributed to the response cannot necessarily be attributed to the new treatmentnew treatment

Eg. treatment comparisons in cohort studiesEg. treatment comparisons in cohort studies


When can a randomised trial be doneWhen can a randomised trial be done

New treatment better than standard

New treatment worse than standard

‘Equipoise’

Who should have equipoise?Who should have equipoise?- The doctors recruiting patientsThe doctors recruiting patients- The patients entering the trialThe patients entering the trial

(is this true in reality?)(is this true in reality?)


Other benefits of randomisationOther benefits of randomisation

Helps with blinding of trial (see later)Helps with blinding of trial (see later)

Prevents any conscious or subconscious Prevents any conscious or subconscious selection bias, whereby doctor tends to put more selection bias, whereby doctor tends to put more (or less) severely ill patients in a particular (or less) severely ill patients in a particular treatment grouptreatment group

Beware of any approach to randomisation Beware of any approach to randomisation whereby clinicians may be able to establish whereby clinicians may be able to establish treatment allocation prior to entry to the trial (eg. treatment allocation prior to entry to the trial (eg. systematic allocation by date of birth, alternate systematic allocation by date of birth, alternate allocation)allocation)


Selection of patients for a trialSelection of patients for a trial

Discuss trial with patient and assess eligibility

Obtain informed consent

Formally enter patient into trial

Randomise


Other benefits of randomisation (cont.)Other benefits of randomisation (cont.)

Example: Trial of anticoagulant therapy (Wright Example: Trial of anticoagulant therapy (Wright 1948)1948)

Patients admitted on odd days – anticoagulants Patients admitted on odd days – anticoagulants Patients admitted on even days – placeboPatients admitted on even days – placebo

Anticoagulant therapy – n=589Anticoagulant therapy – n=589Placebo – n=442Placebo – n=442


The protocolThe protocol

Background, aims and objectives Trial design Patient selection – inclusion/exclusion criteria Treatment schedules Monitoring Registration, randomisation and blinding Methods of patient evaluation Patient consent Size of study Plans for dealing with protocol deviations Plans for statistical analysis Ethical approval and administrative matters

The ‘workshop manual’ for the trial. Will contain The ‘workshop manual’ for the trial. Will contain many or all of the following:many or all of the following:


Selection of patients for a trialSelection of patients for a trial

A trial should have explicit inclusion criteria and A trial should have explicit inclusion criteria and exclusion criteria – precise definitions of who can exclusion criteria – precise definitions of who can be included in the studybe included in the study

Patients should be broadly representative of Patients should be broadly representative of some future group of patients to whom the trial some future group of patients to whom the trial may be appliedmay be applied

BUT – patients in trials are not necessarily a BUT – patients in trials are not necessarily a random selection of all HIV+ve individuals random selection of all HIV+ve individuals (unlikely to be the case)(unlikely to be the case)


Evaluation of response – the primary Evaluation of response – the primary endpointendpoint

In any trial we need to define (preferably) a single In any trial we need to define (preferably) a single primary endpoint that captures the key effects of primary endpoint that captures the key effects of treatment on the patienttreatment on the patient

Primary endpoint is usually related to Primary endpoint is usually related to efficacyefficacy

If results from different endpoints are If results from different endpoints are inconsistent, the primary endpoint will be the one inconsistent, the primary endpoint will be the one on which any decisions about the value of the on which any decisions about the value of the drug will be mainly baseddrug will be mainly based


Evaluation of response – secondary Evaluation of response – secondary endpointsendpoints

In addition to the primary endpoint, we may also In addition to the primary endpoint, we may also define any number of secondary endpointsdefine any number of secondary endpoints

These are often related to toxicity or quality of These are often related to toxicity or quality of life, or may be other measures of efficacy not life, or may be other measures of efficacy not captured by the primary endpointcaptured by the primary endpoint


Definitions of endpoints – example Definitions of endpoints – example

Abacavir substitution for nucleoside analogs in Abacavir substitution for nucleoside analogs in patients with HIV lipoatrophy.patients with HIV lipoatrophy. Carr A et al. JAMA (2002); Carr A et al. JAMA (2002); 288: 207-215. 288: 207-215.

Primary endpoint:Primary endpoint:Mean change in limb fat mass measured by DXA at week 24Mean change in limb fat mass measured by DXA at week 24

Secondary endpoints:Secondary endpoints:Adverse eventsAdverse eventsAnthropometryAnthropometryTotal and central fat massTotal and central fat massBiochemical, lipid, and glycemic measurementsBiochemical, lipid, and glycemic measurementsViral loadViral loadCD4 countCD4 countQuality of lifeQuality of life


Defining an endpointDefining an endpoint

In most trials patients are monitored very regularly In most trials patients are monitored very regularly (eg. every 4 weeks after randomisation(eg. every 4 weeks after randomisation

Tempting to compare treatments at each time point - Tempting to compare treatments at each time point - however, this is not advisable because of problems however, this is not advisable because of problems with multiple testing and the fact that the tests are with multiple testing and the fact that the tests are not independentnot independent

Thus, must select a single time point for assessment Thus, must select a single time point for assessment of the primary endpoint (eg. 24 weeks or 48 weeks)of the primary endpoint (eg. 24 weeks or 48 weeks)

Treatments should be formally compared at that Treatments should be formally compared at that timepoint only timepoint only


Clinical vs. surrogate endpointsClinical vs. surrogate endpoints

We are usually most interested in the effect of a new We are usually most interested in the effect of a new treatment on a clinical outcome (eg. new AIDS events treatment on a clinical outcome (eg. new AIDS events or death)or death)

However, currently, trials of HAART that use clinical However, currently, trials of HAART that use clinical endpoints generally have to be extremely large and endpoints generally have to be extremely large and follow patients for very long periods of time in order follow patients for very long periods of time in order to have sufficient power to detect a difference to have sufficient power to detect a difference between treatment regimensbetween treatment regimens

Thus, we often consider the effect of the treatment Thus, we often consider the effect of the treatment regimen on a surrogate endpoint (eg. change in CD4, regimen on a surrogate endpoint (eg. change in CD4, HIV RNA etc.)HIV RNA etc.)


Surrogate endpointsSurrogate endpoints

““A laboratory measurement or a physical sign used A laboratory measurement or a physical sign used as a substitute for a clinically meaningful endpoint as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, that measures directly how a patient feels, functions or survives.”functions or survives.”

Temple RJ. A regulatory authority’s opinion about surrogate endpoints. In: Temple RJ. A regulatory authority’s opinion about surrogate endpoints. In: Nimmo WS, Tucker GT, eds. Nimmo WS, Tucker GT, eds. Clinical measurement in drug evaluationClinical measurement in drug evaluation. . New York, NY: John Wiley & Sons Inc. 1995.New York, NY: John Wiley & Sons Inc. 1995.


Surrogate endpoints (cont.)Surrogate endpoints (cont.)

In order for a laboratory marker to be a good In order for a laboratory marker to be a good surrogate endpoint for a clinical outcome, it has to surrogate endpoint for a clinical outcome, it has to fulfill two criteriafulfill two criteria

Surrogate must be on the Surrogate must be on the causal pathwaycausal pathway of the disease of the disease processprocess

Entire effect of the intervention on clinical outcome should Entire effect of the intervention on clinical outcome should be captured by changes in the surrogate be captured by changes in the surrogate

TreatmentTreatmentChanges in Changes in surrogatesurrogate

ImprovedImproved clinical outcomeclinical outcome


Surrogate endpoints (cont.)Surrogate endpoints (cont.)

Pre-HAART, CD4 count was established as reliable Pre-HAART, CD4 count was established as reliable surrogate endpoint for AIDS/deathsurrogate endpoint for AIDS/death

Most trials now use HIV RNA as a surrogate Most trials now use HIV RNA as a surrogate endpoint (eg. viral load <50 copies/ml)endpoint (eg. viral load <50 copies/ml)

BUT – not all of the effect of the treatment (eg. BUT – not all of the effect of the treatment (eg. toxicities) may act through changes in the CD4 toxicities) may act through changes in the CD4 count or HIV RNA levelcount or HIV RNA level

Many combinations have similar virological Many combinations have similar virological efficacy – other outcomes may now be more efficacy – other outcomes may now be more importantimportant


‘‘Protocol violations’ Protocol violations’

Ineligible patientsIneligible patients – may be recruited by mistake – may be recruited by mistake

Non-adherent Non-adherent – may forget to take some or all of – may forget to take some or all of their drugs, may not attend for follow-up visits, their drugs, may not attend for follow-up visits, may take alternative treatmentsmay take alternative treatments

Patient withdrawalsPatient withdrawals – not able to tolerate drugs, – not able to tolerate drugs, may switch treatmentsmay switch treatments

For a number of reasons, patients included and For a number of reasons, patients included and randomised in the trial may not ‘behave’ as stated in randomised in the trial may not ‘behave’ as stated in the protocolthe protocol

QUESTION: how should these be dealt with in any QUESTION: how should these be dealt with in any analysis?analysis?


Analysis by Intention-to-treat (ITT)Analysis by Intention-to-treat (ITT)

All patients randomised to treatment should be All patients randomised to treatment should be included in the analysis in the groups to which included in the analysis in the groups to which they were randomisedthey were randomised


Analysis by Intention-to-treat (ITT)Analysis by Intention-to-treat (ITT)

Provides a measure of the real-life effect of Provides a measure of the real-life effect of treatmenttreatment

Is the Is the only unbiased estimateonly unbiased estimate of the of the treatment’s effecttreatment’s effect

Most major journals require analysis by ITT Most major journals require analysis by ITT

All presentations should include analysis by All presentations should include analysis by ITT as the primary analysis unless there is a ITT as the primary analysis unless there is a strong justification for not doing thisstrong justification for not doing this


On-treatment analysesOn-treatment analyses

Only include those patients who complete a full Only include those patients who complete a full course of treatment to which they were course of treatment to which they were randomisedrandomised



Suggested that this shows the optimal effect of Suggested that this shows the optimal effect of treatment when taken as recommendedtreatment when taken as recommended

However, has potential to provide However, has potential to provide extremely extremely biasedbiased estimates of treatment effect as those estimates of treatment effect as those with the worse responses to treatment are with the worse responses to treatment are likely to be the ones who drop-out/switch likely to be the ones who drop-out/switch treatmentstreatments

Approach will give an overly positive estimate Approach will give an overly positive estimate of effect of new treatmentof effect of new treatment


On-treatment analyses - exampleOn-treatment analyses - example

RCT with primary endpoint of virological failure at RCT with primary endpoint of virological failure at week 48. Patients are allowed to switch therapy week 48. Patients are allowed to switch therapy once failure has occurred. once failure has occurred.

1

2

3

4

5

0 4 8 12 16 20 24 28 32 36 40 44 48

Weeks after randomisation

Patient number

CHANGED CHANGED TREATMENTTREATMENT




Viral load > 50 copies/mlViral load > 50 copies/mlViral load Viral load << 50 copies/ml 50 copies/ml


On-treatment analyses - exampleOn-treatment analyses - example

RCT with primary endpoint of virological failure at RCT with primary endpoint of virological failure at week 48. Patients are allowed to switch therapy week 48. Patients are allowed to switch therapy once failure has occurred. once failure has occurred.

1

2

3

4

5

0 4 8 12 16 20 24 28 32 36 40 44 48


Patient number






Primary endpoint Primary endpoint at week 48 = 1/1 at week 48 = 1/1 (100%)(100%)



Those remaining on randomised treatment at Those remaining on randomised treatment at 48 weeks will, by definition, be those who have 48 weeks will, by definition, be those who have not experienced virological failurenot experienced virological failure

Anyone with virological failure prior to week 48 Anyone with virological failure prior to week 48 will change treatment and will be excluded will change treatment and will be excluded from the denominatorfrom the denominator

Primary event rate will always be close to 100% Primary event rate will always be close to 100% (depending on how quickly treatments are (depending on how quickly treatments are changed after virological failure)changed after virological failure)

FOR THIS REASON, ON-TREATMENT FOR THIS REASON, ON-TREATMENT ANALYSES SHOULD NOT BE USED FOR THE ANALYSES SHOULD NOT BE USED FOR THE PRIMARY ANALYSIS OF A TRIALPRIMARY ANALYSIS OF A TRIAL


Problems when analysing by ITT with Problems when analysing by ITT with surrogate endpointssurrogate endpoints

If patients are lost-to-follow-up or drop out of a If patients are lost-to-follow-up or drop out of a trial, they are unlikely to attend for follow-up trial, they are unlikely to attend for follow-up visits and blood testsvisits and blood tests

Whilst it may be possible to obtain information Whilst it may be possible to obtain information on clinical endpoints from other sources, on clinical endpoints from other sources, information on CD4 counts or HIV RNA levels information on CD4 counts or HIV RNA levels may be unavailablemay be unavailable

Where data are missing, it is difficult to run a Where data are missing, it is difficult to run a ITT analysis in which all patients are included ITT analysis in which all patients are included in the analysisin the analysis


Alternative methods of ITT analysesAlternative methods of ITT analyses

ITT Missing=Failure (ITT M=F)ITT Missing=Failure (ITT M=F)

All missing values are treated as failures in the All missing values are treated as failures in the analysis irrespective of most recent value – analysis irrespective of most recent value – ensures that all patients are included in the ensures that all patients are included in the denominator. If anything, this gives the most denominator. If anything, this gives the most pessimistic view of the new treatment.pessimistic view of the new treatment.

Where data on surrogate markers are missing, a Where data on surrogate markers are missing, a number of alternative strategies have been proposed:number of alternative strategies have been proposed:



ITT last observation carried forward (LOCF) ITT last observation carried forward (LOCF)

The last available measurement for each The last available measurement for each person is used in the analysis (irrespective of person is used in the analysis (irrespective of how long before the endpoint it was how long before the endpoint it was measured). This is an ITT analysis as all measured). This is an ITT analysis as all patients are included in the denominator but it patients are included in the denominator but it is not favoured by regulatory bodies (eg. FDA)is not favoured by regulatory bodies (eg. FDA)




ITT missing=excludedITT missing=excluded

All patients with missing surrogate values are All patients with missing surrogate values are excluded from the analyses – this is NOT an excluded from the analyses – this is NOT an ITT analysis as the denominator does not ITT analysis as the denominator does not include all patients recruited to the trial. include all patients recruited to the trial. Essentially this is an Essentially this is an on-treatment analysison-treatment analysis



Examples of different approaches Examples of different approaches

123456789

1011121314151617181920

0 4 8 12 16 20 24 28 32 36 40 44 48


Patient number

Primary endpoint




123456789

1011121314151617181920

0 4 8 12 16 20 24 28 32 36 40 44 48


Patient number

Primary endpoint


Responder:Responder:On treatment/On treatment/ITT missingITT missing=excluded=excluded

111111----00----11--0011----11--11--1100

Response rate Response rate = 8/11= 8/11

= 73%= 73%



123456789

1011121314151617181920

0 4 8 12 16 20 24 28 32 36 40 44 48


Patient number

Primary endpoint


Responder:Responder:ITT missingITT missing

=failure=failure

1111110000000000110000110000110011001100


= 40%= 40%



123456789

1011121314151617181920

0 4 8 12 16 20 24 28 32 36 40 44 48


Patient number

Primary endpoint


Responder:Responder:ITT missingITT missing

=LOCF=LOCF

1111110011000011111100110000110011001100


= 55%= 55%


Examples of different approaches - Examples of different approaches - summarysummary

ApproachApproach Response rateResponse rate

On treatment/ITT missing=excludedOn treatment/ITT missing=excluded 73%73%

ITT missing = failureITT missing = failure 40%40%

ITT missing = LOCFITT missing = LOCF 55%55%


Subgroup analysesSubgroup analyses

It is often tempting to consider the effect of the It is often tempting to consider the effect of the treatment regimen in a number of subgroups of treatment regimen in a number of subgroups of the analysesthe analyses

For example, consider the effect of the regimen For example, consider the effect of the regimen in the following groups:in the following groups:

- Males/females - Males/females - Low/high viral load at baseline- Low/high viral load at baseline- Low/high CD4 count at baseline- Low/high CD4 count at baseline- ARV-naïve/ARV-experienced at start of trial- ARV-naïve/ARV-experienced at start of trial



There are a number of dangers inherent in There are a number of dangers inherent in performing too many subgroup analysesperforming too many subgroup analyses

The increased number of tests being The increased number of tests being performed means that there are problems of performed means that there are problems of multiple testing (ie. some of these multiple testing (ie. some of these comparisons are likely to be significant due to comparisons are likely to be significant due to chance)chance)

Although the study will have sufficient power Although the study will have sufficient power to detect a difference, the subgroups will often to detect a difference, the subgroups will often be based on a much smaller sample size and be based on a much smaller sample size and so will not be sufficiently poweredso will not be sufficiently powered


Subgroup analyses – example 1Subgroup analyses – example 1

MalesMales FemalesFemales TotalTotal

AA BB AA BB AA BB

No. of patientsNo. of patients 110110 106106 2828 3030 138138 136136

No. (%) No. (%) RespondingResponding

77 (70%)77 (70%) 90 (85%)90 (85%) 20 (71%)20 (71%) 25 (83%)25 (83%) 97 (70%)97 (70%) 115 (85%)115 (85%)

P-value P-value

(A vs. B)(A vs. B)

0.010.01 0.440.44 0.0070.007

Although the difference between regimens A and B is Although the difference between regimens A and B is similar in women as it is in men, it is not significant due similar in women as it is in men, it is not significant due to the small number of women in the studyto the small number of women in the study

This This does not provide evidencedoes not provide evidence that there is no that there is no benefit of regimen B in womenbenefit of regimen B in women


Subgroup analyses – example 2Subgroup analyses – example 2

MalesMales FemalesFemales TotalTotal

AA BB AA BB AA BB

No. of patientsNo. of patients 110110 106106 2828 3030 138138 136136

No. (%) No. (%) RespondingResponding

77 (70%)77 (70%) 86 (81%)86 (81%) 20 (71%)20 (71%) 29 (97%)29 (97%) 97 (70%)97 (70%) 115 (85%)115 (85%)

P-value P-value

(A vs. B)(A vs. B)

0.080.08 0.010.01 0.0070.007

Although regimen B now looks better in females than Although regimen B now looks better in females than males, a formal test of the interaction between sex and males, a formal test of the interaction between sex and treatment group (p=0.11), suggests that these results treatment group (p=0.11), suggests that these results are likely to have arisen by chanceare likely to have arisen by chance



In any trial analysis, if subgroup analyses are In any trial analysis, if subgroup analyses are thought to be important then they should be thought to be important then they should be specified specified a prioria priori in the protocol in the protocol

The study should be sufficiently large that The study should be sufficiently large that these subgroup analyses will be large enough these subgroup analyses will be large enough to detect important differencesto detect important differences

Evidence of a subgroup effect should never be Evidence of a subgroup effect should never be based on a comparison of p-values in the based on a comparison of p-values in the individual subgroups, but should be based on individual subgroups, but should be based on formal tests of formal tests of interactioninteraction between the factors between the factors of interestof interest


Interim analysesInterim analyses

In any trial there is always a concern that one In any trial there is always a concern that one of the treatment arms may be inferior in some of the treatment arms may be inferior in some way to the others (eg. one regimen may be far way to the others (eg. one regimen may be far more efficacious or may be associated with a more efficacious or may be associated with a greater rate of serious toxicity than the others) greater rate of serious toxicity than the others)

If so, it may be considered to be ethically If so, it may be considered to be ethically unsound to continue to place patients at risk of unsound to continue to place patients at risk of the serious toxicity or of treatment failurethe serious toxicity or of treatment failure

As a result, one or more interim analyses may As a result, one or more interim analyses may be planned at pre-specified time points to be planned at pre-specified time points to monitor the progress of the trialmonitor the progress of the trial


Interim analysesInterim analyses

However, there is always the chance that initial However, there is always the chance that initial findings, particularly on small numbers of findings, particularly on small numbers of patients, may have arisen by chancepatients, may have arisen by chance

If the trial is allowed to continue to completion, If the trial is allowed to continue to completion, these trends may disappearthese trends may disappear

Have to be very careful about stopping the trial Have to be very careful about stopping the trial early based on results of interim analysesearly based on results of interim analyses

If interim analyses are to be performed, then it If interim analyses are to be performed, then it is usually recommended that the trial is only is usually recommended that the trial is only stopped if evidence for a difference between stopped if evidence for a difference between the arms is very strong (eg. p<0.0001)the arms is very strong (eg. p<0.0001)


Interim analyses – the role of a DSMBInterim analyses – the role of a DSMB

Often a Data Safety and Monitoring Board Often a Data Safety and Monitoring Board (DSMB) may be convened(DSMB) may be convened

Will include a number of independent ‘experts’ Will include a number of independent ‘experts’ in the area, usually including a statisticianin the area, usually including a statistician

The DSMB will evaluate safety data on a The DSMB will evaluate safety data on a regular basis (this information will not usually regular basis (this information will not usually be blinded) and will report back to the trial be blinded) and will report back to the trial Steering Committee Steering Committee

The DSMB may recommend that a trial be The DSMB may recommend that a trial be stopped early if necessarystopped early if necessary


Interim analyses (cont)Interim analyses (cont)

If interim results suggest superiority of one of If interim results suggest superiority of one of the arms, but DSMB the arms, but DSMB do notdo not recommend recommend stopping the trial, presentation of results could stopping the trial, presentation of results could hinder the successful completion of the trialhinder the successful completion of the trial

Patients already randomised may switch to Patients already randomised may switch to superior arm, resulting in high levels of drop-outsuperior arm, resulting in high levels of drop-out

New patients will not wish to be randomised to the New patients will not wish to be randomised to the inferior arminferior arm


Interim analyses (cont)Interim analyses (cont)

If data from interim analyses are to be If data from interim analyses are to be released, it is important that either blinding is released, it is important that either blinding is maintained, or results are not presented maintained, or results are not presented separately for the groupsseparately for the groups

In some cases, even blinded or combined data In some cases, even blinded or combined data may give an indication of the effect of the new may give an indication of the effect of the new drug/combination (e.g. in a placebo controlled drug/combination (e.g. in a placebo controlled trial)trial)

If this is the case, then no results concerning If this is the case, then no results concerning the primary endpoint of the trial (even blinded the primary endpoint of the trial (even blinded or combined) should be presentedor combined) should be presented


Tests of superiorityTests of superiority

In a standard trial we usually test the null In a standard trial we usually test the null hypothesis that there is no difference between hypothesis that there is no difference between the treatment arms, against an alternative the treatment arms, against an alternative hypothesis that there is a difference between hypothesis that there is a difference between treatment arms treatment arms

Note that no direction is specified for this Note that no direction is specified for this difference (ie. drug A could be worse or better difference (ie. drug A could be worse or better than drug B)than drug B)

This is known as a This is known as a test of superioritytest of superiority, even , even though we don’t specify which drug is superiorthough we don’t specify which drug is superior


Tests of equivalenceTests of equivalence

Sometimes, however, we may not want to test Sometimes, however, we may not want to test whether one drug is better than an another, but whether one drug is better than an another, but may simply want to show that the two drugs are may simply want to show that the two drugs are equivalentequivalent

This is usually the case when a new drug This is usually the case when a new drug appears to have similar efficacy but may have a appears to have similar efficacy but may have a better toxicity profile, be easier to take, or is better toxicity profile, be easier to take, or is cheapercheaper

Designing a study to show equivalence requires Designing a study to show equivalence requires a different emphasis to a study of superioritya different emphasis to a study of superiority


-50

-40

-30

-20

-10

0

10

20

30

Tests of superiority – the effect of increasing Tests of superiority – the effect of increasing the sample sizethe sample size

Diff

eren

ce in

per

cent

age

Non-significant difference,but huge uncertainty

Similar difference but significant, due to increased

power of study

A more effective

A less effective


Tests of equivalence (cont.)Tests of equivalence (cont.)

In a test of equivalence we focus more strongly In a test of equivalence we focus more strongly on the confidence interval for the treatment effecton the confidence interval for the treatment effect

The confidence interval around the treatment The confidence interval around the treatment effect must be narrow to exclude even a effect must be narrow to exclude even a moderate difference moderate difference

In order to do this, we usually require a much In order to do this, we usually require a much larger sample size than we would need to show larger sample size than we would need to show that one is superior to the otherthat one is superior to the other

Have to decide Have to decide a priori a priori on what can be deemed on what can be deemed as ‘equivalent’ as ‘equivalent’


-30

-20

-10

0

10

20

Example – difference in percentage undetectable at Example – difference in percentage undetectable at 24 weeks with confidence interval (regimen A vs 24 weeks with confidence interval (regimen A vs regimen B) regimen B)

Diff

eren

ce in

per

cent

age

Non-significant difference,but huge uncertainty

Non-significant difference but less uncertainty

EQUIVALENCE RANGE


Testing for equivalence (cont.)Testing for equivalence (cont.)

Need to specify the maximum amount by which it Need to specify the maximum amount by which it is thought that the two treatments could differ is thought that the two treatments could differ even when thought to be equivalenteven when thought to be equivalent

If the lower (or upper) limit of the CI of the If the lower (or upper) limit of the CI of the treatment effect does not exceed this value, then treatment effect does not exceed this value, then the two drugs are deemed equivalentthe two drugs are deemed equivalent

Sample size is chosen to ensure that the Sample size is chosen to ensure that the confidence interval around the treatment effect is confidence interval around the treatment effect is narrownarrow

Usually requires approximately twice as large a Usually requires approximately twice as large a sample as a test of non-equivalencesample as a test of non-equivalence


Tests of non-inferiorityTests of non-inferiority

Conceptually similar to tests of equivalenceConceptually similar to tests of equivalence

New drug may be expected to be slightly inferior New drug may be expected to be slightly inferior to standard but at the same time offers other to standard but at the same time offers other benefits (eg. easier to take, less toxicities)benefits (eg. easier to take, less toxicities)

Need to show that the effect of the new treatment Need to show that the effect of the new treatment is not below some pre-stated is not below some pre-stated non-inferiority non-inferiority marginmargin

Confidence intervals again need to be narrow, Confidence intervals again need to be narrow, and sample sizes may be larger than in a and sample sizes may be larger than in a superiority trial superiority trial


-30

-20

-10

0

10

20

Example – difference in percentage undetectable at Example – difference in percentage undetectable at 24 weeks with confidence interval (regimen A vs 24 weeks with confidence interval (regimen A vs regimen B) regimen B)

Diff

eren

ce in

per

cent

age

A more effective

A less effective

DRUG CONSIDERED INFERIOR

DRUG CONSIDERED NON-INFERIOR


Group workGroup work


Session 6: Critically appraising Session 6: Critically appraising researchresearch


Why do we need to appraise researchWhy do we need to appraise research

Many people, particularly pharma companies, Many people, particularly pharma companies, have vested interests in some pieces of researchhave vested interests in some pieces of research

Although it is unlikely that anyone would Although it is unlikely that anyone would deliberately falsify research for their own deliberately falsify research for their own interests, the way in which results are presented interests, the way in which results are presented may be misleadingmay be misleading

Even if a study is perfectly carried out and Even if a study is perfectly carried out and presented appropriately, the results may not be presented appropriately, the results may not be applicable to your situationapplicable to your situation

Thus, we need to consider any piece of research Thus, we need to consider any piece of research carefully before acting on its recommendationscarefully before acting on its recommendations


The peer-review process – journal articlesThe peer-review process – journal articles

Most major journals use a process known as Most major journals use a process known as peer-reviewpeer-review

Each submitted article is usually sent to two or Each submitted article is usually sent to two or more experts in the field so that they may give more experts in the field so that they may give their opinion on the design and conduct of the their opinion on the design and conduct of the study, the analytical methods used and study, the analytical methods used and importance of the resultsimportance of the results

On the basis of their reports, journals may either On the basis of their reports, journals may either reject the article, ask for a resubmission with reject the article, ask for a resubmission with changes, or accept the articlechanges, or accept the article


Problems with the peer-review processProblems with the peer-review process

Hard to make the process truly blind – therefore Hard to make the process truly blind – therefore personal biases may be introducedpersonal biases may be introduced

May not be sent to someone who fully May not be sent to someone who fully understands or knows the areaunderstands or knows the area

Relies largely on goodwill of reviewers (payment, Relies largely on goodwill of reviewers (payment, if any, is minimal) and some put more time and if any, is minimal) and some put more time and effort into the task than otherseffort into the task than others

Thus, peer review system isn’t perfect and poor Thus, peer review system isn’t perfect and poor papers may be publishedpapers may be published

Difficult to improve the system though - having Difficult to improve the system though - having some system in place is better than none at all!some system in place is better than none at all!


What is and isn’t publishedWhat is and isn’t published

Large studies are more likely to be published Large studies are more likely to be published than small ones, irrespective of study qualitythan small ones, irrespective of study quality

If the study is small then those studies that show If the study is small then those studies that show a significant result are more likely to be a significant result are more likely to be published (published (publication biaspublication bias))

There is a perception that if you are part of an There is a perception that if you are part of an established group with a known track record, it is established group with a known track record, it is easier to get things published – not sure whether easier to get things published – not sure whether this is backed up by evidence!this is backed up by evidence!


Peer-review of other materialPeer-review of other material

Very limitedVery limited

Conference presentations are often selected on Conference presentations are often selected on basis of peer review of abstract onlybasis of peer review of abstract only

However, abstract is often very short, very vague However, abstract is often very short, very vague and may not contain all the information required and may not contain all the information required to make a valid decision on its qualityto make a valid decision on its quality

Conference abstracts are also selected to fit in Conference abstracts are also selected to fit in with the programme and planned sessionswith the programme and planned sessions


The main questions when appraising The main questions when appraising research research

Do I believe the results?Do I believe the results?




YESYES

Are the results important or new?Are the results important or new?




YESYES


Are the results applicable to me or to other Are the results applicable to me or to other people in a similar situation?people in a similar situation?

YESYES


Do I believe the results – are the results Do I believe the results – are the results valid? (RCTs)valid? (RCTs)

1.1. Was the assignment of patients to treatment Was the assignment of patients to treatment groups randomised? groups randomised?

a)a) How was the assignment list produced?How was the assignment list produced?

b)b) How was the assignment list concealed from the How was the assignment list concealed from the doctors?doctors?



2.2. Were all the patients who entered properly Were all the patients who entered properly accounted for?accounted for?

a)a) How “complete” was the follow-up?How “complete” was the follow-up?

b)b) How did the authors deal with patients who How did the authors deal with patients who did not receive assigned treatment or who did not receive assigned treatment or who deviated from the protocol?deviated from the protocol?

Was an ITT analysis performed?Was an ITT analysis performed?



3.3. To what extent was blinding carried out?To what extent was blinding carried out?

a)a) Patients?Patients?

b)b) Doctors?Doctors?

c)c) Other study personnel?Other study personnel?



4.4. How similar were groups at start of trial?How similar were groups at start of trial?



5.5. Aside from the experimental intervention, were Aside from the experimental intervention, were the two groups treated equally?the two groups treated equally?


Assessing validity – some words of caution!Assessing validity – some words of caution!

Remember that it is very easy to criticise a paper, Remember that it is very easy to criticise a paper, but it is not always as easy to carry out the but it is not always as easy to carry out the research in the first placeresearch in the first place

It is very difficult to write the perfect paper It is very difficult to write the perfect paper (someone, somewhere will always find (someone, somewhere will always find something wrong with it)something wrong with it)

You have to decide if, putting all your criticisms You have to decide if, putting all your criticisms together, they are enough to make you seriously together, they are enough to make you seriously doubt the validity of the findings doubt the validity of the findings




YESYES



Are the results important – what are the Are the results important – what are the results? (RCTs)results? (RCTs)

1.1. How large was the treatment effect?How large was the treatment effect?



1.1. How large was the treatment effect?How large was the treatment effect?

- Are the results clinically significant?- Are the results clinically significant?

- Relative risk? - Relative risk?

- Difference in risks? - Difference in risks?

- Number needed to treat? (an estimate of the - Number needed to treat? (an estimate of the number of patients who need to receive the number of patients who need to receive the treatment in order to prevent one ‘bad’ event)treatment in order to prevent one ‘bad’ event)



2.2. How precise was the treatment effect?How precise was the treatment effect?



2.2. How precise was the treatment effect?How precise was the treatment effect?

- How wide was the confidence interval? - How wide was the confidence interval?

- What interpretation do you make of the - What interpretation do you make of the confidence interval? confidence interval?




YESYES


Are the results applicable to me or to other Are the results applicable to me or to other people in a similar situation?people in a similar situation?

YESYES


Are the results applicable to me? (RCTs)Are the results applicable to me? (RCTs)

1.1. Were patients in the trial similar to my own Were patients in the trial similar to my own situation?situation?

2.2. Did the authors consider all clinically important Did the authors consider all clinically important outcomes?outcomes?

3.3. Are the likely benefits of the new treatment worth Are the likely benefits of the new treatment worth the potential harms/costs?the potential harms/costs?


Critically appraising cohort studiesCritically appraising cohort studies

General points are the same as for RCTs, although General points are the same as for RCTs, although clearly the issues of randomisation, blinding etc. are clearly the issues of randomisation, blinding etc. are not appropriatenot appropriate



How representative is the cohort: How representative is the cohort:

Who is included in the cohortWho is included in the cohort

Who is excluded?Who is excluded?

Are there any differences between those included Are there any differences between those included and excluded that could limit the generalisability of and excluded that could limit the generalisability of the findings?the findings?



Follow-up:Follow-up:

How is this maintained?How is this maintained?

How many patients are lost to follow-up?How many patients are lost to follow-up?

How are these patients dealt with in the analysis?How are these patients dealt with in the analysis?



Temporal changes:Temporal changes:

If the authors are considering changes over time If the authors are considering changes over time (or some treatment whose use may change over (or some treatment whose use may change over time) then has anything else changed which could time) then has anything else changed which could explain the findings?explain the findings?



Possible bias:Possible bias:

Have all other factors that could explain any Have all other factors that could explain any differences been considered?differences been considered?


Group workGroup work

Dr Caroline Sabin, Royal Free Hospital UK-CAB - 13 August 2004- Session 5: The design of RCTs of...

Documents

Transcript of Dr Caroline Sabin, Royal Free Hospital UK-CAB - 13 August 2004- Session 5: The design of RCTs of...