Addressing differences in rigour and relevance of evidence – a review of existing methods Rebecca...

Addressing differences in rigour and relevance of evidence –

a review of existing methods

Rebecca Turner, David Spiegelhalter, Simon Thompson

MRC Biostatistics Unit, Cambridge

OutlineOutline

Why address rigour and relevance?

Review of methods for addressing rigour and relevance

Bias modelling and using external information on sources of bias

Ongoing work and issues for discussion

Differences in rigour (internal bias)Differences in rigour (internal bias)

Examples of internal bias

Inadequacy of randomisation/allocation concealment in RCTs Non-compliance and drop-out in RCTs Selection bias and non-response bias in observational studies Confounding in observational studies Misclassification of exposure in case-control studies

Evidence synthesis: usual approach

Choose a minimum standard Include studies which achieve standard and make no allowance

for further differences between these Exclude studies failing to reach standard

Problems with usual approach to rigourProblems with usual approach to rigour

Relevant evidence, in some cases the majority of available studies, is discarded

undesirable effects on precision (and bias?)

No allowance for differences in rigour between studies included in combined analysis

once minimum criteria achieved, less rigorous studies given equal influence in analysis

policy decisions may be based on misleading results?

Differences in relevance (external bias)Differences in relevance (external bias)

Examples of external bias

Study population different from target population Outcome similar but not identical to target outcome Interventions different from target interventions e.g. dose

Evidence synthesis: usual approach

Similar to that for rigour: studies which achieve minimum standard included and no allowance for further differences

Sometimes separate analyses carried out for different types of population/intervention

Degrees of relevance are specific to target setting, so decisions on relevance are necessarily rather subjective

Example 1: donepezil for treatment ofdementia due to Alzheimer’s disease

Example 1: donepezil for treatment ofdementia due to Alzheimer’s disease

Analysis reported by Birks and Harvey (2003) included

17 double-blind placebo-controlled RCTs only.

40 relevant comparative studies identified:

17 double-blind placebo-controlled RCTs

1 single-blind placebo-controlled RCT

14 non-randomised and/or open-label studies

8 donepezil vs. active comparisons, 1 randomised

Example 2: modified donepezil exampleExample 2: modified donepezil example


1 double-blind placebo-controlled RCT

2 single-blind placebo-controlled RCTs



Example 2: modified donepezil exampleExample 2: modified donepezil example


1 double-blind placebo-controlled RCT

2 single-blind placebo-controlled RCTs



– Include only randomised studies? Allow for degree of blinding?

– Include all studies? Allow for degree of blinding and randomisation?

– Allow for additional sources of bias, and deviations from target population, outcome, details of intervention?

Methods for addressing differencesin rigour and relevance

Methods for addressing differencesin rigour and relevance

Existing approaches:

Methods based on quality scores

Random effects modelling of bias

Full bias modelling using external information on specific sources of bias

Methods based on quality scoresMethods based on quality scores

Exclude studies below a quality score threshold

Weight the analysis by quality

Examine relationship between effect size and quality score

Cumulative meta-analysis according to quality score

Problems include:

Difficult to capture quality in a single score

Quality items represented may be irrelevant to bias

No allowance for direction of individual biases

Random effects modelling of biasRandom effects modelling of bias

Assume that each study i estimates a biased parameter i rather than target parameter

Choose a distribution to describe plausible size (and direction) of the bias i for each study

Standard random-effects analysis is equivalent to assuming E[i ] =0 , V[i ] = 2

This assumption of common uncertainty about study biases seems rather strong

Hip replacements exampleHip replacements example

Comparison of hip replacements (Charnley vs. Stanmore)

Endpoint: patient requires revision operation

Three studies available: Registry data

RCT

Case series

Assumptions: RCT evidence unbiased

bias in case series > bias in registry data

Spiegelhalter and Best, 2003

Hip replacements example: allowing for biasHip replacements example: allowing for bias

o

o

o

o

o

o

RegistryRCT

Case series

UnweightedWeighted(1)Weighted(2)

0 1 2 3 4

Hazard ratio for revision operation

Values assigned to variance of bias, which controls the extent to which evidence will be downweighted.

Problems:

How to choose the values which control the weighting?

No separation of internal and external bias.

Full bias modellingFull bias modelling

Identify sources of potential bias in available evidence

Obtain external information on the likely form of each bias

Construct a model to correct the data analysis for multiple biases, on the basis of external information

Example (Greenland, 2005):

14 case-control studies of association between residential magnetic fields and childhood leukaemia.

Potential biases identified: Non-responseConfoundingMisclassification of exposure

Magnetic fields example: allowing for biasMagnetic fields example: allowing for bias

Bias corrected for OR 95% CI P-value

Non-response 1.45 (0.94,2.28) 0.05

Confounding 1.69 (1.32,2.33) 0.002

Misclassification 2.92 (1.42,35.1) 0.011

All three biases 2.70 (0.99,32.5) 0.026

Conventional analysis

Odds ratio for leukaemia, fields >3mG vs. 3mG:

1.68 (1.27, 2.22), P-value of 0.0001

Choosing values for the bias parametersChoosing values for the bias parameters

Bias due to an unknown confounder U:

Need to express prior beliefs for:

OR relating U to magnetic fields (within leukaemia strata)OR relating U to leukaemia (within magnetic fields strata)

Greenland (2005) chooses wide distributions giving 5th and 95th percentiles of 1/6 and 6.

Multiple studies

Greenland expects degree of confounding to vary according to study location and method of measuring magnetic fields

Uses location, measurement method as predictors for log ORs

Example: ETS and lung cancerExample: ETS and lung cancer

Wolpert and Mengersen, 2004:

29 case-control and cohort studies of association between ETS and lung cancer.

Potential biases identified: Eligibility violationsMisclassification of exposureMisclassification of cases

Penalty points represent each study’s control of each bias

Error rates assumed to increase with each penalty point

E.g. eligibility violation: 5% for typical studies

doubles with each penalty point

Arguments against bias modellingArguments against bias modelling

Impossible to identify all sources of bias

Little information on the likely effects of bias, even for known sources

Bias modelling requires external (subjective) input, rather than letting the data “speak for themselves”

Increases complexity of analysis problems with presentation and interpretation

Arguments for bias modellingArguments for bias modelling

Assumption of zero bias is extremely implausible in most analyses (although zero expected bias may be reasonable)

Uncertainty due to potential biases may be much larger than uncertainty due to random error

Informal discussion of the possible effects of bias is not sufficient

Preferable to include all relevant data and model bias, rather than throwing much of the data away?

Aims of planned workAims of planned work Allow for both rigour and relevance (internal & external bias)

Consider potential sources of bias, and available evidence on plausible sizes of biases

Construct simple models for adjustment

Develop elicitation strategy for obtaining judgements on reasonable size of unmodelled sources of bias

Develop strategy for sensitivity analysis

Simple models for bias

Require 4 bias parameters for each study:

RIG, RIG control rigour

REL, REL control relevance

ChallengesChallenges

Problem of multiple biases is complex, but approach for correction must be simple and accessible.

Otherwise evidence synthesis will, in general, continue to exclude some studies and make no allowance for differences between others.

When correcting for multiple biases, important to determine a strategy for sensitivity analysis.

Issues for discussionIssues for discussion

Credibility of findings which incorporate external information in addition to data

More acceptable when available evidence is scarce and expected to be biased than when many RCTs available?

Greenland and others argue that analysis corrected for biases should be treated as definitive analysis (i.e. not only sensitivity analysis) – is this a realistic aim?

ReferencesReferences

Eddy DM, Hasselblad V, Shachter R. Meta-analysis by the Confidence Profile Method. Academic Press: San Diego, 1992.

Greenland S. Multiple-bias modelling for analysis of observational data. Journal of the Royal Statistical Society Series A 2005; 168: 267-291.

Spiegelhalter DJ, Best NG. Bayesian approaches to multiple sources of evidence and uncertainty in complex cost-effectiveness modelling. Statistics in Medicine 2003; 22: 3687-3709.

Wolpert RL, Mengersen KL. Adjusted likelihoods for synthesizing empirical evidence from studies that differ in quality and design: effects of environmental tobacco smoke. Statistical Science 2004; 19: 450-471.

Addressing differences in rigour and relevance of evidence – a review of existing methods Rebecca...

Documents

Transcript of Addressing differences in rigour and relevance of evidence – a review of existing methods Rebecca...