Diagnostic test accuracy reviews. Advanced Meta-analysis ...

75
The University of Sydney School of Public Health Diagnostic test accuracy reviews. Advanced Meta-analysis: dealing with heterogeneity and test comparisons. Petra Macaskill Screening and Test Evaluation Program School of Public Health University of Sydney Co-convenor, Cochrane Screening and Diagnostic Tests Methods Group

Transcript of Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Page 1: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

The University of Sydney

School of Public Health

Diagnostic test accuracy reviews.

Advanced Meta-analysis: dealing with

heterogeneity and test comparisons.

Petra Macaskill

Screening and Test Evaluation Program

School of Public Health

University of Sydney

Co-convenor, Cochrane Screening and Diagnostic Tests

Methods Group

Page 2: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Outline

• Background

• Descriptive Analyses (available in Revman)

– Graphical displays

– Summary ROC

– Exploring heterogeneity

• Hierarchical Models (not available in Revman)

– Rationale for using hierarchical models

– Choice of model:

• Bivariate

• HSROC (Rutter and Gatsonis model)

– Investigating heterogeneity

– Index test comparisons

Requires statistical expertise

Page 3: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Major steps covered in:

Cochrane Handbook for Systematic Reviews of

Diagnostic Test Accuracy

Objective of the review (e.g. performance of a single test,

exploring heterogeneity in test performance, test comparisons)

Locating and selecting studies

Assessing study quality – QUADAS2 updates in preparation

Extracting data – to be updated

Meta-analysis

Interpretation of the results – in preparation

Chapter 10: Analysing and Presenting Results Petra Macaskill, Constantine Gatsonis, Jonathan Deeks, Roger Harbord, Yemisi

Takwoingi.

Systematic Review of

Diagnostic Test Performance

Page 4: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

http://srdta.cochrane.org/handbook-dta-reviews

Page 5: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Single index test:

Remains a common form of systematic review

Heterogeneity in test performance between studies is likely to be present, and reasons for it should be explored.

Test comparisons:

Increasing in importance and relevance

Methods for investigating heterogeneity can be applied

Ideally, test comparisons should focus on studies that directly compare the tests of interest

Systematic Review of

Diagnostic Test Performance

Page 6: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Reference test (binary)

“true” disease status, i.e. target condition

Index test (continuous, ordinal or binary)

Test threshold

Sensitivity and specificity

Likelihood ratios

ROC curve

Underlying Concepts

Page 7: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Test threshold: Individual Study Level

A plot of sensitivity against 1-specificity across the range of thresholds

results in a receiver operating characteristic (ROC) curve.

a single study:

diseasednon-diseased

TP

TP increases

FP increases

FP

threshold

TP decreases

FP decreases

Page 8: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

ROC curves: Individual Study Level

diseasednon-diseased

0 40 80 120

test measurement

0.0

0.2

0.4

0.6

0.8

1.0

sen

sitiv

ity

0.00.20.40.60.81.0

specificity

diseasednon-diseased

0 40 80 120

test measurement

0.0

0.2

0.4

0.6

0.8

1.0

sen

sitiv

ity

0.00.20.40.60.81.0

specificity

diseasednon-diseased

0 40 80 120

test measurement

0.0

0.2

0.4

0.6

0.8

1.0

sen

sitiv

ity

0.00.20.40.60.81.0

specificity

diseasednon-diseased

0 40 80 120

test measurement

0.0

0.2

0.4

0.6

0.8

1.0

sen

sitiv

ity

0.00.20.40.60.81.0

specificity

Page 9: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Most studies report test sensitivity and specificity at a threshold(s),

or provide sufficient information to construct the following 2 x 2

table at the threshold(s):

From this table we can compute

True positive rate (tpr):

False positive rate (fpr):

Data extraction

DnTPFNTPTPysensitivit

D

nFPTNFPFPyspecificit 1

“true” disease status

+ -

test

result

+ TP FP

- FN TN

Page 10: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Reasons for variability in test accuracy

between studies

• Random sampling error

For each study, the estimated sensitivity and specificity is subject to

sampling error. The larger the sample size, the smaller the

sampling error as shown by the confidence intervals in a Forest

plot.

Because the sensitivity and specificity are both proportions, the

within study sampling error is straightforward to estimate using

the binomial distribution.

Page 11: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Reasons for variability in test accuracy

between studies

• True underlying differences between studies

– In diagnostic reviews, sampling error is unlikely to account for all

of the variability (scatter) between studies.

– Additional heterogeneity in test performance between studies is

likely to occur for other reasons, including differences in:

• Cut-point chosen to define a positive test (threshold effect)

• Spectrum of disease

• Clinical setting

• Study design

• etc…

Even if all studies use the same cut-point, sensitivity and

specificity are expected to vary between studies

Page 12: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Graphical Displays

Descriptive plots should include:

– Forest plot showing sensitivity and specificity for each study and

the numbers on which these estimates are based for each study

– Scatter plot showing (1-specificity, sensitivity) pair for each study

in ROC space. The size of each marker should ideally reflect the

numbers in both the diseased and non-diseased groups.

RevMan provides facilities for:

• graphical displays (improvements made in version 5.2).

• summary ROC curve estimation based on Moses-Littenberg method

• Descriptive exploration of heterogeneity using subgroup analyses

Page 13: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

50 studies taken from the review conducted by Nishimura (2007) of

Rheumatoid factor (RF) as a marker for rheumatoid arthritis (RA)

The cut-point for test positivity for RF varied between studies ranging 3

to 100 U/ml (not all studies reported the cut-point)

The reference standard was based on the 1987 revised American

College of Rheumatology (ACR) criteria or clinical diagnosis.

Note: RF contributes to the ACR criteria so there is some risk of bias in

this analysis.

Example: Rheumatoid Factor as a marker

for Rheumatoid Arthritis

Page 14: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Study

Aho 1999

Anuradha 2005

Banchuin 1992

Bas 2003

Berthelot 1995

Bizzaro 2001

Bombardieri 2004

Carpenter 1989

Choi 2005

Cordonnier 1996

Das 2004

Davis 1989

de Bois 1996

De Rycke 2004

Despres 1994

Dubucquoi 2004

Fernandez-Suarez 2005

Girelli 2004

Goldbach-Mansky 2000

Gomes-Daudrix 1994

Greiner 2005

Grootenboer-Mignot 2004

Hitchon 2004

Jansen 2003

Jonsson 1998

Kamali 2005

Kwok 2005

Lee 2003

Lopez-Hoyos 2004

Nell 2005

Quinn 2006

Rantapaa-Dahlqvist 2003

Raza 2005

Saraux 1995

Saraux 2003

Sauerland 2005

Schellekens 2000

Soderlin 2004

Spiritus 2004

Suzuki 2003

Swedler

Thammanichanond 2005

Vallbracht 2004

van Leeuwen 1988

Vasiliauskiene 2001

Visser 1996

Vittecoq 2001

Vittecoq 2004

Winkles 1989

Young 1991

TP

64

482

36

143

80

61

27

60

261

20

42

18

8

93

143

84

30

32

70

48

75

64

32

130

50

20

77

73

36

56

115

49

22

8

35

161

80

5

57

383

89

57

196

163

75

157

26

62

113

25

FP

16

2

6

43

50

36

6

8

54

2

46

3

8

28

39

41

2

29

39

1

42

18

10

8

14

32

16

22

3

11

53

23

2

8

8

89

28

4

9

38

3

25

75

10

21

287

1

11

19

1

FN

27

82

41

53

39

37

3

20

63

29

14

31

0

25

63

56

23

3

36

40

12

29

9

128

20

26

52

29

5

46

67

28

20

31

51

7

69

11

33

166

9

6

99

28

21

78

32

114

29

14

TN

153

153

313

196

45

196

33

119

197

18

127

25

31

118

130

90

73

13

93

99

191

73

13

113

191

25

52

90

70

87

63

359

80

91

149

360

284

49

93

170

39

111

345

140

106

1466

29

127

481

20

cutoff

8.0

100.0

15.0

87.0

9.0

40.0

16.3

3.0

3.125

100.0

20.0

50.0

20.0

20.0

20.0

20.0

30.0

40.0

20.0

15.0

80.0

22.0

40.0

20.0

30.0

40.0

20.0

3.0

20.0

20.0

15.0

20.0

20.0

15.0

17.0

9.0

80.0

16.0

40.0

Method

LA

LA

ELISA

ELISA

LA

Nephelometry

Nephelometry

ELISA

LA

LA

Nephelometry

ELISA

ELISA

LA

LA

ELISA

Nephelometry

Nephelometry

Nephelometry

ELISA

Nephelometry

Nephelometry

Nephelometry

Nephelometry

ELISA

LA

Nephelometry

LA

Nephelometry

Not reported

Not reported

ELISA

LA

LA

ELISA

Nephelometry

ELISA

LA

Nephelometry

Nephelometry

Nephelometry

LA

ELISA

ELISA

ELISA

ELISA

LA

ELISA

LA

RA hemagglutination

Sensitivity

0.70 [0.60, 0.79]

0.85 [0.82, 0.88]

0.47 [0.35, 0.58]

0.73 [0.66, 0.79]

0.67 [0.58, 0.76]

0.62 [0.52, 0.72]

0.90 [0.73, 0.98]

0.75 [0.64, 0.84]

0.81 [0.76, 0.85]

0.41 [0.27, 0.56]

0.75 [0.62, 0.86]

0.37 [0.23, 0.52]

1.00 [0.63, 1.00]

0.79 [0.70, 0.86]

0.69 [0.63, 0.76]

0.60 [0.51, 0.68]

0.57 [0.42, 0.70]

0.91 [0.77, 0.98]

0.66 [0.56, 0.75]

0.55 [0.44, 0.65]

0.86 [0.77, 0.93]

0.69 [0.58, 0.78]

0.78 [0.62, 0.89]

0.50 [0.44, 0.57]

0.71 [0.59, 0.82]

0.43 [0.29, 0.59]

0.60 [0.51, 0.68]

0.72 [0.62, 0.80]

0.88 [0.74, 0.96]

0.55 [0.45, 0.65]

0.63 [0.56, 0.70]

0.64 [0.52, 0.74]

0.52 [0.36, 0.68]

0.21 [0.09, 0.36]

0.41 [0.30, 0.52]

0.96 [0.92, 0.98]

0.54 [0.45, 0.62]

0.31 [0.11, 0.59]

0.63 [0.53, 0.73]

0.70 [0.66, 0.74]

0.91 [0.83, 0.96]

0.90 [0.80, 0.96]

0.66 [0.61, 0.72]

0.85 [0.80, 0.90]

0.78 [0.69, 0.86]

0.67 [0.60, 0.73]

0.45 [0.32, 0.58]

0.35 [0.28, 0.43]

0.80 [0.72, 0.86]

0.64 [0.47, 0.79]

Specificity

0.91 [0.85, 0.94]

0.99 [0.95, 1.00]

0.98 [0.96, 0.99]

0.82 [0.77, 0.87]

0.47 [0.37, 0.58]

0.84 [0.79, 0.89]

0.85 [0.69, 0.94]

0.94 [0.88, 0.97]

0.78 [0.73, 0.83]

0.90 [0.68, 0.99]

0.73 [0.66, 0.80]

0.89 [0.72, 0.98]

0.79 [0.64, 0.91]

0.81 [0.73, 0.87]

0.77 [0.70, 0.83]

0.69 [0.60, 0.77]

0.97 [0.91, 1.00]

0.31 [0.18, 0.47]

0.70 [0.62, 0.78]

0.99 [0.95, 1.00]

0.82 [0.76, 0.87]

0.80 [0.71, 0.88]

0.57 [0.34, 0.77]

0.93 [0.87, 0.97]

0.93 [0.89, 0.96]

0.44 [0.31, 0.58]

0.76 [0.65, 0.86]

0.80 [0.72, 0.87]

0.96 [0.88, 0.99]

0.89 [0.81, 0.94]

0.54 [0.45, 0.64]

0.94 [0.91, 0.96]

0.98 [0.91, 1.00]

0.92 [0.85, 0.96]

0.95 [0.90, 0.98]

0.80 [0.76, 0.84]

0.91 [0.87, 0.94]

0.92 [0.82, 0.98]

0.91 [0.84, 0.96]

0.82 [0.76, 0.87]

0.93 [0.81, 0.99]

0.82 [0.74, 0.88]

0.82 [0.78, 0.86]

0.93 [0.88, 0.97]

0.83 [0.76, 0.89]

0.84 [0.82, 0.85]

0.97 [0.83, 1.00]

0.92 [0.86, 0.96]

0.96 [0.94, 0.98]

0.95 [0.76, 1.00]

Sensitivity

0 0.2 0.4 0.6 0.8 1

Specificity

0 0.2 0.4 0.6 0.8 1

Page 15: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Forest plot – sorted by specificity

Page 16: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Rheumatoid Factor as a marker

for Rheumatoid Arthritis

Page 17: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Moses LE, Shapiro D, Littenberg B Stat Med 1993; 12:1293-1316.

For each study i

Compute accuracy (log diagnostic odds ratio, lnDOR):

and proxy for threshold (based on overall positivity rate):

Moses-Littenberg SROC regression

)logit()logit( iii fprtprD

)logit()logit( iii fprtprS

Page 18: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

The relationship between test accuracy and test threshold is modelled

to estimate a summary ROC curve.

This fixed effect model is generally fitted using linear regression

(unweighted or weighted by inverse variance of lnDOR).

b 0 Accuracy depends on threshold resulting in an

asymmetric SROC

b = 0 Accuracy is independent of threshold resulting in a

symmetric SROC

The SROC is produced by using the estimates of a and b to compute the

expected sensitivity (tpr) across a range of values for 1-specificity (fpr)

SROC regression: model specification

bSaD

Page 19: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

SROC regression:

properties and summary measures

Page 20: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Rheumatoid Factor as a marker

for Rheumatoid Arthritis

Moses-Littenberg SROC

Page 21: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Historically, this has been the most commonly used method

easy to implement

uses standard regression methods / software

can use regression diagnostics to identify influential studies

but

does not take proper account of within and between study variability

confidence intervals and P-values are likely to be inaccurate

should be regarded as a descriptive/exploratory analysis

Hence:

Revman5 will provide only exploratory analyses based on SROC

regression. Statistical inference will require more complex analyses

using multilevel (hierarchical) models using other software.

Moses-Littenberg SROC regression:

comments

Page 22: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Historically, this has been the most commonly used method

easy to implement

uses standard regression methods / software

can use regression diagnostics to identify influential studies

but

does not take proper account of within and between study variability

confidence intervals and P-values are likely to be inaccurate

should be regarded as a descriptive/exploratory analysis

Multilevel (hierarchical) models have the advantage that they take

proper account of both:

(i) within study variability (sampling error)

(ii) between study variability not accounted for by (i), through the

inclusion of random effects

Moses-Littenberg SROC regression:

comments

Page 23: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Hierarchical (Mixed) models have the advantage that

they take account of both:

(i) within study variability (sampling error)

(ii) between study variability (heterogeneity) not

accounted for by (i), through the inclusion of random

effects

Hierarchical models provide a more rigorous method that

allow statistical inferences to be made.

Hierarchical (Mixed) models

Page 24: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Two hierarchical models most commonly used for the

meta-analysis of studies of diagnostic accuracy:

Bivariate model: the primary objective is to obtain a

summary estimate of sensitivity and specificity

and

HSROC model: the primary objective is to fit a

summary ROC

The two models are mathematically equivalent when no

covariates are included in the model

Hierarchical (Mixed) models

Page 25: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Estimating a summary operating point:

• This is appropriate if there is a common cut-point or criterion for

test positivity between studies

• If studies use different criteria for test positivity the summary

operating point will be difficult to interpret.

Estimating a summary curve:

• This is appropriate if there is variation in the cut-point or criterion

for test positivity between studies

• If studies use the same criterion for test positivity, there will be

very limited information to inform the shape of the curve.

Which method to use?

Page 26: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

If no covariates included in the model, the Bivariate

and HSROC methods are mathematically equivalent:

• The parameter estimates from the HSROC model can be used to

derive the summary point and corresponding confidence region

• The parameter estimates from the Bivariate model can be used

to obtain the HSROC

If covariates are included in the model to explore

reasons for heterogeneity in test performance, the

choice will be guided jointly by:

The research question: Whether we want to make inferences about (i)

the summary curve or (ii) the summary point

Whether or not there is a common criterion for test positivity.

Which method to use?

Page 27: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Bivariate model:

Models the relationship between sensitivity and specificity directly (after

logit transformation), including random effects for both and allowing

for correlation between them.

The focus is on estimating the expected sensitivity and specificity (i.e.

expected operating point).

An underlying SROC can be derived from the estimated model

parameters (the HSROC is one of the possible SROC curves).

HSROC (Rutter and Gatsonis) model:

Includes random effects test accuracy and the proxy for test threshold.

The focus is on estimating a summary ROC.

The expected sensitivity for a given specificity, expected operating

point, etc can be derived from the estimated model parameters.

Multilevel (hierarchical) models

Page 28: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 1

For each study (i), the number testing positive is assumed to follow a

Binomial distribution

where j=1 represents diseased group

j=2 represents non-diseased group

represents the number in group j

represents the probability of a positive

test result in group j

LEVEL 2

Model can be fitted using random effects logistic regression

(e.g. SAS, Stata, R, ...)

Bivariate model

),(~ ijijij nBy

ijn

ij

2

2

~)1logit(

)logit(

BAB

ABA

B

A

i

iBN

B

A

spec

sens

Page 29: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

37 studies taken from the review conducted by Nishimura (2007) of

anti-cyclic citrullinated peptide antibody (anti-CCP).

the anti-CCP test is deemed positive if any anti-CCP antibody is

detected. Hence, detection may be considered a common threshold

the reference standard was based on the 1987 revised American

College of Rheumatology (ACR) criteria or clinical diagnosis.

if we can assume a common threshold (cut-point or criterion for test

positivity) across studies, it is appropriate to focus on summary

estimate(s) for sensitivity and specificity.

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis.

Page 30: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Anti-CCP for the diagnosis of

rheumatoid arthritis. Study

Bas 2003

Bizzaro 2001

Goldbach-Mansky 2000

Jansen 2003

Saraux 2003

Schellekens 2000

Vincent 2002

Zeng 2003

Aotsuka 2005

Bombardieri 2004

Choi 2005

Correa 2004

De Rycke 2004

Dubucquoi 2004

Fernandez-Suarez 2005

Garcia-Berrocal 2005

Girelli 2004

Greiner 2005

Grootenboer-Mignot 2004

Hitchon 2004

Kamali 2005

Kumagai 2004

Kwok 2005

Lee 2003

Lopez-Hoyos 2004

Nell 2005

Nielen 2005

Quinn 2006

Rantapaa-Dahlqvist 2003

Raza 2005

Sauerland 2005

Soderlin 2004

Suzuki 2003

Vallbracht 2004

van Gaalen 2005

van Venrooij 2004

Vittecoq 2004

TP

110

40

43

110

40

72

139

90

115

23

236

74

89

90

31

69

25

70

167

26

26

64

71

68

38

42

149

147

47

24

171

7

481

190

82

865

69

FP

24

5

1

3

11

14

7

7

17

0

20

11

4

2

0

8

2

5

8

8

1

14

2

14

3

2

7

10

7

3

26

2

23

12

13

79

5

FN

86

58

63

148

46

77

101

101

16

7

88

8

29

50

22

18

10

17

98

15

20

15

58

35

0

60

109

35

20

18

60

9

68

105

71

252

107

TN

215

227

120

118

146

298

464

313

73

39

231

130

142

129

75

38

40

228

88

15

56

293

66

132

73

96

114

106

375

79

443

51

185

408

301

2218

133

Generation

CCP1

CCP1

CCP1

CCP1

CCP1

CCP1

CCP1

CCP1

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

CCP2

Sensitivity

0.56 [0.49, 0.63]

0.41 [0.31, 0.51]

0.41 [0.31, 0.51]

0.43 [0.37, 0.49]

0.47 [0.36, 0.58]

0.48 [0.40, 0.57]

0.58 [0.51, 0.64]

0.47 [0.40, 0.54]

0.88 [0.81, 0.93]

0.77 [0.58, 0.90]

0.73 [0.68, 0.78]

0.90 [0.82, 0.96]

0.75 [0.67, 0.83]

0.64 [0.56, 0.72]

0.58 [0.44, 0.72]

0.79 [0.69, 0.87]

0.71 [0.54, 0.85]

0.80 [0.71, 0.88]

0.63 [0.57, 0.69]

0.63 [0.47, 0.78]

0.57 [0.41, 0.71]

0.81 [0.71, 0.89]

0.55 [0.46, 0.64]

0.66 [0.56, 0.75]

1.00 [0.91, 1.00]

0.41 [0.32, 0.51]

0.58 [0.51, 0.64]

0.81 [0.74, 0.86]

0.70 [0.58, 0.81]

0.57 [0.41, 0.72]

0.74 [0.68, 0.80]

0.44 [0.20, 0.70]

0.88 [0.85, 0.90]

0.64 [0.59, 0.70]

0.54 [0.45, 0.62]

0.77 [0.75, 0.80]

0.39 [0.32, 0.47]

Specificity

0.90 [0.85, 0.93]

0.98 [0.95, 0.99]

0.99 [0.95, 1.00]

0.98 [0.93, 0.99]

0.93 [0.88, 0.96]

0.96 [0.93, 0.98]

0.99 [0.97, 0.99]

0.98 [0.96, 0.99]

0.81 [0.71, 0.89]

1.00 [0.91, 1.00]

0.92 [0.88, 0.95]

0.92 [0.86, 0.96]

0.97 [0.93, 0.99]

0.98 [0.95, 1.00]

1.00 [0.95, 1.00]

0.83 [0.69, 0.92]

0.95 [0.84, 0.99]

0.98 [0.95, 0.99]

0.92 [0.84, 0.96]

0.65 [0.43, 0.84]

0.98 [0.91, 1.00]

0.95 [0.92, 0.97]

0.97 [0.90, 1.00]

0.90 [0.84, 0.95]

0.96 [0.89, 0.99]

0.98 [0.93, 1.00]

0.94 [0.88, 0.98]

0.91 [0.85, 0.96]

0.98 [0.96, 0.99]

0.96 [0.90, 0.99]

0.94 [0.92, 0.96]

0.96 [0.87, 1.00]

0.89 [0.84, 0.93]

0.97 [0.95, 0.99]

0.96 [0.93, 0.98]

0.97 [0.96, 0.97]

0.96 [0.92, 0.99]

Sensitivity

0 0.2 0.4 0.6 0.8 1

Specificity

0 0.2 0.4 0.6 0.8 1

Page 31: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

data accp (keep=study_id sens spec true n);

input study_id $ generation tp fp fn tn;

sens=1; spec=0; true=tp; n=tp+fn; output; sens=0; spec=1; true=tn; n=tn+fp; output;

cards;

Bas 1 110 24 86 215

Bizzaro 1 40 5 58 227

Goldbach-Mansky 1 43 1 63 120

Jansen 1 110 3 148 118

Saraux 1 40 11 46 146

Schellekens 1 72 14 77 298

Vincent 1 139 7 101 464

Zeng 1 90 7 101 313

Aotsuka 2 115 17 16 73

Bombardieri 2 23 0 7 39

.

.

; The resulting SAS dataset accp will have two records per study,

the first contains the numerator and denominator for sensitivity

the second contains the numerator and denominator for specificity

Page 32: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

Summary estimate of

logit(sensitivity)

Summary estimate of

logit(specificity)

proc nlmixed data=accp cov ecov;

parms msens=2 mspec= 2 s2usens=0.5 s2uspec=0.5 covsesp=0;

logitp = (msens + usens)*sens + (mspec + uspec)*spec;

p = exp(logitp)/(1+exp(logitp));

model true ~ binomial(n,p);

random usens uspec ~ normal([0 , 0],[s2usens,covsesp,s2uspec])

subject=study_id out=randeffs;

run;

Page 33: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

proc nlmixed data=accp cov ecov;

parms msens=2 mspec= 2 s2usens=0.5 s2uspec=0.5 covsesp=0;

logitp = (msens + usens)*sens + (mspec + uspec)*spec;

p = exp(logitp)/(1+exp(logitp));

model true ~ binomial(n,p);

random usens uspec ~ normal([0 , 0],[s2usens,covsesp,s2uspec])

subject=study_id out=randeffs;

run;

Random effects

Distribution of the random effects

Page 34: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Fit Statistics

-2 Log Likelihood 545.6

AIC (smaller is better) 555.6

AICC (smaller is better) 556.4

BIC (smaller is better) 563.6

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

msens 0.6534 0.1275 35 5.13 <.0001 0.05 0.3946 0.9122 0.000013

mspec 3.1090 0.1459 35 21.31 <.0001 0.05 2.8128 3.4051 -0.00015

s2usens 0.5426 0.1463 35 3.71 0.0007 0.05 0.2455 0.8397 0.000222

s2uspec 0.5717 0.1873 35 3.05 0.0043 0.05 0.1914 0.9520 0.000039

covsesp -0.2704 0.1199 35 -2.26 0.0304 0.05 -0.5137 -0.02710 0.000036

Covariance Matrix of Parameter Estimates

Row Parameter msens mspec s2usens s2uspec covsesp

1 msens 0.01625 -0.00741 0.000890 -0.00004 -0.00004

2 mspec -0.00741 0.02128 -0.00006 0.004286 -0.00116

3 s2usens 0.000890 -0.00006 0.02142 0.003997 -0.00874

4 s2uspec -0.00004 0.004286 0.003997 0.03509 -0.01184

5 covsesp -0.00004 -0.00116 -0.00874 -0.01184 0.01436

Proc NLMIXED for Bivariate Model

Page 35: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Input of Model Results to RevMan

Page 36: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

The specificities appear to be

relatively homogenous but there is

considerable variation in the

sensitivities. (This is evident in the size of

the prediction region on the SROC plot.)

The summary estimate of sensitivity

and specificity is shown by the solid

black dot. (The sensitivity and specificity at

this point can be computed by inverse

transformation of the logit estimates to give

0.66 and 0.96 respectively.)

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis.

Page 37: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 1

For each study (i), the number testing positive is assumed to follow a

Binomial distribution

where j=1 represents diseased group

j=2 represents non-diseased group

represents the number in group j

represents the probability of a positive

test result in group j

The model takes the form:

where represents the “true” disease status (coded as -0.5 for the non-

diseased and 0.5 for the diseased)

Rutter and Gatsonis HSROC model

),(~ ijijij nBy

ijn

ij

ijijiiij disdis exp)logit(

ijdis

Page 38: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 1 cont.

The model is based on the ordinal logistic regression proposed by McCullagh.

Rutter and Gatsonis HSROC model

ijijiiij disdis exp)logit(

dependence

of accuracy on

threshold

(fixed effect)

threshold

(random effect)

accuracy

(random effect)

When = 0, the model reduces to a logistic regression model and

i is estimated by (logit(tpri) + logit(fpri))/2 ( = Si/2)

i is estimated by logit(tpri) - logit(fpri) ( = lnDORi)

LEVEL 1 cont.

The model is based on the ordinal logistic regression proposed by McCullagh.

ijijiiij disdis exp)logit(

LEVEL 1 cont.

The model is based on the ordinal logistic regression proposed by McCullagh.

ijijiiij disdis exp)logit( ijijiiij disdis exp)logit(

Page 39: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 2

The random effects are assumed to be independent and normally

distributed:

The SROC curve is computed using for

chosen values of fpr

When = 0, provides a global estimate of the expected test accuracy

(lnDOR) and the resulting SROC is symmetric.

The expected tpr and fpr are given by and

respectively.

Rutter and Gatsonis HSROC model

),(~ 2

Ni

),(~ 2

Ni

efpreetprE logit5.0

11)(

5.05.011 ee

5.05.011 ee

Page 40: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

The Rutter and Gatsonis HSROC model is a generalised non-linear

random effects model and hence requires more specialised software

to fit it.

It is often fitted using SAS Proc NLMIXED, or using Bayesian (MCMC)

methods.

Notes:

Metandi (macro available for Stata) exploits the relationship between

the Bivariate model and the HSROC model to fit the summary curve.

This software cannot accommodate covariates.

The METADAS macro for SAS create code for Proc NLMIXED and

provide output suitable for input to RevMan

Fitting the HSROC model

Page 41: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

50 studies taken from the review conducted by Nishimura (2007) of

Rheumatoid factor (RF) as a marker for rheumatoid arthritis (RA)

The cut-point for test positivity for RF varied between studies ranging 3

to 100 U/ml (not all studies reported the cut-point)

The reference standard was based on the 1987 revised American

College of Rheumatology (ACR) criteria or clinical diagnosis.

Note: RF contributes to the ACR criteria so there is some risk of bias in

this analysis.

Example: Rheumatoid Factor as a marker

for Rheumatoid Arthritis

Page 42: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

data rf (keep=study_id dis pos n);

input study_id $ tp fp fn tn method $;

dis=0.5; pos=tp; n=tp+fn; output;

dis=-0.5; pos=fp; n=tn+fp; output;

cards;

Bizzaro 61 36 37 196 N Bombardieri 27 6 3 33 N Das 42 46 14 127 N Suzuki 383 38 166 170 N Swedler 89 3 9 39 N Aho 64 16 27 153 LA Berthelot 80 50 39 45 LA Choi 261 54 63 197 LA Cordonnier 20 2 29 18 LA DeRycke 93 28 25 118 LA . . ;

Proc NLMIXED for HSROC Model

The resulting SAS dataset rf will have two records per study,

the first contains the numerator and denominator for sensitivity

the second contains the numerator and denominator for 1-specificity

Page 43: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

proc nlmixed data=rf ecov cov;

parms alpha=2 theta=0 beta=0 s2ua=0 s2ut=0 ;

logitp = (theta + ut + (alpha + ua)*dis) * exp(-(beta)*dis);

p = exp(logitp)/(1+exp(logitp));

model pos ~ binomial(n,p);

random ut ua ~ normal([0,0],[s2ut,0,s2ua])

subject=study_id out=randeffs;

run;

Proc NLMIXED for HSROC Model

Summary estimate

for “threshold”

Summary estimate

for “accuracy”

Shape parameter

estimate

Page 44: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

proc nlmixed data=rf ecov cov;

parms alpha=2 theta=0 beta=0 s2ua=0 s2ut=0 ;

logitp = (theta + ut + (alpha + ua)*dis) * exp(-(beta)*dis);

p = exp(logitp)/(1+exp(logitp));

model pos ~ binomial(n,p);

random ut ua ~ normal([0,0],[s2ut,0,s2ua])

subject=study_id out=randeffs;

run;`

Proc NLMIXED for HSROC Model

Random effects

Distribution of the random effects

Page 45: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

alpha 2.6016 0.1862 48 13.97 <.0001 0.05 2.2273 2.9759 2.227E-6

theta -0.4370 0.1469 48 -2.98 0.0046 0.05 -0.7323 -0.1417 4.573E-6

beta 0.2267 0.1624 48 1.40 0.1691 0.05 -0.09978 0.5532 -1.16E-6

s2ua 1.3014 0.3046 48 4.27 <.0001 0.05 0.6890 1.9137 -6.42E-7

s2ut 0.5423 0.1237 48 4.39 <.0001 0.05 0.2937 0.7909 -6.99E-6

Proc NLMIXED for HSROC Model

Page 46: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Input of Model Results to RevMan

Page 47: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: RF for the diagnosis of

rheumatoid arthritis.

The summary curve shows the

expected trade-off between sensitivity

and specificity as threshold varies.

Notes:

Since RF constitutes part of the ACR

criteria, diagnostic accuracy may be

overestimated.

The impact of potentially influential studies

should be investigated.

Page 48: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Reasons for variability in test accuracy

between studies

• True underlying differences between studies

– In diagnostic reviews, sampling error is unlikely to account for all

of the variability (scatter) between studies.

– Additional heterogeneity in test performance between studies is

likely to occur for other reasons, including differences in:

• Cut-point chosen to define a positive test (threshold effect)

• Spectrum of disease

• Clinical setting

• Study design

• etc…

Covariates can be included in both the Bivariate and

HSROC models to investigate factors that may be

associated with heterogeneity.

Page 49: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

37 studies taken from the review conducted by Nishimura (2007) of

anti-cyclic citrullinated peptide antibody (anti-CCP).

the anti-CCP test is deemed positive if any anti-CCP antibody is

detected. Hence, detection may be considered a common threshold

the reference standard was based on the 1987 revised American

College of Rheumatology (ACR) criteria or clinical diagnosis.

two generations of CCP are included in the analysis, CCP1 and CCP2

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis:

generation of CCP.

Page 50: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis:

generation of CCP.

Page 51: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 1

For each study (i), the number testing positive is assumed to follow a

Binomial distribution

where j=1 represents diseased group

j=2 represents non-diseased group

represents the number in group j

represents the probability of a positive

test result in group j

LEVEL 2

Assuming a study level covariate Z (assumed to have a fixed effect)

Model can be fitted using random effects logistic regression

(e.g. SAS, Stata, R, ...)

Bivariate model with a covariate

),(~ ijijij nBy

ijn

ij

2

2

~)1logit(

)logit(

BAB

ABA

iBB

iAA

i

i

Zv

ZvBN

B

A

spec

sens

Page 52: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

data accp (keep=study_id sens spec true n ccpg);

input study_id $ generation tp fp fn tn;

if generation eq 1 then ccpg=0;

if generation eq 2 then ccpg=1; sens=1; spec=0; true=tp; n=tp+fn; output; sens=0; spec=1; true=tn; n=tn+fp; output;

cards;

Bas 1 110 24 86 215

Bizzaro 1 40 5 58 227

Goldbach-Mansky 1 43 1 63 120

Jansen 1 110 3 148 118

Saraux 1 40 11 46 146

Schellekens 1 72 14 77 298

Vincent 1 139 7 101 464

Zeng 1 90 7 101 313

Aotsuka 2 115 17 16 73

Bombardieri 2 23 0 7 39

.

.

;

CCP1 is the

referent category

Page 53: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

proc nlmixed data=accp cov ecov; parms msens=2 mspec= 2 s2usens=0.5 s2uspec=0.5 covsesp=0

se1=0 sp1=0;

logitp = (msens + se1*ccpg + usens)*sens + (mspec + sp1*ccpg + uspec)*spec;

p = exp(logitp)/(1+exp(logitp));

model true ~ binomial(n,p);

random usens uspec ~ normal([0 , 0],[s2usens,covsesp,s2uspec]) subject=study_id out=randeffs;

/* Estimate logit(sensitivity) and logit(specificity) for CCP2 */

estimate 'logitsens CCP2' msens + se1;

estimate 'logitspec CCP2' mspec + sp1;

run;

run;

Notes:

The variance of the random effects for CCP1 and CCP2 are assumed to be the same

Page 54: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

Random effects estimates common to both CCP1 and CCP2

Fit Statistics

-2 Log Likelihood 533.4

AIC (smaller is better) 547.4

AICC (smaller is better) 549.1

BIC (smaller is better) 558.6

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

msens -0.09653 0.2203 35 -0.44 0.6640 0.05 -0.5438 0.3507 -0.00024

mspec 3.4467 0.2982 35 11.56 <.0001 0.05 2.8412 4.0522 -0.00002

s2usens 0.3598 0.1022 35 3.52 0.0012 0.05 0.1524 0.5673 0.000479

s2uspec 0.5399 0.1802 35 3.00 0.0050 0.05 0.1742 0.9057 -0.00002

covsesp -0.1968 0.09836 35 -2.00 0.0532 0.05 -0.3965 0.002825 0.000213

se1 0.9626 0.2513 35 3.83 0.0005 0.05 0.4523 1.4728 -0.00025

sp1 -0.4302 0.3377 35 -1.27 0.2111 0.05 -1.1158 0.2554 0.000046

Covariance Matrix of Parameter Estimates

Row Parameter msens mspec s2usens s2uspec covsesp se1 sp1

1 msens 0.04854 -0.02464 -0.00012 -0.00001 -0.00003 -0.04855 0.02465

2 mspec -0.02464 0.08895 -0.00002 0.004771 -0.00065 0.02463 -0.08834

3 s2usens -0.00012 -0.00002 0.01044 0.002118 -0.00440 0.000693 -0.00005

4 s2uspec -0.00001 0.004771 0.002118 0.03246 -0.00860 -0.00007 -0.00039

5 covsesp -0.00003 -0.00065 -0.00440 -0.00860 0.009674 0.000100 -0.00091

6 se1 -0.04855 0.02463 0.000693 -0.00007 0.000100 0.06317 -0.03160

7 sp1 0.02465 -0.08834 -0.00005 -0.00039 -0.00091 -0.03160 0.1140

Page 55: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Proc NLMIXED for Bivariate Model

Fit Statistics

-2 Log Likelihood 533.4

AIC (smaller is better) 547.4

AICC (smaller is better) 549.1

BIC (smaller is better) 558.6

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

msens -0.09653 0.2203 35 -0.44 0.6640 0.05 -0.5438 0.3507 -0.00024

mspec 3.4467 0.2982 35 11.56 <.0001 0.05 2.8412 4.0522 -0.00002

s2usens 0.3598 0.1022 35 3.52 0.0012 0.05 0.1524 0.5673 0.000479

s2uspec 0.5399 0.1802 35 3.00 0.0050 0.05 0.1742 0.9057 -0.00002

covsesp -0.1968 0.09836 35 -2.00 0.0532 0.05 -0.3965 0.002825 0.000213

se1 0.9626 0.2513 35 3.83 0.0005 0.05 0.4523 1.4728 -0.00025

sp1 -0.4302 0.3377 35 -1.27 0.2111 0.05 -1.1158 0.2554 0.000046

Covariance Matrix of Parameter Estimates

Row Parameter msens mspec s2usens s2uspec covsesp se1 sp1

1 msens 0.04854 -0.02464 -0.00012 -0.00001 -0.00003 -0.04855 0.02465

2 mspec -0.02464 0.08895 -0.00002 0.004771 -0.00065 0.02463 -0.08834

3 s2usens -0.00012 -0.00002 0.01044 0.002118 -0.00440 0.000693 -0.00005

4 s2uspec -0.00001 0.004771 0.002118 0.03246 -0.00860 -0.00007 -0.00039

5 covsesp -0.00003 -0.00065 -0.00440 -0.00860 0.009674 0.000100 -0.00091

6 se1 -0.04855 0.02463 0.000693 -0.00007 0.000100 0.06317 -0.03160

7 sp1 0.02465 -0.08834 -0.00005 -0.00039 -0.00091 -0.03160 0.1140

Estimates for CCP1 (the referent category),

Page 56: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Additional Estimates

Standard

Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

logitsens CCP2 0.8660 0.1209 35 7.16 <.0001 0.05 0.6206 1.1114

logitspec CCP2 3.0165 0.1622 35 18.59 <.0001 0.05 2.6871 3.3459

Covariance Matrix of Additional Estimates

Row Label Cov1 Cov2

1 logitsens CCP2 0.01461 -0.00697

2 logitspec CCP2 -0.00697 0.02632

Proc NLMIXED for Bivariate Model

the ESTIMATE command is used to get corresponding values for CCP2

Page 57: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

The change in -2logLikelihood when

the two covariates were added to the

model was 12.2 (a chi-squared statistic

with 2 df, P=0.002).

Hence, there is strong statistical

evidence that sensitivity and/or

specificity vary by generation.

The confidence regions show that

sensitivity varies by generation, but not

specificity.

Further models may be fitted to formally test

the effect of removing the covariate for

specificity from the model.

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis:

generation of CCP.

Page 58: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Summary estimates for specificity:

0.97 (95%CI 0.94, 0.98) for CCP1 and

0.95 (95%CI 0.94, 0.97) for CCP2.

Summary estimates for sensitivity:

0.48 (95%CI 0.37, 0.59) for CCP1 and

0.70 (95% CI 0.65, 0.75) for CCP2.

These results indicate an improvement

in sensitivity, without loss of specificity

for CCP2 compared with CCP1.

Bivariate Model Example :

Anti-CCP for the diagnosis of rheumatoid arthritis:

generation of CCP.

Page 59: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

50 studies taken from the review conducted by Nishimura (2007) of

Rheumatoid factor (RF) as a marker for rheumatoid arthritis (RA)

The cut-point for test positivity for RF varied between studies ranging 3

to 100 U/ml (not all studies reported the cut-point)

The reference standard was based on the 1987 revised American

College of Rheumatology (ACR) criteria or clinical diagnosis.

Method of measurement of RF:

15 studies used nephelometry (N), 16 used latex agglutination (LA),

16 used ELISA (E)

(3 studies excluded: 2 method not specified, 1 used RA

hemaggltination)

Example: Rheumatoid Factor as a marker for

Rheumatoid Arthritis:

Method of measurement of RF

Page 60: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Rheumatoid Factor as a marker for

Rheumatoid Arthritis:

Method of measurement of RF

Page 61: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

LEVEL 1

For each study (i), the number testing positive is assumed to follow a

Binomial distribution

where j=1 represents diseased group

j=2 represents non-diseased group

represents the number in group j

represents the probability of a positive

test result in group j

Assuming a study level covariate Z (assumed to have a fixed effect)

where represents the “true” disease status (coded as -0.5 for the non-

diseased and 0.5 for the diseased)

Rutter and Gatsonis HSROC model with a

covariate

),(~ ijijij nBy

ijn

ij

ijdis

ijiijiiiiij disZdisZZ exp)logit(

Page 62: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

data rf (keep=study_id dis pos n rfm1 rfm2);

input study_id $ tp fp fn tn method $;

rfm1=0; if method eq ‘LA’ then rfm1=1;

rfm2=0; if method eq ‘E’ then rfm2=1;

dis=0.5; pos=tp; n=tp+fn; output;

dis=-0.5; pos=fp; n=tn+fp; output;

cards;

Bizzaro 61 36 37 196 N Bombardieri 27 6 3 33 N Das 42 46 14 127 N Suzuki 383 38 166 170 N Swedler 89 3 9 39 N Aho 64 16 27 153 LA Berthelot 80 50 39 45 LA Choi 261 54 63 197 LA Cordonnier 20 2 29 18 LA DeRycke 93 28 25 118 LA . . ;

Proc NLMIXED for HSROC Model

N is the referent

category

Page 63: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

proc nlmixed data=rf ecov cov;

parms alpha=2 theta=0 beta=0 s2ua=0 s2ut=0

a1=0 a2=0 t1=0 t2=0 b1=0 b2=0 ;

logitp = (theta + t1*rfm1 +t2*rfm2 + ut +

(alpha + a1*rfm1 +a2*rfm2 + ua)*dis)

* exp(-(beta + b1*rfm1 + b2*rfm2)*dis);

p = exp(logitp)/(1+exp(logitp));

model pos ~ binomial(n,p);

random ut ua ~ normal([0,0],[s2ut,0,s2ua])

subject=study_id out=randeffs;

run;`

Proc NLMIXED for HSROC Model

This model assumes the SROC curves differ in shape.

Removing b1*rfm1 + b2*rfm2 from the model changed the -2 logLikelihood

by only 0.2 (a chi-squared statistic with 2df, P=0.9 ). Hence there is no statistical

evidence that the curves differ in shape.

Page 64: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

proc nlmixed data=rf ecov cov;

parms alpha=2 theta=0 beta=0 s2ua=0 s2ut=0

a1=0 a2=0 t1=0 t2=0;

logitp = (theta + t1*rfm1 +t2*rfm2 + ut +

(alpha + a1*rfm1 +a2*rfm2 + ua)*dis)

* exp(-(beta)*dis);

p = exp(logitp)/(1+exp(logitp));

model pos ~ binomial(n,p);

random ut ua ~ normal([0,0],[s2ut,0,s2ua])

subject=study_id out=randeffs;

/* parameter estimates for the methods of RF measurement; */

estimate 'alpha ELISA' alpha + a1;

estimate 'theta ELISA' theta + t1;

estimate 'alpha Nephelometry' alpha + a2;

estimate 'theta Nephelometry' theta + t2;

run;

Proc NLMIXED for HSROC Model

This model assumes the SROC curves all have the same asymmetric shape

Page 65: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

-2 Log Likelihood 753.1

AIC (smaller is better) 771.1

AICC (smaller is better) 773.2

BIC (smaller is better) 787.7

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

alpha 2.4552 0.3245 45 7.57 <.0001 0.05 1.8017 3.1087 -0.0004

theta -0.5490 0.2137 45 -2.57 0.0136 0.05 -0.9794 -0.1186 0.000139

beta 0.1995 0.1702 45 1.17 0.2472 0.05 -0.1432 0.5423 -0.00018

s2ua 1.2865 0.3109 45 4.14 0.0002 0.05 0.6603 1.9128 -0.00038

s2ut 0.4786 0.1139 45 4.20 0.0001 0.05 0.2492 0.7080 0.00062

a1 0.2483 0.4408 45 0.56 0.5760 0.05 -0.6395 1.1361 -0.00038

a2 0.3328 0.4439 45 0.75 0.4573 0.05 -0.5612 1.2269 0.000093

t1 -0.1962 0.2614 45 -0.75 0.4568 0.05 -0.7227 0.3303 -0.00017

t2 0.4960 0.2627 45 1.89 0.0654 0.05 -0.03301 1.0250 0.000366

Additional Estimates

Standard

Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

alpha ELISA 2.7035 0.3278 45 8.25 <.0001 0.05 2.0433 3.3637

theta ELISA -0.7452 0.2103 45 -3.54 0.0009 0.05 -1.1687 -0.3217

alpha Nephelometry 2.7880 0.3067 45 9.09 <.0001 0.05 2.1704 3.4057

theta Nephelometry -0.05297 0.2125 45 -0.25 0.8043 0.05 -0.4810 0.3750

Proc NLMIXED for HSROC Model

The common shape parameter to all 3 curves is given by beta

Page 66: Diagnostic test accuracy reviews. Advanced Meta-analysis ...
Page 67: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Rheumatoid Factor as a marker for

Rheumatoid Arthritis:

Method of measurement of RF

LA appears to be less accurate

than N and E whose curves show

very similar accuracy.

Removing a1*rfm1 +a2*rfm2

from the model gave a chi-squared

statistic of 0.6, 2df, P=0.74. Hence,

there is no statistical evidence

that the method of measurement

of RF is associated with

accuracy.

The effect of potentially influential

studies should be investigated.

Page 68: Diagnostic test accuracy reviews. Advanced Meta-analysis ...
Page 69: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Index Test Comparisons

Comparison based on all studies that evaluate one or both tests:

Methods of analysis follow the same approach as already outlined for investigation of heterogeneity

It may be necessary to allow variances of random effects to vary by test.

Such comparisons may be biased due to confounding arising from heterogeneity among studies in terms of design, study quality, setting, etc

Adjusting for potential confounders is often not feasible because the required information is typically missing or poorly reported.

Page 70: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Index Test Comparisons

Comparison restricted to studies that evaluate both tests:

Restricting the analysis to studies that evaluated both tests in the same patients ( truly “paired” studies), or randomized patients to receive each test, removes the need to adjust for confounders.

Methods of analysis for investigation of heterogeneity are extended to model sensitivity and specificity for both tests within each study (i.e. 2 records for sensitivity and 2 records for specificity per study, with a covariate for test type) all studies are analysed as if they are randomised

this approach is generally conservative

methods for dealing for pairing of test results within studies under development

The cross classification of tests results within disease groups for truly paired studies is generally not reported

Page 71: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Comparison of Computed Tomogrpahy (CT)

and Ultrasonography (US) for the diagnosis of

appendicitis.

22 studies were included in the review by Terasawa (2004)

12 studies evaluated CT

14 studies evaluated US

4 studies evaluated both CT and US.

Page 72: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Comparison of Computed Tomogrpahy (CT)

and Ultrasonography (US) for the diagnosis of

appendicitis.

Analysis based on all studies:

Strong statistical evidence of a difference

in sensitivity and specificity between the

tests (P<0.001)

CT has higher sensitivity and specificity

than US.

Page 73: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Example: Comparison of Computed Tomogrpahy (CT)

and Ultrasonography (US) for the diagnosis of

appendicitis.

Analysis based on comparative studies:

CT consistently shows higher sensitivity

than US

Specificity for CT is equal to or greater

than for US

Only 4 studies available for this model.

Convergence is an issue, and simplifying

assumptions may be necessary.

Page 74: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Analyses in RevMan are designed to be descriptive and exploratory.

Hierarchical models provide a more rigorous approach. The Bivariate model

and Rutter and Gatsonis HSROC models are the most commonly used.

The choice of model must be informed by the research question and whether a

common threshold for test positivity is used across studies.

Covariates can be included in hierarchical models to investigate heterogeneity.

The results can be input to RevMan for graphical display.

Modelling of test comparisons follows approach for investigation of

heterogeneity.

Ideally, comparative meta-analysis should focus on studies that compare tests

directly.

A comprehensive list of references is provided in Chapter 10 of the Handbook

for DTA Reviews.

Concluding Remarks

Page 75: Diagnostic test accuracy reviews. Advanced Meta-analysis ...

Small number of studies

Convergence issues

Model checking

Data reported at multiple thresholds per study:

• choosing a cutpoint for each study

• methods for analysing multiple 2x2 tables per study Hamza Taye H.; Arends Lidia R.; van Houwelingen Hans C.; Stijnen Theo

Multivariate random effects meta-analysis of diagnostic tests with multiple thresholds BMC MEDICAL RESEARCH METHODOLOGY Vol 9, Article Number: 73 DOI: 10.1186/1471-

2288-9-73 Published: NOV 10 2009

Other?

Discussion Points ( Methods continue to be extended and refined! )