Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and...
Transcript of Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and...
Design and assessment of diagnostic test
LI Jibin, MD, PhDDepartment of Clinical Research, Sun Yat-sen
University Cancer CenterEmail: [email protected]
Content
2
Definition1
Study design2
Assessment3
Application and clinical significance4
Content
3
Definition1
Study design2
Assessment3
Application and clinical significance4
Definition
• Diagnostic or screening tests are done to obtain information that can guide a health care provider's decision to initiate or continue a therapeutic intervention.– Tests performed in persons with a symptom or sign of an
illness are usually termed diagnostic test.
– Tests did in individuals with no such symptoms or sign are referred to as screening test.
4
5
Screening test
Diagnostic test
Treatment
preventionScreening again
NegativeNormal
PositiveNormal
PositiveDiagnosed with disease
Flowchart of screening and diagnostic test
• Diagnostic tests can be: – medical history– physical examination– laboratory test– imaging examination (X-ray, CT, MRI, etc.)– recognized diagnostic criteria– ……
6
Diagnostic test can be used to
• Screening• Determining severity• Optimal therapy• Prognosis• Monitor• ……
7
Example
• Carotid ultrasound can diagnose the severity of the patient’s carotid stenosis.
• Carotid ultrasound can tell you the patient’s prognosis of stroke.
• Carotid ultrasound can predict the efficacy of certain therapy on your patient.
8
Content
9
Definition1
Study design2
Assessment3
Application and clinical significance4
(1) Gold standard
• Gold standard: the most recognized standard for clinician to diagnose the target disease.
• Biopsy
• Surgical operation
• Pathological anatomy or autopsy
• Special imaging detection (X-ray film, CT scan)
• Long-term follow-up
• Other convincing tests
• ……
10
New diagnostic test vs. Gold standard
• Apply gold standard to confirm whether or not the participants have the target disease.
• New diagnostic test to examine whether a participant has a positive or negative result.
• Test results can be expressed as a 2×2 table.
11
• Construct a 2×2 table
12
Gold standard
Disease Normal Total
New diagnostic
test result
Positive a b a+b
Negative c d c+d
Total a+c b+d n
True +
True -False -
False +
Table 1 a 2×2 table of diagnostic test
(2) Participants selection
• Representativeness– Case and control participants should be recruited from the those
with the target disease and without the target disease, which should be representative for the corresponding population.
– A broad spectrum of the disease• Case group: different types of the disease, such as typical and non-
typical, from mild to severe, etc.• Control group: a broad spectrum of competing conditions
13
(3) Blinding method
• Blinding is important.• To avoid observer bias.• Observers determine the results of diagnostic test by
blinding of the disease conditions of participants.
14
(4) Sample size determination
• Statistical significance level: α• Allowable error: δ• Estimates of sensitivity and specificity
15
2
1 2 group : (1 )Case Z Sen Senn
2
2 2 group :(1 )
ControlZ Spe Spe
n
0.05, 1.96(two-side), 0.80, 0.60, 0.10Z Sen Spe
Example 1: Assuming a sensitivity of 80%, specificity of 60% of ultrasonography for diagnosis of cholecystolithiasis. Please estimate the sample size ?
16
2
1 2 group : (1 )Case Z Sen Senn
2
2 2 group :(1 )
ControlZ Spe Spe
n
0.05, 1.96(two-side), 0.80, 0.60, 0.10Z Sen Spe
Example 1: Assuming a sensitivity of 80%, specificity of 60% of ultrasonography for diagnosis of cholecystolithiasis. Please estimate the sample size ?
1.96 0.80 1 0.800.10 621.96 0.60 1 0.600.10 93
17
2
1 2 group : (1 )Case Z Sen Senn
2
2 2 group :(1 )
ControlZ Spe Spe
n
Content
18
Definition1
Study design2
Assessment3
Application and clinical significance4
Measures of assessment
19
Measures Formula
Sensitivity (Sen) a/(a+c)
Specificity (Spe) d/(b+d)
Youden’s index (J) Sen-(1-Spe)
Accuracy (Acc) (a+d)/(a+b+c+d)
Positive predictive value (+PV) a/(a+b)
Negative predictive value (-PV) d/(c+d)
Positive likelihood ratio (+LR) Sen/(1-Spe)
Negative likelihood ratio (-LR) (1-Sen)/Spe
Prevalence (Prev) (a+c)/(a+b+d+c)20
Gold standard
Case Control Total
New diagnostic
test result
Positive a b a+b
Negative c d c+d
Total a+c b+d n
True +
True -False -
False +
Table 1 a 2×2 table of diagnostic test
Example 2: 360 subjects received an independent, blind CPK (Creatine Phosphate Kinase) test for diagnosis of myocardial infarction (MI). The diagnostic test results were showed in table 2.
21
Table 2 the 2×2 table of CK diagnostic test
Gold standardTotal
MI No MI
CPK + (<80) 215 (a) 16 (b) 231
CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
22
(1) Sensitivity (Sen)• proportion of those with the certain disease who have a positive test.• Sen=a/(a+c)=215/230=0.935• False negative rate=1-Sen
(2) Specificity (Spe)• proportion of those without the certain disease who have a negative
test.• Spe=d/(b+d)=114/130=0.877• False positive rate=1-Spe
Gold standardTotal
MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
Normal Illness
Diagnostic test result
Normal IllnessOverlap
Population
A
B
Ideal distribution of normal and abnormal population
Actual distribution of normal and abnormal population
Diagnostic test result
23
Cut-off point
Normal Illness
- test + test
Relationship between sensitivity and specificity
24
False negative rate (β) False positive rate (α)
Spe (1-α)
Sen (1-β)
• Sen ↑,False - ↓
• More clinical significance with
diagnosing negative result.
• Application: it would result in
significant consequence when having
a high omission diagnostic rate.
Relationship between sensitivity and specificity
Cut-off point
Spe (1-α)
Sen (1-β)
Normal Illness
- test + test
25
Cut-off point
Spe (1-α)
Sen (1-β)
Normal Illness
• Spe ↑,False + ↓
• More clinical significance with
diagnosing positive result.
• Application: it would result in
significant consequence when having
a high missed diagnostic rate.- test + test
Relationship between sensitivity and specificity
26
Relationship between sensitivity and specificity
27
Blood glucose (mg/100ml) Sen (%) Spe(%) Blood glucose
(mg/100ml) Sen (%) Spe(%)
80 100.0 1.2 150 64.3 96.190 98.6 7.3 160 55.7 98.6100 97.1 25.3 170 52.9 99.6110 92.9 48.4 180 50.0 99.8120 88.6 68.2 190 44.3 99.8130 81.4 82.4 200 37.1 100.0140 74.3 91.2
Table 3 the Sen and Spe under different cut-off points of blood glucose test
To weight sensitivity and specificity by using optimal cut-off point
Which is better?
Gold standard
A Cancer Other
Cancer 160 40 200
Other 40 360 400
Total 200 400 600
Sensitivity160/200=80%
Specificity:360/400=90%
Gold standard
B Cancer Other
Cancer 170 60 230
Other 30 340 370
Total 200 400 600
Sensitivity:170/200=85%
Specificity340/400=85%
28
29
(3) Accuracy (Acc)
• The proportion of those with and without the disease who have a correct test results
=(215+114)/360=91.4%
Gold standardTotal
MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
30
(4) Youden’s index• The difference between true positive rate (Sen) and false positive rate
(1-Spe).• J=Sen-(1-Spe)• Range from 0 to 1; the closer to 1 of Youden’s index, the more
accuracy of the diagnostic test is.In the example 1,
• J=0.935-(1-0.877)=0.812
Gold standardTotal
MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
31
(5) Predictive values (posttest probability)• Positive predictive value (+PV): proportion of those with a positive test
who have the disease.• +PV=a/(a+b)=215/231=0.931
• Negative predictive value (-PV): proportion of those with a negative test who do not have the disease.• -PV=d/(c+d)=114/129=0.884
Gold standardTotal
MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
32
(6) Prevalence (pretest probability)
• The proportion of those with disease in the population
• Prevalence may vary largely according to different population
=230/360=0.639=63.9%
Gold standardTotal
MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129
Total 230 130 360
• Based on Bayes’ conditional probability theory,
33
1 1‐
Content
34
Definition1
Study design2
Assessment3
Application and clinical significance4
Stability of the index
• Stable index: Sensitivity, Specificity, +LR, -LR
• Relatively stable index: Accuracy
• Unstable index: +PV, -PV
35
Figure 5 Relationship between prevalence, Sen, Spe and PPV
36
Prevalence
PPVSen/Spe
37
• Prevalence ↑, PPV ↑
• Prevalence ↑, NPV ↓
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
PV (%
)
Prevalence (%)Figure 6 Illustration for the relationship
between PV and prevalence
-PV
+PV
Example 3: Predictive values under different prevalence of MI
• CPK test to diagnose MI in ICU
38
MI No MICPK + (<80) 215 16 231CPK – (≥80) 15 114 129Total 230 130 360
Sen=93.5%Spe=87.7%+PV=93.1%-PV=88.4%+LR=7.6-LR=0.07
Pre=64%
Sen=93.5%Spe=87.7%+PV=46.4%-PV=99.2%+LR=7.6-LR=0.7
Example 3: Predictive values under different prevalence of MI
• CPK test to diagnose MI in ICU
39
MI No MICPK + (<80) 215 16 231CPK – (≥80) 15 114 129Total 230 130 360
MI No MICPK + (<80) 215 248 463CPK – (≥80) 15 1822 1837Total 230 2070 2300
• CPK test to diagnose MI in general hospital
Sen=93.5%Spe=87.7%+PV=93.1%-PV=88.4%+LR=7.6-LR=0.07
Pre=64%
Pre=10% • Negative likelihood ratio (-LR)
• The ratio of false negative rate to true negative rate
• The smaller the value, the stronger the ability of the test to exclude the disease
• Positive likelihood ratio (+LR)
• The ratio of true positive rate to false positive rate
• The larger the value, the stronger the ability of the test to confirm the disease.
40
=0.07
=7.6
Likelihood ratio and its application
Likelihood ratio and its application
41
pretestpost odds odds LR
post
Pretest odds=
CPK testMI
+LRYes No
>280u 97 1 (97/230)/(1/130)=55
80-279u 118 15 (118/230)/(15/130)=4.4
40-79u 13 26 (13/230)/(26/130)=0.3
1-39u 2 88 (2/230)/(88/130)=0.01
合计 230 130
42
• Example 4: A male patient, 60 years old, the level of CPK test is 120u. Please estimate the probability that the patient was diagnosed with MI.– Based on the clinical information, it is estimated that the pretest probability of
MI is 60%– +LR=4.2 under CPK=120u
CPK testMI
+LRYes No
>280u 97 1 (97/230)/(1/130)=55
80-279u 118 15 (118/230)/(15/130)=4.4
40-79u 13 26 (13/230)/(26/130)=0.3
1-39u 2 88 (2/230)/(88/130)=0.01
合计 230 130
43
• Example 4: A male patient, 60 years old, the level of CPK test is 120u. Please estimate the probability that the patient was diagnosed with MI.– Based on the clinical information, it is estimated that the pretest probability of
MI is 60%– +LR=4.4 under CPK=120u
Pretest odds=0.6/(1-0.6)=1.5Posttest odds=1.5×4.2=6.6Posttest probability=6.6/(1+6.6)=0.868
44
ROC curve
• Receiver Operating Characteristic curve, a helpful way to distinguish real signals from false noises in the early days of radar.
• Be widely used to assess the accuracy of a diagnostic test.
• ROC curves nicely display the trade-offs of using more cut-off points of a diagnostic test
45
MI group CPK level No MI35 480 08 440 07 400 0
15 360 019 320 013 280 118 240 119 200 121 160 030 120 530 80 813 40 262 2 88
230 1 0230 130
46
97
133
1
129Sen=97/230=42.2%Spe=129/130=99.2%
215
15
16
114Sen=215/230=93.5%Spe=114/130=87.7%
Example 5: a diagnostic test of CPK level for MI
Sen=35/230=15.2%Spe=130/130=100.0%
35 0
195 130
47
>=480 >=280 >=80 >=40 >=1Sen 15.2% 42.2% 93.5% 99.1% 100%Spe 100% 99.2% 87.7% 67.6% 0%
Using 1、40、80、280 、480 of CPK level as cut-off points for diagnosing Myocardial infarction (MI), the corresponding Sensitivity and Specificity are:
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Sens
itivi
t
1-Specificity
Figure 7 ROC curve for CPK diagnostic test of MI
Area under ROC curve
• The area under ROC curve (AUC) can reflect the overall accuracy of a
diagnostic test.
• AUC ranges from 0.5 to 1.0; for the worthless test, AUC=0.5; for a perfect
test, AUC=1.0
48
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Change line
Worthless test Ideal perfect test
AUC and accuracy
AUC0.5~0.7 Poor accuracy0.7~0.9 Good accuracy
>0.9 Excellent accuracy
49
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Poor Good Excellent
a b c
• AUC can help decide which of two competing tests for the same target disease is the better one.
50
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
CK
EKG
Sens
itivi
ty
1-Specificity
Figure 8 ROC curve for CPK and EKG diagnostic test of MI
Change line
Combination of multiple diagnostic test
• Parallel test
51
A test B test A+B
+ +
++ +
(1 )Sen SenA SenA SenBSpe SpeA SpeB
• Reduce omission diagnosis rate .• When prevalence is low, parallel test can be used
as primary screening method.
52
Sen Spe↑ ↓
• Parallel test
Combined application of multiple diagnostic test
• Serial test
A test B test A+B+ + +
+ +
53
Sen = Sen A ×Sen BSpe = Spe A + (1-Spe A) × Spe B
• Misdiagnosis may cause nuisance effect • Confirmatory diagnosis
Sen Spe↓ ↑
• Serial test
54
Example 6: Combined tests for diagnosing Diabetes using urine glucose and blood glucose test
55
Test results DiabetesParallel
test resultsSerial test
resultsUrine glucose
Blood glucose Yes No
+ - 14 10- + 33 11+ + 117 21- - 35 7599
Total 199 7641
Example 6: Combined tests for diagnosing Diabetes using urine glucose and blood glucose test
56
Test results DiabetesParallel
test resultsSerial test
resultsUrine glucose
Blood glucose Yes No
+ - 14 10 + -- + 33 11 + -+ + 117 21 + +- - 35 7599 - -
Total 199 7641
57
Diagnostic test Results Diabetes No
diabetesSen (%)
Spe(%)
False –(%)
False + (%)
Urine glucose + 131 31 65.8 99.6 34.2 0.4- 68 7610
Blood glucose + 150 32 75.4 99.6 24.6 0.4- 49 7609
Serial test + 117 21 58.8 99.7 41.2 0.3- 82 7620
Total 199 7643Parallel test + 164 42 82.4 99.5 17.6 0.5
- 35 7599Total 199 7641
Table 4 Results of single and combined test for diabetes
58
Figure 9 Test and treatment thresholds in the diagnostic test process