ROC Analysis Emily Kistner-Griffin, PhD Amy Wahlquist, MS Cancer Prevention and Control Statistics...
-
Upload
ronaldo-trim -
Category
Documents
-
view
213 -
download
0
Transcript of ROC Analysis Emily Kistner-Griffin, PhD Amy Wahlquist, MS Cancer Prevention and Control Statistics...
ROC Analysis
Emily Kistner-Griffin, PhDAmy Wahlquist, MS
Cancer Prevention and Control Statistics TutorialAugust 13, 2009
Outline
I. Motivating Example: Chest CTII. ClassificationIII. Sensitivity and SpecificityIV. ROC curve and AUC estimation
a. Nonparametric Curveb. Parametric Curve
V. ROC and Logistic RegressionVI. Comparing ROC curves
I. Motivating Example: Chest CT
Evaluating the probability of malignancy in pulmonary nodules seen on chest CT in 213 MUSC patients from two cohorts
Sample of 194 subjects seen in pulmonary clinic and 19 subjects with CT previous to an unrelated surgical intervention
Develop a prediction model from clinical data and radiological characteristics of lung nodules
Chest CT
A model of P (malignancy) of pulmonary nodules has been described in the literature (Swensen SJ et al., 1997)
Model included three demographic characteristics: patient age, smoking status (ever vs. never), any history of cancer
Model included three radiological characteristics: diameter, upper lobe location, and spiculation
Chest CT
Swensen et al. reported an area under the reciever operating curve of 0.8014 ± 0.0360 in a validation sample, using a logistic regression approach.
Interested in how well Swensen’s model performs in the MUSC cohort.
Interested in evaluating whether we can improve the prediction model by including other patient characteristics
II. Classification
• Consider medical tests that are measured on a continuous or ordinal scale
• Goal: to describe the performance of the medical test in classifying subjects into individuals with and without disease
• Examples: PSA and CA-125 as biomarkers of prostate and ovarian cancer; BI-RADS for breast imaging (radiologist determined probability of malignancy)
Classification from CT
• Consider the diameter of the nodule as measured on the CT scan (range: 3.3mm-15mm)
• Larger nodules are more likely to be malignant (OR: 1.34, 95% CI: 1.20-1.49)
• How well can we predict malignancy from nodule diameter?
Classification Tables
• Choose a cut-point on continuous or ordinal scale in order to assign disease status
TruthD=1
TruthD=0
ClassifiedD=1 TP FP
ClassifiedD=0 FN TN
III. Sensitivity & Specificity
• For selected cut-point determine sensitivity and specificity of medical test (or prediction model)
• Sensitivity = Pr ( TP | + ) = TP / (TP+FN) = TPF
• Specificity = Pr ( TN | — ) = TN / (TN+FP) = TNF
• In order to summarize test characteristics – must compute sensitivity and specificity at multiple cut-points
Sensitivity & Specificity Example
Cut-point Sensitivity Specificity
6 0.972 0.291
8 0.833 0.504
10 0.653 0.709
12 0.458 0.830
From Metz CE (1978) Basic Principles of ROC Analysis. Seminars in Nuclear Medicine; 8 (4): 283 – 297.
Decision Threshold• Lowering the threshold increases TPF (sensitivity) and
the FPF (1-specificity)
• Raising the threshold decreases the TPF and the FPF
• Points representing all possible TPF and FPF lie on a curve – passing through the lower (0,0) corner when all tests are called negative and the upper (1,1) corner when all the tests are called positive
• If the test is informative then all other points on the curve must be above the diagonal (TP more likely than FP)
• The curve describing the compromises between TPF and FPF is called the ROC curve
0.00
0.25
0.50
0.75
1.00
Sen
sitiv
ity
0.00 0.25 0.50 0.75 1.001 - Specificity
Area under ROC curve = 0.7411
Detailed report of Sensitivity and Specificity------------------------------------------------------------------------------ CorrectlyCutpoint Sensitivity Specificity Classified LR+ LR-------------------------------------------------------------------------------( >= 3.3 ) 100.00% 0.00% 33.80% 1.0000 ( >= 4 ) 100.00% 1.42% 34.74% 1.0144 0.0000( >= 5 ) 97.22% 13.48% 41.78% 1.1236 0.2061( >= 6 ) 97.22% 29.08% 52.11% 1.3708 0.0955( >= 7 ) 93.06% 39.72% 57.75% 1.5436 0.1749( >= 8 ) 83.33% 50.35% 61.50% 1.6786 0.3310( >= 9.1 ) 70.83% 61.70% 64.79% 1.8495 0.4727( >= 10 ) 65.28% 70.92% 69.01% 2.2449 0.4896( >= 11 ) 56.94% 78.72% 71.36% 2.6764 0.5469( >= 12 ) 45.83% 82.98% 70.42% 2.6927 0.6528( >= 13 ) 25.00% 90.78% 68.54% 2.7115 0.8262( >= 14 ) 13.89% 95.04% 67.61% 2.7976 0.9061( >= 15 ) 0.00% 98.58% 65.26% 0.0000 1.0144( > 15 ) 0.00% 100.00% 66.20% 1.0000--------------------------------------------------------------------
roctab malignant diameter, detail graph
ROC -Asymptotic Normal--Obs Area Std. Err. [95% Conf. Interval]--------------------------------------------------------213 0.7411 0.0347 0.67317 0.80900
Likelihood Ratios
LR+ = sensitivity / (1-specificity) =TPFFPF
LR- = (1-sensitivity) / specificity = 1-TPF1-FPF
LR+ is the slope between the origin and the point onthe ROC curve and LR- is the slope between the point on thecurve and the (1,1) point (Choi 1998)
IV. ROC curve and AUC estimation
• ROC: Receiver Operating Characteristic
• Developed in signal detection theory to illustrate how the receiver deciphers between signal and noise (1960s)
• Illustration of two test characteristics: sensitivity and specificity at selected cut-points (decision thresholds)
• Popularized in medical testing in the field of Radiology (1980s)
ROC curve and Thresholds• ROC curve describes disease detection independent of
disease prevalence (sensitivity and specificity are also)
• Prevalence may help determine the operating threshold:
– Low prevalence suggests reducing FPF (higher specificity, higher threshold, lower part of the curve)
– High prevalence suggests increasing TPF (higher sensitivity, lower threshold, higher part of the curve)
• In practice, must consider costs and consequences of FP and FN before selecting the desirable cut-off:
– Consequence of FN: death?– Consequence of FP: stressful, costly work-up or treatment
Area Under the ROC Curve
• Summarizes the performance of the test
• Probability that the result of the test for a randomly selected abnormal subject will be greater than the result of the test for a randomly selected normal subject
• Average TPF: averaged across whole range of FPF in (0,1)
• Perfect test gives AUC = 1.0 and an uninformative test gives AUC=0.50
• Parametric and non-parametric approaches to constructing the ROC curve and calculating the area under the curve (AUC)
a. Nonparametric ROC Curve
• Constructed by plotting sensitivity and (1 – specificity) at each possible cut-point
• Area under the curve (AUC) constructed using the trapezoidal rule
• Variance estimators have been derived Delong et al. (1988), Hanley and McNeil (1982); Bamber (1975)
Variance of AUC
• Specifically for Delong et al. (1988) variance estimate:
0110
1
20101
1
21010
101
110
11
11)ˆvar(
}ˆ)({1
1 and}ˆ)({
1
1
),(1
)( and ),(1
)(
if 0, if 5.0, if 1),( where
),(1ˆ
Sn
Sm
YVn
SXVm
S
YXm
YVYXn
XV
XYXYXYYX
YXmn
n
jj
m
ii
m
ijij
n
jjii
m
iji
n
j
Confidence Intervals for AUC• Must consider distribution of AUC estimate:
asymptotically normal or binomial assumption
• Must select standard error estimate (Delong et al. approach is the default):
ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67317 0.80900
. roctab malignant diameter, binomial
ROC -- Binomial Exact -- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67754 0.79916
b. Parametric ROC Curve
• Assumes a binormal model
• A monotone transformation of the test results exists to give results that are normally distributed in the diseased and non-diseased populations
• Method involves fitting a straight line to the empirical ROC points by plotting using normal probability scales on each axis (plot inverse of the standard normal cumulative distribution function for sensitivity and specificity)
• Intercept of the line is the standardized difference in the continuous variable between the two populations; slope is a ratio of the standard deviations
Parametric AUC Estimation
AUC is a function of the slope and intercept of theestimated line – using the standard normalcumulative distribution function
21
/ and /)(Let
b
a
ba DDDDD
Nonparametric vs. Parametric
• Parametric approaches assume a binormal distribution to makes inferences (obtain MLE): only when the assumption is true are the estimators unbiased
• With continuous data a nonparametric approach is recommended
• With discrete ratings a parametric approach is recommended as nonparametric approaches tend to underestimate the true AUC
• Note standard error of the AUC is smaller using a continuous scale
. rocfit malignant diameter, cont(10)
Fitting binormal model:
Binormal model of malignant on diameter Number of obs = 213Goodness-of-fit chi2(7) = 8.52Prob > chi2 = 0.2894Log likelihood = -456.16837
------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- intercept | 0.997803 0.181857 5.49 0.000 0.641370 1.354236 slope (*) | 1.170680 0.139487 1.22 0.221 0.897290 1.444070-------------+---------------------------------------------------------------- /cut1 | -1.296367 0.141750 -9.15 0.000 -1.574192 -1.018542 /cut2 | -0.668255 0.110960 -6.02 0.000 -0.885733 -0.450777 /cut3 | -0.222392 0.102919 -2.16 0.031 -0.424110 -0.020674 /cut4 | 0.202507 0.101135 2.00 0.045 0.004286 0.400729 /cut5 | 0.499186 0.103559 4.82 0.000 0.296214 0.702159 /cut6 | 0.756664 0.109249 6.93 0.000 0.542539 0.970788 /cut7 | 1.040925 0.119741 8.69 0.000 0.806237 1.275614 /cut8 | 1.541544 0.150124 10.27 0.000 1.247307 1.835781 /cut9 | 2.369036 0.244933 9.67 0.000 1.888975 2.849096------------------------------------------------------------------------------------------------------------------------------------------------------------ | Indices from binormal fit Index | Estimate Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------- ROC area | 0.741532 0.034471 0.673970 0.809094 delta(m) | 0.852328 0.144007 0.570080 1.134576 d(e) | 0.919346 0.151542 0.622329 1.216364 d(a) | 0.916517 0.150751 0.621050 1.211985------------------------------------------------------------------------------
(*) z test for slope==1
. rocplot, confband
0.2
5.5
.75
1S
ens
itivi
ty
0 .25 .5 .75 11 - Specificity
Area under curve = 0.7415 se(area) = 0.0345
V. ROC and Logistic Regression
• Prediction Model from Chest CT
• Use logistic regression to create probabilities of malignancy (represent diagnostic results from multiple predictors)
• Compare two logistic models of malignancy – one from previous literature and model with selected variables from the MUSC data
• Variables suggested in Swensen SJ et al. + surgical cohort (variable describing collection of samples)
• Variables selected using backwards regression in MUSC data
. logistic malignant surgical_cohort patient_age any_non_lung_cancer_history lung_cancer_history smoker_ever diameter upper_lobe spiculated
Logistic regression Number of obs = 207 LR chi2(8) = 73.20 Prob > chi2 = 0.0000Log likelihood = -94.454613 Pseudo R2 = 0.2793
------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------surgical_c~t | 7.045799 4.91585 2.80 0.005 1.794929 27.65751 patient_age | .9933921 .0184868 -0.36 0.722 .9578115 1.030294any_non_lu~y | 4.017493 1.537066 3.63 0.000 1.897978 8.50392lung_cance~y | 10.43958 8.011157 3.06 0.002 2.319987 46.9765 smoker_ever | 1.026627 .5138138 0.05 0.958 .3849437 2.737967 diameter | 1.233463 .0787204 3.29 0.001 1.088433 1.397817 upper_lobe | 1.483983 .5613942 1.04 0.297 .7069965 3.114874 spiculated | 2.094564 .8488535 1.82 0.068 .9465232 4.635065------------------------------------------------------------------------------
. predict swensen
. lsens, gensens(sensitivity) genspec(specificity) genpr(cutoffs)
0.0
00
.25
0.5
00
.75
1.0
0S
ens
itivi
ty/S
pec
ifici
ty
0.00 0.25 0.50 0.75 1.00Probability cutoff
Sensitivity Specificity
. lroc
0.0
00
.25
0.5
00
.75
1.0
0S
ens
itivi
ty
0.00 0.25 0.50 0.75 1.001 - Specificity
Area under ROC curve = 0.8344
Use saved predicted probabilities from logistic model:
. roctab malignant swensen
ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 207 0.8344 0.0294 0.77682 0.89203
. roctab malignant swensen, graph
Postestimation: 95% CI
Again use saved predicted probabilities fromlogistic model:
. roccomp malignant diameter swensen
ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval]-------------------------------------------------------------------------diameter 207 0.7351 0.0357 0.66518 0.80499swensen 207 0.8344 0.0294 0.77682 0.89203-------------------------------------------------------------------------Ho: area(diameter) = area(swensen) chi2(1) = 9.52 Prob>chi2 = 0.0020
VI. Comparing ROC curves
Using quantities defined by Delong et al. for variance estimation
to define chi-squared test statistic:
Testing AUC Equality
21
0110
10101
,01
11010
,10
~)ˆ()()ˆ(
11
}ˆ)(}{ˆ)({1
1
}ˆ)(}{ˆ)({1
1
LLLSL
Sn
Sm
S
YVYVn
S
XVXVm
S
n
j
sj
srj
rsr
m
i
si
sri
rsr
Models with Multiple Predictors. logistic malignant diameter any_non_lung_cancer_history
surgical_cohort lung_cancer_history pet_positive pack
Logistic regression Number of obs = 206 LR chi2(6) = 112.09 Prob > chi2 = 0.0000Log likelihood = -75.983489 Pseudo R2 = 0.4245
------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- diameter | 1.218243 .08847 2.72 0.007 1.05662 1.404588any_non_lu~y | 3.830492 1.693158 3.04 0.002 1.610666 9.109691surgical_c~t | 6.996053 5.682876 2.39 0.017 1.423719 34.37811lung_cance~y | 10.16367 8.299078 2.84 0.005 2.051197 50.36092pet_positive | 11.38458 5.025505 5.51 0.000 4.79259 27.04355 pack | 1.007755 .0046908 1.66 0.097 .9986032 1.016991------------------------------------------------------------------------------
. predict musc
. roccomp malignant diameter swensen musc, graph summary
0.00
0.25
0.50
0.75
1.00
Sen
sitiv
ity
0.00 0.25 0.50 0.75 1.001-Specificity
diameter ROC area: 0.741 swensen ROC area: 0.8344musc ROC area: 0.8987 Reference
. roccomp malignant diameter swensen musc
ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval]-------------------------------------------------------------------------diameter 202 0.7410 0.0359 0.67062 0.81131swensen 202 0.8344 0.0298 0.77605 0.89272musc 202 0.8987 0.0230 0.85374 0.94372-------------------------------------------------------------------------Ho: area(diameter) = area(swensen) = area(musc) chi2(2) = 22.81 Prob>chi2 = 0.0000
. rocgold malignant swensen diameter musc
------------------------------------------------------------------------------- ROC Bonferroni Area Std. Err. chi2 df Pr>chi2 Pr>chi2-------------------------------------------------------------------------------swensen (standard) 0.8344 0.0298diameter 0.7410 0.0359 8.2690 1 0.0040 0.0081musc 0.8987 0.0230 8.6304 1 0.0033 0.0066-------------------------------------------------------------------------------
b. Lorenz Curves
• ROC curve represents a monotone increasing function of the FPF (1-specificity)
• If the risk of disease does not vary monotonically with the diagnostic test then the ROC may not be convex
• Lee (1999) suggested a Lorenz curve (used commonly in economics) for such data
• The methodology involves reordering the test results to ensure that the ratio of disease subjects / no disease subjects in each category is increasing
• Must consider whether reordering makes practical sense (usually sensible on an ordinal scale but not necessarily on a continuous scale)
0.1
.2.3
.4.5
.6.7
.8.9
1cu
mul
ativ
e %
of m
alig
nant
=1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1cumulative % of malignant=0
Lorenz curve
Defining Lorenz Curves
• Plot cumulative percent of individuals with disease against the cumulative percent of individuals without the disease at each cut-point
• Examples when a Lorenz might be appropriate:– Test has similar means but different variances across
populations with and without disease– Bimodal distribution of test in either population– Skewed distribution in population with disease and symmetric
distribution in population without the disease
• A flatter Lorenz curve suggests a worse diagnostic test
• Two summary indices describe the curvature – Gini index: twice the area between the Lorenz curve and the
diagonal line – Pietra index: twice the area of the largest triangle inscribed
between the diagonal line and the curve
Lorenz Curves and ROC
. roctab malignant diameter, lorenz graph
. roctab malignant diameter, lorenz
Lorenz curve --------------------------- Pietra index = 0.2322 Gini index = 0.3301
• If the at-risk probabilities increase (or decrease) with increasing values of the test results then Gini = 2(AUC)-1
• Larger Pietra and Gini indices describe better diagnostic tests
• Gini index is related to average difference in post-test probabilities for two randomly selected subjects and Pietra index is related to average absolute change between pre and post test probabilities of disease