Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence...

149
Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic, Jackknife, Bootstrap and other Statistical Methodologies David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014

Transcript of Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence...

Page 1: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic, Jackknife,

Bootstrap and other Statistical Methodologies

David G. Brown and Frank Samuelson

Center for Devices and Radiological Health, FDA

6 July 2014

Page 2: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Course OutlineI. Performance measures for Computational Intelligence (CI) observers

1. Accuracy

2. Prevalence dependent measures

3. Prevalence independent measures

4. Maximization of performance: Utility analysis/Cost functions

II. Receiver Operating Characteristic (ROC) analysis1. Sensitivity and specificity

2. Construction of the ROC curve

3. Area under the ROC curve (AUC)

III. Error analysis for CI observers1. Sources of error

2. Parametric methods

3. Nonparametric methods

4. Standard deviations and confidence intervals

IV. Boot strap methods1. Theoretical foundation

2. Practical use

V. References

Page 3: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

What’s the problem?

• Emphasis on algorithm innovation to exclusion of performance assessment

• Use of subjective measures of performance – “beauty contest”

• Use of “accuracy” as a measure of success• Lack of error bars—My CIO is .01 better than

yours (+/- ?)• Flawed methodology—training and testing on

same data • Lack of appreciation for the many different

sources of error that can be taken into account

Page 4: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Original image

Lena. Courtesy of the Signal and Image Processing Institute at the University of Southern California.

Page 5: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

CI improved image

Baboon. Courtesy of the Signal and Image Processing Institute at the University of Southern California.

Page 6: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Panel of expertsfunnymonkeysite.com

Page 7: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

I. Performance measures for computational intelligence (CI) observers

• Task based: (binary) discrimination task– Two populations involved: “normal” and “abnormal,”

• Accuracy – Intuitive but incomplete– Different consequences for success or failure for each

population

• Some measures depend on the prevalence (Pr) some do not, Pr = – Accuracy, positive predictive value, negative predictive value– Sensitivity, specificity, ROC, AUC

• True optimization of performance requires knowledge of cost functions or utilities for successes and failures in both populations

populationin subjects all ofnumber Total

populationin abnormals ofNumber

Page 8: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

How to make a CIO with >99% accuracy

• Medical problem: Screening mammography (“screening” means testing in an asymptomatic population)

• Prevalence of breast cancer in the screening population Pr = 0.5 %

• My CIO always says “normal”• Accuracy (Acc) is 99.5% (accuracy of accepted

present-day systems ~75%)• Accuracy in a diagnostic setting (Pr~20%) is

80% -- Acc=1-Pr (for my CIO)

Page 9: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

CIO operates on two different populations

Normal cases p(t|0)

Abnormal casesp(t|1)

Threshold t = T

t-axis

Page 10: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Must consider effects on normal and abnormal populations separately

• CIO output t• p(t|0) probability distribution of t for the population of normals• p(t|1) probability distribution of t for the population of abnormals• Threshold T. Everything to the right of T called abnormal, and

everything to the left of T called normal• Area of p(t|0) to left of T is the true negative fraction (TNF =

specificity) and to the right the false positive fraction (FPF = type 1 error). TNF + FPF = 1

• Area of p(t|1) to left of T is the false negative fraction (FNF = type 2 error) and to the right is the true positive fraction (TPF = sensitivity) FNF + TPF = 1

• TNF, FPF, FNF, TPF all are prevalence independent, since each is some fraction of one of our two probability distributions

• {Accuracy = Pr x TPF + (1-Pr) x TNF}

Page 11: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Normalcases

Abnormalcases

Threshold T

TPF (.95)

TNF (.5)

t-axis

t-axis

FNF (.05)

FPF (.5)

Page 12: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Prevalence dependent measures

• Accuracy (Acc) Acc = Pr x TPF + (1-Pr) x TNF

• Positive predictive value (PPV): fraction of positives that are true positives PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr))

• Negative predictive value (NPV): fraction of negatives that are true negativesNPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr)

• Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .05, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 Acc = .05x.95+.95x.5 = .52 PPV = .95x.05/(.95x.05+.5x.95) = .10NPV = .5x.95/(.5x.95+.05x.05) = .997

Page 13: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Prevalence dependent measures

• Accuracy (Acc) Acc = Pr x TPF + (1-Pr) x TNF

• Positive predictive value (PPV): fraction of positives that are true positives PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr))

• Negative predictive value (NPV): fraction of negatives that are true negativesNPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr)

• Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .005, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 Acc = .005x.95+.995x.5 = .50 PPV = .95x.005/(.95x.005+.5x.995) = .01NPV = .5x.995/(.5x.995+.05x.005) = .995

Page 14: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Acc, PPV, NPV as functions of prevalence(screening mammography)

• TPF=.95• FNF=.05• TNF=0.5• FPF=0.5

Page 15: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Acc = NPV as function of prevalence(forced “normal” response CIO)

Page 16: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Prevalence independent measures

• Sensitivity = TPF

• Specificity = TNF (1-FPF)

• Receiver Operating Characteristic (ROC) = TPF as a function of FPF (Sensitivity as a function of 1 – Specificity)

• Area under the ROC curve (AUC)

= Sensitivity averaged over all values of Specificity

Page 17: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

17

Threshold

TP

F,

sens

itivi

tyFPF, 1-specificity

Entire ROC curve

Normal / Class 0subjects

Abnormal / Class 1subjects

ROC slope

Page 18: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Empirical ROC data for mammography screening in the US

0 .0

0 .0

0 .0

0 .0

0 .1

0 .1

0 .1

0 .1

0 .2

0 .2

0 .2

0 .2

0 .3

0 .3

0 .3

0 .3

0 .4

0 .4

0 .4

0 .4

0 .5

0 .5

0 .5 0 .5

0 .6

0 .6

0 .6

0 .6

0 .7

0 .7

0 .7

0 .7

0 .8

0 .8

0 .8

0 .8

0 .9

0 .9

0 .9

0 .9

1 .0

1 .01 .0

1 .0

F a lse P o sitiv e F ractio n

Tru e N ega tiv e F ractio n

Tru

e P

osit

ive

Fra

ctio

n

Fal

se N

egat

ive

Fra

ctio

n

Craig Beam et al.

Page 19: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Maximization of performance• Need to know utilities or costs of each type of decision outcome –

but these are very hard to estimate accurately. You don’t just maximize accuracy.

• Need prevalence• For mammography example

– TPF: prolongation of life minus treatment cost– FPF: diagnostic work-up cost, anxiety– TNF: peace of mind– FNF: delay in treatment => shortened life

• Hypothetical assignment of utilities for some decision threshold T:– UtilityT= U(TPF) x TPF x Pr + U(FPF) x FPF x (1-Pr)

+ U(TNF) x TNF x (1-Pr) + U(FNF) x FNF x Pr– U(TPF) = 100, U(FPF) = -10, U(TNF) = 4, U(FNF) = -20– UtilityT= 100 x .95 x .05 – 10 x .50 x .95

+ 4 x .50 x .95 – 20 x .05 x .05 = 1.85• Now if we only knew how to trade off TPF versus FPF, we could

optimize (?) medical performance.

Page 20: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Utility maximization(mammography example)

Page 21: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Choice of ROC operating point through utility analysis—screening mammography

Page 22: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Utility maximization(mammography example)

Page 23: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Utility maximization calculation

UTPFTPF+UFNFFNF)PR+(UTNFTNF+UFPFFPF)(1-PR)

=(UTPFTPF+UFNF(1-TPF))PR+(UTNF(1-FPF)+UFPFFPF)(1-PR)

d/dFPF=(UFPF-UTNF)(1-PR)+(UTPF-UFNF)PRdTPF/dFPF

=0 dTPF/dFPF=(UTNF-UFPF)(1-PR)/(UTPF-UFNF)PR

PR=.005 dTPF/dFPF = 23.

PR=.05 dTPF/dFPF = 2.2

(UTPF=100, UFNF=-20, UTNF=4, UFPF=-20)

Page 24: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Threshold

Abnormalcases

TPF,

sensi

tivit

yFPF, 1-specificity

Entire ROC curve

Normalcases

ROC slope

Page 25: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Estimators

• TPF, FPF, TNF, FNF, Accuracy, the ROC curve, and AUC are all fractions or probabilities.

• Normally we have a finite sample of subjects on which to test our CIO. From this finite sample we try to estimate the above fractions– These estimates will vary depending upon the

sample selected (statistical variation).– Estimates can be nonparametric or

parametric

Page 26: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Estimators

• TPF=

• TPF=

• Number in sample << Number in population (at least in theory)

Number of abnormals that would be selected by CIO in the population

Number of abnormals in the population

Number of abnormals that were selected by CIO in the sample

Number of abnormals in the sample

Page 27: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

II. Receiver Operating Characteristic (ROC)

• Receiver Operating Characteristic

• Binary Classification

• Test result is compared to a threshold

Page 28: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Distribution of CIO Output for all Subjects

Threshold

Computational intelligence observer output

Page 29: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Threshold

Computational intelligence observer outputt-axis

Distribution of Outputfor Normal / Class 0

Subjects, p(t|0)

Distribution of Outputfor Abnormal / Class 1

Subjects, p(t|1)

Page 30: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Abnormal / Class 1subjects

Threshold

Distribution of Outputfor Normal / Class 0

Subjects, p(t|0)

Page 31: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Abnormal / Class 1subjects

Threshold

Sensitivity

Specificity

= True Negative Fraction = TNF

= True Positive Fraction = TPF

Distribution of Outputfor Normal / Class 0

Subjects, p(t|0)

Page 32: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

DecisionD0 D1

Threshold

Sensitivity

Specificity

TNF0.50

TPF0.95

Tru

thH

1

H0

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 33: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Threshold

1 - Specificity

= False Positive Fraction = FPF

1 - Sensitivity

= False Negative Fraction = FNF

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 34: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

1 - Specificity

1 - Sensitivity

FNF0.05

DecisionD0 D1

TNF0.50

FPF0.50

TPF0.95

Tru

thH

1

H0

Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 35: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

highsensitivity

Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 36: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

sensitivity = specificity

Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 37: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

highspecificity

Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 38: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

CIO #1Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

CIO #2

CIO #3

Which CIO is best?

TPF FPF

CIO #1 0.50 0.07

CIO #2 0.78 0.22

CIO #3 0.93 0.50

Page 39: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

CIO #1Threshold

Normal / Class 0subjects

Abnormal / Class 1subjects

CIO #2

CIO #3

Do not compare rates of one class, e.g. TPF, at different rates of the other class (FPF).

TPF FPF

CIO #1 0.50 0.07

CIO #2 0.78 0.23

CIO #3 0.93 0.50

Page 40: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Threshold

TP

F,

sens

itivi

tyFPF, 1-specificity

Entire ROC curve

Normal / Class 0subjects

Abnormal / Class 1subjects

Page 41: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

TP

F,

sens

itivi

tyFPF, 1-specificity

Entire ROC curve

Discriminability-or-

CIO performance

chan

ce lin

e

AUC=0.5

AUC=0.85

AUC=0.98

Page 42: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

AUC (Area under ROC Curve)

• AUC is a separation probability• AUC = probability that

– CIO output for abnormal > CIO output for normal– CIO correctly tells which of 2 subjects is normal

• Estimating AUC from finite sample– Select abnormal subject score = xi

– Select normal subject score = yk

– Is xi > yk ?

– Average over all x,y:

01

I1

AUC01

n

kki

n

i

yxnn

Page 43: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 44: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 45: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 46: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 47: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 48: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 49: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 50: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 51: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 52: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 53: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 54: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 55: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 56: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 57: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 58: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 59: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 60: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 61: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 62: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 63: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 64: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 65: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 66: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 67: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 68: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 69: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 70: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 71: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 72: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 73: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 74: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 75: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 76: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 77: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 78: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 79: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 80: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 81: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 82: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 83: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 84: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 85: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 86: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 87: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 88: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 89: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 90: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 91: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 92: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

ROC as a Q-Q plot

• ROC plots in probability space

• ROC plots in quantile space

Page 93: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Linear Likelihood Ratio Observer for Gaussian Data

• When the input features of the data are distributed as Gaussians with equal variance,– The optimal discriminant, the log-likelihood ratio, is a

linear function,– That linear discriminant is also distributed as a

Gaussian,– The signal to noise ratio (SNR) is easily calculated

from the input data distributions and is a monotonic function of AUC.

• Can serve as a benchmark against which to measure CIO performance

Page 94: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Linear Ideal Observer

• p(x|0) probability distribution of data x for the population of normals and p(x|1) probability distribution of x for the population of abnormals with components xi independent Gaussian distributed with means 0 and i respectively and identical variances i

2

D

ii

Di ixxp

1

222/2 )2/exp()2()0|(

D

iii

Di ixxp

1

222/2 )2/)(exp()2()1|(

Page 95: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Maximum Likelihood CIO

)0|(/)1|()0()0|(/)1()1|()|0(/)|1( xpxkppxppxpxpxpL

D

i i

iD

i i

iixkL1

2

2

12 2

expexp

D

i i

iixt1

21 exp

D

i i

iixtt1

21)ln(

Page 96: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Linear Ideal Observer ROC

),0(,0)0|(1

2

2

GaussGausstpN

i i

i

,)1|( Gausstp

21

21

)2/()2/(

/))0|()1|(( ))0|()1|((

tptptptpSNR

ondistributiGaussian Cumulative );( SNRAUC

21

Page 97: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Likelihood Ratio = Slope of ROC

• The likelihood ratio of the decision variable t is the slope of the ROC curve:

• ROC= TPF(FPF); TPF= 1-P(t|0); FPF= 1-P(t|1)

)L()0|p(

)1|p(

)0|P(

1)| (P

FPF

TPF slope t

t

t

td

td

d

d

Page 98: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

III. Error analysis for CI observers

• Sources of error

• Parametric methods

• Nonparametric methods

• Standard deviations and confidence intervals

• Hazards

Page 99: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Sources of error

• Test error—limited number of samples in the test set

• Training error—limited number of samples in the training set– Incorrect parameters– Incorrect feature selection, etc.

• Human observer error (when applicable)– Intraobserver– Interobserver

Page 100: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Parametric methods

• Use known underlying probability distribution – may be exact for simulated data

• Assume Gaussian distribution

• Other parameterization – e.g., Binomial or ROC linearity in z-transformation coordinates– (-1(TPF) versus -1(FPF), where is the cumulative

Gaussian distribution)

Page 101: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Binomial Estimates of Variance

• For single population measures, f= TPF, FPF, FNF, TNF

• Var(f) = f (1-f) / N

• For AUC (back of envelope calculation)

Var(AUC) = N

AUC)-(1 AUC

Page 102: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Data rich case

• Repeat experiment M times

• Estimate distribution parameters—e.g., for a Gaussian distributed performance measure f, G(,2):

• Find error bars or confidence limits

Mff/ˆˆ 22

ˆ

M

iif f

M 1

22 )ˆ(1

M

iiMf ff

1

1ˆˆ

ff ˆˆˆ

fkf ˆ

ˆ )96.1( %95 k

Page 103: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Example: AUC

• Mean AUC

• “Distribution” variance

• Variance of mean

• Error bars, confidence interval

Page 104: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Probability distribution for calculation of AUC from 40 values

Page 105: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Probability distribution for calculation of SNR from 40 values

Page 106: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

But what’s a poor boy to do?

• Reuse the data you have: Resubstitution, Resampling

• Two common approaches:– Jackknife– Bootstrap

Page 107: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 108: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 109: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 110: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 111: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 112: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 113: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 114: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Resampling

Page 115: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Jackknife

• Have N observations• Leave out m of these, then have M subsets

of the N observations to calculate , 2

• N=10, m=5: M=252; N=10, m=1: M=10

m

NM

Page 116: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Round-robin jackknife bias derivation and variance

• Given N datasets

1

1

1

1

)1(ˆ

)1(

1

)1(

)12(2

)1/(

/

NNJ

NN

NN

N

N

AUCNAUCNCUA

AUCAUCNN

k

NN

NkAUCAUCAUC

NkAUCAUC

NkAUCAUC

21

1)(1

2 )(1

ˆ

N

N

iiNJ AUCAUC

N

N

Page 117: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Fukunaga-Hayes bias derivation

• Divide both the normal and abnormal classes in half, yielding 4 possible pairings

2/

2/

/2

/

NNHF

N

N

AUCAUCCUA

NkAUCAUC

NkAUCAUC

Page 118: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Jackknife bias correction exampleTraining error

• AUC estimates as a function of number of cases N. Solid line is the multilayer perceptron result. Open circle jackknife, closed circle Fukunaga-Hayes. The horizontal dotted line is the asymptotic ideal result

Page 119: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

IV. Bootstrap methods

• Theoretical foundation

• Practical use

Page 120: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Bootstrap variance

• What you have is what you’ve got—the data is your best estimate of the probability distribution:– Sampling with replacement, M times– Adequate number of samples M>N

Simple bootstrap

2

1)(

2 )(1

1ˆ B

M

iiBB AUCAUC

M

Page 121: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Bootstrap and jackknife error estimates

• Standard deviation of AUC: Solid line simulation results, open circles jackknife estimate, closed circles bootstrap estimate. Note how much larger the jackknife error bars are than those provided by the bootstrap method.

Page 122: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Comparison of s.d. estimates:2 Gaussian dist., 20 normal, 20 abnormal, pop. AUC=.936

• Actual s.d. .0380• Binomial approx. .0380• Bootstrap .0388• Jackknife .0396

• Mean bootstrap AUC est. .936• Mean jackknife bias est. 2x10-17

Page 123: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

.632 bootstrap for classifier performance evaluation

• Have N cases, draw M samples of size N with replacement for training (Have on average .632 x N unique cases in each sample of size N)

• Test on the unused (~.368 x N) cases for each sample

Ncasescases

epp

ep

testingtraining

casecase

NNcase

ii

i

...632.1

...632.11

)1(1

11

NN )/11(

N 5 10 20 100 Infinity

.3281-.672

.3491-.651

.3581-.642

.3661-.634

.3681-.632

Page 124: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

.632 bootstrap for classifier performance evaluation 2

• Have N cases, draw M samples of size N with replacement for training (Have on average .632 x N unique cases in each sample of size N)

• Test on the unused (~.368 x N) cases for each sample

• Get bootstrap average result AUCB • Get resubstitution result (testing on training set)

AUCR

• AUC.632 = .632 x AUCB + .368 x AUCR • As variance take the AUCB variance

Page 125: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Traps for the unwary: Overparameterization

• Cover’s theorem:

• For N<2(d+1) a hyperplane exists that will perfectly separate almost all possible dichotomies of N points in d space

.1

2),(0

d

k k

NdNC

Page 126: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

fd(N) for d=1,5,25,125, and the limit of large d. The abscissa x=N/2(d+1) is scaled so that the values of fd(N)=0.5 lie superposed at

x=1 for all d.

Page 127: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Poor data hygiene

• Reporting on training data results/ testing on training data

• Carrying out any part of the training process on data later used for testing– e.g., using all of the data to select a

manageable feature set from among a large number of features—and then dividing the data into training and test sets.

Page 128: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Overestimate of AUC frompoor data hygiene

0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

Area Under the ROC curve

Num

ber

of e

xper

imen

ts (

out

of 9

00)

Distributions of AUC values in 900 simulation experiments (on the left) and the mean ROC curves (on the right) for four validation methods: Method 1 – Feature selection and classifier training on one dataset and classifier testing on another independent dataset; Method 2 – Given perfect feature selection, classifier training on one dataset and classifier testing on another independent dataset; Method 3 – Feature selection using the entire dataset and then the dataset is partitioned into two, one for training and one for testing the classifier; Method 4 – Feature selection, classifier training, and testing using the same dataset.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Fraction

Tru

e P

ositi

ve F

ract

ion

Method 1, AUC=0.52

Method 2, AUC=0.62

Method 3, AUC=0.82

Method 4, AUC=0.91

Page 129: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Correct feature selection is hard to do

100

101

102

103

0

20

40

60

80

100

120

140

160

180

Feature Index

Num

ber

of e

xper

imen

ts (

out

of 9

00)

0 2 4 6 8 10 120

50

100

150

200

250

Number of useful features (out of 30)

Num

ber

of e

xper

imen

ts (

out

of 9

00)

An insight of feature selection performance in Method 1. On the left plots the number of experiments (out of 900) that a feature is selected. By design of the simulation population, the first 30 features are useful for classification and the remaining are useless. On the right plots the distribution of the number of useful features (out of 30) in the 900 experiments.

Page 130: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Conclusions

• Accuracy and other prevalence dependent measures are inadequate

• ROC/AUC provide good measures of performance

• Uncertainty must be quantified

• Bootstrap and jackknife techniques are useful methods

Page 131: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

V. References• [1] K. Fukunaga, Statistical Pattern Recognition, 2nd Edition. Boston: Harcourt Brace Jovanovich, 1990.• [2] K. Fukunaga and R. R. Hayes, “Effects of sample size in classifier design,” IEEE Trans. Pattern Anal. Machine Intell.,

vol. PAMI-11, pp. 873–885, 1989.• [3] D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics. New York: John Wiley & Sons, 1966.• [4] J. P. Egan, Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975.• [5] C. E. Metz, “Basic principles of roc analysis,” Seminars in Nuclear Medicine, vol. VIII, no. 4.• [6] H. H. Barrett and K. J. Myers, Foundations of Image Science. Hoboken: John Wiley & Sons, 2004, ch. 13 Statistical

Decision Theory.• [7] B. Efron and R. J. Tibshirani, Introduction to the Bootstrap. Boca Raton: Chapman & Hall/CRC, 1993.• [8] B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied

Mathematics, 1982.• [9] A. C. Davison and D. V. Hinkley, Bootstrap Methods and their Applications. Cambridge: Cambridge University Press,

1997.• [10] B. Efron, “Estimating the error rate of a prediction rule: Some improvements on cross-validation,” Journal of the

American Statistical Association, vol. 78, pp. 316–331, 1983.• [11] B. Efron and R. J. Tibshirani, “Improvements on cross-validation: The .632+ bootstrap method,” Journal of the

American Statistical Association, vol. 92, no. 438, pp. 548–560, 1997.• [12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 3rd Edition. New York: Springer,

2009.• [13] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.• [14] ——, Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995.• [15] R. F. Wagner, D. G. Brown, J.-P. Guedon, K. J. Myers, and K. A. Wear, “Multivariate Guassian pattern classification:

effects of finite sample size and the addition of correlated or noisy features on summary measures of goodness,” in Information processing in Medical Imaging, Proceedings of IPMI ’93, 1993, pp. 507–524.

• [16] ——, “On combining a few diagnostic tests or features,” in Proceedings of the SPIE, Image Processing, vol. 2167, 1994.

• [17] D. G. Brown, A. C. Schneider, M. P. Anderson, and R. F. Wagner, “Effects of finite sample size and correlated noisy input features on neural network pattern classification,” in Proceedings of the SPIE, Image Processing, vol. 2167, 1994.

• [18] C. A. Beam, “Analysis of clustered data in receiver operating characteristic studies,” Statistical Methods in Medical Research, vol. 7, pp. 324–336, 1998.

• [19] W. A. Yousef, et al. “Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1809-1817, 2006

• [19] F. W. Samuelson and D. G. Brown, “Application of cover’s theorem to the evaluation of the performance of CI observers,” in Proceedings of the IJCNN 2011, 2011.

• [20] W. Chen and D. G. Brown, “Optimistic bias in the assessment of high dimensional classifiers with a limited dataset,” in Proceedings of the IJCNN 2011, 2011.

Page 132: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Appendix I

Page 133: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Searching suitcases

Page 134: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 135: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 136: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 137: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 138: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 139: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 140: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 141: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 142: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 143: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 144: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,

Previous class results

Page 145: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 146: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 147: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 148: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,
Page 149: Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic,