Principles of Diagnostic Testing and ROC 2016

Post on 15-Apr-2017

374 views 1 download

Transcript of Principles of Diagnostic Testing and ROC 2016

Principles of Diagnostic TestingStatistics for Research

William F. Auffermann, MD/PhDDepartment of Radiology and Imaging Sciences

Emory University School of Medicine

Learning Objectives

• Provide an overview of the basic statistical concepts needed to critically appraise and perform research

Diagnostic Testing

• Diagnostic tests are designed to answer specific medical questions.

• When there is concern for a medical disease, appropriate diagnostic testing can be used to better risk stratify patients

• The probability of a disease after testing is a function of both pre-test probability and the results of the test.

Diagnostic Testing

• Diagnostic testing may be thought of as a way of refining the estimate for the probability of a patient having a particular disease.

• Understanding the principles of diagnostic testing requires an understanding of probability and statistics.

Probability and StatisticsTwo Sides of the Same Coin

• Probability: assumes you know the underlying laws of a process, and can be used to predict outcomes

• Statistics: used to compare data with theory/model and look at how well they agree

Hypotheses

Hypothesis

• A proposed explanation for a phenomenon‡• A key aspect of diagnostic testing and

statistics is formulation of a good hypothesis

‡ http://en.wikipedia.org/wiki/Hypothesis Accessed 2014-11-13

Hypothesis

• Hypothesis are often paired with their logical opposite

• The null hypothesis (H0) is considered the default hypothesis

• The alternative hypothesis (HA) its logical complement

Hypothesis

• H0: the medication does not reduce blood pressure

• HA: the medication does reduce blood pressure

Hypothesis

• Hypotheses should address the question of interest and be testable

• Clear statement of the hypothesis is critical for appropriate statistical testing

Hypothesis

• H0: mean blood pressure in treatment group the same as control group (MBP2 = MBP1)

• HA: mean blood pressure in treatment group lower than the control group (MBP2 < MBP1)

Probability

Probability

• Probability relates to the likelihood of a particular event occurring

• There is an assumption we know the laws governing the behavior of the process being examined

• For example if we have a fair coin where the probability of heads/tails are both 0.5 (equal), then we can estimate the probability of flipping a coin and obtaining: HHTH

Pre/Post Test Probability

• Diagnostic testing is useful as it effects the post test probability of a diagnosis.

• Diagnostic testing which does not significantly effect the post test probability may not be clinically useful

Pre/Post Test Probability

• Let ‘p’ represent the probability of a disease and ‘t’ the results of a diagnostic test

p2 = LR(t) * p1• Where p1 and p2 are the pre and post test

probabilities respectively, and LR(t) is the likelihood ratio for the test.

• LR(t) gives probability values for both positive and negative results.

Pre/Post Test Probability

p2 = LR(t) * p1

Fagan nomogramhttp://http://mcmasterevidence.wordpress.c

om/2013/02/20/what-are-pre-test-probability-post-test-probability-and-

likelihood-ratios/Accessed 2014-11-13

V/Q Scan

• Consider a patient with symptoms concerning for pulmonary embolism.

• Based on the patients clinical symptoms, we can risk stratify them for probability of pulmonary embolism, corresponding to the pretest probability (p1)

V/Q Scan

• A V/Q test is performed to better risk stratify the patient.

• The various patterns of findings on V/Q scan correlate with the probability of pulmonary embolism

V/Q Scan

• The post-test probability is derived from both the pretest probability and the results of the test.

V/Q Scan

p(pretest)p(test) 0.2 0.42 0.8

0.1 0.2 0.060.19 0.04 0.16 0.40.5 0.16 0.28 0.660.8 0.56 0.88 0.96

http://www.auntminnie.com/index.aspx?sec=ser&sub=def&pag=dis&ItemID=54625Pretest for Well’s Scores; Posttest for VQ

Accessed 2014-11-13

V/Q Scan

J Nucl Med 2013; 54:1–5

Pre/Post Test Probability

p2 = LR(t) * p1

http://www.healthknowledge.org.uk/public-health-textbook/disease-causation-diagnostic/2c-diagnosis-screening/ratios

Accessed 2014-11-13

Probability Distributions

Probability Distribution

• A probability distribution function gives the probability of a certain value as a function of value

p(x)

x

Probability Distributions

Probability Distributions

• There are several different probability distributions

• Different physical and biological phenomena can be modeled using different distributions

• One of the most common naturally occurring distribution is the normal (Gaussian) distribution

Normal Distribution

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Probability Distributions

• Based on the knowledge of a probability distribution, it is possible to estimate the probability of observing a range of values

Probability Distributions

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Probability Distributions

• When performing or evaluation research it is very important that the data being modeled can actually be represented by the proposed distribution

• Graphical displays of data can be helpful to confirm this is true (frequency polygon, histogram)

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Statistics

Statistics

The science that deals with the collection, classification, analysis, and interpretation of numerical facts or data, and that, by use of mathematical theories of probability, imposes order and regularity on aggregates of more or less disparate elements.

http://dictionary.reference.com/Accessed 2014-11-13

Why Does Statistics Matter?

• Statistics provides a means of summarizing a data set and making inferential statements

• Appropriate application can highlight important aspects of the data

• Incorrect application can be confusing at best, and misleading at worst

• Statistics do not ‘lie’, but they may be misleading

Statistic

• A mathematical summary of a data set• Examples include the mean (-), median (-),

mode (-), standard deviation

Statistic

0 5 10 15 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

mean (-), median (-), mode (-)

Gama(2,3)

Freq

uenc

y

Statistic

• The selection of a statistic for representing data should be based on the nature of the process underlying the observations

• The statistic should be based on the model which best represents the data

Statistics

• Qualitative: specific summary measures of the data (statistics) may provide greater clarity than the data set as a whole.

• Quantitative: Based on the underlying theory of the process being measured, inferential statements may be made regarding whether the data and theory agree

Example - Qualitative

-4 -3 -2 -1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Example - Qualitative

-4 -3 -2 -1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

mean(x1) mean(x2)

Quantitative

• Based on known properties of the statistical test in question and the distribution of the data, it is possible to make statements of the significance a result

Example - Qualitative

-4 -3 -2 -1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

mean(x1) mean(x2)

P-values

• A p-value is the probability that a value from the proposed distribution is the same as or farther from the expected value than the observed value.

P-values

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

P-values

• The lower the p-value, the less likely that the observed statistical value can be explained by the model under H0

P-values

• Assume you want to know if a coin is a fair coin (equal probability of H/T after flipping)

• You flip the coin 100 times and get H 60 times. Is the coin fair?

P-values

0 10 20 30 40 50 60 70 80 90 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

pdf

Observed Value

Area under curve = p-value = 0.0176

P-values

• By convention, p-values less than or equal to 0.05 are generally considered statistically significant

• Note that other thresholds can and are used• Type I error (often denoted by α ) is the

probability of rejecting the null hypothesis based on the result of a test if H0 is in fact true.

Multiple Comparisons

Multiple Comparisons

• P-values give the probability of an value at least as extreme as the one observed for a single test.

• What happens if there are multiple tests? • Does this affect our decision to consider p-

values less than 0.05 statistically significant?

Multiple Comparisons

• Consider we are looking at a set of anti-hypertensive medications for effect on blood pressure

• A p-value of 0.05 corresponds to a 1/20 probability

Multiple Comparisons

• If we examine 20 medications, we would expect 1 to have a p-value of 0.05 or lower by chance alone even if there were no therapeutic effect

Multiple Comparisons

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

-5 0 50

0.5

1

Multiple ComparisonsMean T P-value

-0.0996 -0.6684 0.7461-0.1300 -0.9387 0.8232-0.1740 -1.0768 0.85600.0172 0.1023 0.45950.2228 1.4224 0.0813-0.0330 -0.2339 0.5919-0.0737 -0.4641 0.67740.3357 2.6773 0.00540.0493 0.3540 0.36260.1828 1.3001 0.1005-0.0341 -0.1953 0.57690.3751 2.5683 0.00700.1226 0.6835 0.24910.0789 0.6016 0.27540.0108 0.0631 0.4750-0.1832 -1.1043 0.8620-0.1618 -1.0581 0.85180.0209 0.1269 0.4498-0.1519 -1.1910 0.8797-0.1685 -1.2920 0.8981

α = 0.05

Multiple Comparisons

• It is possible to correct for multiple comparisons

• There are several ways to perform this correction

• Several are dependant on knowledge of the correlation between variables

Bonferroni Correction

• A conservative correction assuming each test is independent

• The threshold for significance if changed to the overall desired significance (often 0.05) / number of comparisons

• New threshold = 0.05/20 = 0.0025

Bonferroni Correction

• This correction adjusts the type I error such that there is α overall probability of a positive result for any test if H0 is true (across all tests).

Multiple ComparisonsMean T P-value

-0.0996 -0.6684 0.7461-0.1300 -0.9387 0.8232-0.1740 -1.0768 0.85600.0172 0.1023 0.45950.2228 1.4224 0.0813-0.0330 -0.2339 0.5919-0.0737 -0.4641 0.67740.3357 2.6773 0.00540.0493 0.3540 0.36260.1828 1.3001 0.1005-0.0341 -0.1953 0.57690.3751 2.5683 0.00700.1226 0.6835 0.24910.0789 0.6016 0.27540.0108 0.0631 0.4750-0.1832 -1.1043 0.8620-0.1618 -1.0581 0.85180.0209 0.1269 0.4498-0.1519 -1.1910 0.8797-0.1685 -1.2920 0.8981

α = 0.0025

Diagnostic Testing

Diagnostic Testing

• Diagnostic tests are designed to answer specific medical questions.

• When there is concern for a medical disease, appropriate diagnostic testing can be used to better risk stratify patients

• Recognize that diagnostic tests are not perfect, and even the best may misclassify patients.

Confusion Table

Test Prediction Positive

Test Prediction Positive

Actual Positive TP FNActual Negative FP TN

Confusion Table Derivations

• Sensitivity = TP / (TP + FN)• Specificity = TN / (FP + TN)• Positive Predictive Value • PPV = TP / (TP + FP)• Negative Predictive Value• NPV = TN / (TN + FN)

Prediction Positive

Prediction Positive

Actual Positive TP FNActual Negative FP TN

Confusion Table Derivations

• Sensitivity = the probability of a positive case being marked positive

• Specificity = the probability of a negative case being marked negative

• PPV = The probability of a positive test result being positive

• NPV = The probability of a negative test result being negative

Confusion Table Derivations

• Sensitivity • Specificity

• PPV • NPV

Not effected by prevalence of disease in a population

Effected by prevalence of disease in a population

Sensitivity and Specificity

• Diagnostic Testing is a compromise between sensitivity and specificity

• Most tests offer a compromise between these two measures

• Very often two or more tests may complement each other (one may be high sensitivity, the other may be high specificity)

Sensitivity and Specificity

• Sensitive tests: useful for screening, test usually negative if disease is absent

• Specific tests: useful for confirming a diagnosis, test usually positive if disease is present

Diagnostic Testing

• It is important to note that there are instances where diagnostic testing will not significantly alter the posttest probability relative to the pretest probability.

Diagnostic Testing

• Diagnostic testing may be less useful in instances of very low or very high probability.

• Diagnostic tests may be thought of as most useful in instances of intermediate probability.

V/Q Scan

J Nucl Med 2013; 54:1–5

Questions?