Section II Descriptive stats for continuous data
description
Transcript of Section II Descriptive stats for continuous data
![Page 1: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/1.jpg)
Section IIDescriptive stats for continuous
dataDescriptive stats for binary data
and bivariate associations in binary data
1
![Page 2: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/2.jpg)
Types of dataNumerical:
Continuous-age, SBP,glucoseInterval-parity, num infections
Ordinal (ranks)Cancer stage, Apgar score
Nominal (no order)Gender, ethnicity, treatment
2
![Page 3: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/3.jpg)
Dataset used to illustrate some statistics in this section
Stomach cancer survival times in controls (Cameron & Pauling, PNAS, Oct 1976)Days from end of treatment to death
4, 6, 8, 8, 12, 14, 15, 17, 19, 22, 24, 34,45 n= 13 subjects
3
![Page 4: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/4.jpg)
Measures of central tendency (middle)Data: 4, 6, 8, 8,12, 14, 15, 17, 19, 22, 24, 34,45
mean = 17.5 days median = 15 days
mode = 8 daysGeometric mean-GM= 13√4x6x8x8x…x45=14.25
If we delete the most extreme value, 45, mean is now 15.24, median is 14.5, GM=13,
median changes least
4
![Page 5: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/5.jpg)
Mean versus Median (lesson #1 in how to lie with statistics)Yearly income data from n=11 persons, one income is for Dr Brilliant, the other
10 incomes from her 10 graduate students Yearly income in dollars
950 960 970 980 990 1010 1020 1030 1040 1050 $100,000
$110,000 (total) mean = 110,000/11 = $10,000, median = 1010 (the sixth ordered value) Which is better summary of “typical” value?
5
![Page 6: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/6.jpg)
Example - Survival times in women with advanced Breast Cancer Survival time in days after end of radiotherapy woman after 275 days f/u after 305 days f/u 1 14 14 2 26 26 3 43 43 4 45 45 5 50 50 6 58 58 7 60 60 8 62 62 9 70 70 10 70 70 11 83 83 12 98* 128* 13 104* 134* 14 124* 154* 15 125* 155* 16 275* 305*
mean 75.6 83.1 median 66.0 66.0 SD 55.8 66.3 * still alive (censored)
The median is still a valid measure when less than half the data are censored. 6
![Page 7: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/7.jpg)
Cumulative frequencies & survival
num pct cum cum pct cum pctDays dead dead dead dead alive=S 1-10 4 30.8 4 30.8 69.211-20 5 38.5 9 69.2 30.821-30 2 15.4 11 84.6 15.431-40 1 7.7 12 92.3 7.741-50 1 7.7 13 100.0 0 total 13
7
![Page 8: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/8.jpg)
Stomach cancer survival time in days
8
day cum dead Cum incidence survival4 1 7.7% 92.3%6 2 15.4% 84.6%8 4 30.8% 69.2%
12 5 38.5% 61.5%14 6 46.2% 53.8%15 7 53.8% 46.2%17 8 61.5% 38.5%19 9 69.2% 30.8%22 10 76.9% 23.1%24 11 84.6% 15.4%34 12 92.3% 7.7%45 13 100.0% 0.0%
![Page 9: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/9.jpg)
9
0 6 12 18 24 30 36 42 48 540%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%Stomach cancer cum incidence & survival
cum incidence
survival
days
![Page 10: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/10.jpg)
Bevacizumab & Ovarian CancerBerger et. al. NEJM Dec 2011
10
![Page 11: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/11.jpg)
Why survival curves?
0%10%20%30%40%50%60%70%80%90%
100%
0 1 2 3 4 5 6 7 8 9 10
day
pct a
live
11
![Page 12: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/12.jpg)
Summarizing mortality – hazard rates Hazard rate = h =
number of persons with outcome total person-time follow up in all at risk
This is a rate per person-time. It is NOT a probability (not a risk)
In stomach cancer n=13, with 13 deaths, total follow up is 4+6+8+8+12+14+15+17+19+22+24+34+45
= 228 person-days
Hazard rate = mortality rate = 13/228 = 0.057 or 5.7 deaths per 100 person-days of
follow up. Do NOT report as 5.7%-wrong
12
![Page 13: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/13.jpg)
Example: Why hazard rates?
Group n num dead mean f/u total f/u rate per 1000 A 100 7 36 3600 7/3600=1.94 B 100 2 3 300 2/300 =6.66
Mortality rate is higher for B than A even though the number of persons in each group is the same and more people died in group A.
The hazard rate ratio for A/B is 1.94/6.66=0.291.
When ALL patients are followed to the endpoint, (no censoring) mean time to event= 1/hazard.
13
![Page 14: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/14.jpg)
Hazard rates & survival curvesSurvival
0%
20%
40%
60%
80%
100%
0 2 4 6 8 10 12t
S 0.2 0.4hazard rate
log Survival
-6.0
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
0 2 4 6 8 10 12t
log(
S)
0.2 0.4
hazard rate
loge(S) = cum haz= h t, h is (average) slope of loge(S) vs t
14
![Page 15: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/15.jpg)
Hazard rate ratios & Survival curves
ha = hazard rate in group A hb = hazard rate in group B, hazard rate ratio, (HR) for A compared to B is HR = ha/hb
If HR is constant over time one can compute the Survival in group A from the Survival in group B.
Sa = SbHR
Ex: HR=0.291, S at t=12 mos is 90% in group B, S=0.900.291 = 0.970 or 97.0% in group A at t=12 months.
A “protective” HR < 1 increases survival. HR >1 decreases survival.
15
![Page 16: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/16.jpg)
Cumulative hazard rate
16
Loge(S)=Cumulative hazard = Σt hi = ∫ h(t) dt
If h is constant over timeCumulative hazard = h T where T is the
follow up time. In this case, h = cum hazard/T h is the slope of the cum hazard vs t plot.
![Page 17: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/17.jpg)
From: Risks and Benefits of Estrogen Plus Progestin in Healthy Postmenopausal Women: Principal Results From the Women's Health Initiative Randomized Controlled Trial
JAMA. 2002;288(3):321-333.
HR indicates hazard ratio; nCI, nominal confidence interval; andaCI, adjusted confidence interval. Global index = first occurrence of CHD, cancer, stroke, pulmonary embolism, hip fracture or death.
![Page 18: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/18.jpg)
Distribution skewnessLong right tailed distribution median < mean (common for survival data)
0
1
0 9
median
mean
18
![Page 19: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/19.jpg)
Example: ICU length of stay(Howard)
n=94, mean=11.3 days, median= 6 daysmin=1 day, max=80 days
19
'
6 18 30 42 54 66 78
0
10
20
30
40
50
60
70
80
Per
cent
LOS_ICU
![Page 20: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/20.jpg)
SkewnessLong left tailed distribution median > mean
(not as common in biology/medicine)
0
1
0 9
median
mean
20
![Page 21: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/21.jpg)
Symmetric(common in biology)
0.00
0.40
-3.5 3.5
mean
median
Can be symmetric without being bell curve shaped – has one mode When data has a skewed distribution, must use “non parametric” methods
21
![Page 22: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/22.jpg)
Measures of variation, spreadIQR – interquartile range
0
0.1
0.2
0.3
0.4
0.5
0 2 4 6 8 10 12 14 16 18 20Q1 Q3median
25% 25%25% 25%
22
![Page 23: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/23.jpg)
Box-whisker plot
0 10 20 30 40 50
min max
Q1 Q3median
mean
23
![Page 24: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/24.jpg)
Variation-Variance & SD _ Mean = Y= 17.54 days _ _ Y Y-Y (Y-Y)2
4 -13.54 183.3 6 -11.54 133.2 8 -9.54 91.0 8 -9.54 91.0 12 -5.54 30.7 14 -3.54 12.5 15 -2.54 6.5 17 -0.54 0.3 19 1.46 2.1 22 4.46 19.9 24 6.46 41.7 34 16.46 270.9 45 27.46 754.1sum 0 1637.2
_ Variance = (Yi - Y)2
(n-1)
Var=1637.2/12=136.4
SD=√Variance=√136.4=
11.6 days
24
![Page 25: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/25.jpg)
Variation- Interpreting the SDRule of thumb from Gaussian (“Normal”) theory
(will study more shortly) rule ok if data has unimodel symmetric distribution
Range of middle 2/3 of the data: mean +/- SD
Range of middle 95% of the data:mean +/- 2 SD
Implies SD ≈ range/4 (after extreme values removed from range)
25
![Page 26: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/26.jpg)
SD of differences-paired datachol in mmol/L
person chol at start chol at end difference 1 12.6 10.0 2.6 2 8.5 7.5 1.0 3 7.0 5.8 1.2 4 6.9 4.9 2.0 5 5.8 4.0 1.8 6 4.1 3.8 0.3
mean 7.48 6.00 1.48 SD 2.90 2.38 0.82
Corr of start vs end: r=0.971 26
![Page 27: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/27.jpg)
If authors only report (mmol/L) start end change??mean 7.48 6.00 SD 2.90 2.38 Easy to get mean difference=7.48 – 6.00=1.48
But can’t get SD of differences2.90 - 2.38 = 0.52 ≠ 0.82
The 1.48 mean diff is average responseThe 0.82 diff SD is variation in response.
SDdiff= √ SD2start +SD2
end – 2 r SDstart SDend
r= correlation coeff
27
![Page 28: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/28.jpg)
SD of differencestwo independent groups
Comparing ages in groups A vs B group A group B
30 5035 5177 5541
n 4 3 B - A B + Amean 45.75 52.00 6.25 97.75
SD 18.46 2.16 18.58 18.58Var 340.69 4.67 345.35 345.35
Data->
28
![Page 29: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/29.jpg)
All possible differences, B-A
50 51 5530 20 21 2535 15 16 2077 -27 -26 -2241 9 10 14
mean 6.25SD 18.58
Var 345.35
All possible sums, B+A
50 51 5530 80 81 8535 85 86 9077 127 128 13241 91 92 96
mean 97.75
SD 18.58Var 345.35
29
![Page 30: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/30.jpg)
Rule for SD of differencestwo independent groups
Var(Y - X) = Var(Y) + Var(X)Var(Y + X) = Var(Y) + Var(X)
SD(Y-X)= √ SD2(Y) + SD2(X) SD(Y+X)=√ SD2(Y) + SD2(X)
SD(X)
SD(Y)SD(Y-X)
30
![Page 31: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/31.jpg)
BINARY DATAStatistics
31
![Page 32: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/32.jpg)
Associations for Binary datadisease No disease total
Exposed (e) a b a+b
Unexposed(u) c d c+d
risk=P odds=O
Pe= a/(a+b) Oe= a/b
Pu = c/(c+d) Ou= c/d
RR =Pe/Pu OR= Oe/Ou32
![Page 33: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/33.jpg)
Risk vs OddsP=risk, O=odds
O=P/(1-P), P=O/(1+O)P=1/10, O=1/9.
Risk=num sick/totalOdds=num sick/num not sick
RR = OR/(1 – Pu + OR Pu)When Pu is small,
RR=OR In general, OR is more extreme than RR
33
![Page 34: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/34.jpg)
diseaseno
disease risk odds
exposed 50 950 1000 0.050 0.053
unexposed 200 8550 8750 0.0228 0.0234
250 9500 9750
OC use (P) 20% 10% RR OR
2.188 2.250
Oral Contraceptive exposure vs CancerProspective study (unbiased est of pop)
34
![Page 35: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/35.jpg)
Ratios and differences
For rare events or diseases Pe=1/10,000, Pu= 1/100,000
RR = 10, risk difference = 9/100,000Misleading to only report ratio and not actual
risks.
35
![Page 36: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/36.jpg)
Odds-case control studycancer No cancer
OC 100 5no OC 400 45
500 50
OC use (P) 20% 10%
Odds (O) 0.25 0.11OR 2.25
36
![Page 37: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/37.jpg)
Why use ORs?1.In prospective study, usually quote disease risk &
risk ratio (RR). In case-control, we always quote OR, not RR. Case-control OR of exposure in disease/no disease
Equals Prospective OR of disease in exposed/unexposedin population if the probability of exposure is same
as in the target population.(Not necessarily true if there is confounding, bias).
2. OR more “stable” (universal) across studies. If unexposed risk=20%, RR=2, exposed risk=40%If unexposed risk=60%, RR can’t be 2.
37
![Page 38: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/38.jpg)
Independence rule for ORsORs for heart attack (MI) For smokers/non smoker: OR = 4 For alcohol/no alcohol: OR = 2
If independent, OR for those who smoke AND drink alcohol is 4 x 2 = 8 (relative to
no smoke, no alcohol). Only true if smoking, drinking are
independent influences on MI. However, smoking & drinking can be correlated with
each other. 38
![Page 39: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/39.jpg)
NNT – number needed to treat (or harm)(clinical trials)
Pc (like Pu)=prop w/ disease in control groupPt (like Pe)=prop w/ disease in treat group
ARR=absolute risk reduction= risk difference= RD=Pc-Pt
RRR=Relative risk reduction=(Pc-Pt)/Pc = ARR/Pc=1-RR
NNT=number needed to treat=1/ARR
39
![Page 40: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/40.jpg)
NNT ExamplePc=0.36=36%, Pt=0.34=34%
ARR=RD=0.02=2%RRR=0.2/0.36 = 5.5% (a percent of a percent)
NNT = 1/0.02 = 50
So 50 patients must be given the treatment to cure one additional disease case.Can be extended to more complex stats.
40
![Page 41: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/41.jpg)
NNT–Ovarian Ca screening“Tests commonly recommended to screen healthy
women for ovarian cancer do more harm than good and should not be performed, a panel of medical experts said on Monday. The screenings —blood tests for a substance linked to cancer and ultrasound scans to examine the ovaries — do not lower the death rate from the disease, and they yield many false-positive results that lead to unnecessary operations with high complication rates, the panel said.
…“To find one case of ovarian cancer, 20 women
had to undergo surgery. “ (NY Times–10 Sept 2012)
![Page 42: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/42.jpg)
Summary-Ratios Risk Odds Hazard P O h
Ratio: RR=Pe/Pu OR=Oe/Ou HR=he/hu
All have the null value of 1.0 when there is no association. The distribution of the logs of their ratios from study to study are usually bell curve shaped around the true log scale value.
42
![Page 43: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/43.jpg)
True-disease True-No diseaseTest-positive a b
Test-negative c dTotal a+c b+d
Sensitivity and Specificity
Sensitivity=a/(a+c), false negative=c/(a+c)
Specificity=d/(b+d), false positive=b/(b+d)
Positive predictive value=PPV=a/(a+b) *
Negative predictive value=NPV=d/(c+d) ** Depends on disease prevalence-not just attribute of test
43
![Page 44: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/44.jpg)
Sensitivity, Specificity, Accuracy
Accuracy = W Sensitivity + (1-W) Specificity where 0 < W < 1.
Often W=0.5 (unweighted accuracy)
We wish to maximize accuracy=minimize misclassification = 1- Accuracy
Choose W depending on “costs”.
44
![Page 45: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/45.jpg)
0.00
0.10
0.20
0.30
0.40
0.50
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Y
Y
ROC curve–choose continuous data cutpoint (threshold) for highest accuracy, best “separation”
45
![Page 46: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/46.jpg)
“Modern” format for ROC
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Yc =threshold (cutpoint)
sensitivity
specificity
accuracy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Yc =threshold (cutpoint)
unw
eigh
ed a
ccur
acy
Highest accuracy is NOT necessarily where sens=spec,
(only when SD1=SD2) 46
![Page 47: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/47.jpg)
“Traditional” ROC(not recommended-hard to label cutpoints)
traditional ROC
0%10%20%30%40%50%60%70%80%90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
false pos=1-spec
sens
47
![Page 48: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/48.jpg)
C (concordance) statistic for ROCC = area under the “traditional” ROC curve
0.5 (bad) < C < 1.0 (good)If nd=a+c true num w/disease
nnd=b+d true num w/o disease From all possible nd x nnd pairs with one
diseased and one not, call a pair “concordant” if diseased is positive and
non diseased is negative. C is the proportion of the pairs that are
concordant. 48
![Page 49: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/49.jpg)
Positive and Negative predictive valuePositive predictive value (PPV) & negative predictive value (NPV)
depend on sensitivity (sens), specificity (spec) & disease prevalence (P).
Sensitivity and specificity do NOT depend on disease prevalence.
Can only compute PPV=a/(a+b) & NPV=d/(c+d) when disease prevalence P = (a+c)/(a+b+c+d) = (a+c)/n
Bayes formulas for PPV and NPV
Let P = prevalence of disease
PPV = test true pos/ (test true pos + test false pos) = sens x P / [ sens x P + (1- spec) x (1- P) ]
NPV = test true neg/ (test true neg + test false neg) = spec x (1-P) / [ spec x (1-P) + (1-sens) x P ]
But don’t use these formulas – there is an easier way49
![Page 50: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/50.jpg)
Exampledisease no disease Total
Test positive 95 20 115Test negative 5 1980 1985
Total 100 2000 2100 Sens = 95/100=0.95, Spec= 1980/2000 = 0.99,
Disease prevalence=P = 100/2100 = 0.0476
PPV = (0.95 x 0.0476) / [ 0.95 x 0.0476 + 0.01 x 0.9524 ] = 0.826 PPV = 95/115=0.826
NPV = (0.99 x 0.9524) / [0.99 x 0.9524 + 0.05 x 0.0476] = 0.9974
NPV = 1980/1985 = 0.9974
50
![Page 51: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/51.jpg)
Bayesian “paradigm” for PPVOdds of disease Probability of disease
Prior 100/2000=0.05 100/2100=0.0476=4.76%
Positive test “data” Likelihood ratio (LR)
Sensitivity/false pos=0.95/0.01=95
(not applicable)
Posterior given positive test=
Prior x LR=PPV
0.05 x 95 = 4.75 4.75/(1+4.75)=0.826=82.6%
51
LR=Prob(+ test | disease)/Prob(+test | no disease)Posterior odds = Prior odds x LR Bayes: Prior data Posterior
Prior probability is updated with data (LR) to get a posterior probability (PPV)
![Page 52: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/52.jpg)
Bayes paradigm (algebra)Prior -> (test) data -> Posterior
52
Disease No disease Total
Test positive a b a+b
Test negative c d c+d
total a+c b+d n
Prior disease risk=(a+c)/n n=a+b+c+dPrior disease odds= (a+c)/(b+d)
Test Data:LR positive test = Sens/ false pos =[a/(a+c)]/[ b/(b+d)] = RR=LR
Posterior odds disease = Prior odds x LR pos test = a/bPosterior disease risk = a/(a+b) = PPV
![Page 53: Section II Descriptive stats for continuous data](https://reader035.fdocuments.net/reader035/viewer/2022062501/568165c1550346895dd8ca4b/html5/thumbnails/53.jpg)
Ex: FASTER Trial(NEJM 353:19, 10 Nov 2005)
53
Prior odds of Down’s syndrome (varies with gestational age)
↓ LR from biochemical markers
(& other factors/data) ↓
Posterior odds of Downs syndrome