Multiple Comparisons, Outliers
Transcript of Multiple Comparisons, Outliers
Multiple Comparisons, Outliers
Arthur BergPennsylvania State University
Multiple Comparisons Traps sequential trap GWAS outliers
The difficult and ubiquitous problems of multiplicities
Most scientists are oblivious to the problems of multiplicities. Yetthey are everywhere. In one or more of its forms, multiplicities arepresent in every statistical application. They may be out in theopen or hidden. And even if they are out in the open, recognizingthem is but the first step in a difficult process of inference.Problems of multiplicities are the most difficult that we statisticiansface. They threaten the validity of every statistical conclusion.
D. Berry (2007)
Arthur Berg Multiple Comparisons, Outliers 2 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
I Analyzing data without a plan
I Multiple time points – sequential analysis
I Multiple subgroups
I Combining groups
I Post-hoc analyses: coincidences and disease clusters
Arthur Berg Multiple Comparisons, Outliers 3 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> set.seed(2)
> x <- rnorm(3)
> y <- rnorm(3)
> t.test(x, y)
Welch Two Sample t-test
data: x and y
t = 0.7959, df = 3.084, p-value = 0.4828
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.913428 3.216086
sample estimates:
mean of x mean of y
0.2919267 -0.3594024
Arthur Berg Multiple Comparisons, Outliers 4 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> x <- append(x, rnorm(1))
> y <- append(y, rnorm(1))
> p <- t.test(x, y)$p.val
> p
[1] 0.2772826
Arthur Berg Multiple Comparisons, Outliers 5 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> counter <- 0
> while (p > 0.05) {
x <- append(x, rnorm(1))
y <- append(y, rnorm(1))
p <- t.test(x, y)$p.val
counter <- counter + 1
}
> counter
[1] 4
> length(x)
[1] 8
> p
[1] 0.04106854
Arthur Berg Multiple Comparisons, Outliers 6 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> counter <- 0
> while (p > 0.01) {
x <- append(x, rnorm(1))
y <- append(y, rnorm(1))
p <- t.test(x, y)$p.val
counter <- counter + 1
}
> counter
[1] 3
> length(x)
[1] 11
> p
[1] 0.005338842
Arthur Berg Multiple Comparisons, Outliers 7 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
Arthur Berg Multiple Comparisons, Outliers 8 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
20 Questions
Arthur Berg Multiple Comparisons, Outliers 9 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
20 Questions (cont.)
> d <- read.csv("genetics.csv")
> p <- rep(NA, 16)
> for (i in 1:16) {
p[i] <- prop.test(c(d[i, 2], d[i, 3]), c(9,
11))$p.val
}
> p
[1] 0.1944598 0.7415373 0.9178719 1.0000000 0.5401940
[6] 0.6192568 1.0000000 0.1944598 0.6192568 1.0000000
[11] 0.2392036 0.1614038 0.7415373 0.2846443 1.0000000
[16] 1.0000000
Arthur Berg Multiple Comparisons, Outliers 10 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
Normality Tests
fBasics
dagoTest D’Agostino
jarqueberaTest Jarque–Bera
shapiroTest Shapiro-Wilk
ksnormTest Kolmogorov-Smirnov
nordtest
adTest Anderson–Darling
cvmTest Cramer-von Mises
lillieTest Lilliefors
pchiTest Pearson chi-square
sfTest Shapiro–Francia
Arthur Berg Multiple Comparisons, Outliers 11 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> set.seed(1)
> x <- round(-rnorm(20), 3)
> hist(x)Histogram of x
x
Fre
quen
cy
−2 −1 0 1 2
01
23
45
Arthur Berg Multiple Comparisons, Outliers 12 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> library(fBasics)
> dagoTest(x)
Title:
D'Agostino Normality Test
Test Results:
STATISTIC:
Chi2 | Omnibus: 3.923
Z3 | Skewness: 1.5728
Z4 | Kurtosis: 1.2039
P VALUE:
Omnibus Test: 0.1406
Skewness Test: 0.1158
Kurtosis Test: 0.2286
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 13 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> jarqueberaTest(x)
Title:
Jarque - Bera Normalality Test
Test Results:
STATISTIC:
X-squared: 2.0773
P VALUE:
Asymptotic p Value: 0.3539
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 14 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> shapiroTest(x)
Title:
Shapiro - Wilk Normality Test
Test Results:
STATISTIC:
W: 0.9532
P VALUE:
0.4188
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 15 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> ksnormTest(x)
Title:
One-sample Kolmogorov-Smirnov test
Test Results:
STATISTIC:
D: 0.1821
P VALUE:
Alternative Two-Sided: 0.4670
Alternative Less: 0.8524
Alternative Greater: 0.2363
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 16 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> library(nortest)
> print(adTest(x))
Title:
Anderson - Darling Normality Test
Test Results:
STATISTIC:
A: 0.2919
P VALUE:
0.57
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 17 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> cvmTest(x)
Title:
Cramer - von Mises Normality Test
Test Results:
STATISTIC:
W: 0.0403
P VALUE:
0.6583
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 18 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> lillieTest(x)
Title:
Lilliefors (KS) Normality Test
Test Results:
STATISTIC:
D: 0.1107
P VALUE:
0.7513
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 19 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> pchiTest(x)
Title:
Pearson Chi-Square Normality Test
Test Results:
PARAMETER:
Number of Classes: 7
STATISTIC:
P: 1.7
P VALUE:
Adhusted: 0.7907
Not adjusted: 0.9451
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 20 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> sfTest(x)
Title:
Shapiro - Francia Normality Test
Test Results:
STATISTIC:
W: 0.9475
P VALUE:
0.2813
Description:
Thu Feb 24 16:01:28 2011 by user:
Arthur Berg Multiple Comparisons, Outliers 21 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> library(outliers)
> chisq.out.test(x)
chi-squared test for outlier
data: x
X-squared = 6.9387, p-value = 0.008435
alternative hypothesis: highest value 2.215 is an outlier
> dixon.test(x)
Dixon test for outliers
data: x
Q = 0.4177, p-value = 0.1605
alternative hypothesis: highest value 2.215 is an outlier
Arthur Berg Multiple Comparisons, Outliers 22 / 23
Multiple Comparisons Traps sequential trap GWAS outliers
> grubbs.test(x)
Grubbs test for one outlier
data: x
G = 2.6341, U = 0.6156, p-value = 0.03544
alternative hypothesis: highest value 2.215 is an outlier
Arthur Berg Multiple Comparisons, Outliers 23 / 23