Multiple Comparisons, Outliers

Multiple Comparisons, Outliers

Arthur BergPennsylvania State University

Multiple Comparisons Traps sequential trap GWAS outliers

The difficult and ubiquitous problems of multiplicities

Most scientists are oblivious to the problems of multiplicities. Yetthey are everywhere. In one or more of its forms, multiplicities arepresent in every statistical application. They may be out in theopen or hidden. And even if they are out in the open, recognizingthem is but the first step in a difficult process of inference.Problems of multiplicities are the most difficult that we statisticiansface. They threaten the validity of every statistical conclusion.

D. Berry (2007)

Arthur Berg Multiple Comparisons, Outliers 2 / 23


I Analyzing data without a plan

I Multiple time points – sequential analysis

I Multiple subgroups

I Combining groups

I Post-hoc analyses: coincidences and disease clusters



> set.seed(2)

> x <- rnorm(3)

> y <- rnorm(3)

> t.test(x, y)

Welch Two Sample t-test

data: x and y

t = 0.7959, df = 3.084, p-value = 0.4828

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.913428 3.216086

sample estimates:

mean of x mean of y

0.2919267 -0.3594024



> x <- append(x, rnorm(1))

> y <- append(y, rnorm(1))

> p <- t.test(x, y)$p.val

> p

[1] 0.2772826



> counter <- 0

> while (p > 0.05) {

x <- append(x, rnorm(1))

y <- append(y, rnorm(1))

p <- t.test(x, y)$p.val

counter <- counter + 1

}

> counter

[1] 4

> length(x)

[1] 8

> p

[1] 0.04106854



> counter <- 0

> while (p > 0.01) {

x <- append(x, rnorm(1))

y <- append(y, rnorm(1))

p <- t.test(x, y)$p.val

counter <- counter + 1

}

> counter

[1] 3

> length(x)

[1] 11

> p

[1] 0.005338842



20 Questions



20 Questions (cont.)

> d <- read.csv("genetics.csv")

> p <- rep(NA, 16)

> for (i in 1:16) {

p[i] <- prop.test(c(d[i, 2], d[i, 3]), c(9,

11))$p.val

}

> p

[1] 0.1944598 0.7415373 0.9178719 1.0000000 0.5401940

[6] 0.6192568 1.0000000 0.1944598 0.6192568 1.0000000

[11] 0.2392036 0.1614038 0.7415373 0.2846443 1.0000000

[16] 1.0000000



Normality Tests

fBasics

dagoTest D’Agostino

jarqueberaTest Jarque–Bera

shapiroTest Shapiro-Wilk

ksnormTest Kolmogorov-Smirnov

nordtest

adTest Anderson–Darling

cvmTest Cramer-von Mises

lillieTest Lilliefors

pchiTest Pearson chi-square

sfTest Shapiro–Francia



> set.seed(1)

> x <- round(-rnorm(20), 3)

> hist(x)Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2

01

23

45



> library(fBasics)

> dagoTest(x)

Title:

D'Agostino Normality Test

Test Results:

STATISTIC:

Chi2 | Omnibus: 3.923

Z3 | Skewness: 1.5728

Z4 | Kurtosis: 1.2039

P VALUE:

Omnibus Test: 0.1406

Skewness Test: 0.1158

Kurtosis Test: 0.2286

Description:

Thu Feb 24 16:01:28 2011 by user:



> jarqueberaTest(x)

Title:

Jarque - Bera Normalality Test

Test Results:

STATISTIC:

X-squared: 2.0773

P VALUE:

Asymptotic p Value: 0.3539

Description:

Thu Feb 24 16:01:28 2011 by user:



> shapiroTest(x)

Title:

Shapiro - Wilk Normality Test

Test Results:

STATISTIC:

W: 0.9532

P VALUE:

0.4188

Description:

Thu Feb 24 16:01:28 2011 by user:



> ksnormTest(x)

Title:

One-sample Kolmogorov-Smirnov test

Test Results:

STATISTIC:

D: 0.1821

P VALUE:

Alternative Two-Sided: 0.4670

Alternative Less: 0.8524

Alternative Greater: 0.2363

Description:

Thu Feb 24 16:01:28 2011 by user:



> library(nortest)

> print(adTest(x))

Title:

Anderson - Darling Normality Test

Test Results:

STATISTIC:

A: 0.2919

P VALUE:

0.57

Description:

Thu Feb 24 16:01:28 2011 by user:



> cvmTest(x)

Title:

Cramer - von Mises Normality Test

Test Results:

STATISTIC:

W: 0.0403

P VALUE:

0.6583

Description:

Thu Feb 24 16:01:28 2011 by user:



> lillieTest(x)

Title:

Lilliefors (KS) Normality Test

Test Results:

STATISTIC:

D: 0.1107

P VALUE:

0.7513

Description:

Thu Feb 24 16:01:28 2011 by user:



> pchiTest(x)

Title:

Pearson Chi-Square Normality Test

Test Results:

PARAMETER:

Number of Classes: 7

STATISTIC:

P: 1.7

P VALUE:

Adhusted: 0.7907

Not adjusted: 0.9451

Description:

Thu Feb 24 16:01:28 2011 by user:



> sfTest(x)

Title:

Shapiro - Francia Normality Test

Test Results:

STATISTIC:

W: 0.9475

P VALUE:

0.2813

Description:

Thu Feb 24 16:01:28 2011 by user:



> library(outliers)

> chisq.out.test(x)

chi-squared test for outlier

data: x

X-squared = 6.9387, p-value = 0.008435

alternative hypothesis: highest value 2.215 is an outlier

> dixon.test(x)

Dixon test for outliers

data: x

Q = 0.4177, p-value = 0.1605




> grubbs.test(x)

Grubbs test for one outlier

data: x

G = 2.6341, U = 0.6156, p-value = 0.03544



Multiple Comparisons, Outliers

Documents

Transcript of Multiple Comparisons, Outliers