Multiple Comparisons, Outliers

23
Multiple Comparisons, Outliers Arthur Berg Pennsylvania State University

Transcript of Multiple Comparisons, Outliers

Page 1: Multiple Comparisons, Outliers

Multiple Comparisons, Outliers

Arthur BergPennsylvania State University

Page 2: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

The difficult and ubiquitous problems of multiplicities

Most scientists are oblivious to the problems of multiplicities. Yetthey are everywhere. In one or more of its forms, multiplicities arepresent in every statistical application. They may be out in theopen or hidden. And even if they are out in the open, recognizingthem is but the first step in a difficult process of inference.Problems of multiplicities are the most difficult that we statisticiansface. They threaten the validity of every statistical conclusion.

D. Berry (2007)

Arthur Berg Multiple Comparisons, Outliers 2 / 23

Page 3: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

I Analyzing data without a plan

I Multiple time points – sequential analysis

I Multiple subgroups

I Combining groups

I Post-hoc analyses: coincidences and disease clusters

Arthur Berg Multiple Comparisons, Outliers 3 / 23

Page 4: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> set.seed(2)

> x <- rnorm(3)

> y <- rnorm(3)

> t.test(x, y)

Welch Two Sample t-test

data: x and y

t = 0.7959, df = 3.084, p-value = 0.4828

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.913428 3.216086

sample estimates:

mean of x mean of y

0.2919267 -0.3594024

Arthur Berg Multiple Comparisons, Outliers 4 / 23

Page 5: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> x <- append(x, rnorm(1))

> y <- append(y, rnorm(1))

> p <- t.test(x, y)$p.val

> p

[1] 0.2772826

Arthur Berg Multiple Comparisons, Outliers 5 / 23

Page 6: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> counter <- 0

> while (p > 0.05) {

x <- append(x, rnorm(1))

y <- append(y, rnorm(1))

p <- t.test(x, y)$p.val

counter <- counter + 1

}

> counter

[1] 4

> length(x)

[1] 8

> p

[1] 0.04106854

Arthur Berg Multiple Comparisons, Outliers 6 / 23

Page 7: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> counter <- 0

> while (p > 0.01) {

x <- append(x, rnorm(1))

y <- append(y, rnorm(1))

p <- t.test(x, y)$p.val

counter <- counter + 1

}

> counter

[1] 3

> length(x)

[1] 11

> p

[1] 0.005338842

Arthur Berg Multiple Comparisons, Outliers 7 / 23

Page 8: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

Arthur Berg Multiple Comparisons, Outliers 8 / 23

Page 9: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

20 Questions

Arthur Berg Multiple Comparisons, Outliers 9 / 23

Page 10: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

20 Questions (cont.)

> d <- read.csv("genetics.csv")

> p <- rep(NA, 16)

> for (i in 1:16) {

p[i] <- prop.test(c(d[i, 2], d[i, 3]), c(9,

11))$p.val

}

> p

[1] 0.1944598 0.7415373 0.9178719 1.0000000 0.5401940

[6] 0.6192568 1.0000000 0.1944598 0.6192568 1.0000000

[11] 0.2392036 0.1614038 0.7415373 0.2846443 1.0000000

[16] 1.0000000

Arthur Berg Multiple Comparisons, Outliers 10 / 23

Page 11: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

Normality Tests

fBasics

dagoTest D’Agostino

jarqueberaTest Jarque–Bera

shapiroTest Shapiro-Wilk

ksnormTest Kolmogorov-Smirnov

nordtest

adTest Anderson–Darling

cvmTest Cramer-von Mises

lillieTest Lilliefors

pchiTest Pearson chi-square

sfTest Shapiro–Francia

Arthur Berg Multiple Comparisons, Outliers 11 / 23

Page 12: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> set.seed(1)

> x <- round(-rnorm(20), 3)

> hist(x)Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2

01

23

45

Arthur Berg Multiple Comparisons, Outliers 12 / 23

Page 13: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> library(fBasics)

> dagoTest(x)

Title:

D'Agostino Normality Test

Test Results:

STATISTIC:

Chi2 | Omnibus: 3.923

Z3 | Skewness: 1.5728

Z4 | Kurtosis: 1.2039

P VALUE:

Omnibus Test: 0.1406

Skewness Test: 0.1158

Kurtosis Test: 0.2286

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 13 / 23

Page 14: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> jarqueberaTest(x)

Title:

Jarque - Bera Normalality Test

Test Results:

STATISTIC:

X-squared: 2.0773

P VALUE:

Asymptotic p Value: 0.3539

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 14 / 23

Page 15: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> shapiroTest(x)

Title:

Shapiro - Wilk Normality Test

Test Results:

STATISTIC:

W: 0.9532

P VALUE:

0.4188

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 15 / 23

Page 16: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> ksnormTest(x)

Title:

One-sample Kolmogorov-Smirnov test

Test Results:

STATISTIC:

D: 0.1821

P VALUE:

Alternative Two-Sided: 0.4670

Alternative Less: 0.8524

Alternative Greater: 0.2363

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 16 / 23

Page 17: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> library(nortest)

> print(adTest(x))

Title:

Anderson - Darling Normality Test

Test Results:

STATISTIC:

A: 0.2919

P VALUE:

0.57

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 17 / 23

Page 18: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> cvmTest(x)

Title:

Cramer - von Mises Normality Test

Test Results:

STATISTIC:

W: 0.0403

P VALUE:

0.6583

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 18 / 23

Page 19: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> lillieTest(x)

Title:

Lilliefors (KS) Normality Test

Test Results:

STATISTIC:

D: 0.1107

P VALUE:

0.7513

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 19 / 23

Page 20: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> pchiTest(x)

Title:

Pearson Chi-Square Normality Test

Test Results:

PARAMETER:

Number of Classes: 7

STATISTIC:

P: 1.7

P VALUE:

Adhusted: 0.7907

Not adjusted: 0.9451

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 20 / 23

Page 21: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> sfTest(x)

Title:

Shapiro - Francia Normality Test

Test Results:

STATISTIC:

W: 0.9475

P VALUE:

0.2813

Description:

Thu Feb 24 16:01:28 2011 by user:

Arthur Berg Multiple Comparisons, Outliers 21 / 23

Page 22: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> library(outliers)

> chisq.out.test(x)

chi-squared test for outlier

data: x

X-squared = 6.9387, p-value = 0.008435

alternative hypothesis: highest value 2.215 is an outlier

> dixon.test(x)

Dixon test for outliers

data: x

Q = 0.4177, p-value = 0.1605

alternative hypothesis: highest value 2.215 is an outlier

Arthur Berg Multiple Comparisons, Outliers 22 / 23

Page 23: Multiple Comparisons, Outliers

Multiple Comparisons Traps sequential trap GWAS outliers

> grubbs.test(x)

Grubbs test for one outlier

data: x

G = 2.6341, U = 0.6156, p-value = 0.03544

alternative hypothesis: highest value 2.215 is an outlier

Arthur Berg Multiple Comparisons, Outliers 23 / 23