ChungS166projectreport

7/31/2019 ChungS166projectreport

1/17

The R project for Comparisons of Several

Multivariate Means

Chu-yu Chung Hang Du Yi Su Xiangmin Zhang

December 7, 2009

Abstract

Comparisons of multivariate means involve hypothesis testing, con-structing simultaneous confidence intervals (SCI) and decomposing vari-ances under certain condition. In this project, we write five individual Rfunctions to perform such tasks, including paired comparison, a repeatedmeasure design for comparing treatments, comparing mean vectors fromtwo multivariate population and comparing several multivariate popula-tion means. Our R functions are designed to largely facilitate the compu-tation and to produce as much information as needed in practice.

1 Introduction

Multivariate hypothesis testing is different from univariate testing in manyways. It makes use of multivariate normal assumption, which is more ap-propriate in many practical settings. It allows many possible alternatives.

The advantages of multivariate testing include preserving -value andtesting with a greater power.In section 1.1 to 1.5, we introduce how to formulate the testing whencomparing multivariate means. Either critical value for rejection or si-multaneous confidence interval is presented. The notations are adaptedfrom Chapter 6 of Applied Multivariate Statistical Analysis (6th. ed.) byJohnson R. A. and Wichern D. W.

1.1 Paired Comparison

Paired comparison is used to analyze measurements under different setsof experimental conditions to estimate if the responses differ significantlywithin these sets.

In multivariate paired comparison procedure, we label the responses as

X111 (variable 1 under treatment 1 in the first unit), . . ., X2np (variablep under treatment 2 in the nth unit) to denote between p responses,two treatments, and n experimental unites, hence the p paired-differencerandom variables under jth unit become

1


2/17


3/17

Assume all the population follows Nq (,x). Let C be a contrastmatrix. An level test of H0:C = 0 (equal treatment means) versus

H1:C = 0 is:Reject H0 if

T2 = n (Cx)

CxC

1

(Cx) >(n 1) (q 1)

(n q + 1)Fq1,nq+1 () (9)

where Fq1,nq+1 is the upper (1 )th percentile of an F-distribution

with q1 and nq+1 d.f., x = 1n

nj=1

xj and S =1

n1

nj=1

(xj x) (xj x)

The 100(1)% simultaneous confidence intervals for a single contrast

c

for any contrast vectors of interest are

c

x

(n1)(q1)

nq+1Fq1,nq+1 ()

cScn

1.3 Comparison of Two Multivariate PopulationMeans

We are going to compare the responses from one set of experimental set-tings (population 1) with independent response from another set of ex-perimental settings (population 2) in this part. IfX11,X12, . . . ,X1n1 isa random sample of size n1 from Np (1,) and X21,X22, . . . ,X2n2 isan independent random sample of size n2 from Np (2,), the likelihoodratio test of

H0 : 1 2 = 0 (10)

then

T2 =X1 X2 (1 2)

1n1 +

1

n2Spooled

1 X1 X2 (1 2)

(11)

is distributed as

(n1 + n2 2)p

n1 + n2 p 1Fp,n1+n2p1 (12)

where

Spooled =n1 1

n1 + n2 1S1 +

n2 1

n1 + n2 1S2 (13)

and (n1 1)S1 is distributed as Wn11 () and (n2 1)S2 is distributedas Wn21 ().

The 100(1 )% simultaneous confidence interval for 1i 2i is

X1i X2i

c

1n1

+ 1n2

sii,pooled (14)

where c2 = (n1+n22)pn1+n2p1

Fp,n1+n2p1

3


4/17

1.4 Comparison of Several Multivariate Popula-tion Means

MANOVA is a synthesis of analysis output for multivariate analysis. It is ageneralized form of univariate analysis of variance (ANOVA). MANOVAtable is used to identify sum of treatment effects and sum of residuals.We will not delve into details here. A complete explanation of variancedecomposition and summary tables of MANOVA can be found in Chapter6 of Applied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A.and Wichern D. W.

1.5 Treatment Effect Comparison

In treatment efffect comparison, we first test if the treatment effects arethe same. When the hypothesis of equal treatment effects is rejected, wewill construct simultaneous confidence intervals for the components of the

differences of vector means. Treatment effect comparison is closed relatedto MANOVA. Again, a complete discussion can be found in Chapter 6 ofApplied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A. andWichern D. W.

2 Examples

For illustrative purpose, we run our functions on several datasets, all ofwhich accompany Chapter 6 of the book Applied Multivariate StatisticalAnalysis(6th. ed.).


2.1.1 Example 1 (T6-1.dat)

Sample x11j x12j x21j x22j1 6 27 25 15

2 6 23 28 13

3 18 64 36 22

4 8 44 35 29

5 11 30 15 31

6 34 75 44 64

7 28 26 42 30

8 71 124 54 64

9 43 54 34 56

10 33 30 29 20

11 20 14 39 21

Table 1: T6-1.dat

4


5/17

In the above table, the first two colunms are from treatment 1 and thelast two columms are from treatment 2.

The R output from running our function paired on this dataset is asfollows,

reject null hypothesis, nonzero mean difference exists

T Squared Based Simultaneous CI for difference

Estimate LowerCI UpperCI

1 -9.363636 -22.453272 3.726000

2 13.272727 -5.700119 32.245574

Bonferroni Based Simultaneous CI for difference


1 -9.363636 -20.573107 1.845835

2 13.272727 -2.974903 29.520358

2.2 Repeated Measure Design Comparison


Sample x1 x2 x3 gender

1 426 609 556 600

2 253 236 392 395

3 359 433 349 357

4 432 431 522 600

5 405 426 513 513

6 324 438 507 539

7 310 312 410 4568 326 326 350 504

9 375 447 547 548

10 286 286 403 422

11 349 382 473 497

12 429 410 488 547

13 348 377 447 514

14 412 473 472 446

15 347 326 455 468

16 434 458 637 524

17 364 367 432 469

18 420 395 508 531

19 397 556 645 625

Table 2: T6-2.dat

In the above table, each column represent data from an individualtreatment.

5


6/17

The R output from running our function repmeasure on this datasetis as follows,

reject null hypothesis of equal treatment means

contrast matrix

[,1] [,2] [,3] [,4]

[1,] -1 1 -1 1

[2,] -1 -1 1 1

[3,] -1 1 1 -1

Simultaneous CI for contrasts


1 -206.32812 -282.19953 -130.4567

2 -306.92188 -415.73637 -198.1074

3 22.42188 -31.82305 76.6668



Sample x1 x2 x3 gender

1 98 81 38 female

2 103 84 38 female

. . . . .

. . . . .

. . . . .

23 162 124 61 female

24 177 132 67 female25 93 74 37 male

26 94 78 35 male

. . . . .

. . . . .

. . . . .

47 131 95 46 male

48 135 106 47 male

Table 3: T6-9.dat

In the above table, the first 24 rows are data from population one(gender = female), and the last 24 rows are data from population two(

gender = male).

The R output from running our function twopop on this dataset is asfollows,

6


7/17

mean vector of population one

4.900659 4.622909 3.940286

mean vector of population two

4.725444 4.477574 3.703186

reject equality of mean vectors

The coeffcient of the linear combination

of most responsible for rejection is

-43.72677 -8.710687 67.54641

T Squared Based Simultaneous CI for the difference


1 0.1752157 0.05776762 0.2926638

2 0.1453352 0.05411666 0.23655373 0.2371000 0.12906223 0.3451377

Bonferroni Based Simultaneous CI for the difference


1 0.1752157 0.07702893 0.2734025

2 0.1453352 0.06907636 0.2215940

3 0.2371000 0.14678026 0.3274197


Sample x1 x2 x3 x4 gender

1 0.34 3.71 2.87 30.87 male2 0.39 5.08 3.38 43.85 male

. . . . . .

. . . . . .

. . . . . .

24 0.34 4.27 4.00 50.35 male

25 0.40 4.58 2.82 32.48 male

26 0.29 5.04 1.93 33.85 female

27 0.28 3.95 2.51 35.82 female

. . . . . .

. . . . . .

. . . . . .

49 0.37 5.23 2.48 34.86 female

50 0.35 5.37 2.25 35.07 female

Table 4: T6-12.dat

In the above table, the first 25 rows are data from population one(

7


8/17

gender = male), and the last 25 rows are data from population two(gender = female).

The R output from running our function twopop on this dataset is asfollows,

mean vector of population one

0.3136 5.1788 2.3152 38.1548

mean vector of population two

0.3972 5.3296 3.6876 49.4204

reject equality of mean vectors

The coeffcient of the linear combination

of most responsible for rejection is

-99.39898 6.375999 6.228141 -0.7908238

T Squared Based Simultaneous CI for the difference


1 -0.0836 -0.1697234 0.002523361

2 -0.1508 -1.4650835 1.163483457

3 -1.3724 -1.8760572 -0.868742824

4 -11.2656 -17.1438597 -5.387340281

Bonferroni Based Simultaneous CI for the difference


1 -0.0836 -0.1509852 -0.01621484

2 -0.1508 -1.1791296 0.87752962

3 -1.3724 -1.7664745 -0.978325504 -11.2656 -15.8649035 -6.66629645



Continue Example 4, we now demonstrate results of doing MANOVA onthe same dataset T6-12.dat. The R output from running our functionMANOVA is as follows,

Overall mean vector

[,1] [,2] [,3] [,4]

[1,] 132.7528 133.3146 98.19101 50.46067

Treatment sample size

[,1] [,2] [,3]

[1,] 29 30 30

8


9/17

Treatment effect matrix

[,1] [,2] [,3][1,] -1.3734986 -0.3861423 1.7138577

[2,] 0.1336691 -0.6146067 0.4853933

[3,] 1.3262301 0.8756554 -2.1576779

[4,] 0.1255327 -0.2273408 0.1059925

One-Way MANOVA Table

Treatment SS&CP matrix

[,1] [,2] [,3] [,4]

[1,] 147.300878 26.752383 -173.908098 3.083107

[2,] 26.752383 18.918597 -42.424177 6.221813

[3,] -173.908098 -42.424177 213.678096 -8.005024

[4,] 3.083107 6.221813 -8.005024 2.344543

Error SS&CP matrix

[,1] [,2] [,3] [,4]

[1,] 1785.2609 174.1690 125.11034 289.05172

[2,] 174.1690 1904.2724 225.07586 178.87931

[3,] 125.1103 225.0759 2046.07471 -17.82644

[4,] 289.0517 178.8793 -17.82644 837.76782

Total SS&CP matrix

[,1] [,2] [,3] [,4]

[1,] 1932.56180 200.9213 -48.79775 292.13483

[2,] 200.92135 1923.1910 182.65169 185.10112

[3,] -48.79775 182.6517 2259.75281 -25.83146

[4,] 292.13483 185.1011 -25.83146 840.11236

Degrees of Freedom

Treatment Error Total

1 2 80 88

Bonferroni Based Simultaneous CI for Treatments Difference

Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI

1 1 -1 0 -0.987 -4.451 2.476

2 1 0 -1 -3.087 -6.551 0.376

3 0 1 -1 -2.100 -5.563 1.363

4 1 -1 0 0.748 -2.829 4.325

5 1 0 -1 -0.352 -3.929 3.225

6 0 1 -1 -1.100 -4.677 2.477

7 1 -1 0 0.451 -3.257 4.158

8 1 0 -1 3.484 -0.224 7.1919 0 1 -1 3.033 -0.674 6.741

10 1 -1 0 0.353 -2.020 2.725

11 1 0 -1 0.020 -2.353 2.392

12 0 1 -1 -0.333 -2.706 2.039

9


10/17


Sample x1 x2 x3 x4 Group

1 131 138 89 49 1

2 125 131 92 48 1

. . . . . .

. . . . . .

. . . . . .

29 131 136 114 54 1

30 124 138 101 46 1

31 124 138 101 48 2

32 133 134 97 48 2

. . . . . .

. . . . . .

. . . . . .59 135 132 98 54 2

60 130 128 101 51 2

61 137 141 96 52 3

62 129 133 93 47 3

. . . . . .

. . . . . .

. . . . . .

89 138 133 100 55 3

90 138 133 91 46 3

Table 5: T6-13.dat

In the above table, the first 30 rows are data from group 1, the next30 rows are data from group 2 and the last 30 rows are from data fromgroup 3.The R output from running our function MANOVA on this dataset is asfollows,

Overall mean vector

[,1] [,2] [,3] [,4]

[1,] 0.3554 5.2542 3.0014 43.7876

Treatment sample size

[,1] [,2]

[1,] 25 25

Treatment effect matrix

[,1] [,2]

[1,] -0.0418 0.0418

[2,] -0.0754 0.0754

[3,] -0.6862 0.6862

[4,] -5.6328 5.6328

10


11/17

One-Way MANOVA Table

Treatment SS&CP matrix[,1] [,2] [,3] [,4]

[1,] 0.087362 0.157586 1.434158 11.77255

[2,] 0.157586 0.284258 2.586974 21.23566

[3,] 1.434158 2.586974 23.543522 193.26137

[4,] 11.772552 21.235656 193.261368 1586.42179

Error SS&CP matrix

[,1] [,2] [,3] [,4]

[1,] 0.404480 5.378180 0.854764 4.328096

[2,] 5.378180 94.196160 2.597532 113.078548

[3,] 0.854764 2.597532 13.833280 105.750500

[4,] 4.328096 113.078548 105.750500 1884.311320

Total SS&CP matrix[,1] [,2] [,3] [,4]

[1,] 0.491842 5.535766 2.288922 16.10065

[2,] 5.535766 94.480418 5.184506 134.31420

[3,] 2.288922 5.184506 37.376802 299.01187

[4,] 16.100648 134.314204 299.011868 3470.73311

Degrees of Freedom

Treatment Error Total

1 1 46 49


Trt.1 Trt.2 Estimate LowerCI UpperCI

1 1 -1 -0.084 -0.151 -0.016

2 1 -1 -0.151 -1.179 0.8783 1 -1 -1.372 -1.766 -0.978

4 1 -1 -11.266 -15.865 -6.666

2.5 Treatment Effect Comparison

Example 7 and Example 8 use dataset T6-12.dat and T6-13.dat (the sameas in the last subsection).


The R output from running our function trt.effect is as follows,


Trt.1 Trt.2 Estimate LowerCI UpperCI

1 1 -1 -0.084 -0.151 -0.016

2 1 -1 -0.151 -1.179 0.878

3 1 -1 -1.372 -1.766 -0.978

4 1 -1 -11.266 -15.865 -6.666

11


12/17


The R output from running our function trt.effect is as follows,Bonferroni Based Simultaneous CI for Treatments Difference

Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI

1 1 -1 0 -1.000 -4.442 2.442

2 1 0 -1 -3.100 -6.542 0.342

3 0 1 -1 -2.100 -5.542 1.342

4 1 -1 0 0.900 -2.674 4.474

5 1 0 -1 -0.200 -3.774 3.374

6 0 1 -1 -1.100 -4.674 2.474

7 1 -1 0 0.100 -3.680 3.880

8 1 0 -1 3.133 -0.647 6.913

9 0 1 -1 3.033 -0.747 6.813

10 1 -1 0 0.300 -2.061 2.661

11 1 0 -1 -0.033 -2.395 2.328

12 0 1 -1 -0.333 -2.695 2.028

3 Appendix (R code)


In this part, x1 is a np numeric matrix or dataframe of data of responsesunder treatment 1 where n is number of experimental unit and p is numberof responses; x2 is a np numeric matrix or dataframe of data of responsesunder treatment 2 where n is number of experimental unit and p is numberof responses,and the input level is the confidence level of interval.

paired


13/17

1) * sqrt(s[i, i]/n)

}

scit


14/17


In this part, x1 is a n1 p numeric matrix or dataframe of data frompopulation one where n1 is sample size and p is the number of responses.x2 is a n2 p numeric matrix or dataframe of data from population twowhere n2 is sample size and p is the number of responses. The input levelis the confidence level of interval.

twopop


15/17

2], UpperCI = scib[, 3])

cat("\n\n T Squared Based Simultaneous CI for the difference \n")

print(scit)cat("\n\n Bonferroni Based Simultaneous CI for the difference \n")

print(scib)

}


In this part, Y is an Np numeric matrix or dataframe of data where Nis total sample size and p is number of variables. X is an N 1 numericmatrix or dataframe of data where N is total sample size; the input levelis the confidence level of interval. C is the contrast used to test treatment-effects differences.

MANOVA


16/17

prob


17/17

treatment-effects differences.

trt.effect

ChungS166projectreport

Documents

Transcript of ChungS166projectreport