ChungS166projectreport
-
Upload
paul-sandoval-gutierrez -
Category
Documents
-
view
217 -
download
0
Transcript of ChungS166projectreport
-
7/31/2019 ChungS166projectreport
1/17
The R project for Comparisons of Several
Multivariate Means
Chu-yu Chung Hang Du Yi Su Xiangmin Zhang
December 7, 2009
Abstract
Comparisons of multivariate means involve hypothesis testing, con-structing simultaneous confidence intervals (SCI) and decomposing vari-ances under certain condition. In this project, we write five individual Rfunctions to perform such tasks, including paired comparison, a repeatedmeasure design for comparing treatments, comparing mean vectors fromtwo multivariate population and comparing several multivariate popula-tion means. Our R functions are designed to largely facilitate the compu-tation and to produce as much information as needed in practice.
1 Introduction
Multivariate hypothesis testing is different from univariate testing in manyways. It makes use of multivariate normal assumption, which is more ap-propriate in many practical settings. It allows many possible alternatives.
The advantages of multivariate testing include preserving -value andtesting with a greater power.In section 1.1 to 1.5, we introduce how to formulate the testing whencomparing multivariate means. Either critical value for rejection or si-multaneous confidence interval is presented. The notations are adaptedfrom Chapter 6 of Applied Multivariate Statistical Analysis (6th. ed.) byJohnson R. A. and Wichern D. W.
1.1 Paired Comparison
Paired comparison is used to analyze measurements under different setsof experimental conditions to estimate if the responses differ significantlywithin these sets.
In multivariate paired comparison procedure, we label the responses as
X111 (variable 1 under treatment 1 in the first unit), . . ., X2np (variablep under treatment 2 in the nth unit) to denote between p responses,two treatments, and n experimental unites, hence the p paired-differencerandom variables under jth unit become
1
-
7/31/2019 ChungS166projectreport
2/17
-
7/31/2019 ChungS166projectreport
3/17
Assume all the population follows Nq (,x). Let C be a contrastmatrix. An level test of H0:C = 0 (equal treatment means) versus
H1:C = 0 is:Reject H0 if
T2 = n (Cx)
CxC
1
(Cx) >(n 1) (q 1)
(n q + 1)Fq1,nq+1 () (9)
where Fq1,nq+1 is the upper (1 )th percentile of an F-distribution
with q1 and nq+1 d.f., x = 1n
nj=1
xj and S =1
n1
nj=1
(xj x) (xj x)
The 100(1)% simultaneous confidence intervals for a single contrast
c
for any contrast vectors of interest are
c
x
(n1)(q1)
nq+1Fq1,nq+1 ()
cScn
1.3 Comparison of Two Multivariate PopulationMeans
We are going to compare the responses from one set of experimental set-tings (population 1) with independent response from another set of ex-perimental settings (population 2) in this part. IfX11,X12, . . . ,X1n1 isa random sample of size n1 from Np (1,) and X21,X22, . . . ,X2n2 isan independent random sample of size n2 from Np (2,), the likelihoodratio test of
H0 : 1 2 = 0 (10)
then
T2 =X1 X2 (1 2)
1n1 +
1
n2Spooled
1 X1 X2 (1 2)
(11)
is distributed as
(n1 + n2 2)p
n1 + n2 p 1Fp,n1+n2p1 (12)
where
Spooled =n1 1
n1 + n2 1S1 +
n2 1
n1 + n2 1S2 (13)
and (n1 1)S1 is distributed as Wn11 () and (n2 1)S2 is distributedas Wn21 ().
The 100(1 )% simultaneous confidence interval for 1i 2i is
X1i X2i
c
1n1
+ 1n2
sii,pooled (14)
where c2 = (n1+n22)pn1+n2p1
Fp,n1+n2p1
3
-
7/31/2019 ChungS166projectreport
4/17
1.4 Comparison of Several Multivariate Popula-tion Means
MANOVA is a synthesis of analysis output for multivariate analysis. It is ageneralized form of univariate analysis of variance (ANOVA). MANOVAtable is used to identify sum of treatment effects and sum of residuals.We will not delve into details here. A complete explanation of variancedecomposition and summary tables of MANOVA can be found in Chapter6 of Applied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A.and Wichern D. W.
1.5 Treatment Effect Comparison
In treatment efffect comparison, we first test if the treatment effects arethe same. When the hypothesis of equal treatment effects is rejected, wewill construct simultaneous confidence intervals for the components of the
differences of vector means. Treatment effect comparison is closed relatedto MANOVA. Again, a complete discussion can be found in Chapter 6 ofApplied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A. andWichern D. W.
2 Examples
For illustrative purpose, we run our functions on several datasets, all ofwhich accompany Chapter 6 of the book Applied Multivariate StatisticalAnalysis(6th. ed.).
2.1 Paired Comparison
2.1.1 Example 1 (T6-1.dat)
Sample x11j x12j x21j x22j1 6 27 25 15
2 6 23 28 13
3 18 64 36 22
4 8 44 35 29
5 11 30 15 31
6 34 75 44 64
7 28 26 42 30
8 71 124 54 64
9 43 54 34 56
10 33 30 29 20
11 20 14 39 21
Table 1: T6-1.dat
4
-
7/31/2019 ChungS166projectreport
5/17
In the above table, the first two colunms are from treatment 1 and thelast two columms are from treatment 2.
The R output from running our function paired on this dataset is asfollows,
reject null hypothesis, nonzero mean difference exists
T Squared Based Simultaneous CI for difference
Estimate LowerCI UpperCI
1 -9.363636 -22.453272 3.726000
2 13.272727 -5.700119 32.245574
Bonferroni Based Simultaneous CI for difference
Estimate LowerCI UpperCI
1 -9.363636 -20.573107 1.845835
2 13.272727 -2.974903 29.520358
2.2 Repeated Measure Design Comparison
2.2.1 Example 2 (T6-2.dat)
Sample x1 x2 x3 gender
1 426 609 556 600
2 253 236 392 395
3 359 433 349 357
4 432 431 522 600
5 405 426 513 513
6 324 438 507 539
7 310 312 410 4568 326 326 350 504
9 375 447 547 548
10 286 286 403 422
11 349 382 473 497
12 429 410 488 547
13 348 377 447 514
14 412 473 472 446
15 347 326 455 468
16 434 458 637 524
17 364 367 432 469
18 420 395 508 531
19 397 556 645 625
Table 2: T6-2.dat
In the above table, each column represent data from an individualtreatment.
5
-
7/31/2019 ChungS166projectreport
6/17
The R output from running our function repmeasure on this datasetis as follows,
reject null hypothesis of equal treatment means
contrast matrix
[,1] [,2] [,3] [,4]
[1,] -1 1 -1 1
[2,] -1 -1 1 1
[3,] -1 1 1 -1
Simultaneous CI for contrasts
Estimate LowerCI UpperCI
1 -206.32812 -282.19953 -130.4567
2 -306.92188 -415.73637 -198.1074
3 22.42188 -31.82305 76.6668
2.3 Comparison of Two Multivariate PopulationMeans
2.3.1 Example 3 (T6-9.dat)
Sample x1 x2 x3 gender
1 98 81 38 female
2 103 84 38 female
. . . . .
. . . . .
. . . . .
23 162 124 61 female
24 177 132 67 female25 93 74 37 male
26 94 78 35 male
. . . . .
. . . . .
. . . . .
47 131 95 46 male
48 135 106 47 male
Table 3: T6-9.dat
In the above table, the first 24 rows are data from population one(gender = female), and the last 24 rows are data from population two(
gender = male).
The R output from running our function twopop on this dataset is asfollows,
6
-
7/31/2019 ChungS166projectreport
7/17
mean vector of population one
4.900659 4.622909 3.940286
mean vector of population two
4.725444 4.477574 3.703186
reject equality of mean vectors
The coeffcient of the linear combination
of most responsible for rejection is
-43.72677 -8.710687 67.54641
T Squared Based Simultaneous CI for the difference
Estimate LowerCI UpperCI
1 0.1752157 0.05776762 0.2926638
2 0.1453352 0.05411666 0.23655373 0.2371000 0.12906223 0.3451377
Bonferroni Based Simultaneous CI for the difference
Estimate LowerCI UpperCI
1 0.1752157 0.07702893 0.2734025
2 0.1453352 0.06907636 0.2215940
3 0.2371000 0.14678026 0.3274197
2.3.2 Example 4 (T6-12.dat)
Sample x1 x2 x3 x4 gender
1 0.34 3.71 2.87 30.87 male2 0.39 5.08 3.38 43.85 male
. . . . . .
. . . . . .
. . . . . .
24 0.34 4.27 4.00 50.35 male
25 0.40 4.58 2.82 32.48 male
26 0.29 5.04 1.93 33.85 female
27 0.28 3.95 2.51 35.82 female
. . . . . .
. . . . . .
. . . . . .
49 0.37 5.23 2.48 34.86 female
50 0.35 5.37 2.25 35.07 female
Table 4: T6-12.dat
In the above table, the first 25 rows are data from population one(
7
-
7/31/2019 ChungS166projectreport
8/17
gender = male), and the last 25 rows are data from population two(gender = female).
The R output from running our function twopop on this dataset is asfollows,
mean vector of population one
0.3136 5.1788 2.3152 38.1548
mean vector of population two
0.3972 5.3296 3.6876 49.4204
reject equality of mean vectors
The coeffcient of the linear combination
of most responsible for rejection is
-99.39898 6.375999 6.228141 -0.7908238
T Squared Based Simultaneous CI for the difference
Estimate LowerCI UpperCI
1 -0.0836 -0.1697234 0.002523361
2 -0.1508 -1.4650835 1.163483457
3 -1.3724 -1.8760572 -0.868742824
4 -11.2656 -17.1438597 -5.387340281
Bonferroni Based Simultaneous CI for the difference
Estimate LowerCI UpperCI
1 -0.0836 -0.1509852 -0.01621484
2 -0.1508 -1.1791296 0.87752962
3 -1.3724 -1.7664745 -0.978325504 -11.2656 -15.8649035 -6.66629645
2.4 Comparison of Several Multivariate Popula-tion Means
2.4.1 Example 5 (T6-12.dat)
Continue Example 4, we now demonstrate results of doing MANOVA onthe same dataset T6-12.dat. The R output from running our functionMANOVA is as follows,
Overall mean vector
[,1] [,2] [,3] [,4]
[1,] 132.7528 133.3146 98.19101 50.46067
Treatment sample size
[,1] [,2] [,3]
[1,] 29 30 30
8
-
7/31/2019 ChungS166projectreport
9/17
Treatment effect matrix
[,1] [,2] [,3][1,] -1.3734986 -0.3861423 1.7138577
[2,] 0.1336691 -0.6146067 0.4853933
[3,] 1.3262301 0.8756554 -2.1576779
[4,] 0.1255327 -0.2273408 0.1059925
One-Way MANOVA Table
Treatment SS&CP matrix
[,1] [,2] [,3] [,4]
[1,] 147.300878 26.752383 -173.908098 3.083107
[2,] 26.752383 18.918597 -42.424177 6.221813
[3,] -173.908098 -42.424177 213.678096 -8.005024
[4,] 3.083107 6.221813 -8.005024 2.344543
Error SS&CP matrix
[,1] [,2] [,3] [,4]
[1,] 1785.2609 174.1690 125.11034 289.05172
[2,] 174.1690 1904.2724 225.07586 178.87931
[3,] 125.1103 225.0759 2046.07471 -17.82644
[4,] 289.0517 178.8793 -17.82644 837.76782
Total SS&CP matrix
[,1] [,2] [,3] [,4]
[1,] 1932.56180 200.9213 -48.79775 292.13483
[2,] 200.92135 1923.1910 182.65169 185.10112
[3,] -48.79775 182.6517 2259.75281 -25.83146
[4,] 292.13483 185.1011 -25.83146 840.11236
Degrees of Freedom
Treatment Error Total
1 2 80 88
Bonferroni Based Simultaneous CI for Treatments Difference
Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI
1 1 -1 0 -0.987 -4.451 2.476
2 1 0 -1 -3.087 -6.551 0.376
3 0 1 -1 -2.100 -5.563 1.363
4 1 -1 0 0.748 -2.829 4.325
5 1 0 -1 -0.352 -3.929 3.225
6 0 1 -1 -1.100 -4.677 2.477
7 1 -1 0 0.451 -3.257 4.158
8 1 0 -1 3.484 -0.224 7.1919 0 1 -1 3.033 -0.674 6.741
10 1 -1 0 0.353 -2.020 2.725
11 1 0 -1 0.020 -2.353 2.392
12 0 1 -1 -0.333 -2.706 2.039
9
-
7/31/2019 ChungS166projectreport
10/17
2.4.2 Example 6 (T6-13.dat)
Sample x1 x2 x3 x4 Group
1 131 138 89 49 1
2 125 131 92 48 1
. . . . . .
. . . . . .
. . . . . .
29 131 136 114 54 1
30 124 138 101 46 1
31 124 138 101 48 2
32 133 134 97 48 2
. . . . . .
. . . . . .
. . . . . .59 135 132 98 54 2
60 130 128 101 51 2
61 137 141 96 52 3
62 129 133 93 47 3
. . . . . .
. . . . . .
. . . . . .
89 138 133 100 55 3
90 138 133 91 46 3
Table 5: T6-13.dat
In the above table, the first 30 rows are data from group 1, the next30 rows are data from group 2 and the last 30 rows are from data fromgroup 3.The R output from running our function MANOVA on this dataset is asfollows,
Overall mean vector
[,1] [,2] [,3] [,4]
[1,] 0.3554 5.2542 3.0014 43.7876
Treatment sample size
[,1] [,2]
[1,] 25 25
Treatment effect matrix
[,1] [,2]
[1,] -0.0418 0.0418
[2,] -0.0754 0.0754
[3,] -0.6862 0.6862
[4,] -5.6328 5.6328
10
-
7/31/2019 ChungS166projectreport
11/17
One-Way MANOVA Table
Treatment SS&CP matrix[,1] [,2] [,3] [,4]
[1,] 0.087362 0.157586 1.434158 11.77255
[2,] 0.157586 0.284258 2.586974 21.23566
[3,] 1.434158 2.586974 23.543522 193.26137
[4,] 11.772552 21.235656 193.261368 1586.42179
Error SS&CP matrix
[,1] [,2] [,3] [,4]
[1,] 0.404480 5.378180 0.854764 4.328096
[2,] 5.378180 94.196160 2.597532 113.078548
[3,] 0.854764 2.597532 13.833280 105.750500
[4,] 4.328096 113.078548 105.750500 1884.311320
Total SS&CP matrix[,1] [,2] [,3] [,4]
[1,] 0.491842 5.535766 2.288922 16.10065
[2,] 5.535766 94.480418 5.184506 134.31420
[3,] 2.288922 5.184506 37.376802 299.01187
[4,] 16.100648 134.314204 299.011868 3470.73311
Degrees of Freedom
Treatment Error Total
1 1 46 49
Bonferroni Based Simultaneous CI for Treatments Difference
Trt.1 Trt.2 Estimate LowerCI UpperCI
1 1 -1 -0.084 -0.151 -0.016
2 1 -1 -0.151 -1.179 0.8783 1 -1 -1.372 -1.766 -0.978
4 1 -1 -11.266 -15.865 -6.666
2.5 Treatment Effect Comparison
Example 7 and Example 8 use dataset T6-12.dat and T6-13.dat (the sameas in the last subsection).
2.5.1 Example 7 (T6-12.dat)
The R output from running our function trt.effect is as follows,
Bonferroni Based Simultaneous CI for Treatments Difference
Trt.1 Trt.2 Estimate LowerCI UpperCI
1 1 -1 -0.084 -0.151 -0.016
2 1 -1 -0.151 -1.179 0.878
3 1 -1 -1.372 -1.766 -0.978
4 1 -1 -11.266 -15.865 -6.666
11
-
7/31/2019 ChungS166projectreport
12/17
2.5.2 Example 8 (T6-13.dat)
The R output from running our function trt.effect is as follows,Bonferroni Based Simultaneous CI for Treatments Difference
Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI
1 1 -1 0 -1.000 -4.442 2.442
2 1 0 -1 -3.100 -6.542 0.342
3 0 1 -1 -2.100 -5.542 1.342
4 1 -1 0 0.900 -2.674 4.474
5 1 0 -1 -0.200 -3.774 3.374
6 0 1 -1 -1.100 -4.674 2.474
7 1 -1 0 0.100 -3.680 3.880
8 1 0 -1 3.133 -0.647 6.913
9 0 1 -1 3.033 -0.747 6.813
10 1 -1 0 0.300 -2.061 2.661
11 1 0 -1 -0.033 -2.395 2.328
12 0 1 -1 -0.333 -2.695 2.028
3 Appendix (R code)
3.1 Paired Comparison
In this part, x1 is a np numeric matrix or dataframe of data of responsesunder treatment 1 where n is number of experimental unit and p is numberof responses; x2 is a np numeric matrix or dataframe of data of responsesunder treatment 2 where n is number of experimental unit and p is numberof responses,and the input level is the confidence level of interval.
paired
-
7/31/2019 ChungS166projectreport
13/17
1) * sqrt(s[i, i]/n)
}
scit
-
7/31/2019 ChungS166projectreport
14/17
3.3 Comparison of Two Multivariate PopulationMeans
In this part, x1 is a n1 p numeric matrix or dataframe of data frompopulation one where n1 is sample size and p is the number of responses.x2 is a n2 p numeric matrix or dataframe of data from population twowhere n2 is sample size and p is the number of responses. The input levelis the confidence level of interval.
twopop
-
7/31/2019 ChungS166projectreport
15/17
2], UpperCI = scib[, 3])
cat("\n\n T Squared Based Simultaneous CI for the difference \n")
print(scit)cat("\n\n Bonferroni Based Simultaneous CI for the difference \n")
print(scib)
}
3.4 Comparison of Several Multivariate Popula-tion Means
In this part, Y is an Np numeric matrix or dataframe of data where Nis total sample size and p is number of variables. X is an N 1 numericmatrix or dataframe of data where N is total sample size; the input levelis the confidence level of interval. C is the contrast used to test treatment-effects differences.
MANOVA
-
7/31/2019 ChungS166projectreport
16/17
prob
-
7/31/2019 ChungS166projectreport
17/17
treatment-effects differences.
trt.effect