ChungS166projectreport

download ChungS166projectreport

of 17

Transcript of ChungS166projectreport

  • 7/31/2019 ChungS166projectreport

    1/17

    The R project for Comparisons of Several

    Multivariate Means

    Chu-yu Chung Hang Du Yi Su Xiangmin Zhang

    December 7, 2009

    Abstract

    Comparisons of multivariate means involve hypothesis testing, con-structing simultaneous confidence intervals (SCI) and decomposing vari-ances under certain condition. In this project, we write five individual Rfunctions to perform such tasks, including paired comparison, a repeatedmeasure design for comparing treatments, comparing mean vectors fromtwo multivariate population and comparing several multivariate popula-tion means. Our R functions are designed to largely facilitate the compu-tation and to produce as much information as needed in practice.

    1 Introduction

    Multivariate hypothesis testing is different from univariate testing in manyways. It makes use of multivariate normal assumption, which is more ap-propriate in many practical settings. It allows many possible alternatives.

    The advantages of multivariate testing include preserving -value andtesting with a greater power.In section 1.1 to 1.5, we introduce how to formulate the testing whencomparing multivariate means. Either critical value for rejection or si-multaneous confidence interval is presented. The notations are adaptedfrom Chapter 6 of Applied Multivariate Statistical Analysis (6th. ed.) byJohnson R. A. and Wichern D. W.

    1.1 Paired Comparison

    Paired comparison is used to analyze measurements under different setsof experimental conditions to estimate if the responses differ significantlywithin these sets.

    In multivariate paired comparison procedure, we label the responses as

    X111 (variable 1 under treatment 1 in the first unit), . . ., X2np (variablep under treatment 2 in the nth unit) to denote between p responses,two treatments, and n experimental unites, hence the p paired-differencerandom variables under jth unit become

    1

  • 7/31/2019 ChungS166projectreport

    2/17

  • 7/31/2019 ChungS166projectreport

    3/17

    Assume all the population follows Nq (,x). Let C be a contrastmatrix. An level test of H0:C = 0 (equal treatment means) versus

    H1:C = 0 is:Reject H0 if

    T2 = n (Cx)

    CxC

    1

    (Cx) >(n 1) (q 1)

    (n q + 1)Fq1,nq+1 () (9)

    where Fq1,nq+1 is the upper (1 )th percentile of an F-distribution

    with q1 and nq+1 d.f., x = 1n

    nj=1

    xj and S =1

    n1

    nj=1

    (xj x) (xj x)

    The 100(1)% simultaneous confidence intervals for a single contrast

    c

    for any contrast vectors of interest are

    c

    x

    (n1)(q1)

    nq+1Fq1,nq+1 ()

    cScn

    1.3 Comparison of Two Multivariate PopulationMeans

    We are going to compare the responses from one set of experimental set-tings (population 1) with independent response from another set of ex-perimental settings (population 2) in this part. IfX11,X12, . . . ,X1n1 isa random sample of size n1 from Np (1,) and X21,X22, . . . ,X2n2 isan independent random sample of size n2 from Np (2,), the likelihoodratio test of

    H0 : 1 2 = 0 (10)

    then

    T2 =X1 X2 (1 2)

    1n1 +

    1

    n2Spooled

    1 X1 X2 (1 2)

    (11)

    is distributed as

    (n1 + n2 2)p

    n1 + n2 p 1Fp,n1+n2p1 (12)

    where

    Spooled =n1 1

    n1 + n2 1S1 +

    n2 1

    n1 + n2 1S2 (13)

    and (n1 1)S1 is distributed as Wn11 () and (n2 1)S2 is distributedas Wn21 ().

    The 100(1 )% simultaneous confidence interval for 1i 2i is

    X1i X2i

    c

    1n1

    + 1n2

    sii,pooled (14)

    where c2 = (n1+n22)pn1+n2p1

    Fp,n1+n2p1

    3

  • 7/31/2019 ChungS166projectreport

    4/17

    1.4 Comparison of Several Multivariate Popula-tion Means

    MANOVA is a synthesis of analysis output for multivariate analysis. It is ageneralized form of univariate analysis of variance (ANOVA). MANOVAtable is used to identify sum of treatment effects and sum of residuals.We will not delve into details here. A complete explanation of variancedecomposition and summary tables of MANOVA can be found in Chapter6 of Applied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A.and Wichern D. W.

    1.5 Treatment Effect Comparison

    In treatment efffect comparison, we first test if the treatment effects arethe same. When the hypothesis of equal treatment effects is rejected, wewill construct simultaneous confidence intervals for the components of the

    differences of vector means. Treatment effect comparison is closed relatedto MANOVA. Again, a complete discussion can be found in Chapter 6 ofApplied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A. andWichern D. W.

    2 Examples

    For illustrative purpose, we run our functions on several datasets, all ofwhich accompany Chapter 6 of the book Applied Multivariate StatisticalAnalysis(6th. ed.).

    2.1 Paired Comparison

    2.1.1 Example 1 (T6-1.dat)

    Sample x11j x12j x21j x22j1 6 27 25 15

    2 6 23 28 13

    3 18 64 36 22

    4 8 44 35 29

    5 11 30 15 31

    6 34 75 44 64

    7 28 26 42 30

    8 71 124 54 64

    9 43 54 34 56

    10 33 30 29 20

    11 20 14 39 21

    Table 1: T6-1.dat

    4

  • 7/31/2019 ChungS166projectreport

    5/17

    In the above table, the first two colunms are from treatment 1 and thelast two columms are from treatment 2.

    The R output from running our function paired on this dataset is asfollows,

    reject null hypothesis, nonzero mean difference exists

    T Squared Based Simultaneous CI for difference

    Estimate LowerCI UpperCI

    1 -9.363636 -22.453272 3.726000

    2 13.272727 -5.700119 32.245574

    Bonferroni Based Simultaneous CI for difference

    Estimate LowerCI UpperCI

    1 -9.363636 -20.573107 1.845835

    2 13.272727 -2.974903 29.520358

    2.2 Repeated Measure Design Comparison

    2.2.1 Example 2 (T6-2.dat)

    Sample x1 x2 x3 gender

    1 426 609 556 600

    2 253 236 392 395

    3 359 433 349 357

    4 432 431 522 600

    5 405 426 513 513

    6 324 438 507 539

    7 310 312 410 4568 326 326 350 504

    9 375 447 547 548

    10 286 286 403 422

    11 349 382 473 497

    12 429 410 488 547

    13 348 377 447 514

    14 412 473 472 446

    15 347 326 455 468

    16 434 458 637 524

    17 364 367 432 469

    18 420 395 508 531

    19 397 556 645 625

    Table 2: T6-2.dat

    In the above table, each column represent data from an individualtreatment.

    5

  • 7/31/2019 ChungS166projectreport

    6/17

    The R output from running our function repmeasure on this datasetis as follows,

    reject null hypothesis of equal treatment means

    contrast matrix

    [,1] [,2] [,3] [,4]

    [1,] -1 1 -1 1

    [2,] -1 -1 1 1

    [3,] -1 1 1 -1

    Simultaneous CI for contrasts

    Estimate LowerCI UpperCI

    1 -206.32812 -282.19953 -130.4567

    2 -306.92188 -415.73637 -198.1074

    3 22.42188 -31.82305 76.6668

    2.3 Comparison of Two Multivariate PopulationMeans

    2.3.1 Example 3 (T6-9.dat)

    Sample x1 x2 x3 gender

    1 98 81 38 female

    2 103 84 38 female

    . . . . .

    . . . . .

    . . . . .

    23 162 124 61 female

    24 177 132 67 female25 93 74 37 male

    26 94 78 35 male

    . . . . .

    . . . . .

    . . . . .

    47 131 95 46 male

    48 135 106 47 male

    Table 3: T6-9.dat

    In the above table, the first 24 rows are data from population one(gender = female), and the last 24 rows are data from population two(

    gender = male).

    The R output from running our function twopop on this dataset is asfollows,

    6

  • 7/31/2019 ChungS166projectreport

    7/17

    mean vector of population one

    4.900659 4.622909 3.940286

    mean vector of population two

    4.725444 4.477574 3.703186

    reject equality of mean vectors

    The coeffcient of the linear combination

    of most responsible for rejection is

    -43.72677 -8.710687 67.54641

    T Squared Based Simultaneous CI for the difference

    Estimate LowerCI UpperCI

    1 0.1752157 0.05776762 0.2926638

    2 0.1453352 0.05411666 0.23655373 0.2371000 0.12906223 0.3451377

    Bonferroni Based Simultaneous CI for the difference

    Estimate LowerCI UpperCI

    1 0.1752157 0.07702893 0.2734025

    2 0.1453352 0.06907636 0.2215940

    3 0.2371000 0.14678026 0.3274197

    2.3.2 Example 4 (T6-12.dat)

    Sample x1 x2 x3 x4 gender

    1 0.34 3.71 2.87 30.87 male2 0.39 5.08 3.38 43.85 male

    . . . . . .

    . . . . . .

    . . . . . .

    24 0.34 4.27 4.00 50.35 male

    25 0.40 4.58 2.82 32.48 male

    26 0.29 5.04 1.93 33.85 female

    27 0.28 3.95 2.51 35.82 female

    . . . . . .

    . . . . . .

    . . . . . .

    49 0.37 5.23 2.48 34.86 female

    50 0.35 5.37 2.25 35.07 female

    Table 4: T6-12.dat

    In the above table, the first 25 rows are data from population one(

    7

  • 7/31/2019 ChungS166projectreport

    8/17

    gender = male), and the last 25 rows are data from population two(gender = female).

    The R output from running our function twopop on this dataset is asfollows,

    mean vector of population one

    0.3136 5.1788 2.3152 38.1548

    mean vector of population two

    0.3972 5.3296 3.6876 49.4204

    reject equality of mean vectors

    The coeffcient of the linear combination

    of most responsible for rejection is

    -99.39898 6.375999 6.228141 -0.7908238

    T Squared Based Simultaneous CI for the difference

    Estimate LowerCI UpperCI

    1 -0.0836 -0.1697234 0.002523361

    2 -0.1508 -1.4650835 1.163483457

    3 -1.3724 -1.8760572 -0.868742824

    4 -11.2656 -17.1438597 -5.387340281

    Bonferroni Based Simultaneous CI for the difference

    Estimate LowerCI UpperCI

    1 -0.0836 -0.1509852 -0.01621484

    2 -0.1508 -1.1791296 0.87752962

    3 -1.3724 -1.7664745 -0.978325504 -11.2656 -15.8649035 -6.66629645

    2.4 Comparison of Several Multivariate Popula-tion Means

    2.4.1 Example 5 (T6-12.dat)

    Continue Example 4, we now demonstrate results of doing MANOVA onthe same dataset T6-12.dat. The R output from running our functionMANOVA is as follows,

    Overall mean vector

    [,1] [,2] [,3] [,4]

    [1,] 132.7528 133.3146 98.19101 50.46067

    Treatment sample size

    [,1] [,2] [,3]

    [1,] 29 30 30

    8

  • 7/31/2019 ChungS166projectreport

    9/17

    Treatment effect matrix

    [,1] [,2] [,3][1,] -1.3734986 -0.3861423 1.7138577

    [2,] 0.1336691 -0.6146067 0.4853933

    [3,] 1.3262301 0.8756554 -2.1576779

    [4,] 0.1255327 -0.2273408 0.1059925

    One-Way MANOVA Table

    Treatment SS&CP matrix

    [,1] [,2] [,3] [,4]

    [1,] 147.300878 26.752383 -173.908098 3.083107

    [2,] 26.752383 18.918597 -42.424177 6.221813

    [3,] -173.908098 -42.424177 213.678096 -8.005024

    [4,] 3.083107 6.221813 -8.005024 2.344543

    Error SS&CP matrix

    [,1] [,2] [,3] [,4]

    [1,] 1785.2609 174.1690 125.11034 289.05172

    [2,] 174.1690 1904.2724 225.07586 178.87931

    [3,] 125.1103 225.0759 2046.07471 -17.82644

    [4,] 289.0517 178.8793 -17.82644 837.76782

    Total SS&CP matrix

    [,1] [,2] [,3] [,4]

    [1,] 1932.56180 200.9213 -48.79775 292.13483

    [2,] 200.92135 1923.1910 182.65169 185.10112

    [3,] -48.79775 182.6517 2259.75281 -25.83146

    [4,] 292.13483 185.1011 -25.83146 840.11236

    Degrees of Freedom

    Treatment Error Total

    1 2 80 88

    Bonferroni Based Simultaneous CI for Treatments Difference

    Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI

    1 1 -1 0 -0.987 -4.451 2.476

    2 1 0 -1 -3.087 -6.551 0.376

    3 0 1 -1 -2.100 -5.563 1.363

    4 1 -1 0 0.748 -2.829 4.325

    5 1 0 -1 -0.352 -3.929 3.225

    6 0 1 -1 -1.100 -4.677 2.477

    7 1 -1 0 0.451 -3.257 4.158

    8 1 0 -1 3.484 -0.224 7.1919 0 1 -1 3.033 -0.674 6.741

    10 1 -1 0 0.353 -2.020 2.725

    11 1 0 -1 0.020 -2.353 2.392

    12 0 1 -1 -0.333 -2.706 2.039

    9

  • 7/31/2019 ChungS166projectreport

    10/17

    2.4.2 Example 6 (T6-13.dat)

    Sample x1 x2 x3 x4 Group

    1 131 138 89 49 1

    2 125 131 92 48 1

    . . . . . .

    . . . . . .

    . . . . . .

    29 131 136 114 54 1

    30 124 138 101 46 1

    31 124 138 101 48 2

    32 133 134 97 48 2

    . . . . . .

    . . . . . .

    . . . . . .59 135 132 98 54 2

    60 130 128 101 51 2

    61 137 141 96 52 3

    62 129 133 93 47 3

    . . . . . .

    . . . . . .

    . . . . . .

    89 138 133 100 55 3

    90 138 133 91 46 3

    Table 5: T6-13.dat

    In the above table, the first 30 rows are data from group 1, the next30 rows are data from group 2 and the last 30 rows are from data fromgroup 3.The R output from running our function MANOVA on this dataset is asfollows,

    Overall mean vector

    [,1] [,2] [,3] [,4]

    [1,] 0.3554 5.2542 3.0014 43.7876

    Treatment sample size

    [,1] [,2]

    [1,] 25 25

    Treatment effect matrix

    [,1] [,2]

    [1,] -0.0418 0.0418

    [2,] -0.0754 0.0754

    [3,] -0.6862 0.6862

    [4,] -5.6328 5.6328

    10

  • 7/31/2019 ChungS166projectreport

    11/17

    One-Way MANOVA Table

    Treatment SS&CP matrix[,1] [,2] [,3] [,4]

    [1,] 0.087362 0.157586 1.434158 11.77255

    [2,] 0.157586 0.284258 2.586974 21.23566

    [3,] 1.434158 2.586974 23.543522 193.26137

    [4,] 11.772552 21.235656 193.261368 1586.42179

    Error SS&CP matrix

    [,1] [,2] [,3] [,4]

    [1,] 0.404480 5.378180 0.854764 4.328096

    [2,] 5.378180 94.196160 2.597532 113.078548

    [3,] 0.854764 2.597532 13.833280 105.750500

    [4,] 4.328096 113.078548 105.750500 1884.311320

    Total SS&CP matrix[,1] [,2] [,3] [,4]

    [1,] 0.491842 5.535766 2.288922 16.10065

    [2,] 5.535766 94.480418 5.184506 134.31420

    [3,] 2.288922 5.184506 37.376802 299.01187

    [4,] 16.100648 134.314204 299.011868 3470.73311

    Degrees of Freedom

    Treatment Error Total

    1 1 46 49

    Bonferroni Based Simultaneous CI for Treatments Difference

    Trt.1 Trt.2 Estimate LowerCI UpperCI

    1 1 -1 -0.084 -0.151 -0.016

    2 1 -1 -0.151 -1.179 0.8783 1 -1 -1.372 -1.766 -0.978

    4 1 -1 -11.266 -15.865 -6.666

    2.5 Treatment Effect Comparison

    Example 7 and Example 8 use dataset T6-12.dat and T6-13.dat (the sameas in the last subsection).

    2.5.1 Example 7 (T6-12.dat)

    The R output from running our function trt.effect is as follows,

    Bonferroni Based Simultaneous CI for Treatments Difference

    Trt.1 Trt.2 Estimate LowerCI UpperCI

    1 1 -1 -0.084 -0.151 -0.016

    2 1 -1 -0.151 -1.179 0.878

    3 1 -1 -1.372 -1.766 -0.978

    4 1 -1 -11.266 -15.865 -6.666

    11

  • 7/31/2019 ChungS166projectreport

    12/17

    2.5.2 Example 8 (T6-13.dat)

    The R output from running our function trt.effect is as follows,Bonferroni Based Simultaneous CI for Treatments Difference

    Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI

    1 1 -1 0 -1.000 -4.442 2.442

    2 1 0 -1 -3.100 -6.542 0.342

    3 0 1 -1 -2.100 -5.542 1.342

    4 1 -1 0 0.900 -2.674 4.474

    5 1 0 -1 -0.200 -3.774 3.374

    6 0 1 -1 -1.100 -4.674 2.474

    7 1 -1 0 0.100 -3.680 3.880

    8 1 0 -1 3.133 -0.647 6.913

    9 0 1 -1 3.033 -0.747 6.813

    10 1 -1 0 0.300 -2.061 2.661

    11 1 0 -1 -0.033 -2.395 2.328

    12 0 1 -1 -0.333 -2.695 2.028

    3 Appendix (R code)

    3.1 Paired Comparison

    In this part, x1 is a np numeric matrix or dataframe of data of responsesunder treatment 1 where n is number of experimental unit and p is numberof responses; x2 is a np numeric matrix or dataframe of data of responsesunder treatment 2 where n is number of experimental unit and p is numberof responses,and the input level is the confidence level of interval.

    paired

  • 7/31/2019 ChungS166projectreport

    13/17

    1) * sqrt(s[i, i]/n)

    }

    scit

  • 7/31/2019 ChungS166projectreport

    14/17

    3.3 Comparison of Two Multivariate PopulationMeans

    In this part, x1 is a n1 p numeric matrix or dataframe of data frompopulation one where n1 is sample size and p is the number of responses.x2 is a n2 p numeric matrix or dataframe of data from population twowhere n2 is sample size and p is the number of responses. The input levelis the confidence level of interval.

    twopop

  • 7/31/2019 ChungS166projectreport

    15/17

    2], UpperCI = scib[, 3])

    cat("\n\n T Squared Based Simultaneous CI for the difference \n")

    print(scit)cat("\n\n Bonferroni Based Simultaneous CI for the difference \n")

    print(scib)

    }

    3.4 Comparison of Several Multivariate Popula-tion Means

    In this part, Y is an Np numeric matrix or dataframe of data where Nis total sample size and p is number of variables. X is an N 1 numericmatrix or dataframe of data where N is total sample size; the input levelis the confidence level of interval. C is the contrast used to test treatment-effects differences.

    MANOVA

  • 7/31/2019 ChungS166projectreport

    16/17

    prob

  • 7/31/2019 ChungS166projectreport

    17/17

    treatment-effects differences.

    trt.effect