Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of...
Transcript of Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of...
![Page 1: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/1.jpg)
Master thesis Department of Statistics
Masteruppsats, Statistiska institutionen
Effects of unbalancedness and heteroscedasticity on two-way
MANOVA tests
Patrik Zetterberg
Masteruppsats 30 högskolepoäng, vt 2013
Supervisor: Tatjana von Rosen
![Page 2: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/2.jpg)
Abstract
Multivariate analysis of variance is a widely used multivariate method that is gen-
erally robust to minor deviations from normality and homoscedasticity. When
data is balanced, standard multivariate tests for factor effects are exact. How-
ever, these tests can be biased when data is unbalanced and covariance matrices
are heteroscedastic which emphasizes the need for proper methods. This mas-
ter thesis aims to investigate how some newly proposed modified tests, which
takes unbalancedness and heteroscedaticity into account, perform in relation to
standard tests for two-way multivariate analysis of variance models with inter-
actions. Two numerical examples are set up in order to compare performances
of the modified and standard tests. The obtained results show that differences
between these tests are marginal when data is balanced. The modified tests are
overall less prone than standard tests to yield significant results when data is
unbalanced. Main implications from the results are that further studies of the
testing procedure are needed but that modified tests are useful as a statistical
tool in the presence of unbalancedness and heteroscedasticity.
![Page 3: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/3.jpg)
Table of Contents
1 Introduction 4
2 Background 5
3 Univariate analysis of variance models 6
3.1 One-way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Two-way ANOVA with interactions . . . . . . . . . . . . . . . . . . . . 8
3.3 Matrix formulation of the ANOVA model . . . . . . . . . . . . . . . . . 9
3.4 Estimation in the linear model . . . . . . . . . . . . . . . . . . . . . . . 10
3.4.1 Estimation in the two-way ANOVA with interactions . . . . . . 11
3.5 Hypothesis testing in the linear model . . . . . . . . . . . . . . . . . . 12
3.5.1 Hypothesis testing in the one-way ANOVA . . . . . . . . . . . . 12
3.5.2 Hypothesis testing in the two-way ANOVA with interactions . . 13
4 Multivariate analysis of variance models 14
4.1 Heteroscedasticity of covariance matrices . . . . . . . . . . . . . . . . . 15
4.1.1 Box’s M test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 One-way MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Two-way MANOVA with interactions . . . . . . . . . . . . . . . . . . . 17
4.4 Estimation in the two-way MANOVA with interactions . . . . . . . . . 18
4.5 Hypothesis testing in the MANOVA model . . . . . . . . . . . . . . . . 19
4.5.1 Hypothesis testing in the two-way MANOVA with interactions . 19
4.5.2 Wilks’ Λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.3 Hotelling-Lawley Trace . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.4 Pillai’s Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.5 Characteristics of the multivariate tests . . . . . . . . . . . . . . 21
5 Unbalanced data 22
5.1 Unbalanced two-way ANOVA with interactions . . . . . . . . . . . . . 22
5.1.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.2 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Unbalanced two-way MANOVA with interactions 24
6.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Hypothesis testing using modified test statistics . . . . . . . . . . . . . 25
2
![Page 4: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/4.jpg)
7 Numerical examples 28
7.1 Real-life data example . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.1.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.1.2 The two-way MANOVA model with interactions . . . . . . . . . 29
7.1.3 Structure of the numerical example . . . . . . . . . . . . . . . . 29
7.1.4 Testing model assumptions . . . . . . . . . . . . . . . . . . . . . 30
7.1.5 Results from the testing procedure . . . . . . . . . . . . . . . . 30
7.2 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2.1 Construction of the simulated data . . . . . . . . . . . . . . . . 32
7.2.2 Testing model assumptions . . . . . . . . . . . . . . . . . . . . . 33
7.2.3 Results from the testing procedure . . . . . . . . . . . . . . . . 33
7.3 Summary of test results . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8 Discussion 35
References 37
Appendices 39
A Matrix algebra 39
B Summary statistics for the real-life data 41
C Summary statistics for the simulated data 43
D Univariate models for the 2 real-life data 45
E Univariate models for the 2 simulated data 47
F Codes 49
F.1 SAS codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
F.2 MATLAB codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
F.3 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
![Page 5: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/5.jpg)
1 Introduction
Analysis of variance models (ANOVA) have proven useful and applicable, especially as
a tool for experimental design, in a large variety of disciplines ranging from biostatis-
tics to economics. The models have several advantages; they are generally robust and
produce powerful tests (Littell et al., 2002; Hill & Lewicki, 2007). ANOVA-models be-
long to a class of linear models suitable when modeling a continuous response variable
against one or several qualitative explanatory variables, generally called factors, that
are measured either on a nominal or ordinal measurement scale. A main purpose of
fitting ANOVA-models is to determine how the value of the response variable is altered
by the manipulation of factors, but foremost to study differences in means between
factor levels (Sawyer, 2009).
In many situations, it is further of interest how the altering of the combination of factors
could explains variations in not only one, but several response variables simultaneously.
This multivariate set of ANOVA-models are generally referred to as MANOVA-models.
There are several advantages of using MANOVA-models instead of many univariate
ANOVA-models separately. With MANOVA, it is possible to test joint hypotheses of
differences for factor level means. MANOVA also takes into account the correlation
between response variables and thus make better use of the information in data (Littell
et al., 2002).
ANOVA and MANOVA-models rely on a set of assumptions which need to be fulfilled.
These models require data to be normally distributed, homoscedastic and balanced.
However, these assumptions are seldom fair representations of real-life data. In many
situations it is crucial for an experimenter, a company or a scientist to provide solu-
tions and possible corrections when standard assumptions are violated, for instance
when data is unbalanced.
In this master thesis it is shown how the MANOVA-model can be affected if important
model assumptions are violated. The aim is to study effects of unbalanced data and
covariance heteroscedasticity on the testing procedure in the two-way MANOVA-model
with interactions. In particular, the performance of newly proposed multivariate tests
in Zhang & Xiao (2012) will be evaluated and compared to the most commonly used
multivariate tests provided by statistical softwares.
4
![Page 6: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/6.jpg)
The structure of this master thesis is as follows. In Section 2, a brief overview of
the literature related to ANOVA and MANOVA models will be presented. Section 3
provides a short introduction to univariate balanced ANOVA-models. Matrix nota-
tion is introduced for the model specification, estimation and hypothesis testing for the
linear model are reviewed. Section 4 extends the balanced ANOVA-model to a bal-
anced MANOVA-model by considering several response variables. This section further
presents the multivariate model, estimation and multivariate tests. Section 5 is devoted
to unbalanced data and its effects on model specification, estimation and hypothesis
testing in the ANOVA-model. Section 6 further investigates the consequences of unbal-
anced data for the MANOVA-model. Thorough numerical examples, which implement
the two-way MANOVA-model with interactions, will be presented and analyzed in Sec-
tion 7. Finally, the results from the numerical examples will be discussed in Section 8
together with some concluding remarks and suggestions of future studies.
2 Background
Sir Ronald A. Fisher first developed ANOVA in the 1920’s as a method for analyzing
agricultural and biological data. Since then, it has been extensively used in various
applications (Rutherford, 2012). ANOVA was initially applied on balanced data un-
til Frank Yates presented methods for unbalanced data analysIs in the 1930’s (Herr,
1986). Following in the footsteps of Yates, numerous authors have been addressing
unbalanced data in ANOVA-models, some of the more recent being Fujikoshi (1993),
Weber & Skillings (2000), Rencher (2000), Bao & Ananda (2001) and Langsrud (2003) .
After Theodore W. Anderson published his famous book ”An Introduction to Multi-
variate Analysis” in 1958, methods of multivariate statistics including MANOVA were
rapidly established (Sen, 1986). Searle (1987) was one of the first authors to present
methods of how to estimate parameters and test hypotheses in MANOVA-models with
unbalanced data. As he also mentions, ”It is preferable by far to think of the analysis
of unbalanced data as quite separate from that of balanced data”, thus emphasizing
the need for specific methods when data is unbalanced (Searle, 1987).
Many authors, such as Shaw & Mitchell-Olds (1993); Bao & Ananda (2001); Zhang &
5
![Page 7: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/7.jpg)
Xiao (2012), point out that univariate and multivariate tests for main and interaction
effects in ANOVA and MANOVA models are exact when data is balanced and covari-
ance matrices are homoscedastic. When data is unbalanced and covariances are het-
eroscedastic, these tests for main and interaction effects are only approximate (Searle,
1987; Shaw & Mitchell-Olds, 1993; Littell et al., 2002). In addition, these tests become
too conservative and of low power which further highlights the importance of using
modified test statistics (Ananda & Weerahandi, 1997; Zhang & Xiao, 2012). Ways of
adjusting tests for heteroscedasticity in ANOVA-models have been presented by for
example Ananda & Weerahandi (1997) and Bao & Ananda (2001), who use generalized
p-values to obtain exact F-statistics. For MANOVA-models, modified multivariate tests
have been presented by Harrar & Bathke (2008) and Zhang & Xiao (2012).
In Harrar & Bathke (2008), non-parametric alternatives to Wilks’ λ, Hotelling-Lawley
Trace and Pillai’s Trace are proposed. Zhang & Xiao (2012) further propose two other
modifications of these tests using matching of covariance matrix variance components
and affine-invariant covariance matrix transformations. In a simulation study as well as
in real data example, Zhang & Xiao (2012) show that the two modified tests indeed are
less conservative and of higher power than standard multivariate tests and the modified
test proposed by Harrar & Bathke (2008).
Solutions to the problem of unbalanced data and heteroscedastic covariances in MANOVA-
models have not been well addressed despite the vast literature on multivariate methods.
As mentioned above, Harrar & Bathke (2008) and Zhang & Xiao (2012) have recently
presented solutions to these issues, but broader studies concerning their results are
missing and a thorough assessment of their methodology is needed. This master thesis
will therefore start to fill in this gap by investigating the performance of these newly
proposed methods.
3 Univariate analysis of variance models
ANOVA is a tool for estimating the effects of factors on a continuous response variable
with the goal of detecting differences in means for different factor categories, called
levels (Sawyer, 2009). To estimate the factor level means, it is necessary to observe
several outcomes, called replicates, given a certain combination of factor levels. If the
6
![Page 8: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/8.jpg)
number of replicates for each factor combination is equal, the data is referred to as
balanced. However it is often the case that that the number of replicates varies over
factor levels. In this case, data is said to be unbalanced. As it will be shown more
in detail in Sections 5–6, the distinction of balanced and unbalanced is of major im-
portance for the model specification, estimation and hypothesis testing in ANOVA and
MANOVA-models.
Throughout this master thesis, the focus will be entirely on fixed effects ANOVA-
models. Fixed effects models are part of a larger set of general linear models including
random effects models and mixed models. Thus, since factors are assumed to be fixed,
levels of factors are not considered to be random samples from a larger populations
of levels. Hence, inference from fixed effects models is only valid within the specific
population and factors included in the model (Sawyer, 2009).
The ANOVA-model relies on several assumptions:
1. (Normality). The observed sample is assumed to be drawn from a normally
distributed population.
2. (Independence). Observations in the observed sample are independent of each
other.
3. (Homoscedasticity). The variance-covariance matrices are equal across levels of
factors.
A brief overview of one-way and two-way ANOVA models will be given in the following
subsections.
3.1 One-way ANOVA
The most simple form within the set of ANOVA-models is the one-way ANOVA means
model. In the means model, a single response variable is related to the level means of
a single factor so that
yik = µi + εik, (1)
where yik is the value of the response variable of the kth replicate for the ith level of
a factor A, µi is the mean of the ith level of factor A, and εik is a random error, i =
7
![Page 9: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/9.jpg)
1, 2, ..., a, k = 1, 2, ..., n. Further, it is assumed that the error terms are independently
normally distributed with a zero mean and constant variance, εikiid∼ N(0, σ2
ε ). Thus,
E(yik) = µi and V (yik) = V (εik) = σ2ε .
In model (1), each µi is treated as an unknown fixed parameter. The random error
is assumed to vary over both replicates and factor levels, representing the difference
between the observations in each sample and the corresponding population means (?).
By expressing each factor level mean as the deviation from the overall population mean,
i.e. µi = µ + αi where αi = µi − µ, it is possible to formulate model (1) as the factor
effects model
yik = µ+ αi + εik, (2)
where µ is the overall mean and αi is the effect of the ith level of factor A, i = 1, 2, ..., a.
Due to the re-parametrization, E(yik) = µ + αi but the variance equals that in model
(1). In the factor effects model, αi represents the difference between the overall mean
µ and the mean of factor level i.
3.2 Two-way ANOVA with interactions
A natural extension of model (2) is to consider effects of two factors on the response
variable. Given two factors in the model, there are several types of effects to investi-
gate. The main effect of a factor is defined as the difference of one factor to the overall
population mean averaged over the levels of the second factor. It is often the case that
there is a combined effect on the response variable which depends on the level combina-
tion of the two factors. Hence, one may define the interaction effect as the effect of one
factor on the response variable across the levels of the second factor (Littell et al., 2002).
The presence of interaction effects can be discovered when plotting the means of the
response variable for the two factors. Figure 1 shows an example of interaction and no
interaction effects between two factors A and B which have 2 and 3 levels, respectively.
In the right plot in Figure 1 it can be seen that the level means of factor B are varying
with the levels of factor A, showing that factors A and B influence each other. Hence
there are interaction effects between factors A and B.
8
![Page 10: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/10.jpg)
Figure 1: Examples of interaction and no interaction effects between factors A and B
02
46
810
No Interaction
Factor B levels
Mea
n va
lues
of y
1 2 3
Factor A levelslevel 1 level 2
02
46
810
Interaction
Factor B levelsM
ean
valu
es o
f y
1 2 3
Factor A levelslevel 1 level 2
If, on the other hand, level means of factor B are constant with the levels of factor A,
there are no interaction effects. This situation is exemplified in the left plot in Figure 1.
The two-way model with interactions is the following:
yijk = µ+ αi + βj + αβij + εijk, (3)
where yijk is response of the kth replicate on the ith level of A and j th level of B,
i = 1, 2, ..., a, j = 1, 2, ..., b, k = 1, 2, ..., n. Further, µ is the overall mean, αi and βj are
the main effects of the ith and j th levels of factors A and B, respectively, and αβij is
the interaction effect of the ith and j th levels of A and B. In (3), it is assumed that
εijkiid∼ N(0, σ2
ε ), so that E(yijk) = µ+ αi + βj + αβij and V ar(yijk) = σ2ε .
3.3 Matrix formulation of the ANOVA model
Generally, factor effects models (1)–(3) can be written as a linear model in matrix form:
y = Xβ + ε, (4)
where y : n× 1 represents the vector of responses, X : n× p is a known design matrix,
β : p× 1 is a vector of fixed effects to be estimated, and ε : n× 1 is a vector of random
errors such that ε ∼ Nn(0n, σ2εIn). Hence, E(y) = Xβ and V ar(y) = σ2
εIn. Here,
9
![Page 11: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/11.jpg)
0n : n × 1 is a vector with all components equal to zero, and In denotes the identity
matrix of size n.
The utilization of matrix notation facilitates the analysis of factor effects models due
to its ability to express computations in a compact format. Matrix calculations in this
section mainly focus on inference for model (4). All necessary definitions are given in
appendix A.
The estimation in a linear model with fixed effects is briefly discussed in the next
section.
3.4 Estimation in the linear model
A two-way ANOVA with interactions, which is of interest in this master thesis, belongs
to a class of fixed effects models. This model can also be written as:
y = Xβ + ε,
where β = (µ, αi, βj, αβij)′. There are several approaches to estimate the parameter
vector β in the model y = Xβ + ε. A common approach is to use the method of least
squares, i.e. to find an estimator β so that the error sum of squares is minimized:
ε′ε = (y −Xβ)′(y −Xβ)⇒ min .
Observe that:
ε′ε = (y −Xβ)′(y −Xβ)
= y′y − 2βX ′y + β′X ′Xβ, (5)
since the transpose of a scalar is the scalar itself, i.e. y′Xβ = βX ′y. Using matrix
differentiation rules presented in e.g. Harville (2008), the minimization problem reduces
to solving the following equation:
dε′ε
dβ= −2X ′y + 2X ′Xβ = 0, (6)
10
![Page 12: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/12.jpg)
which gives the system of normal equations :
X ′Xβ = X ′y. (7)
It can be shown that an unique solution to (7) is β = (X ′X)−1X ′y, where (X ′X)−1 is
the inverse of X ′X. This parameter value indeed gives the minimum of ε′ε in equation
(6) since the second derivative of (5) is positive (Rencher, 2000). The least square
estimator β has many useful properties. For instance, as shown in Weber & Skillings
(2000) and Rencher (2000) among others, β is a best linear unbiased estimator (BLUE).
3.4.1 Estimation in the two-way ANOVA with interactions
Since factor effects models (2) and (3) are expressed in terms of differences from an
overall mean µ, these models include more parameters than equations in the system of
normal equations. In other words, these models are less than full rank. As a result, by
expressing (2) and (3) in terms of (4), there are infinitely many solutions to the normal
equations in (7) implying that β is not estimable for factor effects model.
One way to solve the system of normal equations (7) uniquely is to impose restrictions
on the included parameters in β using linear constraints. The parameter restrictions
in the two-way ANOVA model with interactions (3) can be expressed as the following
independent linear constraints:
a∑i=1
αi = 0,b∑
j=1
βj = 0,
a∑i=1
αβij = 0, for each j,
b∑j=1
αβij = 0, for each i.
(8)
The least square estimators subject to the constraints (8) can then be derived as:
µ = y...,
αi = yi.. − µ = yi.. − y..., i = 1, . . . , a,
βj = y.j. − µ = y.j. − y..., j = 1, . . . , b,
αβij = yij. − yi.. − y.j. + y....
11
![Page 13: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/13.jpg)
In the above expressions, the dot-notation of for instance yi.. represents averages of y
for each level i of factor A summed over all possible j = 1, 2, ..., b, k = 1, 2, ..., n, so
that:
yi.. =1
a
b∑j=1
n∑k=1
yijk.
3.5 Hypothesis testing in the linear model
Hypothesis tests concerning parameters in linear models written on the form y = βX+ε
could be expressed in terms of the general null hypothesis:
H0 : Cβ = 0, (9)
where C : m× p is a coefficent matrix, β : p× 1 is the vector of unknown parameters
and m is the number of linearly independent estimable functions of Cβ. The general
hypothesis is a convenient way of expressing possible subsets of hypothesis that one is
interested in for a given linear model. Assuming that y ∼ Nn(Xβ, σ2In), it could be
shown that under the null hypothesis (9), the test statistic
F =(Cβ)′[C(X ′X)−1C ′]Cβ/m
SSE/(n− k)
H0∼ F (m,n− k). (10)
In equation (10), (Cβ)′[C(X ′X)−1C ′]Cβ is the sum of squares corresponding to the
null hypothesis in (9) and SSE denotes the error sum of squares. The general hypothesis
can be used for testing null hypotheses about parameters of interest in specific models,
for instance ANOVA and MANOVA-models.
3.5.1 Hypothesis testing in the one-way ANOVA
The classic null hypotheses in the one-way ANOVA-model (2) could be stated as follows:
H0 : α1 = α2 = . . . = αa = 0, (11)
i.e. under the null hypothesis it is assumed that there is no effect of factor A. As
been mentioned by for instance Casella & Berger (2002) and Sawyer (2009), the idea of
ANOVA is to partition the total variance into components. Under the hypothesis (11),
12
![Page 14: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/14.jpg)
it is possible to show that the total variation could be partitioned as:
SST = SSA + SSE,
where SST is the total sum of squares, SSA is the sum of squares of factor A and SSE
is the error sum of squares. The one-way ANOVA table can then be expressed as:
Table 1: One-way ANOVA table
Source df Sum of Squares MS F
A a− 1∑a
i=1
∑nk=1(yi. − y..)2 SSA
a−1MSAMSE
Error a(n− 1)∑a
i=1
∑nk=1(yik − yi.)2 SSE
a(n−1)
Total an− 1∑a
i=1
∑nk=1(yik − y..)2
The sums of squares presented in Table 1 can equivalently be written in matrix form:
Table 2: Matrix notation for the one-way ANOVA table
Source df Sum of Squares MS F
A a− 1 y′(H − 1nJn)y SSA
a−1MSAMSE
Error a(n− 1) y′(In −H)y SSEa(n−1)
Total an− 1 y′(In − 1nJn)y
Here, In is the identity matrix, Jn is a matrix of ones, and H = X(X ′X)−1X ′ denotes
the hat matrix, all matrices being of size n.
3.5.2 Hypothesis testing in the two-way ANOVA with interactions
In the two-way ANOVA model with interactions (3), the general hypothesis (9) under
the constraints (8) could be written as the the following three sets of null hypotheses:
H0A : α1 = α2 = . . . = αa = 0,
H0B : β1 = β2 = . . . = βb = 0, (12)
H0AB : αβ11 = . . . = αβ1b = . . . = αβa1 = . . . = αβab = 0.
13
![Page 15: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/15.jpg)
The first two null hypotheses test the presence of main effects of factors A and B,
respectively, and the third hypothesis tests the presence of an interaction effect between
factors A and B. As with the test statistic for the general linear hypothesis stated in
(10), the test statistics under the three null hypotheses follow F-distributions. The total
sum of squares of all observations in the two-way ANOVA model could be partitioned
into independent sources or variation in the following way:
SST = SSA + SSB + SSAB + SSE. (13)
Since SSA, SSB and SSAB are independent, it is possible to test these 3 hypothesis
concerning factor effects separately. A summary of the decomposition of sums of squares
in the two-way ANOVA analysis is given in Table 3.
Table 3: Two-way ANOVA table
Source df Sum of Squares MS F
A a− 1∑a
i=1
∑bj=1
∑nk=1(yi.. − y...)2 SSA
a−1MSAMSE
B b− 1∑a
i=1
∑bj=1
∑nk=1(y.j. − y...)2 SSB
b−1MSBMSE
AB (a− 1)(b− 1)∑a
i=1
∑bj=1
∑nk=1(yij. − yi.. − y.j. + y...)
2 SSAB(a−1)(b−1)
MSABMSE
Error ab(n− 1)∑a
i=1
∑bj=1
∑nk=1(yijk − yij.)2 SSE
ab(n−1)
Total abn− 1∑a
i=1
∑bj=1
∑nk=1(yijk − y..)2
4 Multivariate analysis of variance models
The multivariate analysis of variance (MANOVA) is an extension of ANOVA in which
the effects of factors are assessed on a linear combination of several response variables.
A multivariate generalization of the ANOVA-model was first addressed by Wilks (1932),
nowadays the MANOVA methodology is well established and widely used in many re-
search areas, ranging from biology to psychology (Casella & Berger, 2002; Zhang &
Xiao, 2012).
The MANOVA-model has many advantages over simultaneous estimation of several
ANOVA-models:
14
![Page 16: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/16.jpg)
• MANOVA tests whether there are significant differences among combinations of
factor levels on several response variables. Thus using MANOVA, one is able to
test joint hypotheses of all univariate ANOVA models and more likely to observe
differences between factor levels. For instance, two factors may have no main or
interaction effects on two different response variables separately but only jointly.
• Fitting one MANOVA-mode instead of several ANOVA-models decreases the the
experimentwise Type I error probability. As a simple example, suppose that
α = 5% for F-tests 6 in separate ANOVA-models. Then, the experimentwise
type I error would equal 30% whereas an overall F-test for included models in the
MANOVA-model would imply a 5% Type I error probability (Littell et al., 2002).
• Several ANOVA-models estimated separately does not take into account the co-
variance pattern among response variables. On the other hand, the MANOVA-
model is sensitive not only to mean differences of factor levels but also to the
covariation between response variables. When response variables are studied to-
gether, they are likely to be correlated to at least some extend and by conducting
several ANOVA analyses this correlation would be lost (Littell et al., 2002).
As for the univariate ANOVA-model, the complexity of the MANOVA-model is rapidly
increasing with the number of factors included in the model. The model specification is
in many ways similar to its univariate analogues presented in Section 3. The assump-
tions for the MANOVA-model are the same as for the ANOVA-model, but extended to
comprise multivariate normality. Still equality of covariance matrices for factor combi-
nations are assumed so that:
Σ11 = Σ12 = . . . = Σ1b = Σa1 = . . . = Σab = Σ,
where Σ : p× p is an unknown covariance matrix.
4.1 Heteroscedasticity of covariance matrices
For the standard ANOVA and MANOVA-model, it is assumed that investigated samples
are independent, follow a normal distribution, and have constant covariance matrices
over factor level combinations. Balanced data does not in itself imply that covariances
are equal and for a given sample, covariances may not in fact be equal for each fac-
tor combination. It has been proven that the estimation in balanced ANOVA and
15
![Page 17: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/17.jpg)
MANOVA-models is robust even with minor deviations from the assumption of equal
covariance matrices (Timm, 2002; Rencher, 2003). This is however not the case when
data is unbalanced, which shall be discussed in Section 6
4.1.1 Box’s M test
One multivariate test of equality of covariance matrices is Box’s M test, named after
Box (1949) who first developed the test. In a similar manner as Lavenes test for the
univariate model, Box’s M tests the equality of covariance matrices across factor levels
in the MANOVA-model.
Thus, with Box’s M, one is interested in testing the null hypothesis:
H0 : Σ1 = Σ2 = . . . = Σi = . . . = Σk = Σ, (14)
where Σi : p×p is the covariance matrix of the ith combination of factors, i = 1, 2, . . . , k,
in the MANOVA-model with p response variables. Setting n =∑k
i=1 ni and vi = ni−1,
under the null hypothesis (14), the pooled estimator of the total covariance matrix is:
S =k∑i=1
viSin− k
,
where ni is the number of replicates on the ith factor combination, and Si is an unbiased
estimator of Σi. A generalized likelihood ratio test statistic can then be calculated as:
M = (n− k) log |S| −k∑i=1
vi logSi.
Using scale factors, Box’s M could be approximated to either a χ2or a F -distribution.
For both approximations, the null hypothesis of homoscedasticity is rejected for large
values of the scaled test statistics (Box, 1949). As Timm (2002) notes, the χ2approxima-
tion is preferred when ni < 20, p < 6 and k < 6. Otherwise, a F approximation is
recommended.
16
![Page 18: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/18.jpg)
4.2 One-way MANOVA
In the one-way MANOVA-model, a single factor explains the variation in a set of
response variables. The factor effects one-way MANOVA-model is the following:
yik = µ+αi + εik, (15)
where yik : p× 1 is vector of p response variables for the kth replicate on the ith level
of factor A, i = 1, 2, ..., a, and αi : p × 1 is the vector of effects for level i of factor A.
More specifically, αi = µi − µ, showing that the vector of effects could be interpreted
as the deviation from the vector of overall means. Further, it is assumed that errors are
independently normally distributed with a zero mean and constant covariance matrix,
εik ∼ Np(0,Σ). Thus,
E(yik) = µ+αi and V (yik) = V (εik) = Σ.
Using matrix notation, one could express the one-way MANOVA-model (15) as:
y = 1ak ⊗ µ+ Ia ⊗ 1k ⊗α+ Iak ⊗ ε,
where ⊗ denotes the Kronecker product of two matrices (see Appendix A).
4.3 Two-way MANOVA with interactions
Similarly to the univariate model (3), the two-way MANOVA model with interactions
is expressed as:
yijk = µ+αi + βj +αβij + εijk, (16)
where yijk : p×1 is a vector of p response variables for the kth replicate on the ith level
of factor A, and the j th level of factor B, i = 1, 2, ..., a, j = 1, 2, ..., b, k = 1, 2, ..., n. In
the two-way MANOVA-model, vectors αi,βj and αβij represent main and interaction
effects, respectively. Also, it is assumed that εijkiid∼ Np(0,Σ) so that:
E(yijk) = µ+αi + βj +αβij and V (yijk) = V (εijk) = Σ.
17
![Page 19: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/19.jpg)
The matrix notation for model (16) is the following:
y = 1abk ⊗ µ+ Ia ⊗ 1bk ⊗α+ 1a ⊗ Ib ⊗ 1k ⊗ β + Iab ⊗ 1k ⊗αβ + Iabk ⊗ ε.
4.4 Estimation in the two-way MANOVA with interactions
As for the univariate models described in Section 3, the effects in model (16) are not
estimable due to over-parametrization of the model. As a result, constraints must be
imposed on the parameters αi,βj,αβij. Based on the notation in e.g. Zhang & Xiao
(2012), the constraints could be expressed as:
a∑i=1
αi = 0,b∑
j=1
βj = 0,
a∑i=1
αβij = 0, for each j,
b∑j=1
αβij = 0, for each i.
(17)
Given the above constraints, the estimators in the two-way MANOVA model with
interactions are obtained as the solutions to the system of normal equations as given
in Section 3.2:
µ = y...,
αi = yi.. − y...,
βj = y.j. − y...,
αβij = yij. − yi.. − y.j. + y...,
where the dot-notation of e.g. y... represents the average of y summed over all possible
(i, j, k), so that:
y... =a∑i=1
b∑j=1
n∑k=1
yijk.
18
![Page 20: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/20.jpg)
4.5 Hypothesis testing in the MANOVA model
For the MANOVA-model, the testing of hypotheses based on the partitioning of sums
of squares becomes more complex because of the interrelationships between the p in-
cluded ANOVA-models. Unlike the univariate models, one must now consider sums of
squares but also cross products for the factors in the MANOVA-model. In the resulting
matrices, called sums of squares and cross products (SSCP), diagonal elements corre-
sponds to the usual sums of squares for each of the p response variables whereas the
off-diagonal elements correspond to the cross products for each response variable pair.
When data is balanced, the partitioning of SSCP matrices is independent in anal-
ogy with the ANOVA-models described in Section 3.5. For example, in the one-way
MANOVA-model:
T = H +E,
where T : p× p is the total SSCP matrix, H : p× p is the hypothesis SSCP matrix and
E : p× p is the error SSCP matrix.
4.5.1 Hypothesis testing in the two-way MANOVA with interactions
In this master thesis, the main focus will be on the two-way MANOVA-model with
interactions. As for the univariate model, it is possible to set up hypotheses about
vectors of parameters in the MANOVA-model using the general hypothesis in equation
(9). Partitioning the associated SSCP matrices makes it possible to conduct multivari-
ate tests of both main effects of A and B as well as the interaction effects between the
two factors A and B. Using the notation proposed by Zhang & Xiao (2012), the general
hypothesis under the two-way MANOVA with interaction (16) can be written as the
following set of hypotheses:
H0A : α1 = α2 = . . . = αa = 0,
H0B : β1 = β2 = . . . = βb = 0, (18)
H0AB : αβ11 = . . . = αβ1b = . . . = αβa1 = . . . = αβab = 0,
19
![Page 21: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/21.jpg)
where H0A tests the main effects of factor A, H0B tests the main effects of factor B and
H0AB tests the interaction effects of A and B. The independent partitioning of SSCP
matrices associated with the hypotheses in (18) could be written as:
T = HA +HB +HAB +E.
In order to test hypotheses about several response variables simultaneously in MANOVA-
models, the standard F-tests for main and interaction effects in the ANOVA-models
have to be generalized. The multivariate tests concerning the effects in the linear
model are in many ways similar to univariate F-tests except that the sums of squares
for effects are replaced, due to the covariance between responses, by SSCP matrices
(Littell et al., 2002).
The SSCP matrices HA, HB, HAB, T and E are expressed explicitly in Table 4:
Table 4: Multivariate analysis of variance table
Source df Sums of squares and cross products matrices
A a− 1 HA = nb∑a
i=1(yi.. − y...)(yi.. − y...)′
B b− 1 HB = na∑b
j=1(y.j. − y...)(y.j. − y...)′
AB (a− 1)(b− 1) HAB = n∑a
i=1
∑bj=1(yij. − yi.. − y.j. + y...)
×(yij. − yi.. − y.j. + y...)′
Error ab(n− 1) E =∑a
i=1
∑bj=1
∑nk=1(yijk − yij.)(yijk − yij.)′
Total abn− 1 T =∑a
i=1
∑bj=1
∑nk=1(yijk − y...)(yijk − y...)′
The multivariate tests in MANOVA are based on the relation between the hypothesis
SSCP matrix H and the error SSCP matrix E. A basis for these tests is the matrix
E−1H , showing that H corresponds to the numerator of the test and E to the denom-
inator of the test. Three commonly used multivariate tests, all being functions of the
matrix E−1H , are Wilks’ Λ, Hotelling-Lawley Trace and Pillai’s Trace.
It should be noted that in Sections 4.5.2–4.5.4, H symbolizes the hypothesis tested
in the MANOVA-model. For instance is H = HAB when testing for interaction effects.
20
![Page 22: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/22.jpg)
4.5.2 Wilks’ Λ
Under the null hypothesis of no factor γ effects:
H0 : γ = 0, (19)
the likelihood ratio test statistic,
Λ =|E|
|E +H|,
is generally known as Wilks’ Λ after Wilks (1932). The null hypothesis is rejected for
small values of Λ, showing that E is small compared to the total SSCP matrix E+H .
4.5.3 Hotelling-Lawley Trace
The test statistic:
U = tr(E−1H),
is often referred to as Hotelling-Layley Trace after Lawley (1938) and Hotelling (1947)
who took part in developing the statistic. Naturally, a large H relative to E would
indicate a larger support for H and a larger trace. Hence is the null hypothesis (19) of
no effects rejected for large values of U .
4.5.4 Pillai’s Trace
Pillai (1955) developed the following statistic:
V = tr((E +H)−1H),
which is commonly known as Pillai’s Trace. As with Hotelling-Lawleys Trace, the null
hypothesis (19) is rejected for large values of V , indicating a large H relative to E.
4.5.5 Characteristics of the multivariate tests
Wilks’ Lambda, Hotelling-Lawley Trace and Pillai’s Trace are all exact tests, meaning
that the probability of rejecting H0 in (19) when H0 is true exactly equals α (Rencher,
2003). However, these tests have different probabilities of rejection when H0 is false,
21
![Page 23: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/23.jpg)
thus implying that the tests have different power for a given sample. In general, none
of these multivariate tests is uniformly better than the other two, although there might
be situations where one test is preferred (Littell et al., 2002; Harrar & Bathke, 2008).
All three tests are also robust when data is balanced (Timm, 2002; Rencher, 2003).
Wilks’ Lambda, Hotelling-Lawley Trace and Pillai’s Trace are usually approximated
with the F-distribution (see e.g. Rencher (2003) for more details on these approxima-
tions).
5 Unbalanced data
So far in this master thesis, the analysis has only been considering the case when
the data is balanced, i.e. when there are equally many observations for each factor
level combination. Nevertheless, it is not always the case that the observed data is
balanced due to either a designed unbalance or missing observations (Searle, 1987; Shaw
& Mitchell-Olds, 1993). In these cases, data is said to be unbalanced. The difference
between balanced and unbalanced data might seem to be trivial, but the similarities
between analyses of balanced data and unbalanced data are few. Instead, as Searle
(1987) points out, one should rather consider unbalanced data as a separate setting
than a special case for balanced data.
5.1 Unbalanced two-way ANOVA with interactions
Since the first papers on analysis of unbalanced data design were published in the mid
1930’s, there has been a long and fruitful debate on how one should express unbalanced
linear models and make inference in the best possible way. Following the notation that
was introduced for balanced ANOVA models in Section 3, the two-way unbalanced
ANOVA-model with interactions could be expressed as a factor effects model:
yijk = µ+ αi + βj + αβij + εijk, (20)
where model assumptions and notation are equal to those for model (3) except that
k = 1, 2, . . . , nij, where nij is the number of replicates at each factor level combination
(i, j), i = 1, 2, . . . , a, j = 1, 2, . . . , b.
22
![Page 24: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/24.jpg)
5.1.1 Estimation
Similar to balanced models one must impose constraints, defined in equation (8), on
the over-parameterized model (20) in order to obtain solutions to the system of normal
equations and unique values of the estimators µ, αi, βj, αβij. These estimators in model
(20) can be written as follows:
µ = y..., αi = yi.. − y..., i = 1, . . . , a,
βj = y.j. − y..., αβij = yij. − yi.. − y.j. + y..., j = 1, . . . , b,(21)
where
y... =a∑i=1
b∑j=1
nij∑k=1
(abnij)−1yijk, yi.. =
b∑j=1
nij∑k=1
(bnij)−1yijk,
y.j. =a∑i=1
nij∑k=1
(anij)−1yijk, yij. =
nij∑k=1
n−1ij yijk.
(22)
As can be seen in expressions (21) and (22), means are weighted by nij which leads to
that produced estimates are different than for the balanced model. For instance, y... no
longer equals the overall mean of the sample as when data is balanced.
5.1.2 Hypothesis testing
Testing the hypotheses of main and interaction effects (12) in the ANOVA-model when
data is unbalanced is not as straight forward as when data is balanced. Searle (1987)
notes that it is quite easy to realize that the usual partitioning of sum of squares in (13)
is not possible when data is unbalanced because main and interaction sum of squares
are no longer independent. As a consequence, the obtained test statistics under the
null hypotheses of no main and interaction effects (12) will not be exactly F-distributed
(Shaw & Mitchell-Olds, 1993). Modifications of methods for computing effect sum of
squares are therefore required when data is unbalanced (Langsrud, 2003).
To adjust for the fact that F-tests for effects in the model are not exact when data
is unbalanced, three methods to partition sums of squares for factors in the ANOVA-
model have been implemented; Type I, Type II and Type III. For details on Type I,
Type II and Type III sum of squares, see e.g. Littell et al. (2002) or Langsrud (2003).
23
![Page 25: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/25.jpg)
In this master thesis, Type III sum of squares will be used for calculations. Type
III sums of squares for an effect is calculated so that it is adjusted for all other effects
in the ANOVA-model, regardless of the order which they are included. For instance,
in the two-way ANOVA with interaction, the Type III sum of squares for factor A is
calculated conditionally on that factor B and the interaction between factors A and
B are already included in the model (Littell et al., 2002). Expressed symbolically, the
partitioning of Type III sums of squares in the unbalanced two-way ANOVA could be
seen in Table 5.
Table 5: Partitioning of Type III sums of squares in the two-way ANOVA table
Source df Type III Sum of Squares
A a− 1 SS(α|µ, β, αβ)
B b− 1 SS(β|µ, α, αβ)
AB (a− 1)(b− 1) SS(αβ|µ, α, β)
6 Unbalanced two-way MANOVA with interactions
The methodology in this section is mainly based on the article by Zhang & Xiao (2012).
Hence, a similar notation and structure will be used throughout this Section. The unbal-
anced and heteroscedastic two-way MANOVA-model with interactions could formally
be expressed as follows:
yijk = µ+αi + βj +αβij + εijk, εijk ∼ N(0,Σij) (23)
where Σij : p×p is the covariance matrix of the (i, j)th combination of levels for factors
A and B, k = 1, 2, . . . , nij, where nij is the number of replicates at each factor level
combination (i, j), i = 1, 2, . . . , a, j = 1, 2, . . . , b, and other notation is the same as for
the balanced two-way MANOVA-model with interactions in Section 4.3.
As been mentioned in Section 4 will a balanced data setting result in robust esti-
mation of the MANOVA-model even with minor deviations from the assumption of
covariance homoscedasticity. However, when covariance heteroscedasticity is severe, for
24
![Page 26: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/26.jpg)
instance due to an unbalanced data setting, the standard multivariate tests become
biased (Zhang & Xiao, 2012). In this case, it is necessary to use modifications of the
standard multivariate tests that protects against the bias. Zhang & Xiao (2012) pro-
pose ways to modify Wilks’ Λ, Hotelling-Lawley Trace and Pillai’s Trace in order to
obtain reliable tests.
6.1 Estimation
In analogy to unbalanced ANOVA-models, the estimation for unbalanced MANOVA-
models is affected by the fact that weights for different factor combinations are no
longer equal. Under the constraints proposed in equation (17), the vector of estimators
of effects could be uniquely derived as:
µ = y..., αi = yi.. − y..., i = 1, . . . , a,
βj = y.j. − y..., αβij = yij. − yi.. − y.j. + y..., j = 1, . . . , b,
where the dot-notation is the same as proposed by (Zhang & Xiao, 2012):
y... =a∑i=1
b∑j=1
nij∑k=1
(abnij)−1yijk, yi.. =
b∑j=1
nij∑k=1
(bnij)−1yijk,
y.j. =a∑i=1
nij∑k=1
(anij)−1yijk, yij. =
nij∑k=1
n−1ij yijk.
.
6.2 Hypothesis testing using modified test statistics
Zhang & Xiao (2012) propose three types of modifications to the standard test statistics
which adjust for unbalanced data and heteroscedastic covaraince matrices. Under the
null hypotheses of no main or interaction effects stated in equation (18), one may define
SSCP matrices for the unbalanced two-way MANOVA-model with interactions in the
following way:
25
![Page 27: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/27.jpg)
HA =1
a− 1
a∑i=1
b∑j=1
(yi.. − y...)(yi.. − y...)′,
HB =1
b− 1
a∑i=1
b∑j=1
(y.j. − y...)(y.j. − y...)′,
HAB =1
(b− 1)(a− 1)
a∑i=1
b∑j=1
(yij. − yi.. − y.j. + y...)(yij. − yi.. − y.j. + y...)′,
where HA,HB and HAB are the SSCP matrices associated with the hypothesis of no
main effects for factor A, factor B and the interaction effect between factors A and B,
respectively. It should be noted that throughout the rest of this section, H = HA,HB
or HAB depending on the hypothesis tested.
When deriving the modified test statistics, Zhang & Xiao (2012) depart from the rela-
tionship between H and a natural unbiased estimator of the covariance matrix Σ:
G = (ab)−1
a∑i=1
b∑j=1
n−1ij Σij,
where Σij = (nij−1)−1∑a
i=1
∑bj=1(yijk− yijk)(yijk− yijk)′ is the unbiased estimator of
Σij. It can be shown that under model (23), H and G will be approximately Wishart-
distributed:
H ∼ W (fH ,Σ/fH), G ∼ W (fG,Σ/fG),
where fH and fG are unknown approximate degrees of freedom belonging to the dis-
tributions of H and G, respectively. Then, by defining W1 = fHH and W2 = fGG,
modifications of Wilks’ Λ (WL), Hotelling-Lawley Trace (HLT) and Pillai’s trace (PT)
could be derived as:
TWL = − log
(|W1|
|W1 +W2|
),
THLT = tr(W1W
−12
),
TPT = tr(W1(W1W2)−1
),
(24)
26
![Page 28: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/28.jpg)
due to the fact that W1 and W2 are independent (Harrar & Bathke, 2008; Zhang &
Xiao, 2012).
By looking at the modified MANOVA test statistics in equation (24), one can clearly
see the resemblance to the standard MANOVA test statistics in sections 4.5.2–4.5.4.
However, unlike the standard multivariate test statistics, the modified test statistics
depend on the unknown quantities Σ, fH and fG. Zhang & Xiao (2012) derive three
sets of expressions for fH and fG using the information contained in the matrices H
and G:
1. fH and fG are obtained as proposed by Harrar & Bathke (2008).
2. fH and fG are obtained by matching total variances.
3. fH and fG are obtained under an affine invariant transformation of the MANOVA
model.
It can be shown that the three sets of approximate degrees of freedom are the following:
fH =tr (Σ2)∑
i,j
∑α,β
c2ij,αβnijnαβ
tr (ΣijΣαβ), fG =
tr (Σ2)
(ab)−2∑i,j
(nij − 1)−1n−2ij tr (Σ2
ij), (25)
fH =[tr (Σ2) + tr2 (Σ)]∑
i,j
∑α,β
c2ij,αβnijnαβ
[tr (ΣijΣαβ) + tr (Σij) tr (Σαβ)],
fG =[tr (Σ2) + tr2 (Σ)]
(ab)−2∑i,j
(nij − 1)−1n−2ij [tr (Σ2
ij) + tr2 (Σij)],
(26)
fH =p(p+ 1)∑
i,j
∑α,β
c2ij,αβnijnαβ
{tr (ΣijΣ−1ΣαβΣ−1) + tr (ΣijΣ−1) tr (ΣαβΣ−1)},
fG =p(p+ 1)
(ab)−2∑i,j
(nij − 1)−1n−2ij {tr ([ΣijΣ−1]2)] + tr2 (ΣijΣ−1)}
,
(27)
where∑i,j
symbolizes summation over all i’s and j’s, c2ij,αβ are design weights related to
the hypothesis tested, and α = 1, 2 . . . , a, β = 1, 2, . . . , b are additional indices used for
27
![Page 29: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/29.jpg)
convenience of calculations. To estimate fG and fH in equations (25)–(27) in real data
analysis, the unknown quantities Σ and Σij are replaced by their estimates.
7 Numerical examples
The section will present two numerical examples, one that applies the methodology on
two real-life data sets and one that applies the method on two simulated data sets. The
main purpose of the numerical examples is to examine the performance of the modified
MANOVA tests, Wilks’ Λ, Hotelling-Lawley Trace and Pillai’s Trace, when the assump-
tion of covariance homoscedasticity is unlikely to hold and the data is unbalanced. In
both examples, the performance of the modified MANOVA tests will be compared to
test results of standard MANOVA tests so to open up for a broader discussion.
The two numerical examples are presented in Sections 7.1–7.2, and main results of
standard and modified MANOVA tests are presented in Section 7.3.
7.1 Real-life data example
In this section, the methodology presented in Sections 3–6 will be illustrated using a
synthetic real-life data set. The data used for this numerical example will be discussed
briefly in Section 7.1.1. As part of the numerical example, assumptions regarding
distributional properties and covariance matrices of the response variables will be in-
vestigated. The results from these diagnostic tests will be presented in Section 7.1.4.
Ultimately, the results from the multivariate testing procedure will be presented in
Section 7.1.5.
7.1.1 Data
The data for the numerical example is collected from the database Integrated Public Use
Microdata Series (IPUMS). IPUMS comprises data for various samples of the American
population, drawn from federal censuses and the American Community Surveys (ASC)
between the years 2000-2011 (Ruggles et al., 2010). In this master thesis, the ACS
2011 sample data is used. The ACS 2011 sample is collected using mixed-modes,
including e-mail, phone, mail and personal interviews (Ruggles et al., 2010). From the
original ACS sample, a subset density of 1%, thereby containing 251215 individuals is
28
![Page 30: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/30.jpg)
extracted. Ultimately, using a second round of random sampling, two final samples of
120 observations are obtained. Three variables from ACS are chosen as a continuous
multivariate response in the MANOVA-model:
• Total personal income – the total pre-tax income or losses for each individual
during the previous calendar year, measured in dollars.
• Hauser and Warren Socioeconomic Index – an index score assigned to each in-
dividual based on occupation. The index measures occupational status based on
earnings and educational attainment for each category.
• Occupational Income Score – an income score assigned to each individual based on
occupation. The score is measured as the weighted average occupational income,
thereby reflecting relative economic standing of occupations for each individual.
where variable definitions are found in (Ruggles et al., 2010). In the numerical exam-
ple below are all three variables log-transformed to avoid severe departure from the
assumption of normality of responses.
7.1.2 The two-way MANOVA model with interactions
The two studied factors for each individual is Sex and Census region, each having 2
and 4 levels, respectively: males and females coming from Northeast, Midwest, South
and West census regions in the U.S. In the MANOVA-model, it is assumed that Sex
and Census region have main effects as well as interaction effects. Further, Sex is
representing the first factor A and Census region the second factor B, so that there are
8 combinations of the two factors, as shown in Table 6.
7.1.3 Structure of the numerical example
As mentioned earlier is the performance of the standard MANOVA tests likely to be
affected when data is unbalanced and covariance matrices are heteroscedastic. This
numerical example will thus examine two situations, one sample where data is bal-
anced and one sample where nij vary over factor combinations. In both samples is
n =∑nij
k=1 = 120. It should be noted that one faces endless combinations of how nij
can be altered when studying effects of unbalancedness and that future studies might
consider other alternatives.
29
![Page 31: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/31.jpg)
Table 6: Data for the unbalanced two-wayMANOVA with k replicates, k = 1, 2, . . . , nij.
Sex
Census region Males Females Totals
Northeast y11k y21k y.1k
Midwest y12k y22k y.2k
South y13k y23k y.3k
West y14k y24k y.4k
Totals y1.k y2.k y..k
7.1.4 Testing model assumptions
This Section summarizes the results from goodness-of-fit tests of univariate and multi-
variate normality of the response variables in the MANOVA model as well as the results
from Box’s M test of covariance matrix homoscedasticiy.
The Shapiro-Wilk goodness-of-fit test is used for testing univariate normality whereas
the Mardia test is used for testing multivariate normality. Additionally, QQ-plots of
multivariate fit and histograms of univariate residuals of the estimated MANOVA model
are constructed. Low p-values of obtained goodness-of-fit test statistics suggest devia-
tion from both univariate and multivariate normality in both samples. However, neither
residual histograms nor QQ-plots suggest that this deviation is severe (Tables and Fig-
ures are presented in Appendix B).
Table 7 summarizes the results from Box’s M test for the samples. The null hypothesis
of equality of covariance matrices is rejected at a 5% significance level for the unbalanced
samples, but not when data is balanced.
7.1.5 Results from the testing procedure
Tables 8–9 below show the test results of Wilks’ Λ (WL), Hotelling-Lawley Trace (HLT)
and Pillai’s Trace (PT) as produced in SAS as well as the three modifications proposed
by Zhang & Xiao (2012). The modified tests are denoted HLTi, PTi and WLi where
30
![Page 32: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/32.jpg)
Table 7: Results from Box’s M test of H0 : Σ11 = Σ12 = . . . = Σ24
Sample χ2 df p
Balanced 54.7 42 0.0910Unbalanced 72.7 42 0.0023
i = 1 stands for the modification proposed initially by Harrar & Bathke (2008), i = 2
for the modification based on matching of variance components, and i = 3 for the
modification based on affine invariant transformation of the two-way MANOVA-model
with interactions. When data is unbalanced, standard tests are based on Type III par-
titioning of SSCP matrices described in Section 5.1.2.
Tables 8 shows the test results for the balanced sample. The null hypotheses of main
and interaction effects for Sex and Census Region are not rejected at a 5% significance
level. Overall, it can be noted that test results are nearly identical for all 12 test statis-
tics. There is a slight tendency that modified MANOVA tests are less significant than
the standard MANOVA tests produced in SAS, but this difference is so small that it
can be neglected.
Table 8: Test results for balanced sample
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 2.44 3 110 0.0678 0.86 9 169.7 0.5647 0.85 9 169.7 0.5685PT 2.44 3 110 0.0678 0.86 9 336 0.5623 0.86 9 336 0.5590WL 2.44 3 110 0.0678 0.86 9 267.86 0.5652 0.86 9 267.86 0.5655
HLT1 2.44 3 93.235 0.0695 0.85 8.4399 221.6 0.5622 0.85 8.4399 221.6 0.5622PT1 2.44 3 93.235 0.0695 0.85 8.4399 258.67 0.5662 0.84 8.4399 258.67 0.5698WL1 2.44 3 93.235 0.0695 0.86 8.4399 267.4 0.5585 0.86 8.4399 267.4 0.5547
HLT2 2.44 3 95.517 0.0692 0.85 8.5088 227.75 0.5625 0.85 8.5088 227.75 0.5625PT2 2.44 3 95.517 0.0692 0.85 8.5088 267.24 0.5663 0.84 8.5088 267.24 0.5700WL2 2.44 3 95.517 0.0692 0.86 8.5088 276.12 0.5589 0.86 8.5088 276.12 0.5552
HLT3 2.44 3 98.208 0.0689 0.85 8.742 236.61 0.5647 0.85 8.742 236.61 0.5647PT3 2.44 3 98.208 0.0689 0.85 8.742 282.35 0.5683 0.85 8.742 282.35 0.5720WL3 2.44 3 98.208 0.0689 0.86 8.742 291.75 0.5612 0.86 8.742 291.75 0.5575
Notation: In the above table, subscripts declare the tested hypothesis. For instance FA represents the F-statisticfor the null hypothesis of no effects for Sex.
Looking at Table 9 one can see that standard MANOVA tests reject the null hypothesis
of main effects for Sex at a 5% significance level. Both the main effect for Census region
31
![Page 33: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/33.jpg)
and interaction effect between Sex and Census region are clearly not rejected.
Table 9: Test results for the unbalanced sample.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 3.49 3 110 0.0181 0.85 9 169.7 0.5680 1.43 9 169.7 0.1769PT 3.49 3 110 0.0181 0.86 9 336 0.5636 1.41 9 336 0.1803WL 3.49 3 110 0.0181 0.85 9 267.86 0.5676 1.42 9 267.86 0.1778
HLT1 2.34 3 39.629 0.0881 0.93 7.6064 90.136 0.4951 1.26 7.6064 90.136 0.2740PT1 2.34 3 39.629 0.0881 0.92 7.6064 97.407 0.5037 1.27 7.6064 97.407 0.2707WL1 2.34 3 39.629 0.0881 0.94 7.6064 104.37 0.4874 1.25 7.6064 104.37 0.2785
HLT2 2.34 3 41.464 0.0868 0.93 7.6777 94.71 0.4935 1.27 7.6777 94.71 0.2710PT2 2.34 3 41.464 0.0868 0.92 7.6777 103 0.5016 1.27 7.6777 103 0.2674WL2 2.34 3 41.464 0.0868 0.94 7.6777 110.11 0.4863 1.26 7.6777 110.11 0.2757
HLT3 2.37 3 52.568 0.0813 0.94 7.9645 122.02 0.4854 1.29 7.9645 122.02 0.2567PT3 2.37 3 52.568 0.0813 0.93 7.9645 136.25 0.4912 1.29 7.9645 136.25 0.2523WL3 2.37 3 52.568 0.0813 0.95 7.9645 143.95 0.4801 1.27 7.9645 143.95 0.2619
Notation: In the above table, subscripts declare the tested hypothesis. For instance FA represents the F-statisticfor the null hypothesis of no effects for Sex.
These results differ from the test results obtained by the modified tests which do not
reject the null hypothesis of no main effect for Sex at a 5% level of significance. Further,
main effects for Census region and interaction effects between Sex and Census region
are not statistically significant.
7.2 Simulation study
A simulation study is conducted in order to validate the results obtained from studying
the real-life data. Two data sets, one balanced and one unbalanced are simulated and
investigated in the same way as the real-life data. The aim is to get further indica-
tions about the performance of the modified MANOVA tests in relation to the standard
MANOVA tests. A short description of the simulation study is given in Section 7.2.1.
Assumptions of univariate and multivariate normality as well as covariance homoscedas-
ticity are tested in Section 7.2.2. Results from the testing procedure are then shown in
Section 7.2.3.
7.2.1 Construction of the simulated data
The simulation study is based on the algorithms presented in Zhang & Xiao (2012).
Two data sets of size n = 120 are simulated from a multivariate normal distribution
32
![Page 34: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/34.jpg)
with 3 response variables y1, y2 and y3, and 2 factors A and B having 2 and 4 levels,
respectively. The layout of the two simulated data sets can therefore be represented as
shown in Table 6. The simulated data sets are generated in the following way:
yijk = µij + Σ1/2ij εijk,
where k = 1, 2, . . . , nij. The mean vectors are defined as µij = µ11 + ijδh/(ab) where
µ11 = 0 and h = 2.4. It is further assumed that εijk ∼ Np(0, Ip). The covariance
structure for the unbalanced simulated data is assumed to vary over the two levels of
factor A. Explicitly, Σ1j = I3 and Σ2j = diag (1.0, 5.0, 0.1) , j = 1, 2, 3, 4.
7.2.2 Testing model assumptions
Goodness-of-fit tests of univariate and multivariate normality for the two simulated data
are presented in Appendix C together with QQ-plots of multivariate fit and histograms
of univariate residuals from the estimated MANOVA model. While figures indicate a
better fit to univariate and multivariate normality than for the real-life data samples,
the goodness-of-fit tests suggest a slight departure from normality in some cases.
Table 10 summarizes the results from Box’s M test for the two simulated data sets. The
null hypothesis of equality of covariance matrices is clearly rejected at a 5% significance
level for both the balanced and the unbalanced case which implies heteroscedasticity of
covariance matrices.
Table 10: Results from Box’s M test of H0 : Σ11 = Σ12 = . . . = Σ24
Sample χ2 df p
1 306.3 42 <.00012 293.9 42 <.0001
7.2.3 Results from the testing procedure
Tables 11–12 show the test results for the two simulated data sets. As can be seen in
Table 11, all tests show statistically significant main effects of factors A and B when
data is balanced. The interaction effect between factors A and B is however not sta-
tistically significant. Overall, p-values are marginally higher for modified tests than for
33
![Page 35: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/35.jpg)
standard tests.
Table 11: Test results for balanced simulated data.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 9.30 3 110 <.0001 6.29 9 169.7 <.0001 1.14 9 169.7 0.3397PT 9.30 3 110 <.0001 4.96 9 336 <.0001 1.12 9 336 0.3463WL 9.30 3 110 <.0001 5.66 9 267.86 <.0001 1.13 9 267.86 0.3434
HLT1 9.14 3 54.507 <.0001 5.59 5.6492 104.82 <.0001 1.11 5.6492 104.82 0.3614PT1 9.14 3 54.507 <.0001 6.11 5.6492 100.88 <.0001 1.10 5.6492 100.88 0.3647WL1 9.14 3 54.507 <.0001 5.09 5.6492 104.3 0.0002 1.11 5.6492 104.3 0.3584
HLT2 9.17 3 59.467 <.0001 5.59 6.0228 119.22 <.0001 1.11 6.0228 119.22 0.3596PT2 9.17 3 59.467 <.0001 6.13 6.0228 117.37 <.0001 1.11 6.0228 117.37 0.3621WL2 9.17 3 59.467 <.0001 5.04 6.0228 121.41 0.0001 1.12 6.0228 121.41 0.3574
HLT3 9.26 3 84.829 <.0001 5.58 8.0237 197.47 <.0001 1.12 8.0237 197.47 0.3512PT3 9.26 3 84.829 <.0001 6.20 8.0237 223.53 <.0001 1.12 8.0237 223.53 0.3499WL3 9.26 3 84.829 <.0001 4.87 8.0237 231.36 <.0001 1.12 8.0237 231.36 0.3528
Notation: In the above table, subscripts declare the tested hypothesis. For instance FA represents the F-statisticfor the null hypothesis of no effects for factor A.
For the unbalanced simulated data, all tests show statistically significant main effects
for factors A and B as can be seen in figure 12.
Table 12: Test results for unbalanced simulated data.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 7.20 3 110 0.0002 5.37 9 169.7 <.0001 1.72 9 169.7 0.0880PT 7.20 3 110 0.0002 4.69 9 336 <.0001 1.62 9 336 0.1071WL 7.20 3 110 0.0002 5.07 9 267.86 <.0001 1.67 9 267.86 0.0959
HLT1 7.52 3 35.478 0.0005 5.09 5.792 69.365 0.0003 1.68 5.792 69.365 0.1416PT1 7.52 3 35.478 0.0005 5.38 5.792 66.634 0.0002 1.73 5.792 66.634 0.1305WL1 7.52 3 35.478 0.0005 4.82 5.792 70.292 0.0004 1.63 5.792 70.292 0.1551
HLT2 7.55 3 38.145 0.0004 5.09 6.093 77.024 0.0002 1.68 6.093 77.024 0.1353PT2 7.55 3 38.145 0.0004 5.40 6.093 75.41 0.0001 1.74 6.093 75.41 0.1232WL2 7.55 3 38.145 0.0004 4.76 6.093 79.566 0.0003 1.63 6.093 79.566 0.1498
HLT3 7.69 3 60.757 0.0002 5.13 8.0793 141.88 <.0001 1.72 8.0793 141.88 0.0984PT3 7.69 3 60.757 0.0002 5.54 8.0793 160.24 <.0001 1.78 8.0793 160.24 0.0840WL3 7.69 3 60.757 0.0002 4.60 8.0793 168.18 <.0001 1.64 8.0793 168.18 0.1154
Notation: In the above table, subscripts declare the tested hypothesis. For instance FA represents the F-statisticfor the null hypothesis of no effects for factor A.
The interaction effect is not significant at a 5% significance level but 2 out of 3 standard
test suggest the presence of a significant effect at a 10% level. This last result is
34
![Page 36: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/36.jpg)
not supported by the modified tests which show substantially higher p-values for the
interaction effect. In line with previous results, the p-values from the modified tests are
higher compared to those from the standard tests in most cases.
7.3 Summary of test results
Combining the results from the two studies it is evident that standard MANOVA tests
overall have lower p-values than the three modified MANOVA tests. These findings
are similar to those presented by Zhang & Xiao (2012). In both numerical examples,
differences between tests are small when data is balanced but substantially larger when
data is unbalanced. Results for the real-life data samples are generally not as clear-
cut as for the simulated data but the overall tendency is that p-values of the standard
MANOVA tests are higher than for the modified MANOVA tests. In one case (the test
of main effects for Sex in the unbalanced real-life sample) are standard tests statistically
significant at a 5% level while modified tests are not.
8 Discussion
Based on the obtained results from the empirical study, it can be debated whether
modified MANOVA tests are a better choice than standard MANOVA tests. It is de-
sirable to adjust tests when heteroscedasticity is severe, since one wants to make valid
inference when the variability in data is large. This property of the modified MANOVA
tests is highlighted by Zhang & Xiao (2012) as one of the main reasons why one should
use these tests.
Looking at the results, it is obvious that the modified test seem to be less prone to
reject the null hypotheses when data is unbalanced which is in line with results ob-
tained by Zhang & Xiao (2012). Test results from modified tests and standard tests
are almost identical when data is balanced, even when covariances are heteroscedastic.
Once again, this is highlighting the problems with unbalanced data and supporting the
fact that balanced data is preferable in empirical studies.
The results obtained from the conducted studies on MANOVA tests raise interesting
questions on what could be learned and improved. Even though results are pointing in
the same direction, that modified MANOVA tests have higher p-values than standard
35
![Page 37: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/37.jpg)
MANOVA tests, one must be careful to exaggerate these results. Further, a deeper
analysis of the underlying covariance structures in the data must be made in order to
generalize these results. For example, the covariance structure for the simulated data
is quite simple, so it is questionable whether complicated covariance structures would
yield equivalent results.
This master thesis aimed to contribute on studies of performance of newly proposed
modified two-way MANOVA tests. The obtained results indicate that further studies
with a special emphasis on unbalancedness are needed. Moreover, different types of
factors (with respect to number of factor levels) and different covariance structures
(with respect to covariance complexity) must be implemented in these studies. The
heteroscedasticity of covariance matrices for factor level combinations might affect the
testing procedure even for balanced data. Further, one could recommend to investigate
the effect of different covariance structures on MANOVA-tests in the presence of het-
eroscedasticity.
Despite all difficulties when analyzing unbalanced and heteroscedastic data, the ob-
tained results in this master thesis suggest modified MANOVA tests as a useful statis-
tical tool. As been further mentioned by Zhang & Xiao (2012), these tests are relatively
powerful and unbiased which further supports their wide application.
36
![Page 38: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/38.jpg)
References
Ananda, M. M. A. & Weerahandi, S. (1997). Two-way anova with unequal cell frequen-
cies and unequal variances. Statistica Sinica, 7, 631–646.
Bao, P. & Ananda, M. M. A. (2001). Performance of two-way anova procedures when
cell frequencies and variances are unequal. Communications in Statistics - Simulation
and Computation, 30, 805–829.
Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria.
Biometrika, 36, 317–346.
Casella, G. & Berger, R. (2002). Statistical inference. Cengage Learning, Stamford.
Fujikoshi, Y. (1993). Two-way anova models with unbalanced data. Discrete Mathe-
matics, 116, 315–334.
Harrar, S. W. & Bathke, A. C. (2008). Nonparametric methods for unbalanced multi-
variate data and many factor levels. Journal of Multivariate Analysis, 99, 1635–1664.
Harville, D. (2008). Matrix algebra from a statistician’s perspective. Springer, New
York.
Herr, D. G. (1986). On the history of anova in unbalanced, factorial designs: The first
30 years. The American Statistician, 40, 265–270.
Hill, T. & Lewicki, P. (2007). Statistics: Methods and applications. StatSoft.
Hotelling, H. (1947). Multivariate quality control: illustrated by the air testing of sample
bombsights. McGraw-Hill, New York.
Langsrud, O. (2003). Anova for unbalanced data: Use Type II instead of Type III sums
of squares. Statistics and Computing, 13, 163–167.
Lawley, D. N. (1938). A generalization of Fisher’s z-test. Biometrika, 30, 180–187.
Littell, R., Stroup, W. & Freund, R. (2002). Sas for linear models. SAS Institute.
Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. The Annals of
Mathematical Statistics, 26, 117–121.
Rencher, A. (2000). Linear models in statistics. Wiley, New York.
37
![Page 39: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/39.jpg)
Rencher, A. (2003). Methods of multivariate analysis. Wiley, New York.
Ruggles, S., Sobek, M., Genadek, K., Alexander, J., Schroeder, M. & Goeken, R. (2010).
Integrated public use microdata series: Version 5.0.
Rutherford, A. (2012). Anova and ancova: A glm approach. Wiley, New York.
Sawyer, S. (2009). Analysis of variance: The fundamental concepts. The Journal of
Manual and Manipulative Therapy, 17, E27–E38.
Schott, J. (2005). Matrix analysis for statistics. Wiley Series in Probability and Statis-
tics. Wiley, New York.
Searle, S. (1987). Linear models for unbalanced data. Wiley, New York.
Sen, P. K. (1986). Contemporary textbooks on multivariate statistical analysis: A
panoramic appraisal and critique. Journal of the American Statistical Association,
81, 560–564.
Shaw, R. G. & Mitchell-Olds, T. (1993). Anova for unbalanced data: An overview.
Ecology, 74, 1638–1645.
Timm, N. (2002). Applied multivariate analysis: methods and case studies. Springer,
New York.
Weber, D. & Skillings, J. (2000). A first course in the design of experiments: A linear
model approch. CRC Press, New York.
Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika, 24,
471–494.
Zhang, J.-T. & Xiao, S. (2012). A note on the modified two-way manova tests. Statistics
and Probability Letters, 82, 519–527.
38
![Page 40: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/40.jpg)
Appendices
A Matrix algebra
Definition 1 (Transpose). Let A be a n × m matrix. Then the transpose of A is a
m × n matrix A′ such that the ith row, jth column element of A′ is the jth row, ith
column element of A.
Definition 2 (Determinant). The following definition of determinants is found in
Schott (2005). Let A be a p× p matrix. Then its determinant |A| is given by:
|A| =∑
(−1)f(i1,...,im)a1i1a2i2 · · · amim=∑
(−1)f(i1,...,im)ai11ai22 · · · aimm,
where the summation is taken over all permutations (i1, . . . , im) of the set of integers
(1, . . . ,m), and the function f(i1, . . . , im) equals the number of transpositions necessary
to change (i1, . . . , im) to (1, . . . ,m).
Definition 3 (Trace). The following definition of the trace is found in Schott (2005).
Let A be a p × p matrix. Then its trace, tr (A), is defined as the sum of the diagonal
element in A:
tr (A) =
p∑i=1
aii.
where the summation is taken over all permutations (i1, . . . , im) of the set of integers
(1, . . . ,m), and the function f(i1, . . . , im) equals the number of transpositions necessary
to change (i1, . . . , im) to (1, . . . ,m).
Definition 4 (Diagonal). Let A = (aij) be a square p× p matrix. Then the diagonal
of A is a p× 1 vector containing the elements a11, a22, . . . , app.
Definition 5 (Invertibility). A square matrixA is said to be invertible (or non-singular)
if there exist a matrix A−1 such that AA−1 = I, where I the identity matrix and A−1
is the inverse of A.
Definition 6 (Rank). The rank of a matrix A is the number of linearly independent
columns or rows in A.
39
![Page 41: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/41.jpg)
Definition 7 (Full rank). A matrix A is said to be of full rank if all columns and/or
all rows in A are linearly independent. Thus, A : n × n is of full rank if and only if
all rows and columns of A are linearly independent. If A : n × p is of full rank, then
rankA = min(n, p).
Definition 8 (Kronecker product). Let A be a m×n matrix and B be a p× q matrix.
Then the Kronecker product, A⊗B, is the mp× nq block matrix:
A⊗B =
a11B · · · a1nB
.... . .
...
am1B · · · amnB
.Definition 9 (Linear independence). Vectors a1,a2, ...,ap are said to be linearly in-
dependent if there exist no scalars c1, c2, ..., cp (at least one ci 6= 0, i = 1, 2, . . . , p) such
that
c1a1 + c2a2 + ...+ cpap = 0.
Definition 10 (Estimable functions). Let y = Xβ+ε with E(ε) = 0, and let λ : p×1
be a vector of constants. Then a function λ′β is an estimable function if and only if λ′
is a linear combination of the rows in X so that a′X = λ′.
40
![Page 42: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/42.jpg)
B Summary statistics for the real-life data
Table 13: Univariate and multivariate tests of normality for balanced data.
Variable Test Statistic Value p
Income Shapiro−Wilk W 0.96 0.0196H-W Score Shapiro−Wilk W 0.96 0.0151Occ. Score Shapiro−Wilk W 0.96 0.0097System Mardia Skewness 27.40 0.0022
Mardia Kurtosis 0.61 0.5419
Table 14: Univariate and multivariate tests of normality for unbalanced data.
Variable Test Statistic Value p
Income Shapiro−Wilk W 0.93 <.0001H-W Score Shapiro−Wilk W 0.94 <.0001Occ. Score Shapiro−Wilk W 0.95 0.0003System Mardia Skewness 43.86 <.0001
Mardia Kurtosis 1.35 0.1777
Figure 2: QQ-plots of Squared Mahalanobis distances for balanced & unbalanced data.
41
![Page 43: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/43.jpg)
Figure 3: Residuals (Income, H-W score and Occ. score) for balanced data
Figure 4: Residuals (Income, H-W score and Occ. score) for unbalanced data
42
![Page 44: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/44.jpg)
C Summary statistics for the simulated data
Table 15: Univariate and multivariate tests of normality for balanced data.
Variable Test Statistic Value p
y1 Shapiro−Wilk W 0.99 0.8368y2 Shapiro−Wilk W 0.92 <.0001y3 Shapiro−Wilk W 0.97 0.1069System Mardia Skewness 31.18 0.0005
Mardia Kurtosis 3.71 0.0002
Table 16: Univariate and multivariate tests of normality for unbalanced data.
Variable Test Statistic Value p
y1 Shapiro−Wilk W 0.98 0.6153y2 Shapiro−Wilk W 0.95 0.0013y3 Shapiro−Wilk W 0.97 0.1565System Mardia Skewness 11.13 0.3471
Mardia Kurtosis 0.53 0.5970
Figure 5: QQ-plots of Squared Mahalanobis distances for balanced & unbalanced data.
43
![Page 45: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/45.jpg)
Figure 6: Residuals (y1, y2 and y3) for balanced data
Figure 7: Residuals (y1, y2 and y3) for unbalanced data
44
![Page 46: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/46.jpg)
D Univariate models for the 2 real-life data
Table 17: Test results for balanced data models. The first table uses Income as responsevariable, the second uses the H-W score and the third uses Occ. score.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.41 1 112 0.5216 0.19 3 112 0.9040 1.18 3 112 0.3219PT 0.41 1 112 0.5216 0.19 3 112 0.9040 1.18 3 112 0.3219WL 0.41 1 112 0.5216 0.19 3 112 0.9040 1.18 3 112 0.3219
HLT1 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216PT1 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216WL1 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216
HLT2 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216PT2 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216WL2 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216
HLT3 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216PT3 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216WL3 0.41 1 95.829 0.5219 0.19 2.8243 95.829 0.8940 1.18 2.8243 95.829 0.3216
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.01 1 112 0.9331 0.79 3 112 0.5024 0.69 3 112 0.5577PT 0.01 1 112 0.9331 0.79 3 112 0.5024 0.69 3 112 0.5577WL 0.01 1 112 0.9331 0.79 3 112 0.5024 0.69 3 112 0.5577
HLT1 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557PT1 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557WL1 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557
HLT2 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557PT2 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557WL2 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557
HLT3 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557PT3 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557WL3 0.01 1 106.95 0.9331 0.79 2.9536 106.95 0.5008 0.69 2.9536 106.95 0.5557
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 3.71 1 112 0.0567 0.80 3 112 0.4987 0.55 3 112 0.6508PT 3.71 1 112 0.0567 0.80 3 112 0.4987 0.55 3 112 0.6508WL 3.71 1 112 0.0567 0.80 3 112 0.4987 0.55 3 112 0.6508
HLT1 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464PT1 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464WL1 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464
HLT2 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464PT2 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464WL2 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464
HLT3 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464PT3 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464WL3 3.71 1 100.15 0.0570 0.80 2.9246 100.15 0.4962 0.55 2.9246 100.15 0.6464
45
![Page 47: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/47.jpg)
Table 18: Test results for unbalanced model. The first table uses Income as responsevariable, the second uses the H-W score and the third uses Occ. score
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 7.49 1 112 0.0072 0.13 3 112 0.9427 3.02 3 112 0.0328PT 7.49 1 112 0.0072 0.13 3 112 0.9427 3.02 3 112 0.0328WL 7.49 1 112 0.0072 0.13 3 112 0.9427 3.02 3 112 0.0328
HLT1 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457PT1 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457WL1 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457
HLT2 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457PT2 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457WL2 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457
HLT3 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457PT3 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457WL3 5.72 1 41.974 0.0214 0.11 2.5323 41.974 0.9307 1.94 2.5323 41.974 0.1457
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.07 1 112 0.7980 1.30 3 112 0.2780 2.91 3 112 0.0375PT 0.07 1 112 0.7980 1.30 3 112 0.2780 2.91 3 112 0.0375WL 0.07 1 112 0.7980 1.30 3 112 0.2780 2.91 3 112 0.0375
HLT1 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285PT1 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285WL1 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285
HLT2 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285PT2 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285WL2 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285
HLT3 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285PT3 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285WL3 0.07 1 59.726 0.7897 1.49 2.6923 59.726 0.2279 3.37 2.6923 59.726 0.0285
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.67 1 112 0.4153 1.17 3 112 0.3253 1.97 3 112 0.1229PT 0.67 1 112 0.4153 1.17 3 112 0.3253 1.97 3 112 0.1229WL 0.67 1 112 0.4153 1.17 3 112 0.3253 1.97 3 112 0.1229
HLT1 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998PT1 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998WL1 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998
HLT2 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998PT2 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998WL2 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998
HLT3 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998PT3 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998WL3 0.82 1 53.615 0.3687 1.70 2.6206 53.615 0.1846 2.26 2.6206 53.615 0.0998
46
![Page 48: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/48.jpg)
E Univariate models for the 2 simulated data
Table 19: Test results for balanced data models. The first table uses y1 as responsevariable, the second uses the y2 and the third uses y3.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.36 1 112 0.5482 0.81 3 112 0.4901 2.64 3 112 0.0530PT 0.36 1 112 0.5482 0.81 3 112 0.4901 2.64 3 112 0.0530WL 0.36 1 112 0.5482 0.81 3 112 0.4901 2.64 3 112 0.0530
HLT1 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591PT1 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591WL1 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591
HLT2 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591PT2 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591WL2 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591
HLT3 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591PT3 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591WL3 0.36 1 79.603 0.5486 0.81 2.795 79.603 0.4837 2.64 2.795 79.603 0.0591
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.29 1 112 0.5928 0.05 3 112 0.9861 0.27 3 112 0.8467PT 0.29 1 112 0.5928 0.05 3 112 0.9861 0.27 3 112 0.8467WL 0.29 1 112 0.5928 0.05 3 112 0.9861 0.27 3 112 0.8467
HLT1 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506PT1 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506WL1 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506
HLT2 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506PT2 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506WL2 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506
HLT3 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506PT3 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506WL3 0.29 1 56.517 0.5939 0.05 1.8803 56.517 0.9458 0.27 1.8803 56.517 0.7506
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 27.70 1 112 <.0001 18.53 3 112 <.0001 0.59 3 112 0.6233PT 27.70 1 112 <.0001 18.53 3 112 <.0001 0.59 3 112 0.6233WL 27.70 1 112 <.0001 18.53 3 112 <.0001 0.59 3 112 0.6233
HLT1 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431PT1 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431WL1 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431
HLT2 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431PT2 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431WL2 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431
HLT3 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431PT3 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431WL3 27.70 1 53.264 <.0001 18.53 1.8224 53.264 <.0001 0.59 1.8224 53.264 0.5431
47
![Page 49: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/49.jpg)
Table 20: Test results for unbalanced model. The first table uses y1 as response variable,the second uses the y2 and the third uses y3.
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 11.97 1 112 0.0008 3.31 3 112 0.0228 0.59 3 112 0.6218PT 11.97 1 112 0.0008 3.31 3 112 0.0228 0.59 3 112 0.6218WL 11.97 1 112 0.0008 3.31 3 112 0.0228 0.59 3 112 0.6218
HLT1 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920PT1 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920WL1 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920
HLT2 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920PT2 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920WL2 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920
HLT3 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920PT3 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920WL3 12.71 1 74.414 0.0006 3.51 2.869 74.414 0.0208 0.63 2.869 74.414 0.5920
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 0.74 1 112 0.3917 0.56 3 112 0.6450 0.17 3 112 0.9163PT 0.74 1 112 0.3917 0.56 3 112 0.6450 0.17 3 112 0.9163WL 0.74 1 112 0.3917 0.56 3 112 0.6450 0.17 3 112 0.9163
HLT1 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390PT1 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390WL1 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390
HLT2 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390PT2 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390WL2 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390
HLT3 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390PT3 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390WL3 0.73 1 37.441 0.3989 0.55 1.9299 37.441 0.5766 0.17 1.9299 37.441 0.8390
Statistic FA df1,A df2,A pA FB df1,B df2,B pB FAB df1,AB df2,AB pAB
HLT 4.55 1 112 0.0350 10.89 3 112 <.0001 3.90 3 112 0.0107PT 4.55 1 112 0.0350 10.89 3 112 <.0001 3.90 3 112 0.0107WL 4.55 1 112 0.0350 10.89 3 112 <.0001 3.90 3 112 0.0107
HLT1 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331PT1 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331WL1 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331
HLT2 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331PT2 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331WL2 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331
HLT3 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331PT3 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331WL3 4.58 1 35.54 0.0392 10.96 1.773 35.54 0.0003 3.93 1.773 35.54 0.0331
48
![Page 50: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/50.jpg)
F Codes
F.1 SAS codes
This SAS program calculates the modified and standard MANOVA test statistics ana-lyzed in this master thesis.
/*--------------------------------------------------------------------
libname IPUMS is created where all output is saved. The data from
http://usa.ipums.org/ is extracted as a .csv file
and formatted in excel. Imprting the file "uppsatsdata.xls" to SAS.
------------------------------------------------------------------*/
%put ?????---;
libname IPUMS "F:\";
ods listing;
proc import out=ipums.data
datafile="F:\uppsatsdata.xlsx" dbms=excel replace;
getnames=yes;
run;
/*---------------------------------------------------------------
Creating a makro for importing simulated datasets from Matlab.
Renaming variables.
-----------------------------------------------------------------*/
%macro matlab(in,out=,dbms=);
proc import out=&out datafile=&in dbms=&dbms replace;
getnames=no;
data &out;
set &out;
sex=VAR1;
area=VAR2;
y1=VAR3;
y2=VAR4;
y3=VAR5;
drop VAR1-VAR5;
if VAR1=1 and VAR2=1 then group=1;
else if VAR1=1 and VAR2=2 then group=2;
else if VAR1=1 and VAR2=3 then group=3;
else if VAR1=1 and VAR2=4 then group=4;
else if VAR1=2 and VAR2=1 then group=5;
else if VAR1=2 and VAR2=2 then group=6;
else if VAR1=2 and VAR2=3 then group=7;
49
![Page 51: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/51.jpg)
else group=8;
run;
%mend matlab;
%matlab("F:\Matlab\myfile1.txt",out=ipums.simdata1,dbms=csv)
%matlab("F:\Matlab\myfile2.txt",out=ipums.simdata2,dbms=csv)
%matlab("F:\Matlab\myfile3.txt",out=ipums.simdata3,dbms=csv)
/*----------------------------------------------------------
Checking the content/variables of the Ipums data.
Variables that are possibly important/relevant are kept.
------------------------------------------------------------*/
proc contents data=ipums.data;
run;
Proc freq data=ipums.data nlevels;
tables sex region sex*region;
proc means data=ipums.data n nmiss min max mean std kurt skew;
var inctot incwage ftotinc occscore sei hwsei;
where inctot<999999 & incwage<999999 & ftotinc<9999999 &
0<occscore & 0<sei & 0<hwsei;
proc univariate data=ipums.data noprint;
var inctot incwage ftotinc occscore sei hwsei;
where inctot<999999 & incwage<999999 & ftotinc<9999999 &
0<occscore & 0<sei & 0<hwsei;
histogram;
run;
/*-----------------------------------------------------
Put constraints on the imported data.
--------------------------------------------------------*/
data ipums.data11;
set ipums.data (keep=age sex region occscore sei hwsei inctot);
where 0<inctot<999999 & 0<occscore & 0<sei & 0<hwsei & age>=16;
*if inctot=0 and inwage=0 and ftotinc=0 then delete;
if region in (11,12,13) then regions="northeast" ;
else if region in (21,22,23) then regions="midwest";
else if region in (31,32,33) then regions="south" ;
else if region in (41,42,43) then regions="west" ;
else regions="Missing" and area=0;
if region in (11,12,13) then area=1;
else if region in (21,22,23) then area=2;
50
![Page 52: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/52.jpg)
else if region in (31,32,33) then area=3;
else if region in (41,42,43) then area=4;
else area=0;
if SEX=1 then Gender="Male ";
else Gender="Female";
if sex=1 and region in (21,22,23) then group=1;
else if sex=1 and region in (11,12,13) then group=2;
else if sex=1 and region in (31,32,33) then group=3;
else if sex=1 and region in (41,42,43) then group=4;
else if sex=2 and region in (21,22,23) then group=5;
else if sex=2 and region in (11,12,13) then group=6;
else if sex=2 and region in (31,32,33) then group=7;
else group=8;
drop region;
run;
proc sort data=ipums.data11 out=ipums.data11;
by sex area;
run;
/*------------------------------------------------------------------
Splitting the data based on the 8 factor combinations to create
samples for each combination from which final samples are drawn.
------------------------------------------------------------------*/
%macro datasets(sex=,area=,out=);
data &out;
set ipums.data11;
if sex=&sex and area=&area then do;
output &out;
end;
run;
%mend datasets;
%datasets(sex=1,area=1,out=ipums.data01);
%datasets(sex=1,area=2,out=ipums.data02);
%datasets(sex=1,area=3,out=ipums.data03);
%datasets(sex=1,area=4,out=ipums.data04);
%datasets(sex=2,area=1,out=ipums.data05);
%datasets(sex=2,area=2,out=ipums.data06);
%datasets(sex=2,area=3,out=ipums.data07);
%datasets(sex=2,area=4,out=ipums.data08);
51
![Page 53: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/53.jpg)
/*----------------------------------------------------------------
Macro for sampling observations from each of the 8 factor
combinations. These obs. are later combined to final samples.
----------------------------------------------------------------*/
%macro c(out1,out2,out3,out4,out5,out6,out7,out8,
a1=,a2=,a3=,a4=,a5=,a6=,a7=,a8=);
%SAMPLE(ipums.data01,&out1,0.5,MRSS=&a1,OVERSAM=0);
%SAMPLE(ipums.data02,&out2,0.5,MRSS=&a2,OVERSAM=0);
%SAMPLE(ipums.data03,&out3,0.5,MRSS=&a3,OVERSAM=0);
%SAMPLE(ipums.data04,&out4,0.5,MRSS=&a4,OVERSAM=0);
%SAMPLE(ipums.data05,&out5,0.5,MRSS=&a5,OVERSAM=0);
%SAMPLE(ipums.data06,&out6,0.5,MRSS=&a6,OVERSAM=0);
%SAMPLE(ipums.data07,&out7,0.5,MRSS=&a7,OVERSAM=0);
%SAMPLE(ipums.data08,&out8,0.5,MRSS=&a8,OVERSAM=0);
%mend c;
/*-------------------------------------------------------------------
The macro below is written by Chang Jiang and is collected
from: http://www.nesug.org/proceedings/nesug00/ps/ps7012.pdf.
It samples observations from a larger dataset using SRS.
-------------------------------------------------------------------*/
%MACRO SAMPLE(EMDS,SAMPLE,RAND,MRSS=,OVERSAM=0.05);
DATA _NULL_;
FSS=CEIL(&MRSS*(1+&OVERSAM));
CALL SYMPUT(’FSS’,LEFT(PUT(FSS,8.)));
RUN;
/* get the number of FSS and store it in &FSS */
DATA _NULL_;
IF 0 THEN SET &EMDS NOBS=EM;
CALL SYMPUT(’EM’, LEFT(PUT(EM,8.)));
STOP;
RUN;
/* get the number of EM and store it in &EM at compile time */
DATA &EMDS; SET &EMDS;
OBSNUM=_N_;
/*use OBSNUM to track chosen members */
RUN;
DATA _NULL_;
N=FLOOR(&EM/&FSS);
START=MAX(ROUND(&RAND*N),1);
/* round START using .5 rule */
CALL SYMPUT(’N’, LEFT(PUT(N,8.)));
52
![Page 54: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/54.jpg)
CALL SYMPUT(’START’,LEFT(PUT(START,8.)));
RUN;
DATA &SAMPLE(DROP=I);
LENGTH LIST $7;
DO I=1 TO &FSS;
OBSIN=&START+FLOOR((I-1)*(&EM/&FSS));
SET &EMDS POINT=OBSIN;
/*draw members by their observation #*/
IF I <= &MRSS THEN LIST=’PRIMARY’;
ELSE LIST=’AUXILIA’;
OUTPUT;
END;
STOP;
RUN;
%PUT EM=&EM MRSS=&MRSS FSS=&FSS N=&N START=&START;
/* output the values of these macro variables to SAS LOG */
%MEND SAMPLE;
/*------------------------------------------------------------------
This macro appends datasets.
-----------------------------------------------------------------*/
%macro append(in,out,a=);
data &out;
set
%do i = 1 %to &a;
&in&i
%end;
;
run;
%mend append;
/*-------------------------------------------------------------------------------
calculating log of the variables to give a better fit to a Normal distribution.
----------------------------------------------------------------------------------*/
%macro log(in=,out=);
data &out;
set ∈
loginc=log(inctot);
loghw=log(hwsei);
logocc=log(occscore);
keep group sex area loginc loghw logocc;
run;
53
![Page 55: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/55.jpg)
%mend log;
/*-----------------------------------------------------------------
Run macros c, append and log to obtain 3 samples, n=120. 1 sample is
balanced and 2 are unbalanced. All response variables are logged.
-------------------------------------------------------------------*/
%c(ipums.nytest1,ipums.nytest2,ipums.nytest3,ipums.nytest4,
ipums.nytest5,ipums.nytest6,ipums.nytest7,ipums.nytest8,
a1=15,a2=15,a3=15,a4=15,a5=15,a6=15,a7=15,a8=15)
%append(ipums.nytest,ipums.urval10,a=8)
%log(in=ipums.urval10,out=ipums.urval1)
%c(ipums.nytest1,ipums.nytest2,ipums.nytest3,ipums.nytest4,
ipums.nytest5,ipums.nytest6,ipums.nytest7,ipums.nytest8,
a1=9,a2=7,a3=14,a4=10,a5=18,a6=15,a7=29,a8=18)
%append(ipums.nytest,ipums.urval11,a=8)
%log(in=ipums.urval11,out=ipums.urval2)
%c(ipums.nytest1,ipums.nytest2,ipums.nytest3,ipums.nytest4,
ipums.nytest5,ipums.nytest6,ipums.nytest7,ipums.nytest8,
a1=10,a2=10,a3=10,a4=30,a5=10,a6=10,a7=10,a8=30)
%append(ipums.nytest,ipums.urval12,a=8)
%log(in=ipums.urval12,out=ipums.urval3)
/*-----------------------------------------------------------------------------
A macro with the program for obtaining modified test statistics as proposed by
Zhang and Xiao (2012) using proc iml. All results are collected and outputted.
-----------------------------------------------------------------------------*/
%macro teststatistics(var1,var2,var3,in=,out=);
ods listing;
/* Loading data into proc iml and naming variables */
proc iml;
use ∈
read all var {sex} into x2;
read all var {area} into x3;
read all var {&var1 &var2 &var3} into y;
close ∈
rows=nrow(y);
p=ncol(y); /*dimensions of y*/
a=2; /*number of levels of factor a*/
b=4; /*number of levels of factor b*/
ab=a*b; /*number of combinations*/
54
![Page 56: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/56.jpg)
ny1=J(rows,1,0);
ny2=J(rows,1,0);
ny3=J(rows,1,0);
ny4=J(rows,1,0);
ny5=J(rows,1,0);
ny6=J(rows,1,0);
ny7=J(rows,1,0);
ny8=J(rows,1,0);
do i=1 to rows;
if x2[i]=1 & x3[i]=1 then ny1[i]=1;
else if x2[i]=1 & x3[i]=2 then ny2[i]=1;
else if x2[i]=1 & x3[i]=3 then ny3[i]=1;
else if x2[i]=1 & x3[i]=4 then ny4[i]=1;
else if x2[i]=2 & x3[i]=1 then ny5[i]=1;
else if x2[i]=2 & x3[i]=2 then ny6[i]=1;
else if x2[i]=2 & x3[i]=3 then ny7[i]=1;
else ny8[i]=1;
end;
n=sum(ny1)//sum(ny2)//sum(ny3)//sum(ny4)//
sum(ny5)//sum(ny6)//sum(ny7)//sum(ny8);
print n; /*number of observations for each ij*/
/* Creating means and covariances for each ij combination. */
covij=j(ab*p,p,0);
meanij=j(ab,p,0);
do i=1 to ab;
jn=J(n[i],1);
jnn=J(n[i]);
in=I(n[i]);
s=y[sum(n[1:i])-(n[i]-1):sum(n[1:i]),];
meanij[i,]=((1/n[i])*(jn‘*s)‘)‘;
covij[(i*p)-(p-1):i*p,]=(1/(n[i]-1))*(s‘*(in-(1/n[i])*jnn)*s);
end;
/*calculating covariance estimator G.*/
G1=J(p,p,0);
print G1;
do i=1 to ab;
G1=G1+(1/n[i])*covij[(p*i)-(p-1):p*i,];
end;
G=(1/ab)*G1;
55
![Page 57: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/57.jpg)
print meanij covij G;
/*calculating degrees of freedom fH, fG for 3 modifications*/
traceG2=trace(G*G);
trace2G=trace(G)**2;
sumij1=0;
sumij2=0;
sumij3=0;
fG=J(3,1,0);
do i=1 to ab;
Sij=covij[(p*i)-(p-1):p*i,];
/* Harrar and Bathkes method*/
trace1=(1/(n[i]-1))*(n[i]**-2)*trace(Sij**2);
sumij1=sumij1+trace1;
/* Zhangs and Xiaos method 1*/
trace2=(1/(n[i]-1))*(n[i]**-2)*(trace(Sij**2)+trace(Sij)**2);
sumij2=sumij2+trace2;
/* Zhangs and Xiaos method 2*/
trace3=(1/(n[i]-1))*(n[i]**-2)*(trace((Sij*inv(G))**2)+
trace(Sij*inv(G))**2);
sumij3=sumij3+trace3;
end;
fG[1]=(ab**2)*traceG2/sumij1;
fG[2]=(ab**2)*(traceG2+trace2G)/sumij2;
fG[3]=(ab**2)*p*(p+1)/sumij3;
print sumij1 sumij2 sumij3 fG;
/* calculating design weights C needed for calculation of fH.
C are depending on the hypothesis tested.*/
C_A=(1/(a-1))*((I(a)-(J(a)/a))@(J(b)/b));
C_B=(1/(b-1))*((J(a)/a)@(I(b)-(J(b)/b)));
C_AB=(1/((b-1)*(a-1)))*((I(a)-(J(a)/a))@(I(b)-(J(b)/b)));
print C_A C_B C_AB;
/*Defining a covariance matris where I already divide by n.*/
covar=J(p*ab,p,0);
do i=1 to ab;
covar[(p*i)-(p-1):p*i,]=covij[(p*i)-(p-1):p*i,]/n[i];
end;
print covar;
56
![Page 58: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/58.jpg)
/* calculating fG and fH for 3 hypotheses and 3 methods.*/
q=3; /* number of hypotheses*/
cov_ab=covar;
fH=J(3,q,0);
do m=1 to q;
summa1=0;
summa2=0;
summa3=0;
if m=1 then C=C_A;
else if m=2 then C=C_B;
else if m=3 then C=C_AB;
do i=1 to ab;
do j=1 to ab;
Sj=cov_ab[(p*j)-(p-1):p*j,];
Si=covar[(p*i)-(p-1):p*i,];
temp1=(C[i,j]**2)*trace(Si*Sj);
temp2=(C[i,j]**2)*(trace(Si*Sj)+
trace(Si)*trace(Sj));
temp3=(C[i,j]**2)*(trace(Si*inv(G)*Sj*inv(G))
+trace(Si*inv(G))*trace(Sj*inv(G)));
summa1=summa1+temp1;
summa2=summa2+temp2;
summa3=summa3+temp3;
end;
end;
/*rows=method. 1=Harrar&Bathke, 2=Zhang&Xiao1, 3=Zhang&Xiao2.
columns=hypotesis. 1=A, 2=B, 3=AB*/
fH[1,m]=traceG2/summa1;
fH[2,m]=(traceG2+trace2G)/summa2;
fH[3,m]=p*(p+1)/summa3;
end;
print fH;
/*calculating modified test statistics. Defining H=vmu‘*C*vmu.*/
T_wlr=J(3,q,0); /*modified Wilks Lambda*/
T_lht=J(3,q,0); /*modified Hotelling-Lawley trace*/
T_bnp=J(3,q,0); /*modified Pillais trace*/
W1=J(3*p,q*p,0);
H=J(3,3,0);
W2=fG[1]*G//fG[2]*G//fG[3]*G;
do k=1 to 3; /*method*/
57
![Page 59: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/59.jpg)
do m=1 to 3; /*hypotesis*/
if m=1 then C=C_A;
else if m=2 then C=C_B;
else if m=3 then C=C_AB;
W1[(p*k)-(p-1):p*k,p*(m-1)+1:p*m]=fH[k,m]*(meanij‘*C*meanij);
T_wlr[k,m]=-log(det(W2[(p*k)-(p-1):p*k,])/det(W1[(p*k)
-(p-1):p*k,(p*m)-(p-1):p*m]+W2[(p*k)-(p-1):p*k,]));
T_lht[k,m]=trace(W1[(p*k)-(p-1):p*k,(p*m)-(p-1):p*m]/
W2[(p*k)-(p-1):p*k,]);
T_bnp[k,m]=trace(W1[(p*k)-(p-1):p*k,(p*m)-(p-1):p*m]/(W1[(p*k)
-(p-1):p*k,(p*m)-(p-1):p*m]+W2[(p*k)-(p-1):p*k,]));
end;
end;
print W1 W2;
/*calculating F-approximations of T_wlr, T_lht and T_bnp*/
tabell=J(9,12,0);
do k=1 to 3; /*method*/
do m=1 to q; /*hypotesis. q=3*/
B=W1[(p*k)-(p-1):p*k,p*(m-1)+1:p*m];
W=W2[(p*k)-(p-1):p*k,];
f_H=fH[k,m];
f_G=fG[k];
teststat=J(3,1,0);
ps=J(3,1,0);
df=J(3,2,0);
do t=1 to 3; /*Test statistic*/
if t=1 then do; /* Wilks Lambda*/
stat=det(W*inv(B+W));
mo=f_H;
no=f_G;
ko=no-0.5*(p-mo+1);
ro=p*mo/2-1;
so=sqrt(((p*mo)**2-4)/(p**2+mo**2-5));
df_1=p*mo;
df_2=ko*so-ro;
f_stat=(stat**(-1/so)-1)*(df_2/df_1);
pval=1-probf(f_stat,df_1,df_2);
end;
if t=2 then do; /*Hotelling-Lawley trace*/
stat1=trace(B*inv(W));
58
![Page 60: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/60.jpg)
d1=f_H;
d2=f_G;
s1=min(p,d1);
m1=(abs(p-d1)-1)/2;
n1=(d2-p-1)/2;
df_1=s1*(2*m1+s1+1);
df_2=2*(n1*s1+1);
f_stat=(stat1*df_2)/(s1*df_1);
pval=1-probf(f_stat,df_1,df_2);
end;
if t=3 then do; /*Pillai trace*/
stat2=trace(inv(B+W)*B);
d1=f_H;
d2=f_G;
s1=min(p,d1);
m1=(abs(p-d1)-1)/2;
n1=(d2-p-1)/2;
df_1=s1*((2*m1)+s1+1);
df_2=s1*((2*n1)+s1+1);
f_stat=(df_2/df_1)*(stat2/(s1-stat2));
pval=1-probf(f_stat,df_1,df_2);
end;
teststat[t]=f_stat;
ps[t]=pval;
df[t,1]=df_1;
df[t,2]=df_2;
end;
tab=teststat||df||ps;
tabell[(3*k)-2:3*k,(4*m)-3:4*m]=tab;
end;
end;
print tabell;
/*Creating a table with all results, exporting to a sas data set.*/
b={"FValue_A" "NumDf_A" "DenDf_A" "ProbF_A" "FValue_B" "NumDf_B" "DenDf_B"
"ProbF_B" "FValue_AB" "NumDf_AB" "DenDf_AB" "ProbF_AB"};
create &out from tabell [ colname=b ];
append from tabell;
Statistic={"Hotelling-Lawley Trace", "Pillai’s Trace" ,"Wilks’ Lambda",
"Hotelling-Lawley Trace", "Pillai’s Trace" ,"Wilks’ Lambda",
"Hotelling-Lawley Trace", "Pillai’s Trace" ,"Wilks’ Lambda"};
name={"Statistic"};
59
![Page 61: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/61.jpg)
create ipums.name from Statistic [ colname=name ];
append from Statistic;
quit; /*end of proc iml program*/
%mend teststatistics;
/*--------------------------------------------------------------------
The below macro runs the 2-way MANOVA model on the samples.
Extracting results from standard test statistics. This is done
for comparison of the modified MANOVA statistics.
--------------------------------------------------------------------*/
%macro manova(var1,var2,var3,in=,in2=,out=);
data ipums.name;
set ipums.name;
n=_n_;
data &in2;
set &in2;
n=_n_;
data &in2;
merge ipums.name &in2;
by n;
drop n;
run;
ods listing close;
ods trace on;
ods output "Multivariate Tests"=ipums.manova;
proc glm data=&in plots=none;
class sex area;
model &var1 &var2 &var3 = sex area sex*area /solution P;
manova h=_all_ / printh;
run;
quit;
ods output close;
ods listing;
data ipums.manova;
set ipums.manova;
if _n_ in (4,8,12) then delete;
drop hypothesis error pvalue Value;
data ipums.manova1;
set ipums.manova;
if _n_ in (1,2,3);
data ipums.manova2;
60
![Page 62: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/62.jpg)
set ipums.manova;
if _n_ in (4,5,6);
data ipums.manova3;
set ipums.manova;
if _n_ in (7,8,9);
proc sort data=ipums.manova1 out=ipums.manova1;
by statistic;
proc sort data=ipums.manova2 out=ipums.manova2;
by statistic;
proc sort data=ipums.manova3 out=ipums.manova3;
by statistic;
data ipums.manova;
merge ipums.manova1 (rename=(Fvalue=FValue_A NumDF=NumDF_A
DenDF=DenDF_A ProbF=ProbF_A))
ipums.manova2 (rename=(Fvalue=FValue_B NumDF=NumDF_B
DenDF=DenDF_B ProbF=ProbF_B))
ipums.manova3 (rename=(Fvalue=FValue_AB NumDF=NumDF_AB
DenDF=DenDF_AB ProbF=ProbF_AB));
by statistic;
proc append base=ipums.manova data=&in2;
data &out;
set ipums.manova(rename=(FValue_A=F_a NumDF_A=Df_a1 DenDF_A=Df_a2
ProbF_A=Pa FValue_B=F_b NumDF_B=Df_b1 DenDF_B=Df_b2 ProbF_B=Pb
FValue_AB=F_ab NumDF_AB=Df_ab1 DenDF_AB=Df_ab2 ProbF_AB=Pab));
run;
%mend manova;
/*-------------------------------------------------------------------
Macro with results from Box’s M test. Exporting results as latex-files.
---------------------------------------------------------------------*/
%macro boxmtest(var1,var2,var3,out,in=);
ods listing close;
ods output "Homogeneity Test"=&out;
proc discrim data=&in pool=test wcov;
class group;
var &var1 &var2 &var3;
run;
ods output close;
ods listing;
%mend boxmtest;
/*-------------------------------------------------------------------
61
![Page 63: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/63.jpg)
macro for generating univariate residuals which are later analyzed.
--------------------------------------------------------------------*/
%macro univnorm(var1,var2,var3,in=,out=);
proc glm data=&in PLOTS=none noprint;
class sex area;
model &var1 &var2 &var3=sex area sex*area /solution P;
manova h=_all_ / printh;
output out=&out residual=res1-res3;
run;
quit;
proc univariate data=&out noprint;
var res1 res2 res3;
histogram res1 res2 res3 /normal ;
run;
%mend univnorm;
/*--------------------------------------------------------------------------------
Importing macro %multnorm which was collected from http://www.srce.unizg.hr/
fileadmin/Srce/proizvodi_usluge/referalni_centri/SAS/stat-sasprog/MultNormMacro.sas
---------------------------------------------------------------------------------*/
%inc "F:\multnorm.sas";
/*--------------------------------------------------------------
macro for getting all results from the iml program, univariate
and multivariate data analysis, manova models and box’s M test.
---------------------------------------------------------------*/
%macro resultat(var1,var2,var3,outbox,outuni,in1=,in2=,in3=,
out1=,out2=,out3=,file1=,file2=,file3=);
/*modified and standard MANOVA test results for the 3 samples*/
%teststatistics(&var1,&var2,&var3,in=&in1,out=ipums.teststats)
%manova(&var1,&var2,&var3,in=&in1,in2=ipums.teststats,out=&out1)
%teststatistics(&var1,&var2,&var3,in=&in2,out=ipums.teststats)
%manova(&var1,&var2,&var3,in=&in2,in2=ipums.teststats,out=&out2)
%teststatistics(&var1,&var2,&var3,in=&in3,out=ipums.teststats)
%manova(&var1,&var2,&var3,in=&in3,in2=ipums.teststats,out=&out3)
/*Box M test*/
%boxmtest(&var1,&var2,&var3,ipums.Box1,in=&in1)
%boxmtest(&var1,&var2,&var3,ipums.Box2,in=&in2)
%boxmtest(&var1,&var2,&var3,ipums.Box3,in=&in3)
data &outbox;
62
![Page 64: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/64.jpg)
set ipums.Box1 ipums.Box2 ipums.Box3;
Chi=round(ChiSq,0.1);
n=_n_;
data &outbox;
merge &outbox(drop=DF ProbChiSq ChiSq)
&outbox(drop=Chi ChiSq);
by n;
drop n;
run;
/*modified and standard ANOVA test results for the 3 samples*/
%teststatistics(&var1,in=&in1,out=ipums.teststats20)
%manova(&var1,in=&in1,in2=ipums.teststats20,out=ipums.sum1)
%teststatistics(&var2,in=&in1,out=ipums.teststats20)
%manova(&var2,in=&in1,in2=ipums.teststats20,out=ipums.sum2)
%teststatistics(&var3,in=&in1,out=ipums.teststats20)
%manova(&var3,in=&in1,in2=ipums.teststats20,out=ipums.sum3)
%teststatistics(&var1,in=&in2,out=ipums.teststats20)
%manova(&var1,in=&in2,in2=ipums.teststats20,out=ipums.sum4)
%teststatistics(&var2,in=&in2,out=ipums.teststats20)
%manova(&var2,in=&in2,in2=ipums.teststats20,out=ipums.sum5)
%teststatistics(&var3,in=&in2,out=ipums.teststats20)
%manova(&var3,in=&in2,in2=ipums.teststats20,out=ipums.sum6)
%teststatistics(&var1,in=&in3,out=ipums.teststats20)
%manova(&var1,in=&in3,in2=ipums.teststats20,out=ipums.sum7)
%teststatistics(&var2,in=&in3,out=ipums.teststats20)
%manova(&var2,in=&in3,in2=ipums.teststats20,out=ipums.sum8)
%teststatistics(&var3,in=&in3,out=ipums.teststats20)
%manova(&var3,in=&in3,in2=ipums.teststats20,out=ipums.sum9)
%append(ipums.sum,&outuni,a=9)
/*generating residual plots and exporting results*/
ods tagsets.simplelatex file=&file2 stylesheet="sas.sty"(url="sas");
%univnorm(&var1,&var2,&var3,in=&in1,out=ipums.residlog1);
%univnorm(&var1,&var2,&var3,in=&in2,out=ipums.residlog2);
%univnorm(&var1,&var2,&var3,in=&in3,out=ipums.residlog3);
ods tagsets.simplelatex close;
/*generating info om multivariate normal distr. and exporting results*/
ods tagsets.simplelatex file=&file3 stylesheet="sas.sty"(url="sas");
%multnorm(data=&in1, var=&var1 &var2 &var3, plot=mult)
%multnorm(data=&in2, var=&var1 &var2 &var3, plot=mult)
63
![Page 65: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/65.jpg)
%multnorm(data=&in3, var=&var1 &var2 &var3, plot=mult)
ods tagsets.simplelatex close;
*Exporting other results obtained above;
ods tagsets.simplelatex file=&file1 stylesheet="sas.sty"(url="sas");
proc print data=&out1;
proc print data=&out2;
proc print data=&out3;
proc print data=&outbox;
proc print data=&outuni;
run;
ods tagsets.simplelatex close;
%mend resultat; /*End of all macros*/
%put *****;
/*------------------------------------------------------------------
Obtaining results by running above macros. Do this for 2 situations:
1. 3 real life data samples obtained above
2. 3 simulated data sets imported from MATLAB
--------------------------------------------------------------------*/
/*simulated data*/
%resultat(y1,y2,y3,ipums.box01,ipums.sum10,in1=ipums.simdata1,
in2=ipums.simdata2,in3=ipums.simdata3,out1=ipums.manovatab11,
out2=ipums.manovatab12,out3=ipums.manovatab13,
file1="F:\Latex\simulations01.tex",file2="F:\Latex\univar01.tex",
file3="F:\Latex\multivar01.tex")
/*real life data*/
%resultat(loginc,loghw,logocc,ipums.box02,ipums.sum20,in1=ipums.urval1,
in2=ipums.urval2,in3=ipums.urval3,out1=ipums.manovatab21,
out2=ipums.manovatab22,out3=ipums.manovatab23,
file1="F:\Latex\urval01.tex",file2="F:\Latex\univar02.tex",
file3="F:\Latex\multivar02.tex")
/*----------------------------------------------------------------
END OF PROGRAM!
----------------------------------------------------------------*/
F.2 MATLAB codes
This MATLAB program was written to obtain the simulated data sets.
64
![Page 66: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/66.jpg)
1 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−2 % Simulat ion o f a balanced data with 120 obs e rva t i on s .
3 % The code i s based on s imu la t i on methods presented
4 % in Zhang & Xiao (2012) .
5 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−6 nsim=2;
7 A = repmat (0 , [ 120 5 nsim ] )
8 kkde l ta =2.4 ; %determining d i f f e r e n c e s in means ;
9 hSigma2 = [ 1 , 0 , 0 ; 0 , 5 , 0 ; 0 , 0 , 0 . 1 ] ;
10 hSigma = [ ] ;
11 hSigma=[hSigma ; eye (3 , 3 ) ] ; %d e f i n i n g 2 covar iance s t r u c t u r e s .
12 hSigma=[hSigma ; hSigma2 ] ;
13 hSigma=kron ( ones (4 , 1 ) , hSigma ) ;
14 Gsize = [15 , 15 ] ;
15 g s i z e=kron ( ones (4 , 1 ) , Gsize ) ;
16 p=3; %dimensions o f y ;
17 u =[1 : 3 ] / s q r t (sum ( [ 1 : 3 ] . ˆ 2 ) ) ;
18 data = [ ] ;
19 i j =0;
20 a=2; %number o f l e v e l s f o r f a c t o r A;
21 b=4; %number o f l e v e l s f o r f a c t o r B;
22
23 % gene ra t e s the data ;
24 f o r i a =1:2 ,
25 f o r ib =1:4 ,
26 i j=i j +1;
27 i j f l a g =(( i j −1)∗p+1) : ( i j ∗p) ;
28 n i j=g s i z e ( i j ) ;
29 i f ( i a==1)&&(ib==1) ,
30 y i j=randn ( n i j , p ) ∗hSigma ( i j f l a g , : ) ;
31 e l s e
32 y i j=ones ( n i j , 1 ) ∗ kkde l ta ∗u∗ i a ∗ ib /b/a+randn ( n i j , p ) ∗hSigma (
i j f l a g , : ) ;
33 end
34 i j d a t a =[ i a ∗ ones ( n i j , 1 ) , ib ∗ ones ( n i j , 1 ) , y i j ] ;
35 data =[ data ; i j d a t a ] ;
36 end
37 end
38 A( : , : , 1 )=data ;
39
40 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−41 % Simulat ion o f an unbalanced data s e t
65
![Page 67: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/67.jpg)
42 % with 120 obs e rva t i on s . Covariance s t r u c t u r e s
43 % and mean d i f f e r e n c e s are the same as above .
44 %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−45 kkde l ta =2.4 ;
46 hSigma2 = [ 1 , 0 , 0 ; 0 , 5 , 0 ; 0 , 0 , 0 . 1 ] ;
47 hSigma = [ ] ;
48 hSigma=[hSigma ; eye (3 , 3 ) ] ;
49 hSigma=[hSigma ; hSigma2 ] ;
50 hSigma=kron ( ones (4 , 1 ) , hSigma ) ;
51 Gsize = [10 , 20 ] ;
52 g s i z e=kron ( ones (4 , 1 ) , Gsize ) ;
53 p=3;
54 u =[1 : 3 ] / s q r t (sum ( [ 1 : 3 ] . ˆ 2 ) ) ;
55 data = [ ] ;
56 i j =0;
57 a=2;
58 b=4;
59 f o r i a =1:2 ,
60 f o r ib =1:4 ,
61 i j=i j +1;
62 i j f l a g =(( i j −1)∗p+1) : ( i j ∗p) ;
63 n i j=g s i z e ( i j ) ;
64 i f ( i a==1)&&(ib==1) ,
65 y i j=randn ( n i j , p ) ∗hSigma ( i j f l a g , : ) ;
66 e l s e
67 y i j=ones ( n i j , 1 ) ∗ kkde l ta ∗u∗ i a ∗ ib /b/a+randn ( n i j , p ) ∗hSigma (
i j f l a g , : ) ;
68 end
69 i j d a t a =[ i a ∗ ones ( n i j , 1 ) , ib ∗ ones ( n i j , 1 ) , y i j ] ;
70 data =[ data ; i j d a t a ] ;
71 end
72 end
73 A( : , : , 2 )=data ;
74 data1=A( : , : , 1 ) ;
75 data2=A( : , : , 2 ) ;
76
77 %export ing the data to text f i l e s ;
78 dlmwrite ( ’ myf i l e1 . txt ’ , data1 )
79 dlmwrite ( ’ myf i l e2 . txt ’ , data2 )
80
81 %end o f program ;
66
![Page 68: Master thesis - Statistiska Institutionen/menu/standard... · Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Effects of unbalancedness and heteroscedasticity](https://reader030.fdocuments.net/reader030/viewer/2022040122/5d610fe488c9933d458b5918/html5/thumbnails/68.jpg)
F.3 R code
R program for constructing interaction plots in section 3.2.
# Showing one example of interaction effects and
# one of no interaction effects
Y <- c(4,5,9,2,3,7)
A <- c(1,2,3,1,2,3)
B <- c(1,1,1,2,2,2)
hej = aov(Y~A+B+A*B) #do the analysis of variance
par(mfrow=c(1,2)) # Two-way Interaction Plot
A1 <- factor(A)
B1 <- factor(B)
interaction.plot(A1, B1, Y,type="b", col=c("red","blue"), legend=F,lty=c(1,2),
lwd=2, pch=c(1,24),xlab="Factor B levels",ylim=c(0,10),
ylab="Mean values of y",main="No Interaction")
par(family = "")
legend("topleft",c("level 1 ","level 2"),border="black", bty="o",
bg="beige", lty=c(1,2),lwd=2,pch=c(1,24),
col=c("red","blue"), title="Factor A levels",inset = .02)
Y1 <- c(4,3,9,2,5,7)
interaction.plot(A1, B1, Y1,type="b", col=c("red","blue"), legend=F,lty=c(1,2),
lwd=2, pch=c(1,24),xlab="Factor B levels",ylim=c(0,10),
ylab="Mean values of y",main="Interaction")
par(family = "")
legend("topleft",c("level 1 ","level 2"),border="black", bty="o",
bg="beige", lty=c(1,2),lwd=2,pch=c(1,24),
col=c("red","blue"), title="Factor A levels",inset = .02)
#end of program
67