Statistical Modelling Chapter III 1 III.Completely Randomized Design (CRD) III.ADesign of a CRD...

Statistical Modelling Chapter III 1

III. Completely Randomized Design (CRD)

III.A Design of a CRD

III.B Models and estimation for a CRD

III.C Hypothesis testing using the ANOVA method

III.D Diagnostic checking

III.E Treatment differences


III.A Design of a CRD

Definition III.1: An experiment is set up using a CRD when each treatment is applied a specified, possibly unequal, number of times, the particular units to receive a treatment being selected completely at random.

Example III.1 Rat experiment• Experiment to investigate 3 rat diets with 6 rats:

Diet A, B, C will have 3, 2, 1 rats, respectively.


Use R to obtain randomized layouts

• How to do this is described in Appendix B, Randomized layouts and sample size computations in , for all the designs that will be covered in this course, and more besides.


R functions and output to produce randomized layout> # Obtaining randomized layout for a CRD> #> n <- 6> CRDRat.unit <- list(Rat = n)> Diet <- factor(rep(c("A","B","C"), > times = c(3,2,1)))> CRDRat.lay <- fac.layout(unrandomized=CRDRat.unit,> randomized=Diet, seed=695)> CRDRat.lay• fac.layout from dae package produces the randomized layout.

• unrandomized gives the single unrandomized factor indexing the units in the experiment.

• randomized specifies the factor, Diets, that is to be randomized.

• seed is used so that the same randomized layout for a particular experiment can be generated at a later date. (0–1023)


Randomized layout

Units Permutation Rat Diet1 1 4 1 A2 2 1 2 C3 3 5 3 B4 4 3 4 A5 5 6 5 A6 6 2 6 B#remove Diet object in workspace to avoid using it by mistake

remove(Diet)


III.B Models and estimation for a CRD

• The analysis of CRD experiments uses:– least-squares or maximum likelihood

estimation of the parameters of a linear model – hypothesis testing based on the ANOVA

method or maximum likelihood ratio testing.

• Use rat experiment to investigate linear models and the estimation of its parameters.


a) Maximal model

• Definition III.2: The maximal expectation model is the most complicated model for the expectation that is to be considered in analysing an experiment.

• We first consider the maximal model for the CRD.


Example III.1 Rat experiment (continued)

• Suppose, the experimenter measured the liver weight as a percentage of total body weight at the end of the experiment.

• The results of the experiment are as follows: Rat 1 4 5 3 6 2 Diet A A A B B C Liver wt. 3.3 3.1 2.9 3.2 3.4 2.7

• The analysis based on a linear model, that is:

2 and var nE YY Xθ Y V I

• Our model also involves assuming Y,NY Xθ V

• Trick is what are X and going to be?


Perhaps?

• Note numbering of Y's does not correspond to Rats; does not affect model but neater.

• This model can then be fitted using simple linear regression techniques.

21

22

23

24

25

26

0 0 0 0 01 10 0 0 0 01 1

1 1 0 0 0 0 0 with 1 2 0 0 0 0 01 2 0 0 0 0 01 3

0 0 0 0 0

E YE YE YE YE YE Y

V

• The fitted equation is: 3.30 0.120E Y x

That is, 3.30, 0.12ˆ̂


Using model to predict

• However, does this make sense? – means that, for each unit increase in diet, % liver

weight decreases by 0.120. – sensible only if the diets differences are based on

equally spaced levels of some component;

• For example, if the diets represent 2, 4 and 6 mg of copper added to each 100g of food

• But, no good if diets unequally spaced (2, 4, and 10 mg Cu added) or diets differ qualitatively.

1

5

Now, ˆ ˆ

so that 3.30 0.12

Hence 3.30 0.12 1 3.18

and 3.30 0.12 2 3.06

i i

i i

E Y x

E Y x

E Y

E Y


Regression on indicator variables

• In this method the explanatory variables are called factors and the possible values they take levels.

• Thus, we have a factor Diet with 3 levels: A, B, C.• Definition III.3: Indicator variables are formed

from a factor:– create a variable for each level of the factor;– the values of a variable are either 1 or 0,

1 when the unit has the particular level for that variable and 0 otherwise.


Indicator-variable model

• Hence

E[Yi] k, var[Yi] 2, cov[Yi, Yj] 0, (i j)• Can be written as

21

22

213

2 24

3 25

26

0 0 0 0 01 0 00 0 0 0 01 0 0

1 0 0 0 0 0 0 0,0 1 0 0 0 0 0 00 1 0 0 0 0 0 00 0 1 0 0 0 0 0

E YE YE YE YE YE Y

V

2D D and nE ψ Y X α V I

• Model suggests 3 different expected (or mean) values for the diets.


General form of X for CRD• For the general case of a set of t Treatments

suppose Y is ordered so that all observations for:– 1st treatment occur in the first r1 rows, – 2nd treatment occur in the next r2 rows, – and so on with the last treatment occurring in the last rt

rows.– i.e. order of systematic layout (prior to randomization)

• Then XT given by the following partitioned matrix

1 1 1

2 2 2

1 1

1 1T

1 1t t t

r r r

r r r

r r r

1 0 00 1 0

X

0 0 1

1

where is the r 1 column vector of ones and is the r 1 column vector of zeroes

i

i

r i

r i

10

(1 only ever vector, but 0 can be matrix)


Still a linear model• In general, the model for the expected

values is still of the general form E[Y] = X

• and on assuming Y is 2D, nN ψ I

• can use standard least squares or maximum likelihood estimation


Estimates of expectation parameters• OLS equation is 1

T T Tˆ α X X X Y

Rat 1 4 5 3 6 2 Diet A A A B B C Liver wt. 3.3 3.1 2.9 3.2 3.4 2.7

• Can be shown, by examining the OLS equation, that the estimates of the elements of and are the means of the treatments.


• The estimates of are:

1

2

3

ˆ 3.10ˆ 3.30ˆ

2.70ˆ

α



• The estimates of the expected values, the fitted values, are given by:

1

D D 2

3

1 0 0 1 0 0 3.101 0 0 1 0 0 3.10ˆ 3.101 0 0 1 0 0 3.10ˆˆ 3.30ˆ0 1 0 0 1 0 3.302.70ˆ0 1 0 0 1 0 3.300 0 1 0 0 1 2.70

ψ X α


Estimator of the expected values

• In general, where is the n-vector consisting of the treatment means for each unit and – being least squares, this estimator can be

written as a linear combination of Y. – that is, can be obtained as the product of an

matrix and the n-vector Y. – let us write – M for mean because MT is the matrix that

replaces each value of Y with the mean of the corresponding treatment.

TT T ˆˆ ψ X α T

T

TT M Y


General form of mean operator• Can be shown that the general form of MT is

1 1 2 1

2 1 2 2

1 2

11

12

T

1

t

t

t t t

r r r r r

r r r r r

r r r r t r

r

r

r

J 0 0

0 J 0M

0 0 J

Tso that T M Y• Clear from the above expression that:

– 1st r1 elements of are the mean of the Yis for the 1st treatment,

– next r2 elements are the mean of those for the 2nd treatment,

– and so on.

T

• MT is a mean operator as it • computes the treatment means from the vector to which it is

applied and• replaces each element of this vector with its treatment mean.

1 1 2 2 t tT T T T T T T


Estimator of the errors

• The estimator of the random errors in the observed values of Y is, as before, the difference from the expected values.

• That is,

Tˆ ˆ ε Y ψ Y T


Example III.1 Rat experimentAlternative expression for fitted values

13 3 2 3 13

1D D 2 3 2 2 12

11 3 1 2 11

1 1 13 3 31 1 13 3 31 1 13 3 3

1 12 21 12 2

ˆ

0 0 03.3 3.10

0 0 0 3.1 3.100 0 0 2.9 3.10

3.2 3.300 0 0 03.4 3.300 0 0 0 2.7 2.70

0 0 0 0 0 1

J 0 0ψ t M y 0 J 0 y

0 0 J

Note not as estimates rather than estimators. t T

• We know that whereD Dˆ ψ T M Y

1 1 1 2 2 3T T T T T T T


Residuals

• Fitted values for orthogonal experiments are functions of means.

• Residuals are differences between observations and fitted values

3.3 3.10 0.23.1 3.10 0.02.9 3.10 0.2ˆ3.2 3.30 0.13.4 3.30 0.12.7 2.70 0.0

ε e y t


b) Alternative indicator-variable, expectation models

• For the CRD, two expectation models are considered:

G G G1. where i nE Y or ψ X X 1

T T2. i kE Y or ψ X α

• First model is minimal expectation model: population mean response is same for all observations, irrespective of diet.

• Second model is the maximal expectation model.


Minimal expectation model• Definition III.4: The minimal expectation

model is the simplest model for the expectation that is to be considered in analysing an experiment.

• The minimal expectation model is the same as the intercept-only model given for the single sample in chapter I, Statistical inference.

• Will be this for all analyses we consider.

• Now the estimator of the expected values in the intercept-only model is where is the n-vector each of whose elements is the grand mean. • For Rat experiment

Gˆ G G

G 63.1ˆ 1


Submodels• In regression case submodels have some of the

parameters in the full model set to zero.• Here this is not the case.• Instead have that submodels are marginal to full

models.• Here simply set k =

– That is, intercept only model is the special case where all k s are equal.

– Clear for getting E[Yi] = from E[Yi] = k.– What about G from T?– If replace each element of with , then = 1t.– So T = XT = XT1t = XG.

• Now marginality derives from the relationship between XT and XG as encapsulated in definition.


Marginality of models (in general)• Definition III.5: Let (X) denote the column space

of X. • For two models, 1 X11 and 2 X22, the first

model is marginal to the second if (X1) (X2) irrespective of the replication of the levels in the columns of the Xs,

• That is if the columns of X1 can always be written as linear combinations of the columns of X2.

• We write 1 2. • Note marginality relationship is not symmetric — it

is directional, like the less-than relation.• So while 1 2, 2 is not marginal to 1 unless 1 2.


Marginality of models for CRD• G is marginal to T or G T because (XG) (XT)

– in that an element from a row of XG is the sum of the elements in the corresponding row of XT

– and this will occur irrespective of the replication of the levels in the columns of XG and XT.

G T

1 1 0 01 1 0 01 1 0 0,1 0 1 01 0 1 01 0 0 1

X X

• So while G T, T is not marginal to G as (XT) (XG) so that T G.

• In geometrical terms, (XT) is a three-dimensional space and (XG) is a line, the equiangular line, that is a subspace of (XT).


III.C Hypothesis testing using the ANOVA method

• Are there significant differences between the treatment means?

• This is equivalent to deciding which of our two expectation models best describes the data.

• We now perform the hypothesis test to do this for the example.


a) Analysis of the rat example Example III.1 Rat experiment (continued)

Step 1: Set up hypotheses H0: ABC (G XG)

H1: not all population Diet means are equal

(D XD)

Set 0.05


Example III.1 Rat experiment (continued)Step 2: Calculate test statistic

Source df SSq MSq F Prob Rats 5 0.34 Diets 2 0.24 0.1200 3.60 0.1595 Residual 3 0.10 0.0332

From table can see that (corrected) total variation amongst the 6 Rats is partitioned into 2 parts: – variance of difference between diet means and – the left-over (residual) rat variation.

Step 3: Decide between hypotheses

As probability of exceeding F of 3.60 with 1 2 and 2 3 is 0.1595 > 0.05, not much evidence of a diet difference.Expectation model that appears to provide an adequate description of the data is G XG.


b) Sums of squares for the analysis of variance

• From chapter I, Statistical inference, an SSq – is the SSq of the elements of a vector and – can write as the product of transpose of a column

vector with original column vector.

• Estimators of SSqs for the CRD ANOVA are SSqs of following vectors (cf ch.I):

G

T

Total or Units SSq: (Corrected)

Treatments SSq: (Model diff.)

Residual SSq: (Y Max fit)e

D Y G

T T G

D Y Twhere Ds are n-vectors of deviations from Y and

Te is the n-vector of Treatment effects.Definition III.6: An effect is a linear combination of means with a set of effects summing to zero.


SSqs as quadratic forms

• Want to show estimators of all SSqs can be written as YQY. – Is product of 1n, nn and n1 vectors and

matrix, so is 11 or a scalar.

• Definition III.7: A quadratic form in a vector Y is a scalar function of Y of the form YAY where A is called the matrix of the quadratic form.


SSqs as quadratic forms (continued)

• Firstly write

U

1G

T

n

nn

Y I Y M Y

G J Y M Y

T M Y• That is, each of the individual vectors on which

the sums of squares are based can be written as an M matrix times Y.

• These M matrices are mean operators that are symmetric and idempotent: M' M and M2 M in all cases.


SSqs as quadratic forms (continued)• Then

T G T G T

T T Gwith eT T G M Y M Y M M Y Q Y

Q M M

• Given Ms are symmetric and idempotent, it is relatively straightforward to show so are the three Q matrices.

• It can also be shown that

G U G U G U

U U Gwith

D Y G M Y M Y M M Y Q Y

Q M M

Res ResT U U T Q Q Q Q 0

Res

Res

T U T U T U

U U Twith

D Y T M Y M Y M M Y Q Y

Q M M


SSqs as quadratic forms (continued)• Consequently obtain the following expressions for

the SSqs:

G G

U U

U U

U U

U

D D Y G Y G

Q Y Q Y

Y Q Q Y

Y Q Q Y

Y Q Y

T T

T T

T T

T

e e

T T T G T G

Q Y Q Y

Y Q Q Y

Y Q Q Y

Y Q Y

Res Res

Res Res

Res Res

Res

T T

U U

U U

U U

U

D D Y T Y T

Q Y Q Y

Y Q Q Y

Y Q Q Y

Y Q Y


SSqs as quadratic forms (continued)• Theorem III.1: For a completely randomized

design, the sums of squares in the analysis of variance for Units, Treatments and Residual are given by the quadratic form:

ResU T U, and , respectively Y Q Y Y Q Y Y Q Y

Res

1 1 2 1

2 1 2 2

1 2

U U G T T G

U U T

1U G

11

12

T

1

where , and

, and

t

t

t t t

n nn

r r r r r

r r r r r

r r r r t r

r

r

r

Q M M Q M M

Q M M

M I M J

J 0 0

0 J 0M

0 0 J

Proof: follows the argument given above.


Residual SSq by difference

• In the notes show that

ResU U T Q Q Q

• so that

ResU U T

U T

y Q y y Q Q y

y Q y y Q y

• That is, Residual SSq = Units SSq Treatments SSq.


ANOVA table construction Source df SSq MSq (s2) F p Units n1 UYQ Y

Treatments t1 TYQ Y 2TT1

st

YQ Y

Res

2 2T Us s Tp

Residual nt ResUYQ Y

Res

Res

U 2Usn t

YQ Y

• As in regression, Qs are orthogonal projection matrices.

– QU orthogonally projects the data vector into the n-1 dimensional part of the n-dimensional data space that is orthogonal to equiangular line.

– QT orthogonally projects data vector into the t-1 dimensional part of the t-dimensional Treatment space, that is orthogonal to equiangular line. (Here the Treatment space is the column space of XT.)

– Finally, the matrix orthogonally projects the data vector into the n-t dimensional Residual subspace.

ResUQ

– That is, Units space is divided into the two orthogonal subspaces,

the Treatments and Residual subspaces.


Geometric interpretation

• Of course, the SSqs are just the squared lengths of these vectors

• Hence, according to Pythagoras’ theorem, the Treatments and Residual SSqs must sum to the Units SSq.


Example III.1 Rat experiment (continued)• Vectors for computing the SSqs are:

Diet

Liver wt.

y

Grand mean

Gg M y

Total Rat deviations

G R

d Q y

y g

Diet means Dt M y

Diet effects

De

t Q y

t g

Residual Rat deviations

ResT R

d Q y

y t

A 3.3 3.1 0.2 3.1 0.0 0.2 A 3.1 3.1 0.0 3.1 0.0 0.0 A 2.9 3.1 0.2 3.1 0.0 0.2 B 3.2 3.1 0.1 3.3 0.2 0.1 B 3.4 3.1 0.3 3.3 0.2 0.1 C 2.7 3.1 0.4 2.7 0.4 0.0

SSq 0.34 0.24 0.10 • Total Rat deviations, Diet Effects and Residual Rats

deviations are projections into Rats, Diets and Residual subspaces of dimension 5, 2 and 3, respectively.

• Squared length of projection = SSq– Rats SSq is Y'QRY 0.34– Diets SSq is Y'QDY 0.24– Residual SSq is ResR 0.10 Y Q Y

Exercise III.3 is similar example for you to try


c) Expected mean squaresSource df SSq MSq (s2) F p Units n1 UYQ Y

Treatments t1 TYQ Y 2TT1

st

YQ Y

Res

2 2T Us s Tp

Residual nt ResUYQ Y

Res

Res

U 2Usn t

YQ Y

• Have an ANOVA in which we use F (= ratio of MSqs) to decide between models.

• But why is this ratio appropriate?• One way of answering this question is to look at what the

MSqs measure?– Use expected values of the MSqs, i.e. E[MSq]s, to do this


Expected mean squares (cont’d)

• Need E[MSq]s– under the maximal model: – and the minimal model:

2T T and Y nE ψ Y X V I

2G G and Y nE ψ Y X V I

• Similar to asking what is E[Yi]?• Know answer is E[Yi] = k.

– i.e. in population, under model, average value of Yi is k.

• So for Treatments, what is E[MSq]?• The E[MSq]s are the mean values of the MSqs in

populations described by the model for which they are derived– i.e. an E[MSq] is the true mean value;– it depends on the model parameters.


E[MSq]s under the maximal model

Source df MSq (s2) E[MSq] F

Units n 1 Treatments t 1 2T

T1s

t

Y Q Y

2Tq ψ

Res

2 2T Us s

Residual n t Res

Res

U 2Us

n t

Y Q Y

2

• So if we had the complete populations for all Treatments and computed the MSqs, the value of – the Residual MSq would equal 2 – the Treatment MSq would equal 2 + qT().

• So that the population average value of both MSqs involves 2, the uncontrolled variation amongst units from the same treatment.

• But what about q in Treatments E[MSq].


The qT() function• Subscript T indicates the Q matrix on which

function is based:

T Gq ψ 0

T T 1q t ψ ψ Q ψ• but no subscript on the in qT(),

– because we will determine expressions for it under both the maximal (T) and alternative models (G).

– That is, in qT() will vary.

• Numerator is same as the SSq except that it is a quadratic form in instead of Y.

• To see what this means want expressions in terms of individual parameters.

• Will show that under the maximal model (T)

• and under the minimal model (G) that

2T T T T T .

1

1 1t

k kk

q t r t

ψ ψ Q ψ



. 1 2 31where 3 2 6

tk kk

r n

D2

2

3

ψ

D D D

2.

1

22 21 . 2 . 3 .

and 1

1

3 2 3 1

t

k kk

q t

r t

ψ ψ Q ψ

• The latter is just the mean of the elements of T.• Actually, the quadratic form is the SSQ of the elements of

vector

When will the SSq be zero?

.

.

.

2 .

2 .

3 .


The qT() function• Now want to prove the following result:

• As QT is symmetric and idempotent,

'QT (QT)'QT

• qT() is the SSq of QT, divided by (t1). • QT (MT – MG) MT – MG

– MG replaces each element of with the grand mean of the elements of

– MT replaces each element of with the mean of the elements of that received the same treatment as the element being replaced.

2T T T T T .

1

1 1t

k kk

q t r t

ψ ψ Q ψ


The qT() function (continued)• Under the maximal model (T)

G T . . 1

T T T

where t

n k kkr n

M ψ 1

M ψ ψ

so that 'TQTT is the SSq of the elements of

T T G T T . . i.e. of n kM ψ M ψ ψ 1

2T T T T T .

1

1 1t

k kk

q t r t

ψ ψ Q ψ

• Under the minimal model (G 1n)– MGG MTG G so 'GQTG 0 and qT()

0;– or k = so that – and so that qT() 0.

. 1 1

t tk k kk k

r n r n

. 0k


Example III.1 Rat experiment (continued) ..

..

..D D D G

2 2 2 2 ..

2 2 2 2 ..

3 3 3 3 ..

Q ψ M M

D D D

22 21 . 2 . 3 .

2.

1

so that 1

3 2 3 1

1t

k kk

q t

r t

ψ ψ Q ψ

D G D G

000000

Q ψ M M

D G Dso that 1 0q t ψ ψ Q ψ


How qT(T) depends on the s

• qT(T) is a quadratic form and is basically a sum of squares so that it must be nonnegative.

• Indeed the magnitude of depends on the size of the differences between the population treatment means, the ks

– if all the ks are similar they will be close to their mean,

– whereas if they are widely scattered several will be some distance from their mean.

2T T T T T .

1

1 1t

k kk

q t r t

ψ ψ Q ψ


E[Msq]s in terms of parameters

• Could compute population mean of MSq if knew ks and 2.• Treatment MSq will on average be greater than the Residual MSq

– as it is influenced by both uncontrolled variation and the magnitude of treatment differences.

• The quadratic form qT() will only be zero when all the s are equal, that is when the null hypothesis is true. – Then the E[MSq]s under the minimal model are equal so that the F

value will be approximately one.

• Not surprising if think about a particular experiment.

Source df MSq (s2) E[MSq] F

Units n 1 Treatments t 1 2T

T1s

t

Y Q Y

2Tq ψ

Res

2 2T Us s

Residual n t Res

Res

U 2Us

n t

Y Q Y

2

2T T T T T .

1

where 1 1t

k kk

q t r tψ ψ Q ψ



• So what can potentially contribute to the difference in the observed means of 3.1 and 2.7 for diets A and C?

• Answer: – Obviously, the different diets; – not so obvious that differences arising from

uncontrolled variation also contribute as 2 different groups of rats involved.

• This is then reflected in E[MSq] in that it involves 2 and the "variance" of the 3 effects.

Diet A B C 3.3 3.2 2.7 3.1 3.4 2.9

Mean 3.1 3.3 2.7


Justification of F-test

• Thus, the F test involves asking the question "Is the variance in the sample treatment means > can be expected from uncontrolled variation alone?".

• If the variance is no greater, concluded qT() 0 the minimal model is the correct model since the

expected Treatment MSq under this model is just 2. • Otherwise, if the variance is greater, qT() is

nonzero the maximal model is required to describe the data.

• Similar argument to examining dotplots for Example II.2, Paper bag experiment.


d) Summary of the hypothesis test

• see notes


e) Comparison with traditional one-way ANOVA

• Our ANOVA table is essentially the same as the traditional table – the values of the F statistic from each table are exactly the

same.

• Labelling differs and the Total would normally be placed at the bottom of the table, not at the top.

Source df Source in traditional one-way ANOVA Units n 1 Total Treatments t 1 Between Treatments Residual n t Within Treatments

• Difference is symbolic:– Units term explicitly represents a source of uncontrolled

variation: differences between Units.

– Our table exhibits the confounding in the experiment. – Indenting of Treatments under Units signifies that treatment

differences are confounded or “mixed-up” with unit differences.


f) Computation of the ANOVA in R• Begin with data entry (see Appendix A, Introduction to R

and, Appendix C, Analysis of designed experiments in R)• Next an initial graphical exploration using boxplots —

defer to a second example with more data. • ANOVA: while function lm could be used, function aov is

preferred for analysing data from a designed experiment. • Both use a model formula of the form:

– Response variable ~ explanatory variables (and operators)– So far expressions on right fairly simple — one or two

explanatory variables separated by a “+”. • Subtlety with analysis of designed experiments:

– If explanatory variable is a numeric, such as a numeric vector, then R fits just one coefficient for it.For a single explanatory variable, a straight-line relationship fitted.

– If explanatory variable is categorical, such as a factor, a coefficient is fit for each level of the variable — indicator variables are used.

– In analyzing a CRD important that the treatment factor is stored in a factor object, signalling R to use indicator variables


f) Computation of the ANOVA in R (continued)

• There are two ways in which this analysis can be obtained using the aov function: – without and with an Error function in the model

formula. – Error function used in model formula to specify a

model for the Error in the experiment (a model for uncontrolled variation).

• summary function is used and this produces ANOVA table.

• model.tables function used to obtain tables of means.



• Following commands to perform the two analyses of the data:## AOV without Error#Rat.NoError.aov <- aov(LiverWt ~ Diet, CRDRat.dat)summary(Rat.NoError.aov)## AOV with Error#Rat.aov <- aov(LiverWt ~ Diet + Error(Rat), CRDRat.dat)

summary(Rat.aov)model.tables(Rat.aov, type = "means")


Output> # AOV without Error> #> Rat.NoError.aov <- aov(LiverWt ~ Diet, CRDRat.dat)> summary(Rat.NoError.aov) Df Sum Sq Mean Sq F value Pr(>F)Diet 2 0.240000 0.120000 3.6 0.1595Residuals 3 0.100000 0.033333 > #> # AOV with Error> #> Rat.aov <- aov(LiverWt ~ Diet + Error(Rat), CRDRat.dat)> summary(Rat.aov)Error: Rat Df Sum Sq Mean Sq F value Pr(>F)Diet 2 0.240000 0.120000 3.6 0.1595Residuals 3 0.100000 0.033333

• Analysis without the Error function parallels the traditional analysis and that with Error is similar to our table.

• In this course will use Error function.


Output (continued)

> model.tables(Rat.aov, type="means")Tables of meansGrand mean 3.1 Diet A B C 3.1 3.3 2.7rep 3.0 2.0 1.0


III.D Diagnostic checking • Assumed the following model:

Y is distributed N(, 2In) where E[Y] X and

2 is the variance of an observation• Maximal model is used in diagnostic checking:

T E[Y] XT• For this model to be appropriate requires that:

a) the response is operating additively — that the individual deviations in the response to a treatment are similar;

b) the sets of units assigned the treatments are comparable in that the amount of uncontrolled variation exhibited by them is the same for each treatment;

c) each observation is independent of other observations; andd) that the response of the units is normally distributed.


Example III.2 Caffeine effects on students

• Effect of orally ingested caffeine on a physical task was investigated (Draper and Smith, 1981, sec.9.1).

• Thirty healthy male college students were selected and trained in finger tapping.

• Ten men were randomly assigned to receive one of three doses of caffeine (0, 100 or 200 mg).

• The number of finger taps after ingesting the caffeine was recorded for each student.

Caffeine Dose (mg) 0 100 200

242 248 246 245 246 248 244 245 250 248 247 252 247 248 248 248 250 250 242 247 246 244 246 248 246 243 245 242 244 250


Entering the data

• Setting up a data frame for data arranged in standard order:– factor Students with values 1–30,– factor Dose with levels 0, 100 and 200 and

values depending on whether data is entered by rows or columns (can use rep function),

– numeric vector Taps with the 30 observed values of the response variable.


Entering the data (cont'd)

#set up data.frame with factors Students and Dose and response variable Taps

CRDCaff.dat <- data.frame(Students = factor(1:30), Dose = factor(rep(c(0,100,200), times=10)))CRDCaff.dat$Taps <-

c(242,248,246,245,246,248,244,245,250,248,247,252,247,248,248,

248,250,250,242,247,246,244,246,248,246,243,245,242,244,250)

CRDCaff.dat

> CRDCaff.dat

Students Dose Taps

1 1 0 242

2 2 100 248

3 3 200 246

4 4 0 245

5 5 100 246

6 6 200 248

7 7 0 244

8 8 100 245

9 9 200 250

10 10 0 248

11 11 100 247

12 12 200 252

13 13 0 247

14 14 100 248

15 15 200 248

16 16 0 248

17 17 100 250

18 18 200 250

19 19 0 242

20 20 100 247

21 21 200 246

22 22 0 244

23 23 100 246

24 24 200 248

25 25 0 246

26 26 100 243

27 27 200 245

28 28 0 242

29 29 100 244

30 30 200 250


Boxplots for each level of Dose • Use function

boxplot(split(Taps, Dose), xlab="Dose", ylab="Number of taps")

• Average number of taps increasing as dose increases

• Some evidence dose 0 more variable than dose 100.

0 100 200

24

22

44

24

62

48

25

02

52

Dose

Nu

mb

er

of t

ap

s


Analysis of variance for this data> Caffeine.aov <- aov(Taps ~ Dose + Error(Students), CRDCaff.dat)

> summary(Caffeine.aov)

Error: Students

Df Sum Sq Mean Sq F value Pr(>F)

Dose 2 61.400 30.700 6.1812 0.006163

Residuals 27 134.100 4.967

> model.tables(Caffeine.aov, type="means")

Tables of means

Grand mean

246.5

Dose

Dose

0 100 200

244.8 246.4 248.3


The hypothesis test

Step 1: Set up hypotheses H0: 0100200 (G = XG)

H1: not all population Dose means are equal

(D = XD)

Set = 0.05


The hypothesis test (continued)Step 2: Calculate test statistic

The ANOVA table for the example is:

Source df SSq MSq F Prob Students 29 195.50 Doses 2 61.40 30.70 6.18 0.006 Residual 27 134.10 4.97

Step 3: Decide between hypotheses

P(F2,27 6.18) p 0.006 < 0.05.

The evidence suggests there is a dose difference and that the expectation model that best describes the data is D XD.


Examination of the residuals, eT

• Use the Residuals-versus-fitted-values plot and the Normal Probability plot.

• In interpreting these plots:– Note ANOVA is robust to variance heterogeneity, if

treatments equally replicated, and to moderate departures from normality.

– Most commonly find an unusually large or small residual. The cause of such extreme values requires investigation.May be a recording mistake or catastrophe for a unit that can be identified.But, may be valid and is the result of some unanticipated, but important effect.


The Normal Probability plot• Should show a broadly straight line trend.

*

*

*

*

*

*

* *

*

__________________


The Residuals-versus-fitted-values plot • Generally, the points on scatter diagram should

be spread across plot evenly.

* * * * * * * * * * * * * * _____________________________

no particular pattern


Problem plots * * * * * * * * * * * * * * * * _________________________ systematic trend in residuals

* * * * * * * * * * * __________________________

variance increases as level increases *

* * * * * * * * * * * * *_________________________

variance peaks in middle

Actually, for the CRD, this plot has a vertical scatter of points for each treatment — each should be centred on zero and of the same width.


R functions used in producing these plots

resid.errors: extract the residuals from an aov object when Error function used

fitted.errors: extract the fitted values from an aov object when Error function used

plot: to plot the fitted values against the residualsqqnorm: to plot the residuals against the normal

quantilesqqline: to add a line to the plot produced by

qqnorm.• First 2 functions are nonstandard functions from dae package.


Example III.2 Caffeine effects on students (continued)

• A violation of the assumptions would occur if all the students were in the same room and the presence of other students caused anxiety to just the students that had no caffeine. – The response of the students is not independent. – It may be that the inhibition of this group resulted in less variation

in their response which would be manifest in the plot.

• Another situation that would lead to an unacceptable pattern in the plot is if the effect becomes more variable as the level of the response variable increases. – For example, caffeine increases the tapping but at higher levels

the variability of increase from student to student is greater. – That is there is a lack of additivity in the response.


R output for getting the plots

Note the use of data.frame to produce a printed list of the original data with the residuals and fitted values.

> res <- resid.errors(Caffeine.aov)> fit <- fitted.errors(Caffeine.aov)> data.frame(Students,Dose,Taps,res,fit) Students Dose Taps res fit1 1 0 242 -2.8 244.82 2 100 248 1.6 246.43 3 200 246 -2.3 248.34 4 0 245 0.2 244.85 5 100 246 -0.4 246.46 6 200 248 -0.3 248.37 7 0 244 -0.8 244.88 8 100 245 -1.4 246.49 9 200 250 1.7 248.310 10 0 248 3.2 244.811 11 100 247 0.6 246.412 12 200 252 3.7 248.313 13 0 247 2.2 244.814 14 100 248 1.6 246.415 15 200 248 -0.3 248.316 16 0 248 3.2 244.817 17 100 250 3.6 246.418 18 200 250 1.7 248.319 19 0 242 -2.8 244.820 20 100 247 0.6 246.4


Plots for the example

• The Residuals-versus-fitted-values plot appears to be fine.

21 21 200 246 -2.3 248.322 22 0 244 -0.8 244.823 23 100 246 -0.4 246.424 24 200 248 -0.3 248.325 25 0 246 1.2 244.826 26 100 243 -3.4 246.427 27 200 245 -3.3 248.328 28 0 242 -2.8 244.829 29 100 244 -2.4 246.430 30 200 250 1.7 248.3> plot(fit, res, pch = 16)> qqnorm(res, pch = 16)> qqline(res)

245.0 245.5 246.0 246.5 247.0 247.5 248.0

-3-2

-10

12

3

fit

res


Normal Probability plot

• Displaying some curvature at ends. – Indicates data heavier in tails and flatter in the peak

than expected for a normal distribution. – Given normality not crucial and only a few

observations involved, use analysis we have performed.

-2 -1 0 1 2

-3-2

-10

12

3

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es


The hypothesis test – SummaryThe ANOVA table for the example is:

Source df SSq MSq F Prob Students 29 195.50 Doses 2 61.40 30.70 6.18 0.006 Residual 27 134.10 4.97

P(F2,27 6.18) p 0.006 < 0.05.

The evidence suggests there is a dose difference and that the expectation model that best describes the data is D XD.

Diagnostic checking indicates model is OK


III.E Treatment differences • So far all that our analysis has accomplished is that we

have decided whether or not there appears to be a difference between the population treatment means.

• Of greater interest to the researcher is how the treatment means differ.

• Two alternatives available: 1. Multiple comparisons procedures

used when the treatment factors are all qualitative so that it is appropriate to test for differences between treatments

2. Fitting submodelsWhen one (or more) of the treatment factors is quantitative the fitting of smooth curves to the trend in the means is likely to lead to a more appropriate and concise description of the effects of the factors. Often, for reasons explained in chapter II, a low order polynomial will provide an adequate description of the trend.


Note

• Multiple comparison procedures should not be used when the test for treatment differences is not significant.

• Submodels should be fitted irrespective of whether the overall test for treatment differences is significant.

• The difference in usage has to do with one being concerned with mean differences and the other with deciding between models.


a) Multiple comparisons procedures for comparing all treatments

• Multiple comparisons for all treatments = MCA procedures.

• In general MCA procedures divide into those based:– on family-wise error rates — Type I error rate specified and

controlled for over all comparisons, often at 0.05.– those based on comparison-wise error rates — Type I error rate

specified and controlled for each comparison

• Problem with latter is probability of an incorrect conclusion gets very high as the number of comparisons increases.

• For comparison-wise error rate of 0.05, family-wise error rate: No. means No. Compared Family-wise

error rate 5 10 0.40 10 45 0.90 15 105 0.995

• So recommend use MCA procedure based on family-wise error rates: use just Tukey's HSD procedure.


Tukey’s Honestly Significant Difference procedure

• determines if each pair of means is significantly different

• is based on a family-wise error rate.

• basically for equal numbers of observations for each mean.

A modification that is approximate will be provided for unequal numbers.


The procedure• Each application of the procedure is based on

the hypotheses:

Ho: A = B

H1: A B

• One calculates the statistic: , ,%

2 d

tx

qw s

qt,, is the studentized range with t = no. means, = Residual df and = significance level,

A B

1 1Residual MSq dxs r r

is the standard error of the difference

rA, rB are the number of replicates for each of a pair of means being compared.


Notes about replication

• Note that strictly speaking rA and rB should be equal.

• When unequal rA and rB called the Tukey-Kramer procedure. – w(%) will depend on which means are being

compared.

• If the treatments are all equally replicated with replication r, the formula for reduces to

2Residual MSq dxs r


In R

• aov has given Residual MSq and df• model.tables produces tables of

means. • qtukey computes as follows:

q <- qtukey(1 - , t, )



• Have already concluded evidence suggests that there is a dose difference. But which doses are different?

• Output with means and q: > model.tables(Caffeine.aov, type = "means")

Tables of means

Grand mean

246.5

Dose

Dose

0 100 200

244.8 246.4 248.3

> q <- qtukey(0.95, 3, 27)

> q

[1] 3.506426

3.506 4.967 25%

1023.506

0.9972

2.47

w


Decide on differences

• Any two means different by 2.47 or more are significantly different.

Differences between all pairs of Dose means

Dose 0 100 200 Mean 244.8 246.4 248.3 0 244.8

100 246.4 1.6 200 248.3 3.5 1.9

w(5%) 2.47

• Our conclusion is that the mean for 0 and 200 are different but that for 100 is somewhat intermediate.


b) Fitting submodels

• For quantitative factors, like Dose, it is often better to examine the relationship between the response and the levels of the factor.

• This is commonly done using polynomials.• Now, a polynomial of degree t1 will fit exactly t

points. – So a quadratic will fit exactly the three means.

• In practice, polynomials of order 2 often sufficient.– However, more than 3 points may be desirable so that

deviations from the fitted curve or lack of fit can be tested


Polynomial models• To investigate polynomial models up to order 2,

following models for expectation investigated:

1

21 2

i

i k

i k k

i k

E Y

E Y x

E Y x x

E Y

where

xk is the value of the kth level of the treatment factor,

is the intercept of the fitted equation and

1 is the slope of the fitted equation and

2 is the quadratic coefficient of the fitted equation.

G

1 1 1 1

2 2 2 1 2

T 0 100 200

where

where

where

E

E

E

E

Y X

Y X

Y X

Y X α α

In matrix terms

The X1 and X2 matrices made up of columns that consist of the values of the levels of the factor and their powers

— not the indicator variables of before.



• The X matrices for the example are:

G 1 2

1 1 0 1 0 01 1 0 1 0 0

1 1 0 1 0 01 1 100 1 100 100001 1 100 1 100 10000, ,

1 1 100 1 100 100001 1 200 1 200 400001 1 200 1 200 40000

1 1 200 1 200 40000

X X X

D

1 0 01 0 0

1 0 00 1 00 1 0,

0 1 00 0 10 0 1

0 0 1

X

• The columns of each X matrix in the above list are a linear combination of those of any of the X matrices to its right in the list.


Marginality of models

• That is, the columns of each X matrix in the above list are a linear combination of those of any of the X matrices to its right in the list.

• Marginality is not a symmetric relationship in that– if a model is marginal to second model,– the second model is not necessarily marginal to the

first.

• For example, E[Y] X11 is marginal to E[Y]

X22 but not vice-a-versa, except when t 2.

G 1 2 T X X X XC C C C

• Each model in the sequence of models G 1 1 2 2 T, , ,E E E EY X Y X Y X Y X α

is marginal to all models before it as


Equivalence of E[Y] X22 and E[Y] XD• (X2) (XD) as the 3 columns of one matrix can be written

as 3 linearly independent combinations of the columns of the other matrix.

• So the two models are marginal to each other and are equivalent

• However, while the fitted values are the same, the estimates and interpretation of the parameters are different: – those corresponding to X2 are interpreted as the intercept,

slope and curvature coefficient.– those corresponding to XD are interpreted as the expected

(mean) value for that treatment. • Also, in spite of being marginal, the estimators of the same

parameter differ depending on the model that has been fitted. – for example, for the model E[Y] XG,– but for E[Y] X11,

ˆ Y

1ˆ ˆY x models not orthogonal


Hypothesis test incorporating submodels • Test statistics computed in ANOVA table.

Step 1: Set up hypotheses a) H0:

H1: (Differences between fitted model or

Deviations from quadratic are zero)

b) H0: H1:

c) H0: H1:

Set .

21 2 0 for all k k kx x k

21 2 0 for all k k kx x k

2 0 2 0

1 0 1 0


Hypothesis test (continued)Step 2: Calculate test statistics

The ANOVA table for a CRD is:

Source df SSq MSq F p Units 1n UYQY Treatments 1t TYQY T 1t YQY

Linear 1 LT

YQ Y LT

YQ Y L

2Ts

L Res

2 2T Us s LT

p

Quadratic 1 QT

YQ Y QT

YQ Y Q

2Ts

Q Res

2 2T Us s QT

p

Deviations 3t DevTYQ Y

DevT 3t YQ Y Dev

2Ts

Dev Res

2 2T Us s DevTp

Residual n t ResUYQ Y

ResU n t YQ Y Res

2Us

Note that LT

Q , QT

Q and DevTQ are not simple linear functions of M, or mean

operator, matrices — other Q matrices are.

Test statistics corresponds to hypothesis pairs in Step 1.


Hypothesis test (continued)Step 3: Decide between hypotheses

Begin with the first hypothesis pair, determine its significance and continue down the sequence until a significant result is obtained. – A significant Deviations F indicates that the linear and

quadratic terms provide an inadequate description.Thus a model based on them would be unsatisfactory. No point in continuing.

– If the Deviations F is not significant , then a significant Quadratic F indicates that a 2nd degree polynomial is required to adequately describe the trend. As a linear coefficient is necessarily incorporated in a 2nd polynomial, no point to further testing in this case.

– If both the Deviations and Quadratic F's are not significant, then a significant Linear F indicates a linear relationship describes the trend in the treatment means.


Fitting polynomials in R

• Need to realize that the default contrasts for ordered (factor) objects are polynomial contrasts, assuming equally-spaced levels:

– linear transformations of all columns of X, except the first, such that each column

a) is orthogonal to all other columns in X andb) has SSq equal to 1.

• Facilitates the computations. • So, if a factor is quantitative:

– set it up as an ordered object from the start.– if did not make it an ordered object, then redefine

factor as an ordered.



• R output for setting up ordered (factor):> # fit polynomials> #> Dose.lev <- c(0,100,200)> CRDCaff.dat$Dose <- ordered(CRDCaff.dat$Dose, + levels=Dose.lev)> contrasts(CRDCaff.dat$Dose) <- contr.poly(t, + scores=Dose.lev)> contrasts(CRDCaff.dat$Dose) .L .Q0 -7.071068e-01 0.4082483100 -9.073264e-17 -0.8164966200 7.071068e-01 0.4082483

Dose levels stored to save repeatedly entering them

coded values of dose and dose squared with 0, SSq 1, cross-product= 0

redundant for equally-, but not unequally-, spaced levels.


R output for fitting polynomial submodels> Caffeine.aov <- aov(Taps ~ Dose + Error(Students),

CRDCaff.dat)> summary(Caffeine.aov,

split = list(Dose = list(L = 1, Q = 2)))Error: Students Df Sum Sq Mean Sq F value Pr(>F)Dose 2 61.400 30.700 6.1812 0.006163 Dose: L 1 61.250 61.250 12.3322 0.001585 Dose: Q 1 0.150 0.150 0.0302 0.863331Residuals 27 134.100 4.967

• Note 3 treatments will follow a quadratic exactly — this is reflected in (X2) (XD).

• Thus, Deviations line is redundant in this example.• When required, Deviations line is computed by

assigning extra powers to a single line named, say, Dev in split function (e.g. Dev = 3:6 — see notes).


Hypothesis test incorporating submodels

Step 1: Set up hypotheses

a) H0:

H1:

b) H0:

H1:

Set 0.05.

2 0

2 0

1 0

1 0


Hypothesis test (continued)Step 2: Calculate test statistics > Caffeine.aov <- aov(Taps ~ Dose + Error(Students),

CRDCaff.dat)> summary(Caffeine.aov,

split = list(Dose = list(L = 1, Q = 2)))Error: Students Df Sum Sq Mean Sq F value Pr(>F)Dose 2 61.400 30.700 6.1812 0.006163 Dose: L 1 61.250 61.250 12.3322 0.001585 Dose: Q 1 0.150 0.150 0.0302 0.863331Residuals 27 134.100 4.967

Source df SSq MSq F Prob Students 29 195.50 Doses 2 61.40 30.70 6.18 0.006 Linear 1 61.25 61.25 12.33 0.002 Quadratic 1 0.15 0.15 0.03 0.863 Residual 27 134.10 4.97


Hypothesis test (continued)

Step 3: Decide between hypotheses The Quadratic source has a probability of 0.863 > 0.05

and so H0 is not rejected in this case. The linear source has a probability of 0.002 < 0.05 and so

H0 is rejected in this case. It is clear quadratic term is not significant but linear term is

highly significant so the appropriate model for the expectation is the linear model X11 where 1 = [ 1].

Source df SSq MSq F Prob Students 29 195.50 Doses 2 61.40 30.70 6.18 0.006 Linear 1 61.25 61.25 12.33 0.002 Quadratic 1 0.15 0.15 0.03 0.863 Residual 27 134.10 4.97


Fitted equation

• Coefficients can be obtained using the coef function on the aov object, but these are not suitable for obtaining the fitted values.

• The fitted equation is obtained by putting the values of the levels into a numeric vector and using the lm function to fit a polynomial of the order indicated by the hypothesis test.

• For the example, linear equation was adequate and so the analysis is redone with 1 for the order of the polynomial.


Fitted equation for the example> D <- as.vector(Dose)> D <- as.numeric(D)> Caffeine.lm <- lm(Taps ~ D)> coef(Caffeine.lm)(Intercept) D 244.7500 0.0175

• The fitted equation is Y 244.75 + 0.0175 X where X is the number of taps

• The slope of this equation is 0.0175. • That is, taps increase 0.0175 x 100 = 1.75 with each 100 mg of

caffeine. • This conclusion seems a more satisfactory summary of the results

than that the response at 200 is significantly greater than at 0 with 100 being intermediate.

• The commands to fit a quadratic would be:D2 <- D*DCaffeine.lm <- lm(Taps ~ D + D2)


Plotting means and fitted line• Details are in the notes

• The plot produced is as follows:

0 50 100 150 200

24

5.0

24

5.5

24

6.0

24

6.5

24

7.0

24

7.5

24

8.0

Dose

Ta

ps


c) Comparison of treatment parametrizations

• Two alternative bases:

21 1

22 2

23 3

Indicator variables Polynomials

11 0 00 1 0 10 0 1 1

x xx xx x

• Lead to different parameter estimates with

different interpretations and different partitions of the treatment SSqs.

• The total treatment SSqs and fitted values for treatments remain the same, while the contrasts span the treatment space.

• That is, in this case, SSqT = SSqL + SSqQ


III.G Exercises

• Ex. III.1–2 look at aspects of quadratic forms

• Ex. III.3 investigates the calculations with example that can be done with a calculator

• Ex. III.4 involves producing a randomized layout

• Ex. III.5 asks for the complete analysis of a CRD with a qualitative treatment factor

• Ex. III.6 asks for the complete analysis of a CRD with a quantitative treatment factor

Statistical Modelling Chapter III 1 III.Completely Randomized Design (CRD) III.ADesign of a CRD...

Documents

Transcript of Statistical Modelling Chapter III 1 III.Completely Randomized Design (CRD) III.ADesign of a CRD...