Factorial Experiments Analysis of Variance Experimental Design.
Factorial Experiments
-
Upload
oren-fitzgerald -
Category
Documents
-
view
54 -
download
0
description
Transcript of Factorial Experiments
Factorial Experiments
Analysis of Variance
Experimental Design
• Dependent variable Y
• k Categorical independent variables A, B, C, … (the Factors)
• Let– a = the number of categories of A– b = the number of categories of B– c = the number of categories of C– etc.
The Completely Randomized Design
• We form the set of all treatment combinations – the set of all combinations of the k factors
• Total number of treatment combinations– t = abc….
• In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination.– Total number of experimental units N = nt=nabc..
The treatment combinations can thought to be arranged in a k-dimensional rectangular block
A
1
2
a
B1 2 b
A
B
C
Another way of representing the treatment combinations in a factorial experiment
A
B
...
D
C
...
Example
In this example we are examining the effect of
We have n = 10 test animals randomly assigned to k = 6 diets
The level of protein A (High or Low) and The source of protein B (Beef, Cereal, or Pork) on weight gains Y (grams) in rats.
The k = 6 diets are the 6 = 3×2 Level-Source combinations
1. High - Beef
2. High - Cereal
3. High - Pork
4. Low - Beef
5. Low - Cereal
6. Low - Pork
TableGains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s
ource of protein (Beef, Cereal, or Pork)
Levelof Protein High Protein Low protein
Sourceof Protein Beef Cereal Pork Beef Cereal Pork
Diet 1 2 3 4 5 6
73 98 94 90 107 49102 74 79 76 95 82118 56 96 90 97 73104 111 98 64 80 86
81 95 102 86 98 81107 88 102 51 74 97100 82 108 72 74 106
87 77 91 90 67 70117 86 120 95 89 61111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
Example – Four factor experiment
Four factors are studied for their effect on Y (luster of paint film). The four factors are:
Two observations of film luster (Y) are taken for each treatment combination
1) Film Thickness - (1 or 2 mils)
2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and
4) Temperature of wash (92 ˚C or 100 ˚C)
The data is tabulated below:Regular Dry Special DryMinutes 92 C 100 C 92C 100 C
1-mil Thickness20 3.4 3.4 19.6 14.5 2.1 3.8 17.2 13.430 4.1 4.1 17.5 17.0 4.0 4.6 13.5 14.340 4.9 4.2 17.6 15.2 5.1 3.3 16.0 17.860 5.0 4.9 20.9 17.1 8.3 4.3 17.5 13.9
2-mil Thickness20 5.5 3.7 26.6 29.5 4.5 4.5 25.6 22.530 5.7 6.1 31.6 30.2 5.9 5.9 29.2 29.840 5.5 5.6 30.5 30.2 5.5 5.8 32.6 27.4
60 7.2 6.0 31.4 29.6 8.0 9.9 33.5 29.5
NotationLet the single observations be denoted by a single letter and a number of subscripts
yijk…..l
The number of subscripts is equal to:(the number of factors) + 1
1st subscript = level of first factor 2nd subscript = level of 2nd factor …Last subsrcript denotes different observations on the same treatment combination
Notation for Means
When averaging over one or several subscripts we put a “bar” above the letter and replace the subscripts by •
Example:
y241 • •
Profile of a Factor
Plot of observations means vs. levels of the factor.
The levels of the other factors may be held constant or we may average over the other levels
Definition:
A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors:
No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors)
Otherwise the factor is said to affect the response:
Definition:• Two (or more) factors are said to interact if
changes in the response when you change the level of one factor depend on the level(s) of the other factor(s).
• Profiles of the factor for different levels of the other factor(s) are not parallel
• Otherwise the factors are said to be additive .
• Profiles of the factor for different levels of the other factor(s) are parallel.
• If two (or more) factors interact each factor effects the response.
• If two (or more) factors are additive it still remains to be determined if the factors affect the response
• In factorial experiments we are interested in determining
– which factors effect the response and– which groups of factors interact .
0
10
20
30
40
50
60
70
0 20 40 60
Factor A has no effect
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Additive Factors
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Interacting Factors
A
B
The testing in factorial experiments 1. Test first the higher order interactions.2. If an interaction is present there is no need
to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact
3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.
Level of Protein Beef Cereal Pork Overall
Low 79.20 83.90 78.70 80.60
Source of Protein
High 100.00 85.90 99.50 95.13
Overall 89.60 84.90 89.10 87.87
Example: Diet Example
Summary Table of Cell means
70
80
90
100
110
Beef Cereal Pork
Wei
ght
Gai
n
High Protein
Low Protein
Overall
Profiles of Weight Gain for Source and Level of Protein
70
80
90
100
110
High Protein Low Protein
Wei
ght
Gai
nBeef
Cereal
Pork
Overall
Profiles of Weight Gain for Source and Level of Protein
Models for factorial Experiments Single Factor: A – a levels
yij = + i + ij i = 1,2, ... ,a; j = 1,2, ... ,n
01
a
ii
Random error – Normal, mean 0, std-dev.
i
iAyi when ofmean thei
Overall mean Effect on y of factor A when A = i
y11
y12
y13
y1n
y21
y22
y23
y2n
y31
y32
y33
y3n
ya1
ya2
ya3
yan
Levels of A1 2 3 a
observationsNormal dist’n
Mean of observations
1 2 3 a
+ 1
+ 2
+ 3
+ a
Definitions
a
iia 1
1mean overall
a
iiiii a
iA1
1 )en (Effect wh
Two Factor: A (a levels), B (b levels
yijk = + i + j+ ()ij + ijk
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,n
0,0,0,01111
b
jij
a
iij
b
jj
a
ii
ij
ijji
ij jBiAy
and when ofmean the
Overall mean
Main effect of A Main effect of B
Interaction effect of A and B
Table of Means
Table of Effects – Overall mean, Main effects, Interaction Effects
Three Factor: A (a levels), B (b levels), C (c levels)
yijkl = + i + j+ ij + k + ()ik + ()jk+ ijk + ijkl
= + i + j+ k + ij + (ik + (jk
+ ijk + ijkl
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n
0,,0,0,0,011111
c
kijk
a
iij
c
kk
b
jj
a
ii
Main effects
Two factor Interactions
Three factor Interaction
Random error
ijk = the mean of y when A = i, B = j, C = k
= + i + j+ k + ij + (ik + (jk
+ ijk
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n
0,,0,0,0,011111
c
kijk
a
iij
c
kk
b
jj
a
ii
Main effects Two factor Interactions
Three factor Interaction
Overall mean
Levels of C
Levels of B
Levels of A
Levels of B
Levels of A
No interaction
Levels of C
Levels of B
Levels of A Levels of A
A, B interact, No interaction with C
Levels of B
Levels of C
Levels of B
Levels of A Levels of A
A, B, C interact
Levels of B
Four Factor:
yijklm = + + j+ ()ij + k + ()ik + ()jk+ ()ijk + l+ ()il + ()jl+ ()ijl + ()kl + ()ikl + ()jkl+ ()ijkl + ijklm
=
+i + j+ k + l
+ ()ij + ()ik + ()jk + ()il + ()jl+ ()kl
+()ijk+ ()ijl + ()ikl + ()jkl
+ ()ijkl + ijklm
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,d; m = 1,2, ... ,n
where 0 = i = j= ()ij k = ()ik = ()jk= ()ijk = l= ()il = ()jl = ()ijl = ()kl = ()ikl = ()jkl =
()ijkl
and denotes the summation over any of the subscripts.
Main effects Two factor Interactions
Three factor Interactions
Overall mean
Four factor Interaction Random error
Estimation of Main Effects and Interactions • Estimator of Main effect of a Factor
• Estimator of k-factor interaction effect at a combination of levels of the k factors
= Mean at the combination of levels of the k factors - sum of all means at k-1 combinations of levels of the k factors +sum of all means at k-2 combinations of levels of the k factors - etc.
= Mean at level i of the factor - Overall Mean
Example:
• The main effect of factor B at level j in a four factor (A,B,C and D) experiment is estimated by:
• The two-factor interaction effect between factors B and C when B is at level j and C is at level k is estimated by:
yyˆjj
yyyy kjjkjk
• The three-factor interaction effect between factors B, C and D when B is at level j, C is at level k and D is at level l is estimated by:
• Finally the four-factor interaction effect between factors A,B, C and when A is at level i, B is at level j, C is at level k and D is at level l is estimated by:
yyyyyyyy lkjklljjkjkljkl
jklikiijjklklilijijkijklijkl yyyyyyyyy
yyyyyyy lkjikllj
Anova Table entries
• Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested)
• Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction.
a
iiA nbSS
1
2
b
jjB naSS
1
2
a
i
b
jijAB nSS
1 1
2
a
i
b
j
n
kijijkError yySS
1 1 1
2
Analysis of Variance (ANOVA) Table Entries (Two factors – A and B)
The ANOVA Table
a
iiA nbcSS
1
2
b
jjB nacSS
1
2
a
i
b
jijAB ncSS
1 1
2
a
i
c
kikAC nbSS
1 1
2
b
j
c
kjkBC naSS
1 1
2
a
i
b
j
c
kijkABC nSS
1 1 1
2
a
i
b
j
c
k
n
lijkijklError yySS
1 1 1 1
2
Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C)
c
kkC nabSS
1
2
The ANOVA Table
Source SS df
A SSA a-1
B SSB b-1
C SSC c-1
AB SSAB (a-1)(b-1)
AC SSAC (a-1)(c-1)
BC SSBC (b-1)(c-1)
ABC SSABC (a-1)(b-1)(c-1)
Error SSError abc(n-1)
• The Completely Randomized Design is called balanced
• If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations)
• If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.)
Example: Diet example
Mean
= 87.867
y
Main Effects for Factor A (Source of Protein)
Beef Cereal Pork
1.733 -2.967 1.233
yyˆ ii
Main Effects for Factor B (Level of Protein)
High Low
7.267 -7.267
yyˆjj
AB Interaction Effects
Source of Protein
Beef Cereal Pork
Level High 3.133 -6.267 3.133
of Protein Low -3.133 6.267 -3.133
yy-y-y jiijij
Example 2
Paint Luster Experiment
Table: Means and Cell Frequencies
Means and Frequencies for the AB Interaction (Temp - Drying)
0
5
10
15
20
25
92 100
Temperature
Lus
ter
Regular Dry
Special Dry
Overall
Profiles showing Temp-Dry Interaction
Means and Frequencies for the AD Interaction (Temp- Thickness)
0
5
10
15
20
25
30
92 100
Temperature
Lus
ter
1-mil
2-mil
Overall
Profiles showing Temp-Thickness Interaction
The Main Effect of C (Length)
7060504030201012
13
14
15
16
Profile of Effect of Length on Luster
Length
Lu
ster
The Randomized Block Design
• Suppose a researcher is interested in how several treatments affect a continuous response variable (Y).
• The treatments may be the levels of a single factor or they may be the combinations of levels of several factors.
• Suppose we have available to us a total of N = nt experimental units to which we are going to apply the different treatments.
The Completely Randomized (CR) design randomly divides the experimental units into t groups of size n and randomly assigns a treatment to each group.
The Randomized Block Design
• divides the group of experimental units into n homogeneous groups of size t.
• These homogeneous groups are called blocks.
• The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.
Experimental Designs
• In many experiments were are interested in comparing a number of treatments. (the treatments maybe combinations of levels of several factors.)
• The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental effects
The Completely Randomizes Design
Treats1 2 3 … t
Experimental units randomly assigned to treatments
Randomized Block Design
Blocks
All treats appear once in each block
The Model for a randomized Block Experiment
ijjiijy
i = 1,2,…, t j = 1,2,…, b
yij = the observation in the jth block receiving the ith treatment
= overall mean
i = the effect of the ith treatment
j = the effect of the jth Block
ij = random error
The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value
Treat SST t-1 MST MST /MSE
Block SSB n-1 MSB MSB /MSE
Error SSE (t-1)(b-1) MSE
• A randomized block experiment is assumed to be a two-factor experiment.
• The factors are blocks and treatments.
• The is one observation per cell. It is assumed that there is no interaction between blocks and treatments.
• The degrees of freedom for the interaction is used to estimate error.
Experimental Designs
• In many experiments were are interested in comparing a number of treatments. (the treatments maybe combinations of levels of several factors.)
• The objective of Experimental design is to reduce the magnitude of random error resulting in more powerful tests to detect experimental effects
The Completely Randomized Design
Treats1 2 3 … t
Experimental units randomly assigned to treatments
Randomized Block Design
Blocks
All treats appear once in each block
The matched pair designan experimental design for comparing two
treatments
Pairs
The matched pair design is a randomized block design for t = 2 treatments
The Model for a randomized Block Experiment
ijjiijy
i = 1,2,…, t j = 1,2,…, b
yij = the observation in the jth block receiving the ith treatment
= overall mean
i = the effect of the ith treatment
j = the effect of the jth Block
ij = random error
The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value
Treat SST t-1 MST MST /MSE
Block SSB n-1 MSB MSB /MSE
Error SSE (t-1)(b-1) MSE
Incomplete Block Designs
Randomized Block Design
• We want to compare t treatments• Group the N = bt experimental units into b
homogeneous blocks of size t.• In each block we randomly assign the t treatments
to the t experimental units in each block.• The ability to detect treatment to treatment
differences is dependent on the within block variability.
Comments• The within block variability generally increases
with block size.• The larger the block size the larger the within
block variability.• For a larger number of treatments, t, it may not be
appropriate or feasible to require the block size, k, to be equal to the number of treatments.
• If the block size, k, is less than the number of treatments (k < t)then all treatments can not appear in each block. The design is called an Incomplete Block Design.
Commentsregarding Incomplete block designs
• When two treatments appear together in the same block it is possible to estimate the difference in treatments effects.
• The treatment difference is estimable.
• If two treatments do not appear together in the same block it not be possible to estimate the difference in treatments effects.
• The treatment difference may not be estimable.
Example• Consider the block design with 6 treatments
and 6 blocks of size two.
• The treatments differences (1 vs 2, 1 vs 3, 2 vs 3, 4 vs 5, 4 vs 6, 5 vs 6) are estimable.
• If one of the treatments is in the group {1,2,3} and the other treatment is in the group {4,5,6}, the treatment difference is not estimable.
1
2
2
3
1
3
4
5
5
6
4
6
Definitions
• Two treatments i and i* are said to be connected if there is a sequence of treatments i0 = i, i1, i2, … iM = i* such that each successive pair of treatments (ij and ij+1) appear in the same block
• In this case the treatment difference is estimable.
• An incomplete design is said to be connected if all treatment pairs i and i* are connected.
• In this case all treatment differences are estimable.
Example• Consider the block design with 5 treatments
and 5 blocks of size two.
• This incomplete block design is connected.
• All treatment differences are estimable.
• Some treatment differences are estimated with a higher precision than others.
1
2
2
3
1
3
4
5
1
4
DefinitionAn incomplete design is said to be a Balanced Incomplete Block Design.
1. if all treatments appear in exactly r blocks.• This ensures that each treatment is estimated with
the same precision2. if all treatment pairs i and i* appear together in exactly
blocks.• This ensures that each treatment difference is
estimated with the same precision.• The value of is the same for each treatment pair.
Some IdentitiesLet b = the number of blocks.
t = the number of treatmentsk = the block sizer = the number of times a treatment appears in the experiment. = the number of times a pair of treatment appears together in the same block
1. bk = rt• Both sides of this equation are found by counting the
total number of experimental units in the experiment.
2. r(k-1) = (t – 1)• Both sides of this equation are found by counting the
total number of experimental units that appear with a specific treatment in the experiment.
BIB DesignA Balanced Incomplete Block Design(b = 15, k = 4, t = 6, r = 10, = 6)
Block Block Block 1 1 2 3 4 6 3 4 5 6 11 1 3 5 6
2 1 4 5 6 7 1 2 3 6 12 2 3 4 6
3 2 3 4 6 8 1 3 4 5 13 1 2 5 6
4 1 2 3 5 9 2 4 5 6 14 1 3 4 6
5 1 2 4 6 10 1 2 4 5 15 2 3 4 5
An Example A food processing company is interested in comparing the taste of six new brands (A, B, C, D, E and F) of cereal.
For this purpose: • subjects will be asked to taste and compare these
cereals scoring them on a scale of 0 - 100. • For practical reasons it is decided that each subject
should be asked to taste and compare at most four of the six cereals.
• For this reason it is decided to use b = 15 subjects and a balanced incomplete block design to assess the differences in taste of the six brands of cereal.
The design and the data is tabulated below:
Subject Taste Scores (Brands)
1 51 (A) 55 (B) 69 (C) 83 (D) 2 48 (A) 87 (D) 56 (E) 22 (F) 3 65 (B) 91 (C) 67 (E) 35 (F) 4 42 (A) 48 (B) 65 (C) 43 (E) 5 36 (A) 58 (B) 69 (D) 7 (F) 6 79 (C) 85 (D) 56 (E) 25 (F) 7 54 (A) 60 (B) 90 (C) 21 (F) 8 62 (A) 92 (C) 94 (D) 63 (E) 9 39 (B) 71 (D) 47 (E) 11 (F) 10 51 (A) 59 (B) 84 (D) 51 (E) 11 39 (A) 74 (C) 61 (E) 25 (F) 12 69 (B) 78 (C) 78 (D) 22 (F) 13 63 (A) 74 (B) 59 (E) 32 (F) 14 55 (A) 74 (C) 78 (D) 34 (F) 15 73 (B) 83 (C) 92 (D) 68 (E)
Analysis of Block Experiments
Analysis of Block ExperimentsThe purpose of such experiments is to estimate the effects of treatments applied to some material (experimental units, subjects etc.) grouped into relatively homogeneous groups (blocks).
The variability within the groups (blocks) will be considerably less than if the subjects were left ungrouped.
This will lead to a more powerful analysis for comparing the treatments.
The basic model for block experiments
jiij block in treat toreponsemean theLet
Suppose we have t treatments and b blocks of size k.
The Assumption of Additivity
blocks) ofnt (Independe ikkjij
written.becan Hence ij
0 with j
ji
ijiij
0 with j
jji
block in the nsobservatio of vector the1
th
kj jk
yLet
if treat n is applied to mth unit in jth block
0
1 where j
mnj
mntkj xxX
otherwise
τX1yβτ
jkjj
bt
E
then , if Thus 2
1
2
1
.0with
then
If
2
1
2
1
2
1
1
1β
β
τ
100X
010X
001X
y
y
y
y
y
y
y
y
bb
b
bk
E
E
E
E
Not of full rank
.orthogonal is such that is
wherelet
0 condition side thehandle To
0
21
1
0
U
uuuUU
1U
1β
bb
bb
0
Hence 1
0
vβU
β1βU
b
0
and 2
1
100
vu
vu
vu
vUv
U1βUUβ
b
b
Thus i vu
i
θXv
τ
u1X
u1X
u1X
y
bb
E 22
11
and
rank full of is here X
121 ˆvar and ˆ Thus
XXθyXXXθ
Now
bb
b
u1X
u1X
u1X
1u1u1u
XXXXX
22
11
21
111
jjj
jjj
jjj
jjj
u11uX1u
u1XXX
and
b
b
y
y
y
1u1u1u
XXXyX
2
1
21
111
BU
T
y1u
yX
jjj
jjj
with
totals treatmentof vector 1
jjj
tT
T
yXT
lsblock tota of vector 11
bbB
B
y1
y1
B
and
Now define the incidence matrix
.block in appears treat. timesof no. where
jin
n
ij
ijbt
N
.replicated is treat timesof no. the where
00
00
00
and also 1
1
2
1
inr
r
r
r
r
r
r
jiji
tt
Rr
Note and bkb k11Nr1N
if treat n is applied to mth unit in jth block
0
1 where j
mnj
mnj xxXotherwise
Now
tj
j
j
jj
n
n
n
00
00
00
2
1
XX
Thus
RXX
t
jjj
r
r
r
00
00
00
2
1
Nn1X of col 2
1
thj
tj
j
j
j j
n
n
n
Also
N1X1X1X 21
b
and
Thus
and
UN
u
u
u
1X1X1Xu1X
2
1
21
b
bj
jj
NUX1u j
jj
Finally
IUU
u
u
u
uuu
uuu11u
kkk
k
b
b
jjj
jjj
2
1
21
.orthogonal is Since1
0
U
1U
b
Therefore
jjj
jjj
jjj
jjj
u11uX1u
u1XXX
XX
INU
UNR
k
UΩNNUIΩNU
UΩNΩXX
kkk
k
111
11 and
11 where NUUNRΩ k
Now
orthogonal is since 01
00 U11UUUUI
b
11 and NUUNRΩ k
11IUU
b1or
111
N11INR
bk
111
N11NNNR
bkk
111 rrNNR
bkk
ˆ
ˆˆ Thus 1 yXXXv
τθ
BU
T
UΩNNUIΩNU
UΩNΩ
kkk
k
111
1
B1rBNTΩ
B11INTΩ
BUUNTΩ
BUUΩNTΩτ
GbkG
k
bk
k
k
where
ˆor
1
11
1
1
1BNTΩ
rBNTΩτ
bkG
k
bkG
k
1
1
ˆ Hence
rrNNRΩ bkk111But
rrrrrr1Nr
1rr1NN1R
1rrNNR1Ω
Thus
11
111
bkk
bkk
1rΩ
and
TΩNUBUUΩNNUIv kkk
111ˆ Also
TΩNUUBUUΩNNUIUvUβ
kkk111ˆˆ
and
TΩNUUBUUΩNNUUBUU
kkk111
2
TΩNBUUΩNNBUU
kk11
BUUNTΩNBUU
kk11
τNBτNB11I ˆˆ 111 kbk
0ˆˆˆ Since τrτN1B1τNB1
G
GGGG
GG
G
G
k
k
bkG
k
ˆ
1
1
1
B1
BN1T1
BNT1
1rBNTΩrτr
Summary: The Least Squares Estimates
τNBβ ˆˆ 1 k
BNTQ
1QΩ1BNTΩτ
k
bkG
bkG
k
1
1
where
ˆ
The Residual Sum of Squares
θXyyy RSS
v
τUBTyy
ˆ
ˆ
vUBτTyy ˆˆ βBτTyy ˆˆ
τNBBτTyy ˆˆ 1 k
Hence
0 since 1 GGk BN1T1Q1
BBτBNTyy
kk
11 ˆ
τNBBτTyy ˆˆ 1 kRSS
BBτQyy k
1ˆ
BB1QΩQyy kbk
G 11QΩ
bkG
1QBBQΩQyy bk
Gk1
BBQΩQyy k
1
Summary: The Least Squares Estimates
τNBβ ˆˆ 1 k
BNTQ1QΩτ
kbkG 1 whereˆ
The Residual Sum of Squares
BBQΩQyy kRSS 1