Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more...
-
Upload
britton-patrick-brown -
Category
Documents
-
view
236 -
download
3
Transcript of Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more...
Chapter 19
Analysis of Variance(ANOVA)
ANOVA
• How to test a null hypothesis that the means of more than two populations are equal.
H0: m1 = m2 = m3
H1: Not all three populations are equal
• Test hypothesis with ANOVA procedure (Analysis of variance)
• ANOVA tests use the F distribution
F Distribution
• F distribution has 2 numbers of degree of freedom (DF) -- numerator and denominator.– EXAMPLE: df = (8,14)
• Change in numerator df has a greater effect on the shape of the distribution.
• Properties:– Continuous and skewed to the right– Has 2 df numbers– Nonnegative unites.
Finding the F valueExample 19.1
SITUATION: Find the F value for 8 degrees of freedom for the numerator, 14 degrees of freedom for the denominator and .05 area in the right tail of the F curve.
• Consult Table V of Appendix A – corresponding to .05 area. – Locate the numerator on the top row, and the denominator along
the left. – Find where they intersect.
• This will give the critical value of F. • Excel: FDIST (x, df1, df2), FINV(prob., df1, df2)
Assumptions in ANOVA
To test H0: m1=m2=m3
H1: Not all three populations are equal
– The following must be true:• Population from which samples are drawn are normally
distributed• Population from which the samples are drawn have the
same variance (or standard deviation)• The samples are drawn from different populations that are
random and independent.
How does ANOVA work?
• The purpose of ANOVA is to test differences in means (for groups or variables) for statistical significance.
• By partitioning the total variance into the component that is due to true random error (i.e., within-group SSE) and the components that are due to differences between groups (SSG).
• SSG is then tested for statistical significance, and, if significant, the null hypothesis of no differences between means is rejected.
• Always right-tailed with the rejection region in the right tail
Types of ANOVA
• One-way ANOVA: Only one factor is considered• Two-way ANOVA
– Answer the question if the two categorical variables act together to impact the averages for the various groups?
– If the two factors do not act together to impact the averages, does at least one of the factors have an impact on the averages for the various groups?
• N-way ANOVA – Looking for interaction of multiple factors. – Requires more data
– Always right-tailed with the rejection region in the right tail
ANOVA Notation and Formulas
• xi = sample mean for group (or treatment) i
• k = the number of groups (or treatments)• ni = sample size of group i
• x = the average (the grand mean) of all of the observations in all groups
• n = sum of the k sample sizes = n1 + n2 + n3 …. + nk
• si2 = the sample variance for group (or treatment) i
MSG and MSE
• Sum of squares for groups (SSG)
• Mean squares for groups (MSG)
• Sum of squared error (SSE)
• Mean squared error (MSE)
2222
211 )(....)()( xxnxxnxxnSSG kk
2222
211 )1(....)1()1( kk snsnsnSSE
1
k
SSGMSG
kn
SSEMSE
SST and relationship among the SSs
• Total sum of squares (SST)– SST is the numerator when calculating sample variance– Does not include a group distinction– Dividing SST by its df sample variance
• Relationships
SSG + SSE = SST
2)( xxSST
Groups Error Total
df k-1 n-k n-1
ANOVA Tables
• It is common practice to report results using an ANOVA table:
SourceSum of
Squares df Mean Square F P
Groups SSG k-1 P-value
Error SSE n-k
TOTAL SST n-1
1
k
SSGMSG
kn
SSEMSE
MSE
MSGF 0
ANOVA process by handExample 19.2
SITUATION: Soap manufacturer wants to test 3 new machines that should fill a jug. They tested for 5 hours and recorded the number of jugs filled by each per hour:
– At the 10% significance level can we reject the null hypothesis that the mean number of jugs filled per hour by each machine is the same?
• k = 3• n1= n2 =n3 = 5
continued….
Machine 1 Machine 2 Machine 3
54 53 49
49 56 53
52 57 47
55 51 50
48 59 54
ANOVA process by handExample 19.2 continued
• We now need to calculate the ANOVA table• For machine 1:
• Now do the same for machine 2 & 3
• Then for 1-3 combined
3.91
)(
6.515
4855524954
5
1
212
1
1
11
1
n
xxs
n
xx
n
3.8,6.50,5
2.10,2.55,52332
2222
sxn
sxn
7333.169,4667.52,3,15 2 sxkn
ANOVA process by handExample 19.2 continued
• Then we can calculate SSG/E/T:
SST =SSG + SSE = 58.5335 + 111.2 = 169.7335
• Degrees of freedom– Group df = k-1 = 3-1 = 2– Error df = n-k = 15-3 = 12– Total df = n-1 = 15-1 = 14
continued….
5335.58)4667.526.50(5)4667.522.55(5)4667.526.51(5
)()()(222
233
222
211
SSG
xxnxxnxxnSSG
2.1113.8)15(2.10)15(3.9)15(
)1()1()1( 233
222
211
SSE
snsnsnSSE
ANOVA process by handExample 19.2 continued
• Now calculate MSG,
MSE, and F
• Determine if the assumption that the three populations have the same population variance are valid. The assumption is reasonable if:
• Now, look in Table V of Appendix A . Use numerator df=2, denominator df=12 ….
continued….
1583.326667.9
26675.29
26667.912
2.111
26675.292
5335.58
1
0
MSE
MSGF
kn
SSEMSE
k
SSGMSG
2)min(
)max(
i
i
s
s
Example 19.2 ANOVA tables
• Replace the calculations results in the table below:
• Do we reject the null hypothesis?
H0: m1=m2=m3
H1: Not all three populations are equal
SourceSum of
Squares df Mean Square F P
Groups SSG k-1 P-value
Error SSE n-k
TOTAL SST n-11
k
SSGMSG
kn
SSEMSE
MSE
MSGF 0
Example 19.2 by Excel
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
M1 5 258 51.6 9.3M2 5 276 55.2 10.2M3 5 253 50.6 8.3
ANOVASource of Variation SS df MS F P-value F crit
Between Groups 58.53333 2 29.26667 3.158273 0.079073 2.806796Within Groups 111.2 12 9.266667
Total 169.7333 14
M1 M2 M354 53 4949 56 5352 57 4755 51 5048 59 54
Example 19.2 by Minitab
One-way ANOVA: P versus M
Source DF SS MS F PM 2 58.53 29.27 3.16 0.079Error 12 111.20 9.27Total 14 169.73
S = 3.044 R-Sq = 34.49% R-Sq(adj) = 23.57%
Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev -+---------+---------+---------+--------M1 5 51.600 3.050 (---------*---------)M2 5 55.200 3.194 (---------*---------)M3 5 50.600 2.881 (---------*---------) -+---------+---------+---------+-------- 48.0 51.0 54.0 57.0
Pooled StDev = 3.044
Example 19.3 by Excel
Anova: Single Factor
SUMMARYGroups Count Sum Average Variance
A 5 108 21.6 11.3B 6 87 14.5 7.5C 6 93 15.5 13.1D 5 110 22 8.5
ANOVASource of Variation SS df MS F P-value F critBetween Groups 255.6182 3 85.20606 8.417723 0.001043 2.416005Within Groups 182.2 18 10.12222
Total 437.8182 21
Example 19.3 by Minitab
One-way ANOVA: Cus. versus Teller
Source DF SS MS F PTeller 3 255.6 85.2 8.42 0.001Error 18 182.2 10.1Total 21 437.8
S = 3.182 R-Sq = 58.38% R-Sq(adj) = 51.45% Individual 95% CIs For Mean Based on Pooled StDev
Level N Mean StDev ------+---------+---------+---------+---A 5 21.600 3.362 (--------*-------)B 6 14.500 2.739 (------*-------)C 6 15.500 3.619 (-------*-------)D 5 22.000 2.915 (--------*-------) ------+---------+---------+---------+--- 14.0 17.5 21.0 24.5
Pooled StDev = 3.182
Pairwise Comparisons
• If the result of ANOVA is to reject the null hypothesis, it does not identify which group means are significantly different.
• Most software packages include this comparison.– Calculate a confidence interval for the differences of
each unique pair of means. – Check to see if ZERO falls in the interval, if not then
they are significantly different.
Example 19.3 by Minitab
Fisher 95% Individual Confidence IntervalsAll Pairwise ComparisonsSimultaneous confidence level = 80.96%
A subtracted from: Lower Center Upper ---------+---------+---------+---------+B -11.147 -7.100 -3.053 (------*------)C -10.147 -6.100 -2.053 (------*------)D -3.827 0.400 4.627 (------*------) ---------+---------+---------+---------+ -6.0 0.0 6.0 12.0B subtracted from: Lower Center Upper ---------+---------+---------+---------+C -2.859 1.000 4.859 (------*-----)D 3.453 7.500 11.547 (------*-----) ---------+---------+---------+---------+ -6.0 0.0 6.0 12.0C subtracted from: Lower Center Upper ---------+---------+---------+---------+D 2.453 6.500 10.547 (------*------) ---------+---------+---------+---------+ -6.0 0.0 6.0 12.0
Pairwise ComparisonsFisher’s Least Significant Difference (LSD) Method
• Null Hypothesis: H0: i = j
• Least Significant Difference (LSD) :
)()11
(2
,2/,2/ jiji
EaNE
aN nnifnn
MStorn
MStLSD
• The pair of means i and j is declared significantly different if
LSDXX ji
Example 19.3 with LSD
)()11
(2
,2/,2/ jiji
EaNE
aN nnifnn
MStorn
MStLSD
LSDXX ji
n
Teller A 5 21.6
Teller B 6 14.5
Teller C 6 15.5
Teller D 5 22.0
1.10EMS
Example 19.3 with LSD
)()11
(2
,2/,2/ jiji
EaNE
aN nnifnn
MStorn
MStLSD
ni nj LSD
Teller A-B 5 6 21.6 7.1 4.71
Teller A-C 5 6 6.1 4.71
Teller A-D 5 5 0.4 4.91
Teller B-C 6 6 14.5 1.0 4.49
Teller B-D 6 5 14.5 7.5 4.71
Teller C-D 6 5 15.5 6.5 4.71
Welch’s Approach to Heterogeneity of Variance
• If Max(sj2)/Min(sj
2)>2, the assumption of equal variance can not be used.
• Welch’s approach modifies the F-test with the following steps:– For each sample j, calculate wj
– Calculate the summation of w from k samples– Calculate the weighted avg. of sample means– Calculate the test statistic F0 and df
2j
jj s
nw
k
j jw1
k
j j
k
j jj
w
XwX
1
1
k
j k
j j
j
j
k
j jj
w
w
nkk
k
XXw
F
1
2
1
2
1
2
0
11
11
)2(21
1
)(
k
j k
j j
j
j w
w
n
kdf
1
2
1
2
11
13
1