Notes_ANOVA_04-13-08

12
8/10/2019 Notes_ANOVA_04-13-08 http://slidepdf.com/reader/full/notesanova04-13-08 1/12 One-way Analysis of Variance  (available on E-book) In basic statistics, the F-distribution is used in: (1) making inferences about two population variances— i.e., homogeneity of variance test, and (2) analysis of variance (ANOVA). In this class, we will cover only the ANOVA test. E.g., if samples are drawn of size n 1 =8 from Population 1 and size n 2 =5 from Population 2, then F has df = 7, 4 (i.e., (8-1), (5-1)). 7 1 8 ) 1 1 ( 8 2 2 1 1  = = = df  x  x s n  i  4 7 2 2 2 1 and df has s s  = =  4 1 5 ) 2 2 ( 5 2 2 2 2  = = =  df  x  x s n  i  Note that df for F is always stated as first numerator df  and then denominator df . Finding critical values of the F-distribution using Table V. Characteristics of the F-distribution 1. F > 0. 2. The F-distribution is not symmetric; it is skewed to the right. 3. The F-distribution is asymptotic to the horizontal axis on the right hand side. 4. As df increase, the high point of the F-distribution approaches 1. 5. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator (see Figure 10 above). This is similar to Student’s t-distribution, whose shape depends upon the degrees of freedom. 6. The total area under the curve is 1. Fisher’s F-distribution If σ 1 2  = σ 2 2  and s 1 2  and s 2 2  are sample variances from independent simple random samples of size n 1  and n 2 , respectively, drawn from normal populations, then 2 2 2 1 s s =  follows the F-distribution with n 1 -1 degrees of freedom in the numerator and n 2 -1 degrees of freedom in the denominator.

Transcript of Notes_ANOVA_04-13-08

Page 1: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 1/12

One-way Analysis of Variance (available on E-book)

In basic statistics, the F-distribution is used in: (1) making inferences about two population variances—i.e., homogeneity of variance test, and (2) analysis of variance (ANOVA). In this class, we will cover

only the ANOVA test.

E.g., if samples are drawn of size n1=8 from Population 1 and size n2=5 from Population 2, then F has df= 7, 4 (i.e., (8-1), (5-1)).

718

)11(8

2

2

11   =−

−==∑

df  x x

sn  i

 

472

2

2

1 and df hass

sF    ==  

415

)22(5

2

2

22   =−

−==

  ∑df 

 x xsn

  i 

Note that df for F is always stated as first numerator df  and then denominator df .

Finding critical values of the F-distribution using Table V.

Characteristics of the F-distribution

1. F > 0.2. The F-distribution is not symmetric; it is skewed to the right.

3. The F-distribution is asymptotic to the horizontal axis on the right hand side.

4. As df increase, the high point of the F-distribution approaches 1.

5. The shape of the F-distribution depends upon the degrees of freedom in the numerator anddenominator (see Figure 10 above). This is similar to Student’s t-distribution, whose shape

depends upon the degrees of freedom.

6. The total area under the curve is 1.

Fisher’s F-distribution

If σ12 = σ2

2 and s1

2 and s2

2 are sample variances from independent simple random samples of size n1 

and n2, respectively, drawn from normal populations, then

2

2

2

1

s

sF =  

follows the F-distribution with n1-1 degrees of freedom in the numerator and n2-1 degrees of freedom

in the denominator.

Page 2: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 2/12

  2

Find the critical F-value for a right-tailed test with α=0.05, degrees of freedom in the numerator = 10 anddegrees of freedom in the denominator = 6.

F 0.05, 10. 6

Area in the right tail

(i.e., α or significance

level)

df of numerator

df of denominator

F0.05, 10, 6 

Page 3: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 3/12

  3

Analysis of Variance (ANOVA) is an inferential method that is used to test the equality of three or morepopulation means. ANOVA is an extension of a t-test for independent samples (section 10.2)

H0: µ 1 = µ 2 = … = µ kH1: not all means are equal

For example, for k=3 the null hypothesis and alternative hypotheses are:H0: µ 1 = µ 2 = µ 3 

H1: µ 1 = µ 2 ≠ µ 3µ 1 ≠ µ 2 = µ 3µ 1 = µ 3 ≠ µ 2µ 1 ≠ µ 2 ≠ µ 3

Assumptions of a One-Way ANOVA

1.  There are k simple random samples from k populations.

2.  The k samples are independent of each other, that is, the subjects in one group

cannot be related in any way to subjects in a second group.3.  The populations are normally distributed.

4.  The populations have the same variance; that is, each treatment group haso ulation variance σ

2.

Population 1

Population 3

Population 2

Page 4: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 4/12

ANOVA Test using the F-distribution—Hypothesis Test Regarding Three or More Means with σ 

Unknown

Assumptions:

•  k simple random samples from k populations. •  k populations are normally distributed.

•  k samples are independent of each other •  The populations have the same variances, σ2.

Step 1:  A claim is made regarding the means of three or more populations. The null and alternative hypotheses are written as:

H0: µ 1 = µ 2 = … = µ kH1: not all means are equal

Step 2:  Select a level of significance, α, and find the right-tailed critical value for the F-distribution with df=(k-1),

(n1+n2+…+nk -k). The rejection region (or critical region) is the set of all values of the test statistic to the right of the critical

F-value.

Step 3: Calculate the test statistic or calculated F-value:

a.  Calculate the grand mean of the combined data set,  x , by adding up all the observations and dividing by the number of

observations.

b.  Find the sample mean for each population or treatment ( 1 x = sample mean from population 1; 2 x = sample mean from

population 2; and so on).

c.  Find the sample variance for each population (s12 = sample variance from population 1; s2

2 = sample variance from

population 2; and so on).

d.  Calculate the mean square due to treatment. (Another name for mean square is variance which is equal to the “mean” of

the squared deviations about x ).

1k 

)xx(n...)xx(n)xx(nMST

2k k 222

211

−++−+−= ,

where n1 is the sample size from population 1;

n2 is the sample size from population 2; and so on

k is the number of populations, or treatment levels.

e.  Calculate the mean square due to error:

k )n...nn(

s)1n(...s)1n(s)1n(MSE

k 21

2

k k 

2

22

2

11

−+++

−++−+−= .

f.  Calculate the F test statistic:

)errortoduesquaremean(MSE

)treatmenttoduesquaremean(MSTF  =  

The calculations in Step 3 are reported in an ANOVA table as shown below.

Source of Variation Sum of Squares Degrees of Freedom Mean Square

F-

Statistic

Treatment SST k-1 MST=SST/(k-1) F=MST/MSE

Error SSE n1+n2+…+nk -k MSE=SSE / ( n1+n2+…+nk -k)

Total SS n1+n2+…+nk -1

Step 4:  Draw a conclusion:

•  Compare the calculated F-value (or F test statistic) to the critical F-value and state whether or not H0 is rejected at the

specified α.

If F > Fα, (k-1),(n1+n2+…+nk-k), reject H0; otherwise do not reject H0.

•  Interpret the conclusion in the context of the problem.

Fα, k-1 , n1+n2+…+nk-k

The numerator in the computation

of MST is called the “sum of

s uares treatment” or SST.

The numerator in the computation

of MSE is called the “sum of

s uares error” or SSE.

Page 5: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 5/12

  5

ANOVA—Blood Glucose Levels of Rats.

Problem: Researcher Jelodar Gholamali wanted to determine the effectiveness of various

treatments on glucose levels of diabetic rats. He randomly assigned diabetic albino rats into four

treatments groups. Group 1 rats served as a control group and were fed a regular diet. Group 2

rats were served a regular diet supplemented with a herb, fenugreek. Group 3 rats were served a

regular diet supplemented with garlic. Group 4 rats were served a regular diet supplemented

with onion. The basis for the study is that Persian folklore states that diets supplemented withfenugreek, garlic, or onion help to treat diabetes. After 15 days of treatment, the blood glucose

was measured in milligrams per deciliter (mg/dL). The results are presented in the table below.

Carry out a test of the relevant null hypothesis to test the claim made by Persian folklore that

fenugreek, garlic, and onion help treat diabetes. Use α = 0.05. Show all 4 steps of test of a

hypothesis.

Step 1:  A claim is made regarding the means of the three populations. The null and alternative hypotheses are

written as:

H0: µ 1 = µ 2 = µ 3H1: not all means are equal

Step 2:  Select α = 0.05 and find the right-tailed critical value for the F-distribution with df=(k-1), (n1+n2+n3+n4-k)

or df=3, 28.

F0.05, 2, 33 = 2.99

Control Fenugreek Garlic Onion

288.1 229.1 177.4 299.7

296.8 240.7 202.2 258.3

267.8 239.4 163.1 286.8256.7 207.7 184.7 244.0

292.1 225.7 197.9 267.1

282.9 230.8 164.6 297.1

260.3 206.6 193.9 249.9

283.8 213.3 158.1 265.1

Page 6: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 6/12

  6

Step 3:

a. Calculate the grand mean of the entire data set: 

54.23832

1.2659.249...8.2961.288x   =

++++=  

b. Find the sample mean of each population, where control = Population 1, Fenugreek = Population 2, Garlic =

Population 3 and Onion = Population 4.

56.2788

8.283...8.2961.2881x   =

+++=  

16.2248

3.213...7.2401.2292x   =

+++=  

24.1808

1.158...2.2024.1773x   =

+++=  

00.2718

1.265...3.2587.2994x   =

+++=  

c. Find the sample variance for each population.

77.225

18

2)56.2788.283(...

2)56.2788.296(

2)56.2781.288(2

1s   =

−++−+−=  

99.18118

2)16.2243.213(...

2)16.2247.240(

2)16.2241.229(2

2s   =−

−++−+−=  

03.29118

2)24.1801.158(...

2)24.1802.202(

2)24.1804.177(2

3s   =−

−++−+−=  

58.44818

2.)2711.265(...

2.)2713.258(

2.)2717.299(2

3s   =−

−++−+−=  

d. Compute MST:

8.695,16

3

4112.087,50

13

2)54.238271(8

2)54.23824.180(8

2)54.23816.224(8

2)54.23856.278(8

MST   ==

−+−+−+−=

 

e. Compute MSE:

82.28628

89.030,8

432

48.448)18(03.291)18(99.181)18(77.225)18(MSW   ==

−+−+−+−=  

f. Compute F test statistic.

21.5882.286

8.695,16

MSE

MST

errortoduesquareMean

treatmenttoduesquareMeanF   ====  

ANOVA Table:Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F-Test Statistic

Between 50,087.41 k-1=4-1=3 MST=16,695.80 calc F=58.21

Within 8,030.89 n1+n2+n3+n4-k=28 MSE=286.82

Total 58,118.30 n1+n2+n3+n4-1=31

Step 4:  Conclusion—Because the calculated F-statistic=58.21 is less than the critical F=2.99,reject H0 at the 0.05 significance level. At least one of the population means is different from the

others.

Page 7: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 7/12

Excel: ANOVA—Single Factor

Step 1:  Enter the raw data in columns A, B, C, ... for each sample (or treatment). Step 2: From the Windows menubar, select Tools/Data Analysis/ANOVA: Single Factor.

Step 3:  With the cursor in the “Input Range:” box, highlight the data. Click OK.

Perform the calculations using Excel.

A B C D

1 Control Fenugreek Garlic Onion

2 288.1 229.1 177.4 299.7

3 296.8 240.7 202.2 258.3

4 267.8 239.4 163.1 286.8

5 256.7 207.7 184.7 244.0

6 292.1 225.7 197.9 267.17 282.9 230.8 164.6 297.1

8 260.3 206.6 193.9 249.9

9 283.8 213.3 158.1 265.1

10

11

12 Anova: Single Factor

13 SUMMARY

14 Groups Count Sum Average Variance

15 Control 8 2228.5 278.5625 225.7713

16 Fenugreek 8 1793.3 224.1625 181.9884

17 Garlic 8 1441.9 180.2375 291.0341

18 Onion 8 2168.0 271 448.58

19

20 ANOVA

21 Source of Variation SS df MS F P-value F crit

22Between Groups(SST) 50090.69 3 16696.9 58.2091

3.74E-12 2.946685

23 Within Groups (SSE) 8031.616 28 286.8434

24 Total 58122.31 31

Note: Be sure the Data Analysis Tool Pak is activated. This is

done by selecting the Tools menu and highlighting, Add-Ins.Check the box for the Analysis Tool Pak and select OK.

F-statistic (or

calculated F)

Crit

Page 8: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 8/12

Logic of the ANOVA test

If H0 is TRUE, MST will approximately equal MSE and the calculated F will be approximately

equal to 1.If H0 is FALSE, MST will be greater than MSE and the calculated F > 1.

•  If the k samples are taken from populations with different means, then MST will be

considerably greater than MSE, owing to the wider dispersion of the sample means (   i x )

about the grand mean ( x )—see figure below.

•  If MST is so large that in comparison to MSE it yields a calculated F-value > the critical F-value, we conclude that the sample means are significantly different and there must be at

least one pair of samples whose means differ significantly.

H0 is TRUE  H0 is FALSE 

meangrand  x=   meangrand  x=  

ANOVA test is always a one-tailed test:

•  A significant result occurs only if MST > MSE, i.e., if the calculated F > 1; thus, a right-tailed test is always used in ANOVA.

•  Whenever MST < MSE, the result is never considered significant.

Page 9: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 9/12

  9

Tukey’s Test Using the Studendized Range Distribution—Hypothesis Test Comparing Two

Means (see Section 13.2, available under Course Compass).

Assumptions:•  k simple random samples from k populations. •  k populations are normally distributed.

•  k samples are independent of each other •  The populations have the same variances, σ2.

Step 1:  A claim is made regarding the two population means (µi and µ j).

Two-Tailed Test

H0: µi = µ j 

H1: µi ≠ µ j

µi<µ j or µi>µ j 

Step 2:  Determine the critical value, qα, (n1+n2+…+nk-k), k , where α = significance level,

(n1+n2+…+nk -k) = df for error, and k = df for treatments.

Step 3:  (a) Compute the pairwise differences,  ji xx   − , where  ji xx   > .

(b) Compute the test statistic,

 

 

 

 +∗

−=

 ji

2

 ji

n

1

n

1

2

s

xxq .

Note that s2 is the mean square error due to error, MSE, from the ANOVA table; n i is the sample

size from population i; and n j is the sample size from population j.

Step 4:  Draw a conclusion:

•  Compare the calculated q (or q statistic) to the critical value, qα, (n1+n2+…+nk-k), k , and state

whether or not the H0 is rejected at the specified α.

If q≥ qα, (n1+n2+…+nk-k), k , reject H0; otherwise do not reject H0.

•  Interpret the conclusion in the context of the problem 

Compare all pairwise differences to identify which population means are considered equal.

Page 10: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 10/12

  10

Tukey’s Test Using the Studentized Range Distribution—Example

Control Fenugreek Garlic Onion

288.1 229.1 177.4 299.7

296.8 240.7 202.2 258.3

267.8 239.4 163.1 286.8

256.7 207.7 184.7 244.0

292.1 225.7 197.9 267.1282.9 230.8 164.6 297.1

260.3 206.6 193.9 249.9

283.8 213.3 158.1 265.1

278.56 224.16 180.24 271.00

Source of

Variation

Sum of

Squares

Degrees of

Freedom Mean Square

F-

Statistic

Treatment 50,087.41 3 16,695.80

Error 8,030.89 28 286.82 58.21

Total 58,118.30 31

Step 1:  State the null and alternative hypotheses.

Step 2:  Determine the critical value, qα, (n1+n2+n3+n4-k), k , where α = significance level,(n1+n2+n3+n4-k) = df for error, and k = df for treatments.

α = ________

k = ______________

n1+n2+n3+n4-k = __________________________

Step 3: (a) Compute the pairwise difference,  ji xx   − , where  ji xx   > .

(b) Compute the test statistic,

 

 

 

 +∗

µ−µ−−=

 ji

2

 ji ji

n

1

n

1

2

s

)()xx(q .

Step 4—Conclusion. Provide a conclusion and the statistical justification for the conclusion, and

interpret your conclusion in the context of the problem.

Repeat this procedure for all pairwise differences in sample means.

Page 11: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 11/12

 Comparison, Difference,

H0 and H1  Test Statistic, q Critical Value Conclus

Summary of Tukey’s Test (arrange sample means from highest to lowest and draw a line under means that are

 ji xx   −

Page 12: Notes_ANOVA_04-13-08

8/10/2019 Notes_ANOVA_04-13-08

http://slidepdf.com/reader/full/notesanova04-13-08 12/12