GMS MS 700, Lecture 7-2

8/6/2019 GMS MS 700, Lecture 7-2

1/38

Hypothesis Testing

Analysis of Variance (ANOVA)

GMS MS 700/GMS AN 704

Elementary Biostatistics

March 23, 2011

8/6/2019 GMS MS 700, Lecture 7-2

2/38

Hypothesis Testing

continuous outcomes: z- ort-test

one sample

two samples

paired samples (matched samples)

discrete outcomes:2

one sample (2goodness-of-fit test)

two samples (2test of independence)

8/6/2019 GMS MS 700, Lecture 7-2

3/38

Hypothesis Testing

continuous outcomes: ANOVA

more than two samples/groups

several types of ANOVAs

one-way (one-factor)

extension of two-sample t-test

randomized block (no interaction effects)

multi-factor (possible interaction effects)

repeated measures extension of paired-samples t-test

8/6/2019 GMS MS 700, Lecture 7-2

4/38

One-Way ANOVA allows us to compare the means of2 ormore groups or categories (the independent variable) onone dependent variable to determine if the groups differsignificantly from one another on the DV.

To use ANOVA, you must have a categorical (or nominal)variable that has at least two independent groups (e.g.treatment vs control, fuel 1 vs fuel 2) as the independentvariable and a continuous variable (interval or ratio) as thedependent variable.

ANOVA is very similar to a t-test, particularly whencomparing only 2 groups. But when looking at 3 or moregroups, ANOVA is much more effective in determiningsignificant group differences.

What is ANOVA?

8/6/2019 GMS MS 700, Lecture 7-2

5/38

t-tests allow us to decide whether the observeddifference between two group means is large enoughnot to be due to chance (i.e., statistically significant).

But the more ttests we run, the greater the chance ofrejecting the null hypothesis when it is true (Type 1error).

ANOVA takes into account the number of groups beingcompared, and provides us with more certainty inconcluding significance when looking at 3 or moregroups.

Rather than finding a simple difference between 2means as in a t-test, in ANOVA we find the averagedifference between means of multiple independentgroups using the squared value of the differencebetween the means.

t-Tests vs. ANOVA

8/6/2019 GMS MS 700, Lecture 7-2

6/38

H0: There is no difference in MPG between fuels.

HA: There is a difference in MPG between fuels.

(What is the IV? What is the DV?)Data Set 1

Fuel 1 Fuel 2 Fuel 3

40 50 5644 54 56

42 52 54

44 52 5840 52 56

M1 = 42 M2= 52 M3 = 56

Grand M= 50

Data Set 2


36 54 3448 40 74

34 58 58

44 62 42

48 46 72

M1 = 42 M2= 52 M3 = 56

Grand M= 50

8/6/2019 GMS MS 700, Lecture 7-2

7/38

One-Way (One-Factor) ANOVA (one IV):An Intuitive Decomposition of Sum of Squares/Variance

Variance: the near average of the squared differences ofa set of observations around its mean

One-Way ANOVA: Compare the between-group (between-factor) variance to the within-group (within-factor) variance

In case of ANOVA, variance is referred to as the meansquare

Fstatistic is determined by the ratio of these two variances

1

)( 22

7!

n

XXs

8/6/2019 GMS MS 700, Lecture 7-2

8/38

Hypothesis Testing for More than 2 Means:

ANOVA

Continuous outcome

k Independent Samples, k > 2

H0: Q!Q2!Q !Qk

H1: Means are not all equalTest Statistic

Find critical value in Table 4 Fdistribution

df = (k -1), (N k)

k)/(N)X(X

1)/(k)XX(nF

2j

2

jj

!

8/6/2019 GMS MS 700, Lecture 7-2

9/38

An Intuitive Decomposition of Sum of Squares

Data Set 1: Decision Rule

SSTOTAL = SSBETWEEN + SSWITHINFuel 1 Fuel 2 Fuel 3

40 50 56

44 54 56

42 52 5444 52 58

40 52 56

M1 = 42 M2= 52 M3 = 56

GrandM

=50

k 1 = 3 1 = 2; N k = 15 3 = 12

F(2, 12) = 3.89 (E = .05; Table 4)

Data Set 1

k)/()X(X)/(k)XX(

!

8/6/2019 GMS MS 700, Lecture 7-2

10/38


Data Set 1

SSTOTAL = SSBETWEEN + SSWITHIN


40 50 56

44 54 56

42 52 54

44 52 58

40 52 56

M1 = 42 M2= 52 M3 = 56

Grand M= 50

SST = (40 - 50)2 + (44 - 50)2 + + (58 - 50)2 + (56 - 50)2

= 552 units of variation

Data Set 1

8/6/2019 GMS MS 700, Lecture 7-2

11/38

An Intuitive Decomposition of Sum of Squares:

Data Set 1



40 50 56

44 54 56

42 52 54

44 52 58

40 52 56

M1 = 42 M2= 52 M3 = 56

Grand M= 50

SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2]

= 5 [ 64 + 4 + 36]


Data Set 1

8/6/2019 GMS MS 700, Lecture 7-2

12/38


Data Set 1SS

TOTAL =SS

BETWEEN+ SS

WITHIN


40 50 56

44 54 56

42 52 5444 52 58

40 52 56

M1 = 42 M2= 52 M3 = 56

Grand M= 50

SSW1 = (40 - 42)2 + + (40 - 42)2 = 16 for Fuel 1

SSW2 = (50 - 52)2 + + (52 - 52)2 = 8 for Fuel 2

SSW3 = (40 - 56)2 + + (40 - 56)2 = 8 for Fuel 3


DataSe

t1

8/6/2019 GMS MS 700, Lecture 7-2

13/38


Data Set 1: Conclusion

Sources of

Variation

Sum of

Squares

df Mean

Square

F p

Between Groups 520 2 260 97.5 .000

Within Groups/Error 32 12 2.67

Total 552 14

Reject H0 because F= 97.5 > F= 3. 89 (E = .05).

Conclude that there is a significant difference between fuels in

MPG.

8/6/2019 GMS MS 700, Lecture 7-2

14/38

SSTOTAL =

SSBETWEEN

+ SSWITHIN


36 54 34

48 40 74

34 58 58

44 62 42

48 46 72

M1 = 42 M2= 52 M3 = 56

Grand M= 50

Data Set 2

An Intuitive Decomposition of Sum of SquaresData Set 2: Decision Rule

k 1 = 3 1 = 2; N k = 15 3 = 12

F(2, 12) = 3.89 (E = .05; Table 4)

8/6/2019 GMS MS 700, Lecture 7-2

15/38



36 54 34

48 40 7434 58 58

44 62 42

48 46 72

M1 = 42 M2= 52 M3 = 56

Grand M= 50

SST = (36 - 50)2 + (48 - 50)2 + + (42 - 50)2 + (72 - 50)2


Data Set 2

An Intuitive Decomposition of Sum of SquaresData Set 2

8/6/2019 GMS MS 700, Lecture 7-2

16/38


Data Set 2

SSB = 5 [(42 - 50)2 + (52 - 50)2 + (56 - 50)2]

= 5 [ 64 + 4 + 36]

= 520 units of variation (NOTE: Same as for Data Set 1)

Data Set 2


36 54 34

48 40 74

34 58 58

44 62 42

48 46 72

M1 = 42 M2= 52 M3 = 56

Grand M= 50


8/6/2019 GMS MS 700, Lecture 7-2

17/38


Data Set 2

SSTOTAL =

SSBETWEEN

+ SSWITHIN

SSW1 = (36 - 42)2 + + (48 - 42)2 = 176 for Fuel 1

SSW2 = (54 - 52)2 + + (46 - 52)2 = 320 for Fuel 2

SSW3 = (34 - 56)2 + + (72 - 56)2 = 1264 for Fuel 3


Data Set 2


36 54 34

48 40 74

34 58 5844 62 42

48 46 72

M1 = 42 M2= 52 M3 = 56

Grand M= 50

8/6/2019 GMS MS 700, Lecture 7-2

18/38


Data Set 2: Conclusion

Sources ofVariation

Sum ofSquares

df MeanSquare

F p

Between Groups 520 2 260 1.77 .212

Within Groups/Error 1760 12 146.7

Total 2280 14

Accept H0 because F= 1.77 < F= 3. 89 (E = .05).

Conclude that there is not a significant difference between fuels

in MPG.

8/6/2019 GMS MS 700, Lecture 7-2

19/38

One-Way (One-Factor) ANOVA:

An Intuitive Decomposition of Sum of Squares/Variance

Between-Group

Variance

Within-Group

Variance

Likely

Statistical Outcome

small small hard to say.

small large factor has little or no

effect. accept HO.

large small factor has a large

effect. reject HO.

large large hard to say.

8/6/2019 GMS MS 700, Lecture 7-2

20/38

Post-Hoc Tukey HSD Test between Means

xsHSDTukey

21

! 73.5

67.2!!!

g

e

x n

MS

s

where ng = the number of cases in each group

Tukey1-2 = (42 - 52)/.73 = 13.7 p < .01

Tukey1-3 = (42 - 56)/.73 = 19.2 p < .01

Tukey2-3 = (52 - 56)/.73 = 5.48 p < .01

Critical value of Tukey statistic (seeTable D) is based on number of

groups/factors (3 here) and the df of the error term (12 here) 3.77 for

= .05 and 5.05 for = .01

Each of the 3 means are significantly different from each other at .01 level of

significance mileage for Fuel 3 > mileage for Fuel 2 > mileage for Fuel 1

8/6/2019 GMS MS 700, Lecture 7-2

21/38

SPSS Input for Data Set 1

Fuel Mileage

1 40

1 44

1 42

1 44

1 40

2 502 54

2 52

2 52

2 52

3 56

3 56

3 54

3 58

3 56

8/6/2019 GMS MS 700, Lecture 7-2

22/38

SPSS Output for Data Set 1

Test of Homogeneity of Variances

Mileage

Levene Statistic df1 df2 Sig.

1.000 2 12 .397

ANOVA

M

Sum of Squares df Mean Square F Sig.

Between Groups 520.000 2 260.000 97.500 .000Within Groups 32.000 12 2.667

Total 552.000 14

Tests the H0 that the error

variance of the dependent

variable is equal across

groups.

8/6/2019 GMS MS 700, Lecture 7-2

23/38

8/6/2019 GMS MS 700, Lecture 7-2

24/38

An Intuitive Decomposition of SS: Practice

Decision Rule

Data Set 3

Fuel 1 Fuel 2 Fuel 320 25 28

22 27 28

21 26 27

22 26 29

20 26 28

M1 = 21 M2= 26 M3 = 28

Grand M= 25

8/6/2019 GMS MS 700, Lecture 7-2

25/38


Between-Groups Variance

Data Set 3


22 27 28

21 26 27

22 26 29

20 26 28

M1 = 21 M2= 26 M3 = 28

Grand M= 25

8/6/2019 GMS MS 700, Lecture 7-2

26/38


Within-Groups Variance

Data Set 3


22 27 28

21 26 27

22 26 29

20 26 28

M1 = 21 M2= 26 M3 = 28

Grand M= 25

8/6/2019 GMS MS 700, Lecture 7-2

27/38


Data Set 3


20 25 28

22 27 28

21 26 27

22 26 29

20 26 28

M1 = 21 M2= 26 M3 = 28

Grand M= 25

Sources of Variation Sum of

Squares

df Mean

Square

F p

Between Groups

Within Groups/Error

Total

8/6/2019 GMS MS 700, Lecture 7-2

28/38

One-Way (One-Factor) ANOVA:

Fishers Randomized Block Design

In some cases, an extraneous factoris a systematic sourceof variance that increases the error term

The goal of a randomized block design is to block theextraneous source of variance and to remove it from the errorterm, thus increasing the between-groups F value

in effect, the randomized block design removes unexplainedvariance from the error term by associating it with anextraneous factor that is affecting the results

Fisher (from whom we get ourFvalue) developed the blockdesign to account forextraneous variance in crop yieldassociated with farm location (e.g., northern vs. central vs.southern locales in England) in order to test whether therewere real differences in his main experimental factor, fertilizer

type

8/6/2019 GMS MS 700, Lecture 7-2

29/38

One-Factor Randomized Block Design


Fertilizer 1 Fer tilizer 2

38 50

42 52

29 3832 41

18 27

22 28

M1 = 30.17 M2= 39.33

Grand M= 34.75

SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2

= 1232.25 units of variation

Data Set Unblocked

8/6/2019 GMS MS 700, Lecture 7-2

30/38




38 50

42 52

29 3832 41

18 27

22 28

M1 = 30.17 M2= 39.33

Grand M= 34.75

Data Set Unblocked

SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2]


8/6/2019 GMS MS 700, Lecture 7-2

31/38




38 50

42 52

29 3832 41

18 27

22 28

M1 = 30.17 M2= 39.33

Grand M= 34.75

Data Set Unblocked

SSW1 = (38 30.17)2 + + (22 - 30.17)2 for Fertilizer 1

SSW2 = (50 39.33)2 + + (28 - 39.33)2 for Fertilizer 2


8/6/2019 GMS MS 700, Lecture 7-2

32/38



Squares

df Mean

Square

F p

Between Groups 252.1 1 252.1 2.57 .140

Within Groups/Error 980.2 10 98.02

Total 1232.3 11 112.03


38 50

42 5229 38

32 41

18 27

22 28

M1 = 30.17 M2= 39.33Grand M= 34.75

Data Set Unblocked

8/6/2019 GMS MS 700, Lecture 7-2

33/38


SSTOTAL = SSBETWEEN + SSBLOCK + SSWITHIN

Blocked

Variable

Fertilizer 1 Fer tilizer 2 Sector Mean

Northern Sector 38 50 MN

= 45.5

42 52Central Sector 29 38 M

C= 35

32 41

Southern Sector 18 27 MS

= 23.75

22 28M1

= 30.17 M2= 39.33 Grand M= 34.75

SST = (38 34.75)2 + (42 34.75)2 + + (27 34.75)2 + (28 34.75)2


Data SetBlocked

8/6/2019 GMS MS 700, Lecture 7-2

34/38



Blocked

variable



= 45.5

42 52

Central Sector 29 38 MC

= 35

32 41


= 23.75

22 28

M1

= 30.17 M2= 39.33 Grand M= 34.75

Data SetBlocked

SSB = 6 [(30.17 34.75)2 + (39.33 - 34.75)2]

= 252.1 units of variation (NOTE: Same as forUnblocked Data Set)

8/6/2019 GMS MS 700, Lecture 7-2

35/38



Blocked

variable



= 45.5

42 52


= 35

32 41


= 23.75

22 28

M1

= 30.17 M2= 39.33 Grand M= 34.75

Data SetBlocked

SSBL = 4 [(45.5 34.75)2 + (35 - 34.75)2 + (23.75 - 34.75)2]


8/6/2019 GMS MS 700, Lecture 7-2

36/38



Squares

df Mean

Square

F p

Blocked/Extraneous Factor 946.5 2 473.25 112.4 .000

Between Groups 252.1 1 252.1 59.9 .000

Within Groups/Error* 33.7 8 4.21

Total 1232.3 11 112.03

Blocked

variable


Northern Sector 38 50 MN= 45.542 52


= 35

32 41


= 23.75

22 28

M1

= 30.17 M2= 39.33 Grand M= 34.75

*Was 980.2 Unblocked. 980.2 946.5 = 33.7

8/6/2019 GMS MS 700, Lecture 7-2

37/38

8/6/2019 GMS MS 700, Lecture 7-2

38/38

SPSS Input for Blocked Data Set

Fertilizer Plot Bushels

1 1 381 1 42

1 2 29

1 2 32

1 3 18

1 3 22

2 1 50

2 1 52

2 2 38

2 2 41

2 3 27

2 3 28

GMS MS 700, Lecture 7-2

Documents

Transcript of GMS MS 700, Lecture 7-2