1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

59
1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Transcript of 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

Page 1: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

1

Lecture 2

ANALYSIS OF VARIANCE: AN INTRODUCTION

Page 2: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

2

Between subjects experiments

• The caffeine experiment was of between subjects design, that is, each participant was tested under only one condition.

• Participants were RANDOMLY ASSIGNED to the conditions, so that there was no basis on which the data could be paired.

• Between subjects experiments result in INDEPENDENT SAMPLES of data.

Page 3: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

3

More than two conditions

• In more complex experiments, there may be three or more conditions.

• For example, we could compare the performance of groups of participants who have ingested four different supposedly performance-enhancing drugs with that of a control or placebo group.

Page 4: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

4

Factors

• In the context of analysis of variance (ANOVA), a FACTOR is a set of related treatments, conditions or categories.

• The ANOVA term ‘factor’ is a synonym for the term ‘independent variable’.

Page 5: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

5

One-factor experiments

• In the drug experiment, there is just ONE set of (drug-related) conditions.

• The experiment therefore has ONE treatment factor.

• The conditions making up a factor are known as its LEVELS. In the drug experiment, the treatment factor has 5 levels.

Page 6: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

6

Results of the experiment

raw scores

grand mean

Page 7: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

7

Statistics of the results

group (cell) means

group (cell) standard deviations

Group (cell) variances

Page 8: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

8

The null hypothesis

• The null hypothesis states that, in the population, all the means have the same value.

• We cannot test this hypothesis with the t statistic.

Page 9: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

9

The alternative hypothesis

• The alternative hypothesis is that, in the population, the means do NOT all have the same value.

• MANY POSSIBILITIES are implied by H1.

Page 10: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

10

The One-way ANOVA

• The ANOVA of a one-factor between groups experiment is also known as the ONE-WAY ANOVA.

• The one-way ANOVA must be sharply distinguished from the one-factor WITHIN SUBJECTS (or REPEATED MEASURES) ANOVA, which is appropriate when participants are tested at every level of the treatment factor.

• The between subjects and within subjects ANOVA are based upon different statistical models.

Page 11: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

11

There are some large differences among the five treatment means,

suggesting that the null hypothesis is false.

Page 12: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

12

Mean square (MS)

• In ANOVA, the numerator of a variance estimate is known as a SUM OF SQUARES (SS). The denominator is known as the DEGREES OF FREEDOM (df). The variance estimate itself is known as a MEAN SQUARE (MS), so that MS = SS/df .

Page 13: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

13

Accounting for variability

• The building block for any variance estimate is a DEVIATION of some sort.

• The TOTAL DEVIATION of any score from the grand mean (GM) can be divided into 2 components: 1. a BETWEEN GROUPS component; 2. a WITHIN GROUPS component.

total deviation between groups deviation

within groups deviation

grand mean

Page 14: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

14

Example of the breakdown

• The score, the group mean and the grand mean have been ringed in the table.

• This breakdown holds for each of the fifty scores in the data set.

score

grand mean

group mean

Page 15: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

15

Breakdown (partition) of the total sum of squares

• If you sum the squares of the deviations over all 50 scores, you obtain an expression which breaks down the total variability in the scores into between groups and within groups components.

Page 16: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

16

How ANOVA works

• The variability BETWEEN the treatment means is compared with the average spread of scores around their means WITHIN the treatment groups.

• The comparison is made with a statistic called the F-RATIO.

Page 17: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

17

The variances of the scores in each group around their group mean are

averaged to obtain a WITHIN GROUPS MEAN SQUARE

Page 18: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

18

From the values of the five treatment means, a BETWEEN GROUPS MEAN SQUARE is

calculated.

Page 19: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

19

The statistic F is calculated by dividing the between groups MS by

the within groups MS thus

Page 20: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

20

The F ratio

Page 21: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

21

The value of the MSbetween , since it is calculated from the MEANS,

reflects random error, plus any real differences among the population

means that there may be.

Page 22: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

22

The value of MSwithin , since it is calculated only from the variances

of the scores within groups and ignores the values of the group

means, reflects ONLY RANDOM ERROR.

Page 23: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

23

What F is measuring

• If there are differences among the population means, the numerator will be inflated and F will increase.

• If there are no differences, F will be close to 1.

error + real differences

error only

Page 24: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

24

Expectations

• If the null hypothesis is true, the values of MSbetween

and MSwithin will be similar, because both

variance estimates merely reflect individual differences and random variation or ERROR.

• If so, the value of F will be around 1. • If the null hypothesis is false, real differences

among the population means will inflate the value of MSbetween but the value of MSwithin

will be unaffected.

• The result will be a LARGE value of F.

Page 25: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

25

Range of variation of F

• The F statistic is the ratio of two sample variances.

• A variance can take only non-negative values.

• So the lower limit for F is zero.

• There is no upper limit for F.

Page 26: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

26

Imagine…

• Suppose the null hypothesis is true.

• Imagine the experiment were to be repeated thousands and thousands of times, with fresh samples of participants each time.

• There would be thousands and thousands of data sets, from each of which a value of F could be calculated.

Page 27: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

27

Sampling distribution

• To test the null hypothesis, you must be able to locate YOUR value of F in the population or PROBABILITY DISTRIBUTION of such values.

• The probability distribution of a statistic is known as its SAMPLING DISTRIBUTION.

• To specify a sampling distribution, you must assign values to properties known as PARAMETERS.

Page 28: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

28

Parameters of F

• Recall that the t distribution has ONE parameter: the DEGREES OF FREEDOM (df ).

• The F distribution has TWO parameters: the degrees of freedom of the between groups and within groups mean squares, which we shall denote by dfbetween and dfwithin, respectively.

Page 29: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

29

Rule for finding the degrees of freedom

• There’s a useful rule for finding the degrees of freedom of a statistic.

• Take the number of independent observations and subtract the number of parameters estimated.

• The sample variance of n scores is based upon n independent observations. But to obtain the deviations, we need an estimate of ONE parameter, namely, the mean.

• So the degrees of freedom of the sample variance is n – 1, not n.

Page 30: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

30

Rule for obtaining the df

Page 31: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

31

Degrees of freedom of the two mean squares

• The degrees of freedom of MSbetween is the number of treatment groups minus 1. (One parameter estimated: the grand mean.)

• The degrees of freedom of MSwithin is the total number of scores minus the number of treatment groups. (Five parameters are estimated: the five group means.)

Page 32: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

32

The correct F distribution

• We shall specify an F distribution with the notation F(dfbetween, dfwithin).

• We have seen that in our example, dfbetween = 4 and dfwithin = 45.

• The correct F distribution for our test of the null hypothesis is therefore F(4, 45).

Page 33: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

33

The distribution of F(1, 45)

• F distributions are POSITIVELY SKEWED, i.e., they have a long tail to the right.

• However, the shape of F varies quite markedly with the values of the df.

Page 34: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

34

The distribution of F(4, 45)

Page 35: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

35

Distribution of F(4, 45)

• The critical region is in the upper tail of this F distribution.

• If we set the significance level at .05, the value of F must be at least 2.6.

• The value 2.58 is the 95th Percentile of the distribution F(4, 45).

Page 36: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

36

The F distribution

• An F distribution is asymmetric, with an infinitely long tail to the right.

• The critical region lies above the 95th percentile which, in this F distribution, is 2.58.

0 F 95th percentile = 2.58

.05

.95

F(dfbetween, dfwithin) = F(4, 45)

Page 37: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

37

The ANOVA summary table• F large, nine times larger than unity, the expected value

from the null hypothesis and well over the critical value 2.58.

• The p-value (Sig.) <.01. So F is significant beyond the .01 level.

• Write this result as follows: ‘with an alpha-level of .05, F is significant: F(4, 45) = 9.09; p <.01’.

• Do NOT write the p-value as ‘.000’!• Notice that SStotal= SSbetween groups + SSwithin groups

Page 38: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

38

SPSS advice

• A few general points. • Give close attention to the labels you give to

your variables, and to the appearance of your data. Unnecessary decimal places clutter the display.

• It is particularly important to assign VALUE LABELS to the code numbers you choose for any grouping variables.

• Specify also the LEVEL OF MEASUREMENT of each variable.

Page 39: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

39

Start in Variable View

Work in Variable View first, amending the settings so that when you enter Data View, your variables are already labelled, the scores appear without unnecessary decimals and you will have the option of displaying the value labels of your grouping variable.

Page 40: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

40

Graphics

• The latest SPSS graphics require you to specify the level of measurement of the data on each variable.

• The group code numbers are at the NOMINAL level of measurement, because they are merely CATEGORY LABELS.

• Make the appropriate entry in the Measure column.

Page 41: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

41

Grouping variables

• To instruct SPSS to analyse data from between subjects experiments, you must construct a GROUPING VARIABLE consisting of code numbers identifying the treatment condition under which a score was achieved.

• So we could set 1 = Placebo, 2 = Drug A, 3 = Drug B, 4 = Drug C, and 5 = Drug D.

Page 42: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

42

Data View

• This is what Data View will look like.

• The entry of data for an ANOVA on SPSS is similar to the procedure we followed when making an independent-samples t-test.

• On the right, the VALUE LABELS are displayed, instead of the values themselves. (This option appears in the Data menu.)

Page 43: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

43

Assignment of values in Variable View

Page 44: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

44

Variable View completed

• Note the setting of Decimals so that only whole numbers will appear in Data View.

• Note the informative variable LABELS, which will appear in the output.

• Note the VALUE LABELS giving the key to the code numbers you have chosen for your grouping variable. (The ‘values’ themselves are the code numbers you have chosen.)

Page 45: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

45

Page 46: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

46

The One-Way ANOVA dialog box

Page 47: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

47

More statistics

• By clicking Options, you can order more statistics than would normally appear in the ANOVA output.

• Click the Descriptive button to order the extra statistics and then Continue, to return to the ANOVA dialog box.

Page 48: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

48

A word of warning

• Modern computing packages such as SPSS afford a bewildering variety of attractive graphs and displays to help you bring out the most important features of your results. You should certainly use them.

• But there are pitfalls awaiting the unwary. • Suppose the drug experiment had turned out

rather differently. The researcher proceeds as follows.

Page 49: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

49

Ordering a means plot

Page 50: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

50

A picture of the results

Page 51: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

51

The picture is false!

• The table of means shows miniscule differences among the five group means.

• The value of F is very small indeed.

• The p-value of F is very high – unity to two places of decimals.

• The experiment has failed to show that any of the drugs works.

Page 52: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

52

A small scale view

• Only a microscopically small section of the scale is shown on the vertical axis.

• This greatly magnifies even small differences among the group means.

Page 53: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

53

Putting things right

• Double-click on the image to get into the Graph Editor.

• Double-click on the vertical axis to access the scale specifications.

Click here

Page 54: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

54

Putting things right …

• Uncheck the minimum value box and enter zero as the desired minimum point.

• Click Apply.

Amend entry

Page 55: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

55

The true picture!

Page 56: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

56

The true picture …

• The effect is dramatic. • The profile now

reflects the true situation.

• Always be suspicious of graphs that do not show the complete vertical scale.

Page 57: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

57

Summary

• In the one-way ANOVA, we compare two variance estimates, MSbetween and MSwithin by means of their ratio, which is called the F statistic.

• If F is large, we conclude that there is at least one significant difference somewhere among the array of treatment means.

Page 58: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

58

Multiple-choice question

Page 59: 1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.

59

Multiple-choice example