Comparing 3 Means- ANOVA

Comparing 3 Means- ANOVA Evaluation Methods & Statistics- Lecture 7

Dr Benjamin Cowan

Research Example- Theory of Planned Behaviour Ajzen & Fishbein (1981)

One of the most prominent models of behaviour

Used a lot in behaviour change research

Has 3 core components:

Beliefs (right)

Intentions (middle)

Behavior (left)

Research Example- Theory of Planned Behaviour Beliefs:

Attitude- Persons attitude towards an action

Subjective norms- what people perceive others to believe about the action

Perceived behavioural control- peoples perceptions on how able they are to do a certain behaviour

Research Example- Theory of Planned Behaviour

Intentions: A person’s internal declaration to act

Behavior: The action

TPB and Pro Environmental Behaviour Perceived Behavioural Control

Self Efficacy concept

We want to see whether 2 interventions we design impact pro environmental behaviour self efficacy compared to a control group with no intervention.

We therefore have 3 conditions in our experiment

How would we design this experiment?

How would we design this experiment? IV- Intervention Level 1- Control Group (No Intervention)

Level 2- Intervention 1 (Generic Information Condition)

Level 3- Intervention 2 (Tailored Information condition)

DV- Self Efficacy/Perceived Behavioural Control Measured by questionnaire

One Way ANOVA- The Idea Compare more than 2 means to identify whether

they are significantly different i.e. whether they come from different populations

We could do 3 t-tests Compare control to group 1

Compare control to group 2

Compare group 2 to group 3

Familywise error rate If we have 3 tests in a family of tests and assume each is

independent

If we use Fishers level of 0.05 as our level of significance…

The probability of no Type I error in all of these tests 0.95 x 0.95 X 0.95 = 0.857 This is because we would expect to get a chance significant

results 5% of the time. => Probability of Type 1 error is 1- 0.857=0.143

That is far greater than the Type I error for each test separately (0.05)

We therefore use ANOVA rather than lots of t-tests

ANOVA- The Idea Compare 3 (or more) means

to identify whether they are significantly different i.e. whether they come

from different populations

Or more accurately…..we are testing the null hypothesis that the samples come from the same population.

It is what we call an omnibus test It tells us there is a

significant difference, not where it is.

ANOVA- The Idea

What this means in our example We are testing whether:

The scores on perceived behavioural control in each condition come from the same population of scores (i.e. our interventions had no effect - H0)

If they don’t (i.e. they come from different populations) then we can reject the null hypothesis

The Key: ANOVA & F Ratio We found a significant effect of condition on

perceived behavioural control [F(2, 56)= 11.78, p<0.001]

F ratio is the ratio of explained (that accounted for by the model we are proposing) to unexplained variation

This is calculated using the Mean Squares

The mindset of “models” Remember the mean is a statistical model, just sometimes not a very good one…….

We want to see whether the statistical model we have proposed explains the variation in our data better

Step 1- Total Sum of Squares The total amount of variation in our data

This should look familiar (see Lecture 3)

€

SST = xi − xgrand( )∑2

Step 1- Graphically What the equation is doing

€

xgrand

€

xi

Step 2- Model Sum of Squares We now need to know how much variation our

model can explain

How much the total variation can be explained due to data points coming from different groups in “the perfect model”

is the amount of people in that condition

€

SSM = nk (xk − xgrand∑ )2€

nk

Step 2- Graphically

€

xk

€

xgrand

What the equation is doing

Step 3- Residual Sum of Squares How much of the variation cannot be explained

by the model i.e. what error is there in the model prediction?

Easy way to calculate: SSR= SST – SSM

But here is the real formula

€

SSR = xik − xk( )2

∑

Step 3- Graphically

€

xik

€

xk

What the equation is doing

Great ….. So what next? These are summed values Therefore impacted by the number of scores in the

sum (remember variance in Lecture 3)

We can get around this by dividing by the respective degrees of freedom for each SS

Degrees of Freedom for each SS Degrees of Freedom for SST (dfT):

N-1

Degrees of Freedom for SSM(dfM): Amount of Conditions (k) -1

Degrees of Freedom for SSR(dfR): N-k

F Ratio The F Ratio is calculated using the:

Mean Squares model (MSM): SSM/dfM

Mean Squares residual (error) (MSR): SSR/dfR

F Ratio

Variation explained by our model

Variation unexplained by our model

F Ratio

Mean Square Model (MSM)

Mean Square Residual (MSR)

F Distribution F Distribution for specific pair

of degrees of freedom Table of Critical Values

One Way Independent ANOVA- Assumptions

Normally distributed data (Shapiro-Wilkes test)

Equality of Variance (Levene’s test)

Interval or ratio data

Independent data

Reporting ANOVA F ratio

Degrees of Freedom (dofM, dofR)

P value

Back to the example we saw earlier: F(2, 56)= 11.78, p<0.001

We can therefore state that there is a significant effect of our independent variable on perceived behavioural control

One Way Repeated Measures ANOVA Experiment measuring self efficacy at:

1. Time 1 (before Intervention)

2. Time 2 (directly after intervention)

3. Time 3 (1 month after the intervention)

This is a Repeated Measures design (Within Subjects- see last week)

Reduces unsystematic variability due to individual differences

Repeated Measures ANOVA- Sphericity The assumption of independence of data

doesn’t hold as the data is from the same participants

Instead we look for sphericity The assumption that the variance of the differences

between scores in each treatment are equal

Calculate the difference between pairs of scores in all possible combination of treatment levels ,then calculate the variance of these differences

Mauchly’s test of sphericity It tests the hypothesis that the variances of the

differences are equal (H0)

If I got p<0.05 for this test would it be good or bad?

Mauchly’s test of sphericity It tests the hypothesis that the variances of the

differences are equal (H0)

If I got p<0.05 for this test would it be good or bad?

It would be bad as it states there is a significant difference between variance of differences

Corrections exist if this is the case, usually Greenhouse-Geisser correction is used

One Way Repeated Measures ANOVA-Theory

Within-Participant Variance

This includes Experiment effect (as all people have taken part in

all conditions)

Error variance- that not explained by the model

Step 1- Total Sum of Squares The total amount of variation in our data

dfT = N-1

€

SST = xi − xgrand( )∑2

Step 2- Within Participant Sum of Squares How much of the variation is within participant

variance

is a person (i)’s score on a condition (k) and is the person’s mean across those conditions

dfW= Number of participants X (Number of conditions-1)

€

SSw = xik − x i _ across_ conditions( )2

∑

€

x i _ across_ conditions

€

xik

So far we know…..

The total amount of variation in our data (SSt)

How much this is caused by individual’s performance under the different experiment conditions (SSw)

Step 3- Model Sum of Squares We now need to know how much variation our

model can explain

In other words how much variation is attributed to our experiment and how much isn’t

This calculation is the same as in Independent ANOVA

€

SSM = nk (xk − xgrand∑ )2

Step 4- Residual Sum of Squares We now need to know how much of this variation

is noise or error variation

Simplest way to calculate this is: SSR= SSW-SSM

Degrees of freedom are also calculated using: dfR = dfW-dfM

One Way Repeated Measures ANOVA- Assumptions

Normally distributed data for each condition (Shapiro-Wilkes test)

Sphericity

Interval or ratio data

Omnibus test & Post Hoc If we had a significant main effect of intervention

on self efficacy [F(2, 56)= 11.78, p<0.001]

This tells us there is a significant effect of our experiment conditions on self efficacy

i.e. that the means of the conditions are not equal

But how does this break down? Information >Control condition?

Tailored Information > Information & Control conditions?

We then need post hoc tests

Post Hoc Tests Used when no specific a priori predictions about

the data we have

They are used for exploratory data analysis

Pairwise comparisons Like performing t-tests on all the pairs of mean in our

data

But they control the Type I error rate by correcting the significance level across all tests

Using the Bonferroni correction (0.05/Ntests)

There are many to choose from……..

A selection of common post hoc tests LSD (Least Significant Difference) Analogous to multiple t-tests

Bonferroni Uses Bonferroni correction to control for Type I

With multiple comparisons this may be too conservative (increase chance of Type II error)

Tukey’s test Control Type I and better when testing large

amounts of means

Which one to choose? Trade off between:

Type I error rate likelihood

Statistical power (ability to find an effect if there is one)

Whether assumptions of ANOVA have been violated, although most are robust to minor variations

Lecture Readings and Further concepts to consider Core:- Field (2009) Chapters 8 & 11 (pages

427-454)

Other concepts to consider: Statistical Power: Cohen (1992). A power primer.

Psychological Bulletin

Planned Contrasts: Field (2009), Chapter 8, p.325-339

Comparing 3 Means- ANOVA

Documents

Transcript of Comparing 3 Means- ANOVA