Comparing 3 Means- ANOVA
Transcript of Comparing 3 Means- ANOVA
Comparing 3 Means- ANOVA Evaluation Methods & Statistics- Lecture 7
Dr Benjamin Cowan
Research Example- Theory of Planned Behaviour Ajzen & Fishbein (1981)
One of the most prominent models of behaviour
Used a lot in behaviour change research
Has 3 core components:
Beliefs (right)
Intentions (middle)
Behavior (left)
Research Example- Theory of Planned Behaviour Beliefs:
Attitude- Persons attitude towards an action
Subjective norms- what people perceive others to believe about the action
Perceived behavioural control- peoples perceptions on how able they are to do a certain behaviour
Research Example- Theory of Planned Behaviour
Intentions: A person’s internal declaration to act
Behavior: The action
TPB and Pro Environmental Behaviour Perceived Behavioural Control
Self Efficacy concept
We want to see whether 2 interventions we design impact pro environmental behaviour self efficacy compared to a control group with no intervention.
We therefore have 3 conditions in our experiment
How would we design this experiment?
How would we design this experiment? IV- Intervention Level 1- Control Group (No Intervention)
Level 2- Intervention 1 (Generic Information Condition)
Level 3- Intervention 2 (Tailored Information condition)
DV- Self Efficacy/Perceived Behavioural Control Measured by questionnaire
One Way ANOVA- The Idea Compare more than 2 means to identify whether
they are significantly different i.e. whether they come from different populations
We could do 3 t-tests Compare control to group 1
Compare control to group 2
Compare group 2 to group 3
Familywise error rate If we have 3 tests in a family of tests and assume each is
independent
If we use Fishers level of 0.05 as our level of significance…
The probability of no Type I error in all of these tests 0.95 x 0.95 X 0.95 = 0.857 This is because we would expect to get a chance significant
results 5% of the time. => Probability of Type 1 error is 1- 0.857=0.143
That is far greater than the Type I error for each test separately (0.05)
We therefore use ANOVA rather than lots of t-tests
ANOVA- The Idea Compare 3 (or more) means
to identify whether they are significantly different i.e. whether they come
from different populations
Or more accurately…..we are testing the null hypothesis that the samples come from the same population.
It is what we call an omnibus test It tells us there is a
significant difference, not where it is.
ANOVA- The Idea
What this means in our example We are testing whether:
The scores on perceived behavioural control in each condition come from the same population of scores (i.e. our interventions had no effect - H0)
If they don’t (i.e. they come from different populations) then we can reject the null hypothesis
The Key: ANOVA & F Ratio We found a significant effect of condition on
perceived behavioural control [F(2, 56)= 11.78, p<0.001]
F ratio is the ratio of explained (that accounted for by the model we are proposing) to unexplained variation
This is calculated using the Mean Squares
The mindset of “models” Remember the mean is a statistical model, just sometimes not a very good one…….
We want to see whether the statistical model we have proposed explains the variation in our data better
Step 1- Total Sum of Squares The total amount of variation in our data
This should look familiar (see Lecture 3)
€
SST = xi − xgrand( )∑2
Step 1- Graphically What the equation is doing
€
xgrand
€
xi
Step 2- Model Sum of Squares We now need to know how much variation our
model can explain
How much the total variation can be explained due to data points coming from different groups in “the perfect model”
is the amount of people in that condition
€
SSM = nk (xk − xgrand∑ )2€
nk
Step 2- Graphically
€
xk
€
xgrand
What the equation is doing
Step 3- Residual Sum of Squares How much of the variation cannot be explained
by the model i.e. what error is there in the model prediction?
Easy way to calculate: SSR= SST – SSM
But here is the real formula
€
SSR = xik − xk( )2
∑
Step 3- Graphically
€
xik
€
xk
What the equation is doing
Great ….. So what next? These are summed values Therefore impacted by the number of scores in the
sum (remember variance in Lecture 3)
We can get around this by dividing by the respective degrees of freedom for each SS
Degrees of Freedom for each SS Degrees of Freedom for SST (dfT):
N-1
Degrees of Freedom for SSM(dfM): Amount of Conditions (k) -1
Degrees of Freedom for SSR(dfR): N-k
F Ratio The F Ratio is calculated using the:
Mean Squares model (MSM): SSM/dfM
Mean Squares residual (error) (MSR): SSR/dfR
F Ratio
Variation explained by our model
Variation unexplained by our model
F Ratio
Mean Square Model (MSM)
Mean Square Residual (MSR)
F Distribution F Distribution for specific pair
of degrees of freedom Table of Critical Values
One Way Independent ANOVA- Assumptions
Normally distributed data (Shapiro-Wilkes test)
Equality of Variance (Levene’s test)
Interval or ratio data
Independent data
Reporting ANOVA F ratio
Degrees of Freedom (dofM, dofR)
P value
Back to the example we saw earlier: F(2, 56)= 11.78, p<0.001
We can therefore state that there is a significant effect of our independent variable on perceived behavioural control
One Way Repeated Measures ANOVA Experiment measuring self efficacy at:
1. Time 1 (before Intervention)
2. Time 2 (directly after intervention)
3. Time 3 (1 month after the intervention)
This is a Repeated Measures design (Within Subjects- see last week)
Reduces unsystematic variability due to individual differences
Repeated Measures ANOVA- Sphericity The assumption of independence of data
doesn’t hold as the data is from the same participants
Instead we look for sphericity The assumption that the variance of the differences
between scores in each treatment are equal
Calculate the difference between pairs of scores in all possible combination of treatment levels ,then calculate the variance of these differences
Mauchly’s test of sphericity It tests the hypothesis that the variances of the
differences are equal (H0)
If I got p<0.05 for this test would it be good or bad?
Mauchly’s test of sphericity It tests the hypothesis that the variances of the
differences are equal (H0)
If I got p<0.05 for this test would it be good or bad?
It would be bad as it states there is a significant difference between variance of differences
Corrections exist if this is the case, usually Greenhouse-Geisser correction is used
One Way Repeated Measures ANOVA-Theory
Within-Participant Variance
This includes Experiment effect (as all people have taken part in
all conditions)
Error variance- that not explained by the model
Step 1- Total Sum of Squares The total amount of variation in our data
dfT = N-1
€
SST = xi − xgrand( )∑2
Step 2- Within Participant Sum of Squares How much of the variation is within participant
variance
is a person (i)’s score on a condition (k) and is the person’s mean across those conditions
dfW= Number of participants X (Number of conditions-1)
€
SSw = xik − x i _ across_ conditions( )2
∑
€
x i _ across_ conditions
€
xik
So far we know…..
The total amount of variation in our data (SSt)
How much this is caused by individual’s performance under the different experiment conditions (SSw)
Step 3- Model Sum of Squares We now need to know how much variation our
model can explain
In other words how much variation is attributed to our experiment and how much isn’t
This calculation is the same as in Independent ANOVA
€
SSM = nk (xk − xgrand∑ )2
Step 4- Residual Sum of Squares We now need to know how much of this variation
is noise or error variation
Simplest way to calculate this is: SSR= SSW-SSM
Degrees of freedom are also calculated using: dfR = dfW-dfM
One Way Repeated Measures ANOVA- Assumptions
Normally distributed data for each condition (Shapiro-Wilkes test)
Sphericity
Interval or ratio data
Omnibus test & Post Hoc If we had a significant main effect of intervention
on self efficacy [F(2, 56)= 11.78, p<0.001]
This tells us there is a significant effect of our experiment conditions on self efficacy
i.e. that the means of the conditions are not equal
But how does this break down? Information >Control condition?
Tailored Information > Information & Control conditions?
We then need post hoc tests
Post Hoc Tests Used when no specific a priori predictions about
the data we have
They are used for exploratory data analysis
Pairwise comparisons Like performing t-tests on all the pairs of mean in our
data
But they control the Type I error rate by correcting the significance level across all tests
Using the Bonferroni correction (0.05/Ntests)
There are many to choose from……..
A selection of common post hoc tests LSD (Least Significant Difference) Analogous to multiple t-tests
Bonferroni Uses Bonferroni correction to control for Type I
With multiple comparisons this may be too conservative (increase chance of Type II error)
Tukey’s test Control Type I and better when testing large
amounts of means
Which one to choose? Trade off between:
Type I error rate likelihood
Statistical power (ability to find an effect if there is one)
Whether assumptions of ANOVA have been violated, although most are robust to minor variations
Lecture Readings and Further concepts to consider Core:- Field (2009) Chapters 8 & 11 (pages
427-454)
Other concepts to consider: Statistical Power: Cohen (1992). A power primer.
Psychological Bulletin
Planned Contrasts: Field (2009), Chapter 8, p.325-339