COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA...

41
FEEG6017 lecture: Multivariate ANOVA Dr Brendan Neville [email protected]

Transcript of COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA...

Page 1: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

FEEG6017 lecture:

Multivariate ANOVA

Dr Brendan Neville

[email protected]

Page 2: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Multiple dependent variables?

• All the analyses we've done so far have had

a single dependent or outcome variable.

• E.g., regression, ANOVA.

• These are known as Univariate Tests.

• What if you are interested in more than one

outcome variable?

• Multivariate Test

Page 3: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Multiple analyses

• You could do multiple analyses in turn,

focusing on one outcome measure at a time.

A simple solution, used often.

• But: we risk inflating the type-I error rate by

conducting a global analysis that includes

many components.

• And we might miss the effects of interactions

between our dependent variables.

Page 4: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Multivariate ANOVA

• Essentially this is ANOVA applied to a vector

(list) of dependent variables (DVs), rather

than just one.

• The logic is very similar: instead of different

means across groups, we look for different

locations in dependent-variable-space

across groups.

Page 5: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Multivariate ANOVA

• The null hypothesis is that the different

groups all have a common centroid in our

DV-vector-space.

• The alternative hypothesis is that at least

one group has a distinct centroid in DV-

space.

Page 6: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

An example scenario

• Let's say we're trying to compare the

effectiveness of two different mathematics

textbooks.

• We have a large group of people (N=100)

taking a mathematics course, and we

randomly allocate them to two textbook

groups: A and B.

Page 7: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Our dependent variables

• We care about the average exam

performances of the two groups: we want to

know if one textbook is better than the other

at helping people to learn.

• DV1 = Performance.

• If that was all we wanted to know, this would

be an ANOVA of performance on group.

• But we also want to compare the textbooks

on how much people enjoyed using them.

• DV2 = Enjoyment

Page 8: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Our dependent variables

• So our dependent variable is not a scalar

quantity, but a collection of points in the

space ( Performance, Enjoyment ).

• Performance ranges from 0--100;

Enjoyment ranges from 0--10.

• We want to know whether those using

textbook A and those using textbook B end

up in the same or different parts of this

space.

Page 9: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: No relationship

• Suppose that

there is no

relationship

between

performance and

enjoyment, and

also that group

membership has

no effect on either

score.

Page 10: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: No relationship

• Group A shown in

red, group B

shown in blue.

• Note the near-

complete overlap

between the two

groups in DV-

space.

Page 11: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: DVs by group

Page 12: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: MANOVA

• Step 1 is to link the performance and

enjoyment variables in a vector format.

Y = cbind(d1$Performance,d1$Enjoyment)

• The model-fitting command is simple: we fit

the DV-vector on the group variable.

m1 = manova(Y~d1$Group)

summary(m1)

Page 13: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: MANOVA output

• The output from MANOVA is analogous to

an F-test.

• There is no exact solution: four different

statistics available. Wilks' lambda is a

reasonably robust option.

summary(m1,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d1$Group 1 0.96252 1.8886 2 97 0.1568

Residuals 98

Page 14: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 1: MANOVA output

• P-value is not significant in this case, as

expected.

• We can't reject the null hypothesis that

groups A and B have the same centroid in

the Performance-Enjoyment space.

Page 15: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 2: Enjoyment

advantage for group A

• Suppose that

people enjoy

textbook A more

than B, but that

there's no

difference in exam

performance.

Page 16: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 2: Enjoyment

advantage for group A

• Suppose that

people enjoy

textbook A more

than B, but that

there's no

difference in exam

performance.

• Note the differing

centroid positions.

Page 17: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 2: DVs by group

Page 18: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 2: MANOVA output

m2 = manova(Y~d2$Group)

summary(m2,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d2$Group 1 0.50786 46.999 2 97 5.351e-15 ***

Residuals 98

summary.aov(m2)

Response 1 :

Df Sum Sq Mean Sq F value Pr(>F)

d2$Group 1 99.8 99.768 1.0703 0.3034

Residuals 98 9135.3 93.218

Response 2 :

Df Sum Sq Mean Sq F value Pr(>F)

d2$Group 1 79.162 79.162 93.018 7.152e-16 ***

Residuals 98 83.402 0.851

Page 19: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 2: MANOVA output

• As expected, we can reject the null

hypothesis that both groups share the same

centroid in DV-space.

• From MANOVA we know they're different,

but not exactly how they're different.

• Univariate analyses confirm that there's a

significant difference on enjoyment (R2) but

not performance (R1).

Page 20: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 3: Performance and

enjoyment are related

• Suppose that

performance

and enjoyment

are correlated

for our

subjects, (0.69)

but not in a

group-specific

way.

Page 21: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 3: Performance and

enjoyment are related

Page 22: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 3: Performance and

enjoyment are related

• The link between performance and

enjoyment doesn't change the fact that they

overlap in DV-space.

• So the MANOVA results are not significant. m3 = manova(Y~d3$Group)

summary(m3,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d3$Group 1 0.95923 2.0613 2 97 0.1328

Residuals 98

Page 23: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 4: Performance and

enjoyment both higher in group A

• P and E are

again correlated

(0.82) but this

time those who

used textbook A

tended to

perform better on

the exam and to

enjoy themselves

more.

Page 24: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 4: Performance and

enjoyment both higher in group A

• P and E are

again correlated

(0.82) but this

time those who

used textbook A

tended to

perform better on

the exam and

enjoy themselves

more.

Page 25: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 4: Performance and

enjoyment both higher in group A

Page 26: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 4: Performance and

enjoyment both higher in group A

m4 = manova(Y~d4$Group)

summary(m4,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d4$Group 1 0.42272 66.234 2 97 < 2.2e-16 ***

Residuals 98

summary.aov(m4)

Response 1 :

Df Sum Sq Mean Sq F value Pr(>F)

d4$Group 1 11810 11810 132.65 < 2.2e-16 ***

Residuals 98 8725 89

Response 2 :

Df Sum Sq Mean Sq F value Pr(>F)

d4$Group 1 90.949 90.949 54.483 5.168e-11 ***

Residuals 98 163.591 1.669

Page 27: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Scenario 4: Performance and

enjoyment both higher in group A

• Unsurprisingly, the overall MANOVA model

is highly significant (because the two groups

occupy different parts of DV-space).

• The univariate analyses confirm significant

between-group differences on both

performance and enjoyment.

Page 28: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

A more complex scenario?

• The examples so far have been very simple:

two continuous DVs, and a single binary

group variable.

• In these cases the results of the analysis are

unlikely to surprise us: the labelled

scatterplot of enjoyment on performance

tells us all we need to know.

Page 29: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

A more complex scenario:

more predictor variables

• Suppose that instead of 2 textbooks, we look

at 4. We now have groups A, B, C, and D.

• We also expand our study to include another

factor: whether or not the person had access

to a web-based demo that accompanied

each textbook ("demo").

• We're also interested in the potential

interaction between these two variables.

Maybe one text is especially useful with its

demo but not without.

Page 30: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

A more complex scenario:

more outcome variables

• Previously we had two dependent variables:

exam performance and enjoyment of the

textbook.

• We expand the study to look at attendance

in class (perhaps some textbooks inspire

people to come to class more often) and the

long-term knowledge imparted by the book

(e.g., test scores one year later).

Page 31: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario:

Descriptive statistics

• The pairs

command in R can

show us all four

DVs and their

relationships with

the group variable.

• Effect of demo not

shown.

Page 32: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario:

Descriptive statistics

Page 33: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario:

Descriptive statistics

Page 34: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: MANOVA

• In this case the point of a MANOVA

becomes clearer.

• It's not easy to see whether the four groups

(and the with-demo and without-demo

variants of each) lie far apart in the four-

dimensional DV-space of performance,

enjoyment, attendance, and long-term

knowledge.

Page 35: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: MANOVA

Y = cbind(d5$Performance,d5$Enjoyment,d5$Attendance,d5$Longterm)

m5 = manova(Y~d5$Group*d5$Demo)

summary(m5,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d5$Group 3 0.22864 14.6702 12 235.76 <2e-16 ***

d5$Demo 1 0.97937 0.4686 4 89.00 0.7586

d5$Group:d5$Demo 3 0.92475 0.5896 12 235.76 0.8497

Residuals 92

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

• Groups are significantly separated in DV-

space, but the group-demo interaction and

the demo variable are not significant.

Page 36: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: MANOVA

summary.aov(m5)

Response 1 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 3360.4 1120.14 13.2840 2.806e-07 ***

d5$Demo 1 67.9 67.89 0.8051 0.3719

d5$Group:d5$Demo 3 139.0 46.32 0.5493 0.6499

Residuals 92 7757.7 84.32

Response 2 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 86.387 28.7957 18.9683 1.165e-09 ***

d5$Demo 1 0.915 0.9147 0.6025 0.4396

d5$Group:d5$Demo 3 1.449 0.4831 0.3182 0.8121

Residuals 92 139.665 1.5181

Performance

Enjoyment

Page 37: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: MANOVA Response 3 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 1610.15 536.72 18.2334 2.286e-09 ***

d5$Demo 1 17.96 17.96 0.6100 0.4368

d5$Group:d5$Demo 3 11.35 3.78 0.1285 0.9430

Residuals 92 2708.11 29.44

Response 4 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 3200.8 1066.94 8.8547 3.237e-05 ***

d5$Demo 1 11.3 11.29 0.0937 0.7602

d5$Group:d5$Demo 3 518.9 172.98 1.4356 0.2375

Residuals 92 11085.4 120.49

• Group-x-demo and demo can be dropped as

they are not significant on any of the

univariate analyses.

Attendence

Long-Term

Page 38: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: reduced model

m5b = manova(Y~d5$Group)

summary(m5b,test="Wilks")

Df Wilks approx F num Df den Df Pr(>F)

d5$Group 3 0.23335 15.053 12 246.35 < 2.2e-16 ***

Residuals 96

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

• The overall analysis by group is highly

significant.

Page 39: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: reduced model

summary.aov(m5b)

Response 1 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 3360.4 1120.14 13.502 2.017e-07 ***

Response 2 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 86.387 28.7957 19.463 6.132e-10 ***

Response 3 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 1610.2 536.72 18.823 1.108e-09 ***

Response 4 :

Df Sum Sq Mean Sq F value Pr(>F)

d5$Group 3 3200.8 1066.9 8.8179 3.201e-05 ***

Page 40: COMP6053 lecture: Multivariate ANOVAmb1a10/stats/FEEG... · Complex scenario: MANOVA summary.aov(m5) Response 1 : Df Sum Sq Mean Sq F value Pr(>F) d5$Group 3 3360.4 1120.14 13.2840

Complex scenario: reduced model

• The univariate comparisons indicate that all

four dependent variables differ significantly

across the four groups.

• If we wanted to go further, we could perform

multiple comparisons for each DV by group,

using the TukeyHSD method described in

the lecture on ANOVA.