Download - SPSS in Research Part 2 by Prof. Dr. Ananda Kumar

8/17/2019 SPSS in Research Part 2 by Prof. Dr. Ananda Kumar

1/212

1

Faculty of Education, University of Malaya,

50603 Kuala Lumpur, Malaysia.

Tel: 603-79675046 (Off)

603-79675010 (Fax)

Email: [email protected],

[email protected]

PROF. DR. ANANDA KUMAR PALANIAPPAN, Ph.D

mailto:[email protected]:[email protected]


2/212

2

Outline

Brief overview of SPSS Part I workshop

Instrument Validity and Reliability

Factor Analyses and InterpretationMultiple Regression


3/212

3

Types of Research

Research

Quantitative

ExperimentalEthnography

Qualitative

Non-experimental

Action

Research

Case Study

Grounded

Theory

Historical

Descriptive

Causal

Comparative

Correlational

Phenomenology


4/212

4

Steps in Educational Research 1) Identify the problem area / the need for investigation

2) Write the statement of the problem in either (a) QuestionForm [e.g. Do children from kindergarten perform better at school

compared to children who have no kindergarten experience?]

(b) Hypothesis [e.g. There is no significant difference in academic

achievement between children with kindergarten experience and

children without kindergarten experience] 3) Decide which research design is most appropriate.

4) Review studies in the variables indicated in the research questions /

hypothesis (a) to form a conceptual framework for the research (b)

information required to design instruments


5/212

5

Steps in Educational Research

(Contd.) 5) define the variables involved in operational terms [e.g.

Academic achievement are grades assigned by teachers; or

Intelligence is the score obtained in Cattle’s Culture Fair

Intelligence Test]

6) Design instruments to measure the variables involved

7) Pilot test the instruments to ascertain (I) whether it is

suitable for the sample under study (2) Internal

Reliabilities (Item Analyses), Test Reliablities and Test

Validities.

8)Administer the instruments and score based on a

predetermined score sheet.


6/212

6

Steps in Educational Research

(Contd.) 9) Analyse the data using SPSS

10) Interpret the analyses and answer the research question

or reject/accept the hypotheses

11) State any assumptions or limitations in the study.


7/212

7

Pilot Study- Reliability and

Validation of Instrument Ascertain Reliability:

(A) INTERNAL CONSISTENCY: (1) Item Analysis -Index of discriminability (2) Split-half reliability (3)Kuder-Richardson reliability (for dichotomous data) (4)Cronbach Alpha (for ordinal data) SPSS- Data Editor-Statistics-Scale-Reliability Analysis - Model (Alpha, Split-

half, Guttman, Parallel) (B) STABILITY: (1)Test-retest reliability (2) Alternate

Forms reliability - use SPSS-Data Editor-Statistics-Compare Means-Paired-Samples t-test .

Ascertain Validity: (1) Content Validity - (use Expert

testimony) (2) Construct Validity – SPSS – Data Editor – Analyze – Data Reduction (3) Criterion-related Validity/Concurrent Validity- Use correlation (4) PredictiveValidity – Use correlation


8/212

8

Validity

Content Validity - if the instrument tests all aspects thatshould be tested (Ascertained using Expert testimony)

Construct Validity - if the test measures what it is

supposed to measure (Ascertained using Factor Analysis)

Criterion-related Validity/ concurrent validity - if thetest scores are closely related to another test which

measures similar construct (Ascertained using Pearson

Correlation)

Predictive Validity - if the instrument can predictcorrectly a particular outcome (Ascertained using Pearson

Correlation)


9/212

9

METHODS OF ESTIMATING RELIABILITY

Type of

Reliability Measure Procedure

Test-retest method Measure of stability Give the same twice to the samegroup with any time interval

between tests from several

minutes to several years

Equivalent-Forms Measure of equivalence Give two forms of the test to

Method the same group in closesuccession

Test-retest with Measure of stability Give two forms of the test to the

equivalence forms and equivalence same group with increased time

interval between forms

Split-half method Measure of internal Give test once. Score two equivalent

consistency halves test (e.g. odd items and even time)

Kuder-Richardson Measure of internal Give test once. Score total test and

method consistency apply Kuder-Richardson formula


10/212

10

DESIGNING INSTRUMENTS

Should be suitable for the population under study

Should sample the universe of data pertaining to

the variable measured Should be reliable

Should be reliably scored


11/212

11

Outline of SPSS Part 1

Types of Data

How to enter data and examine data

How to explore data for normality

What analyses / statistics to use

How to run these analysesHow to COMPUTE and RECODE


12/212


13/212

13

Start your SPSS for Windows now. You will get the

Data Editor Window. Study the menu bar and the

options available in each menu.

Then,

1. Open the data file call ‘PRACTICE’.

2. Run some simple frequency analyses on thefollowing variables:

a) sex

b) race

c) regiond) happy

3. From the results in your Output Navigator

describe the respondents in this study

Exercise 1


14/212

14

Types of Measurement Scales and their

Statistical Analyses

MeasurementScale

Characteristics Type of Data StatisticalTests

NominalSimple Classification in

Categories without any order

e.g Boy / Girl

Happy / Not HappyMuslim / Buddhist / Hindu

Non-

parametricChi-square

Ordinal Has order or rank ordering

e.g. Strongly agree, agree,

undecided, disagree, strongly

disagree

(LIKERT SCALE)

Non-

parametric

Spearman’s rho

Mann-Whitney

Wilcoxon


15/212

15



MeasurementScale

Characteristics Type of Data StatisticalTests

IntervalDo not have true 0 points. Has

order as well as equal distance

or interval between judgements

(Social Sciences) e.g. IQ scoreof 95 is better than IQ 85 by 10

IQ points

Parametric COMPARISON:

t-tests

ANOVA

RELATIONSHIP:Pearson r

Ratio Have true 0 points. Has high

order, equal distance between

judgements, a true zero value

(Physical Sciences) e.g.age, no.of children, 9 ohm is 3 times 3

ohm and 6 ohm is 3 times 2

ohm But IQ 120 is more

comparable to IQ 100 than to IQ

144, although ratio

IQ 120 /100 = 144 /120 = 1.2

ParametricCOMPARISON:

t-tests

ANOVARELATIONSHIP:

Pearson r


16/212

16



Higher order of measurement --> lower

order e.g. Interval ---> ordinal, nominal

But not ordinal, nominal ----> interval


17/212

17

Refer to the handout provided.

Exercise 1

Indicate in the spaces provided in

Table 1 the level of measurement of thecorresponding variables


18/212

18

Data Collection

Identify the population to be studied

Choose sample randomly or by stratified

random sampling

The accuracy of the findings of a research

depends greatly on (1) how the sample is

chosen (2) whether the correctinstruments are used (3) the reliability

and validity of the instruments


19/212

19

Entering & Editing Data

Open SPSS by double clicking at the SPSS icon or

‘START’ - ‘PROGRAM’ - ‘SPSS’

Define variable

Enter data

Adding labels for variables and value labels Inserting new cases

Inserting new variables

Adding Missing Value codes Examining Data by running ‘FREQUENCY’


20/212

20

Refer to the handout provided.

Exercise 2:

Enter data given in the handout

then answer the questions


21/212

21

Exploring Data Graphically

To check normality graphically and decide onits appropriate analyses

1) By displaying data

Histogram

Boxplot

Stem-and-leaf Plot

2) By Statistical Analyses

Descriptive StatisticsM - Estimators

Kolmogorov-Sminov Test

Shapiro-Wilk


22/212

22

Histogram

CHILD REARING PRACTICES

25.022.520.017.515.012.510.0

Histogram

F r e q u e n c y

14

12

10

8

6

4

2

0

Std. Dev = 3.89

Mean = 18.0

N= 41.00


23/212

23

Checking Normality -

SkewnessSkewness measures the symmetry of the

sample distribution

Skewness = StatisticStandard Error

If Skewness < -2 or > +2, reject normalityIf -2 < Skewness < 2 ---> Normal

distribution


24/212

24

Negatively Skewed

If Ratio is negativeIf Mean < Median

2213N =

SEX

FEMALEMALE

C R A

22

20

18

16

14

12

10

8

6

35

Boxplot

Negatively skewed

MeanMedia


25/212


26/212

26

Checking Normality - Kurtosis

Kurtosis measures the spread of the data

Kurtosis = Statistic

Standard Error

If Kurtosis < -2 or > +2 reject normalityIf -2 < Kurtosis < 2 ---> Normal

distribution


27/212


28/212

28

Kurtosis

Negative value of Kurtosis indicates shorter

tails (Box like distribution)

Normal Graf


29/212

2941N=

CHILDREARINGPRACTI

30

20

10

0

Slightly positively

skewed

Largest observed value that isn’t

outlier

Smallest observed value that isn’t

outlier

Median

75th Percentile

25th Percentile

Boxplot Values more than 1.5

box-lengths from 75th

percentile (outliers)

Values more than 3 box-lengths from 75th

percentile


30/212

30

Fig.1. Boxplot comparisons of the creativity scores of

Malaysian and American students

Elaboration > Fluency > Flexibility > Originality

Descriptive Statistics


31/212

31

Example: Boxplots for more than one variable / time series

http://upload.wikimedia.org/wikipedia/commons/f/fa/Michelsonmorley-boxplot.svghttp://upload.wikimedia.org/wikipedia/commons/f/fa/Michelsonmorley-boxplot.svg


32/212

32

Stem - and - Leaf Plot

CHILD R

Fre

1

St

Eac


33/212

33

Testing Normality of data

collected All data must be tested for normality before analyzing

them statistically.

Normality - if the data samples the populationrepresentatively, it will be normally distributed - where the

mean and median are approximately equal Type of analysis depends on the normality of data and the

level of measurement of data

- Normally distributed data - use Parametric Tests like t -

tests, ANOVA, Pearson r .- Non-normally distributed data - use Non-parametricTests like Chi-square, Spearman’s rho, Mann-Whitney,Wilcoxon

To show Normality of Data


34/212

34

To show Normality of Data

I

I

I

I

i


35/212

35

Not sig. at p < .01.

Data is normally distributed

Data Editor - Analyze - Descriptive Statistics - Explore


36/212

36

BoxPlot for Male and Female parents

2213N =

SEX

FEMALEMALE

CRA

22

20

18

16

14

12

10

8

6

35

Slightly Negatively

Skewed

Slightly Positively

Skewed

Detrended Normal Q-Q Plot


37/212

37

Detrended Normal Q-Q Plot of CRA

For SEX= MALE

Observed Value

2220181614121086

Devfrom

Normal

.4

.2

-.0

-.2

-.4

-.6

Normal Q-Q Plot of CRA

For SEX= FEMALE

Observed Value

2220181614121086

ExpectedNormal

2.0

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

-2.0


Detrended Normal Q-Q Plot of CRA

For SEX= FEMALE

Observed Value

2220181614121086

Devfrom

Normal

.2

.1

0.0

-.1

-.2

-.3

-.4


For SEX= MALE

Observed Value

2220181614121086

ExpectedNormal

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

Detrended Normal Q Q Plot

of CRA


38/212

38

Exercise

Open the data file “PRACTICE’ and check the normality of

the ‘Age’ data of the respondents using

a) Histogram

b) Boxplotc) Stem-and-leaf

d) E-estimators

e) Kolmogorov-Sminov & Shapiro Wilk

f) Normal Q-Q Plot

g) Detrended Normal Q-Q Plot


39/212

39

Testing equali ty of var iance

Levernes Test (SPSS-DataEditor-Analize-Explore-Plots(Leverne)

If Leverne Statistic is highly significant (p < .001), the groups do not

have equal variance

If Leverne Statistic is not significant (p > .001), the groups have

equality of variance and t-tests analyses can be undertaken

Not

Sig.

Mothers

Fathers


40/212


41/212

41

Compute Data

Please try exercise 3.

SPSS data editor - Transform - Compute -


42/212

42

RECODE SPSS Data Editor - Transform - Recode - into different variable

/ into same variable


43/212

43

Recode (contd)

Please try exercise 4


44/212

44

Select cases SPSS Data Editor - Data - Select cases-


45/212

To Analyze & Report Demographic Data


46/212

46

To Analyze & Report Demographic Data

ANALYSE DESCRIPTIVE STATISTICS EXPLORE


47/212

47

Source: Palaniappan, A. K. (2009). Penyelidikan Pendidikan dan SPSS .

Kuala Lumpur, Malaysia: Pearson.


48/212

48

Source:American Psychological Association. (2010). Publication Manual

of the American Psychological Association (6th ed.). Washington, DC: Author.


49/212

49

Source:

American Psychological Association. (2010). Publication Manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Sample APA Reporting of Demographic


50/212

50

Sample APA Reporting of Demographic

I nformation for 4 Subsamples


51/212

Parametric Statistical Analyses


52/212

52


(Degree of Association/ Relationship)

SPSS Data Editor - Statistics - Correlate - Bivariate -



53/212

53


(Degree of Association/ Relationship)Pearson Product-moment Correlation

*

P ti C l ti T bl


54/212

54

Presenting Correlation Table

Table 1 Pearson Product Moment Correlations between SAM,

WKOPAY and CRA Scores

CRA SAM WKOPAY

SAM .20 1.00 .38*

WKOPAY .29 .38* 1.00

N of Cases: 165 1- tailed Signif: * - .01 ** - .001


55/212

Effect size for correlation

55


56/212

56

Reporting Product Moment Correlations

Table 1 presents the inter-correlations among Creative Child Rearing Practices

(CRA), Something About Myself (SAM) and What Kind of Person Are You?

(WKOPAY) scores. The correlation coefficient between CRA and SAM scores

is .20 which is not significant at p < .05 and with small effect size. This

indicates that parents who perceive themselves as creative based on their past

creative performances do not engage in creative child rearing practices.

The correlation coefficent between CRA and WKOPAY scores is also not

significant (r = .29, p > .05) with small effect size. This indicates that parents

who perceive themselves as creative based on their personality characteristics,

also do not engage in creative child rearing practices.


57/212

57

Report

There is a significant correlation between SAMand WKOPAY (r = .375, p < .05) with smalleffect size. The correlation is positive, indicating

that an increase in SAM scores will result in anincrease in WKOPAY scores. Results also showthat 14% (r squared) of the variance of SAMscores is explained by WKOPAY scores. About

86% of the variance in SAM is unaccounted for.

S l f C l ti R t


58/212

58

Sample of Correlation Report

Creed, P. A. & Lehmann, K. (2009). The relationship between core self-evaluations, employment commitment and

well-being in the unemployed. Personality and Individual Differences, 47 , 310 – 315.

S l f C l ti T bl


59/212

59

Sample of Correlation Table

Creed, P. A. & Lehmann, K. (2009). The relationship between core self-evaluations, employment commitment and

well-being in the unemployed. Personality and Individual Differences, 47 , 310 – 315.


60/212

60

Source:

American Psychological Association. (2010). Publication Manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Means and SDs

for the Upper

Group

Means and SDs

for the Lower

Group

An example of a Scatter Plot (Palaniappan, 2007)


61/212

61

10080604020

Individualism

0.51

0.48

0.45

0.42

0.39

Ov

erallExpressivity

Endorsement

ZimbabweUSA

Turkey

Switzerland

South Korea

Russia

Portugal

PolandPeople's Republic of China

New Zealand

NetherlandsMexico

Malaysia

Lebanon

Japan

ItalyIsrael

Indonesia

India Hungary

Hong Kong

Germany

Denmark

Croatia

Canada

Brazil

Belgium

Australia

Graphical Representation of the Relationship Between Individualismand Overall Expressivity Endorsement


62/212

62

t - tests

Paired t-tests

Grouped t-tests


63/212

63

Assumptions of t -tests

1) Data must be interval or ratio

2) Data must be obtained via random

sampling from population3) Data must be normally distributed



64/212

64

( comparisons - t -tests )

SPSS Data Editor - Compare means - Independent Sample t test



65/212

65


( comparisons - t -tests )


66/212

66

Presentation of t -test results

Table 2

T-tests comparisons of CRA scores by gender

Father Mother

Mean

SD

15.06 14.36

4.05 3.63

t -value p < .05

5.38 NS

(n =13) (n =12)EffectSize

.18


67/212

67

Effect Size

221

___

21

__

s s

X X

EffectSize

X1 = 15.08 s1 = 4.05

X2 = 14.36 s2 = 3.63

1875.84.3

72.0

2

63.305.4

36.1408.15

EffectSize

Example:

Result: Effect Size (Cohen’s d ) = .1875 (Small effect size)

Note: Effect size ~ .5 (medium); ~ .8 (high)

Eff t Si d b C h ’ d


68/212

68

.

Effect size (Cohen’s d), Eta Squared and Interpretation

---------------------------------------------------------------------------------------------------

Effect Size (Cohen’s d ) Eta Squared, η2 Interpretation

----------------------------------------------------------------------------------------------------

0.2


69/212

69

Report

The mean CRA scores of fathers and mothers are15.08 and 14.36 and the standard deviations are 4.05and 3.63 respectively. These scores are subjected tot -test analysis. The Levene’s Test for equality of

variance indicates that the variances are similar. Thet -value obtained is .54 which is not significant at p <.05. The effect is .18.

These results indicate that fathers and mothers do

not differ in their child rearing practices. The effectsize indicates that parents’ gender has only a smalleffect on their creative child-rearing practices.


70/212

70

Palaniappan, A K. (2000). Sex differences in Creative Perceptions of Malaysian

Students, Perceptual and Motor Skills, 91, 970 - 972.

See handout for a clearer page (article page # 971)


71/212


72/212

72

Paired t -test

Assumptions

1) Normality of the population difference ofscores – this is ascertained by ensuring the

normality of each variable separately.

2) the other assumptions similar to group t – test

a) Data must be interval or ratio

b) Data must be obtained via randomsampling from population

c) Data must be normally distributed


73/212

73

Exercise

1) Is there a significant difference in the

highest year of education between the

respondent’s mother and father?2) Is there a significant difference in the

highest year of education of respondent and

his/her spouse?



74/212

74

y

( comparisons - Oneway ANOVA )

SPSS Data Editor - Compare Means - One-way ANOVA -



75/212

75

( comparisons - Oneway ANOVA )

Understanding the ANOVA table


76/212

76

Understanding the ANOVA table

Variations among the sample means

F = -------------------------------------------

Variance within the samples

Between groups sum of squares / df 1 Between mean square

F = --------------------------------------------- = --------------------------

Within groups sum of squares / df 2 Within mean square

Between mean square is computed by subtracting the mean of the observations (the overall

mean) from the mean of each group, squaring each difference, multiplying each square by the

number of cases in its group, and adding the results for each group together. The total is called

between-group sum of squares

Within-group sum of squares is computed by multiplying each group variance by the numberof cases in the group minus 1 and add the results for all groups.

Mean square column reports sum of squares divided by its respective degree of freedom.

F ratio is the ratio of the two mean squares.

Presentation of One

way ANOVA results


77/212

77

Presentation of One-way ANOVA resultsTable 3

One-way ANOVA for CRA scores by WKOPAY groups

Source df Sum of Mean of F F

Squares Squares Ratio Probability

Between Gps 2 31.145 15.573 .632 .537

Within Grps 38 936.660 24.649

Total 40 967.805

Multiple Range Test

Scheffe Procedure

No groups are significantly different at the .05 level

i


78/212

78

Interpreting F

If the F value is significant, then the groups

are significantly different

To ascertain which groups are significantlydifferent, perform the Scheffe test.

F (Groups -1, No. of Participants – Groups) = F Value

Report


79/212

79

Report

Results show that the three groups do notdiffer significantly on CRA scores

( F (2, 38) = .632, p >.05). This represents aneffect size of 3.22% [{31 / (31 + 937)} x100] which indicates that only 3.22% of thevariance of CRA scores was accounted for

by the 3 groups. (do the same for SAM)


80/212

80

Effect Size

Sum of Squares between GroupsEffect Size = ------------------------------------------- x 100

Total Sum of Squares

Is the degree to which the phenomena exists (Cohen, 1988)

B f i C i f


81/212

81

Bonferonni Correction for

Multiple Comparisons

For multiple comparisons, Bonferonni

corrections must be made

If the overall level of significance is set at p< .05 and the number of comparisons

involved is 10, then the level of significance

for each comparison must be .05/10 whichis .005.

T bl f P t h C i


82/212

82

Table for Post-hoc Comparisons

Power of a test


83/212

83

Power of a statistical test is the probability of observing atreatment effect when it occurs.

It is the probability that it will correctly lead to therejection of a false null hypothesis (Green, 2000)

The statistical power is the ability of the test to detect aneffect if it actually exists (High, 2000)

The statistical power is denoted by 1 – β, where β is the

Type II error, the probability of failing to reject the nullhypothesis when it is false.

Conventionally, a test with a power greater than .8 level(or β = < .2) is considered statistically powerful.

α = is the probability of rejecting the true null hypothesis (Type I error)

β = is the probability of not rejecting the false null hypothesis (Type II error)

There are four components that


84/212

84

f p

influence the power of a test:

1) Sample size, or the number of units (e.g., people)accessible to the study

2) Effect size, the difference between the means, divided

by the standard deviation (i.e. 'sensitivity') 3) Alpha level (significance level), or the probability that

the observed result is due to chance

4) Power, or the probability that you will observe atreatment effect when it occurs

Usually, experimenters can only change the sample size(population) of the study and/or the alpha value

Other ways to calculate Sample size and

Confidence Interval


85/212

Confidence Interval

85

T l l t S l Si


86/212

86

To calculate Sample Size or

Power

http://www.stat.ubc.ca/~rollin/stats/ssize/n2.

html

http://www.downloadforge.com/Windows/Mathematics/Download/GPower-319.html

Sample size and Effect size Table

http://www.stat.ubc.ca/~rollin/stats/ssize/n2.htmlhttp://www.stat.ubc.ca/~rollin/stats/ssize/n2.htmlhttp://www.stat.ubc.ca/~rollin/stats/ssize/n2.htmlhttp://www.downloadforge.com/Windows/Mathematics/Download/GPower-319.htmlhttp://www.downloadforge.com/Windows/Mathematics/Download/GPower-319.htmlhttp://www.downloadforge.com/Windows/Mathematics/Download/GPower-319.htmlhttp://www.downloadforge.com/Windows/Mathematics/Download/GPower-319.htmlhttp://www.stat.ubc.ca/~rollin/stats/ssize/n2.html


87/212

87


88/212


89/212

ANOVA (1-way)

To compare 3 groups or more on a

dependent variable.

Same assumptions as T-tests applyAnalyze Compare Means One-way

ANOVA

Do Exercise 10A, page 11.

89

Sample APA Table for One-way ANOVA


90/212

90Source: Palaniappan, A. K. (2009). Penyelidikan Pendidikan dan SPSS .

Kuala Lumpur, Malaysia: Pearson.


91/212

91


92/212


93/212


94/212

94

2 – way ANOVA, 3 - way ANOVA

Do exercise on p.11


95/212

95

ANCOVA

Try exercise on ANCOVA on page 10.

Presentation of Three

-

way ANOVA results

Table 4


96/212

96

Table 4

Analysis of Variance using CRA scores as the dependent variable

Source of Variation Sum of DF Mean F Signif.

Squares Squares of F

Main Effects 14.916 3 4.972 .318 .812

Sex .192 1 .192 .012 .913

SAM grps 12.994 1 12.994 .830 .370

WK grp 3.346 1 3.346 .214 .648

2-way Interactions 32.025 3 10.675 .682 .571

Sex x SAM grps 8.403 1 8.403 .537 .470

Sex x WK grps 15.077 1 15.077 .963 .335

SAM grps x WK grps 13.149 1 13.149 .840 .367

3 – way Interactions 2.472 1 2.472 .158 .894Sex x SAM grps x WK

grps

Model 55.588 7 7.941 .507 ,821

Residual 422.583 27 15.651

Total 478.171 34 14.064

Reporting ANOVA

Simple


97/212

97

Reporting ANOVA – Simple

Factorial As shown in Table 2, there is no significant differences between fathers

and mothers with respect to Child Rearing Practices ( F = .12, p > .05).

The results also show that WK groups ( F = .83, p > .05) and SAM

Groups ( F = .24, p > .05) also do not have significant effects on CRA

Scores. There are also no significant two-way interactions or three-way

Interactions between sex, WK groups and SAM groups.

The results indicate male parents do not differ from female parents

in their child rearing practices. Their creative perceptions also donot affect their child rearing practices.

Sample Report of an Experimental Research


98/212

98

Dalton, J. J. & Glenwick, J. S. (2009). Effects of Expressive Writing

on Standardized Graduate Entrance Exam Performance and Physical

Health Functioning.The Journal of Psychology,143(3), 279 –

292


99/212

99

Part II

Factor Analysis

Reliability – Item Analysis

Multiple Regression One-way Repeated Measures ANOVA

Multivariate ANOVA (MANOVA)

Discriminant Analysis

Testing for Moderating Effects of a Variable

FACTOR ANAYSIS


100/212

100

C O N S S

Factor analysis is undertaken to ascertain howmay factors are measured by the items you haveconstructed. This is sometimes called DataReduction.

To do this, you need to enter the data item by item

in your datafile. Using Factor Analyses you will be able to tell which items are strongly correlatedand lump together to form a factor. By looking atthese items you will be able to give a collectivename to represent these items or Factor.

SPSS will be able to tell how many factors thereare and how many items fall in each factor.


101/212

101

FACTOR ANALYSIS

Data are entered item by item in the datafile

In Factor Analyses you will be able to tell which

items are strongly correlated and lump together to

form a factor. By looking at these items you will

be able to give a collective name to represent these

items or Factor .

SPSS will indicate how many factors there are andhow many items fall in that factor.

Assumptions for Factor Analysis


102/212

102

Assumptions for Factor Analysis

There must be at least [X variables (items) x 5]

respondents or more than 200 respondents to run FactorAnalysis reliably.

There must be linear relationship between the variables oritems

There should not be any outliers for each variable.

The correlations among the items must be more than .3 inorder to factorizable.

To be factorizable, the Bartlett’s test of sphericity must besignificant and large.

To be factorizable, the Kaiser-Meyer-Olkin (KMO)measure of Sampling Adequacy must be more than .6

To ensure sampling adequacy, the anti-image correlationmatrix is used. Variables with sampling adequacy below .5(see the diagonal of the anti-image correlation matrix)should be excluded from Factor Analysis.


103/212

103

FACTOR ANAYSIS

Exercise 19

Using the datafile “Datafile for Item

Analysis and Factor Analysis” run a factoranalysis of all 20 items and determine how

many factors there are. By looking at the

items that fall within each factor, can yougive a common name to represent all the

items in each factor?

Factor Analysis Output


104/212

104

y p

KMO and Bartlett's Test

.466

7478.285

3741

.000

Kaiser-Meyer-Olkin Measure of Sampling

Adequacy.

Approx. Chi-Square

df

Sig.

Bartlett 's Test of

Sphericity

The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is less

than .6 (should be more than .6, the higher the better) so the

variables are marginally factorizable.

The Bartlett’s Test of Sphericity is significant p < .05. This indicates

that the variables are related and therefore factorizable.


105/212


106/212

106

ITEM ANALYSIS

Item analysis is undertaken to ascertain to whatextend the items measuring a certain construct arecorrelated. Items that are closely correlatedindicate high internal consistency or reliability ofthe test. The measure of internal consistency orreliability is given by Cronbach Alpha.

If the items are ordinal (eg likert scale), SPSS willgive the Cronbach Alpha. But if the items are

dichotomous, you will need to use Kuder-Richardson 20 which also obtained by requestingCronbach Alpha.


107/212

107

Item Analysis

Use the data file in your desktop icon called

SPSS WORKSHOP, use the data file called

“Datafile for Item Analysis and FactorAnalysis” run the Item Analysis and

ascertain the best Cronbach Alpha

Exercise 19

Sample Factor Analysis table


108/212

108


109/212

109

Multiple Regression

Bivariate Multiple Regression

Aca Ach = Constant + b Motivation

Multivariate Multiple Regression

Aca Ach = Constant + b1 Motivation + b2 Creativity + b3 Self-confidence

Multiple Regression

-

Assumptions1) Ratio of cases to independent variables:


110/212

110

20 times more cases than predictors

2) Variables must be normally distributed – check graphically or statistically(e.g. Box-plot, Histogram, skewness and kurtosis, Kolmogorov-Smirnof orShipiro Wilk)

3) IV must be linearly related to DV (Use Scatter-plot for BivariateRegression). For Multitivariate Use Residual Scatter Plot betweenStandarized residuals (Y-axis) and Standardized Predicted value (X-axis) – iflinearly related – points in scatter plot are evenly distributed on both sides of0 value of the Standardized Predicted value (X-axis).

4) No multicollinearity – IVs must not be significantly correlated (use Pearsoncorrelation Matrix to check / Tolerance = 1 – R 2 (must be more than .1) / VIF(Variance Inflation Factor) = 1/Tolerance (must be less than 10) [R is thecorrelation coefficient between the 2 IVs or predictors which should not bemore than .7. If more than .7, omit 1 of the IV or combine the IVs]

5) No multivariate outliers – use Mahalanobis Distance to ascertain this. UseChi-square value at p < .001 and df (= no of IVs) from Chi-square table todetermine which data is outlier in the MAHAL column produced in the

Datafile.


111/212

111

Residuals are the differences between the predicted DVcalculated from the predictors and the obtained DV – obtained from the study.

Normality: These residuals must be normally distributed

about the predicted DV scores Linearity: These residuals should have a straight-line

relationship with the predicted DV scores

Homoscedasticity: The variance of the residuals about predicted DV scores should be the same for all predicted

scores Normality, Linearity and Homoscedasticity can be checked

using the residuals scatterplots generated by SPSS.


112/212

112

Scatterplot

Dependent Variable: Highest Year of School Co

Regression Standardized Predicted Value

3210-1-2-3-4

4

3

2

1

0

-1

-2

-3

-4

Example of Scatterplot between Std Residual and Std Predicted Value

Collinearity Statistics

-


113/212

113


Tolerance

Tolerance – is the statistic used to determine how

much the independent variables are linearly

related to one another (Multicollinear)

-Tolerance is the proportion of a variable's

variance not accounted for by other independent

variables in the model and is given by 1 – R 2,

where R is the correlation coefficient between the2 IVs or predictors.

Tolerance level must be more than .1

C lli i S i i

VIF


114/212

114

Collinearity Statistics - VIF

VIF – Variance Inflation Factor

- is the reciprocal of the Tolerance

VIF should be less than 10

D bi

W


115/212

115

Durbin-Watson

Gives a measure of autocorrelations in the

residuals (or errors) in the values or observations

in the multiple regression analyses

If the Durbin-Watson value is between 1.5 and

2.5, then the observations or values are

independent there are no systematic trend in

the errors of the observation of the values (thereshould not be a systematic trend in the errors)

Multivariate Outlier

–

an


116/212

116

Multivariate Outlier an

example

It is usual to find a person who is 15 years old and willnot be a outlier when you plot a histogram for age(univariate)

It is also common to find a person earning a salary of

RM10,000 a month and this person may not be an outlierwhen you plot a histogram for salary (univariate)

However, if you combine both age and salary(multivariate) a person who is 15 years old earningRM10,000 may become an outlier called multivariate

outlier You need to get rid of multivariate outlier using

Mahalanobis Distance before you run your multipleregression

What havoc a multivariate outlier can do to your results?


117/212

117

It can change your R from .08 to .88!


118/212

Methods for Selecting Variables


119/212

119

Forward Selection – starting from the constant

term, variable is added to the equation orregression model if it results in the largestsignificant (at p < .05 for e.g.) increase in multipleR 2 .

Backward Selection – all variables are put into

the equation or regression model. At each step, avariable is removed if this removal results in onlya small insignificant change in R 2.

Stepwise variable Selection – most commonly

used method for model building. Is a combinationof Forward Selection and Backward Selection.Variables already in the model can be removed ifthey are no longer significant predictors when newvariables are added to the regression model.

T f R i A l


120/212

120

Types of Regression Analyses

Standard Multiple Regression

Sequential / Hierarchical Multiple

RegressionStatistical / Stepwise Multiple Regression

Coding for Dummy Variables


121/212

121

Example:

Gender – dichotomous

Male – 1

Female - 2

Need to convert to dummy variable

Male - 1

Female - 0

to study the effect of gender on the DVif r = sig + , male has higher significant effect on DV

if r = sig - , female has higher significant effect on DV

U i PRACTICE d t fil


122/212

122

Using PRACTICE data file

Research Question:

1) To what extent do PAEDU and MAEDU

predict EDUC?2) To what extent do PAEDU, MAEDU and

SEX predict EDUC?

3) To what extent do PAEDU, MAEDU,SIBS and SEX predict EDUC?

Results of Mul Reg for Research Question 2

Descripti e Statistics


123/212

123


13.54 2.797 973

11.01 4.117 97311.02 3.409 973

.4245 .49452 973

educ

paeducmaeduc

sexdummy

Mean Std. Dev iation N

Correlations

1.000 .450 .429 .112

.450 1.000 .672 .102

.429 .672 1.000 .065

.112 .102 .065 1.000

. .000 .000 .000

.000 . .000 .001

.000 .000 . .021

.000 .001 .021 .

973 973 973 973

973 973 973 973

973 973 973 973

973 973 973 973

educ

paeduc

maeduc

sexdummy

educ

paeduc

maeduc

sexdummy

educ

paeduc

maeduc

sexdummy

Pearson Correlation

Sig. (1-tailed)

N

educ paeduc maeduc sexdummy

Results of Mul Reg for Research Question 2 (contd)

Model Summaryd


124/212

124

.450a .203 .202 2.499 .203 246.937 1 971 .000

.481b .232 .230 2.454 .029 36.704 1 970 .000

.486c .236 .234 2.448 .004 5.670 1 969 .017 1.738

Model

1

2

3

R R Square

Adjusted

R Square

Std. Error of

the Estimate

R Square

Change F Change df 1 df 2 Sig. F Change

Change Statistics

Durbin-

Watson

Predictors: (Constant), paeduca.

Predictors: (Constant), paeduc, maeducb.

Predictors: (Constant), paeduc, maeduc, sexdummyc.

Dependent Variable: educd.

ANOVAd

1541.572 1 1541.572 246.937 .000a

6061.733 971 6.243

7603.305 972

1762.582 2 881.291 146.361 .000b

5840.724 970 6.021

7603.305 972

1796.560 3 598.853 99.934 .000c

5806.745 969 5.993

7603.305 972

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model

1

2

3

Sum of

Squares df Mean Square F Sig.

Predictors: (Constant), paeduca.

Predictors: (Constant), paeduc, maeducb.

Predictors: (Constant), paeduc, maeduc, sexdummyc.

Dependent Variable: educd.

Multiple Regression Results


125/212

125

Coefficientsa

10.178 .229 44.499 .000 9.729 10.627

.306 .019 .450 15.714 .000 .268 .344 1.000 1.000

9.254 .272 34.077 .000 8.721 9.787

.201 .026 .295 7.768 .000 .150 .251 .548 1.826

.189 .031 .230 6.058 .000 .128 .250 .548 1.826

9.142 .275 33.250 .000 8.602 9.681

.196 .026 .288 7.574 .000 .145 .246 .544 1.837

.189 .031 .231 6.085 .000 .128 .250 .548 1.826

.380 .160 .067 2.381 .017 .067 .693 .990 1.011

(Constant)

paeduc

(Constant)

paeduc

maeduc

(Constant)

paeduc

maeduc

sexdummy

Model

1

2

3

B St d. Error

Unstandardized

Coeff icients

Beta

Standardized

Coeff icients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Tolerance VIF


Dependent Variable: educa.

Reporting Results of Mul Reg for Research Question 2


126/212

126

Table XX

Standard Multiple Regression of PAEDUC, MAEDUC and SEXDUMMY on EDUC

Variables EDUC PAEDUC MEADUC B β t p < .05

PAEDUC .45 .20 .29 7.57 Sig

MEADUC .43 .67 .20 .19 .23 6.09 Sig

SEXDUMMY .11 .10 .07 .38 .07 2.38 Sig

Intercept = 9.14

Means 13.54 11.01 11.02 R = .49R 2 = .24

SD 2.80 4.12 3.41 Adjusted R 2 = .23

Reporting Multiple Regression Results


127/212

127

A standard multiple regression was performed between respondents’

level of education, EDUC as the dependent variable and fathers’ levelof education (PAEDUC), mothers’ level of education (MAEDUC) and

respondents’ gender (SEXDUMMY). The assumptions were evaluated

using SPSS EXPLORE.

Table XX displays the correlations between the variables, the

unstandardized regression coefficients, B, and intercept, the standardizedRegression, β , R 2 and adjusted R 2.

R for regression was significant, F (3, 969) = 99.93, p < .05.

with R 2 =.24.

The adjusted R 2 of .23 indicates that more than one-fifth of the variability

of EDUC is predicted by the three predictors.

The regression equation is:

EDUC = 9.14 + .380 + .20 (PAEDUC) + .19 (MAEDUC) + .380 (SEXDUMMY)

Multiple Regression


128/212

128

Multiple Regression

Try exercise on Linear Regression and

Multiple Regression on page 26.


129/212

Hierarchical Multiple Regression


130/212

130

Hierarchical Multiple Regression

Is used when there is a need to control for certain

variables

For example, if we wish to study how PEADUC

and MEADUC predict EDUC while controlling

for Age of the respondent and the number of

siblings (SIBS)

We enter Age and SIBS in the first batch ofvariables and then enter PEADUC and MEADUC

in the second batch as predictors of EDUC

Coefficientsa


131/212

131

Coefficients

15.528 .263 59.086 .000-.038 .005 -.226 -7.463 .000 -.254 -.233 -.225 .986 1.014

-.233 .030 -.238 -7.842 .000 -.264 -.244 -.236 .986 1.014

9.855 .512 19.230 .000

-.007 .005 -.044 -1.391 .165 -.254 -.045 -.039 .786 1.272

-.126 .029 -.128 -4.387 .000 -.264 -.140 -.122 .900 1.111

.219 .028 .303 7.825 .000 .463 .244 .217 .516 1.938

.137 .033 .159 4.098 .000 .419 .131 .114 .513 1.948

(Constant) Age of Respondent

Number of Brothers

and Sisters

(Constant)

Age of Respondent

Number of Brothers

and Sisters

Highest Year School

Completed, Father

Highest Year School

Completed, Mother

Model

1

2

B Std. Error

Unstandardized

Coeff icients

Beta

Standardized

Coeff icients

t Sig. Zero-order Partial Part

Correlations

Tolerance VIF


Dependent Variable: Highest Year of School Completeda.

Model Summaryc

Model R R Square

Adjusted

R Square

Std. Error of

the Estimate

R Square

Change F Change df 1 df 2 Sig F Change

Change Statistics


132/212

132

.347a .120 .118 2.802 .120 66.311 2 971 .000

.502b .252 .249 2.586 .132 85.238 2 969 .000

Model

1

2

R R Square R Square the Estimate Change F Change df 1 df 2 Sig. F Change

Predictors: (Constant), Number of Brothers and Sisters, Age of Respondenta.

Predictors: (Constant), Number of Brothers and Sisters, Age of Respondent, Highest Year School Completed, Father, Highest

Year School Completed, Mother

b.

Dependent Variable: Highest Year of School Completedc.

APA Report:

Hierarchical Multiple Regression was used to assess the ability of PAEDUC and

MAEDUC in predicting EDUC while controlling for Age and Sibs,

Age and Sibs were entered at Step 1 (Model 1) explaining 12% of the

variance in EDUC. On entering PAEDUC and MAEDUC at Step 2

(Model 2), the total variance explained was 25.2%, F(4, 969) = 81.53,

p < .001)

PEADUC and MEADUC explained 13.2% of the variance on EDUC

after controlling for Age and SIBS, R squared change = .13,

F change (2, 969) = 85.24.

In the final model, only Sibs, PAEDUC and MAEDUC were

statistically significant, with PAEDUC having a higher sig effect on

EDUC than MAEDUC or SIBS.

Exercise


133/212

133

Exercise

1) Are PAEDUC and MAEDUC significant

predictors of SIBS if we control for Age and

EDUC? Report your findings in the APAformat.

Binary Logistic Regression


134/212

Binary Logistic Regression

Used when you want to predict a binary

criterion (dependent) variable.

Eg. of binary dependent variable0 – No diabetes, 1 – Has diabetes

0 – No default, 1 – defaults

0 – Does not graduate, 1 - graduates

134

Assumption of binary logistic regression


135/212

Dependent variable must be binary (1 for the desired

outcome and 0 for the other outcome) for binary logisticregression

Dependent variable must be ordinal for Ordinal or

Multinormial logistic regression.

Does not need to make many of the assumptions of linearregression. Eg does not need to satisfy conditions of

linearity, normality, homoscedasticity and measurement

level.

Does not need a linear relationship between the dependentand independent variables.

Can handle all types of relationships because it uses non-

linear log transformation to predict odds-ratio.135


136/212

Eg

of Research Question: Do EDUC, PAEDUC and

MAEDUC predict

HAPPYrec

(1 = happy 0 = not


137/212

MAEDUC predict HAPPYrec (1 happy, 0 not

happy)

Record HAPPY to HAPPYrec. (Happy 1 and 2 recode to 1

and Happy 3 record to 0)

In SPSS: Analyze Regression Binary Regression.

Enter HAPPYrec into Dependent box.

Enter EDUC, PAEDUC and MAEDUC into Covariates

box.

Click Save – check Probabilities and Group membership

(In the datafile, the respondents will be classified into

groups)

Click Options – select Hosmer-Lemeshow goodness-of-fit

(to test to what extent the model fits the data) and Iteration

History. 137


138/212

In the Output: A) Step 1: is like the test of the null hypothesis when there

are no predictors in the equation. The prediction is 90.7%

accurate.

138

The predictors are

all not sig.


139/212


140/212

B) Step 2: when the predictors are entered,

140

All 3 predictors are

not sig. not

included in the model.

The percentage

accuracy is still

90.7%

One

-

way Repeated Measures


141/212

141

ANOVA

This analysis is used to compare one sample onthree or more variables.

Click Analyze General Linear Model

Repeated Measures You will get the Repeated Measures Define

Factors Dialogue box.

Example of Research Question: Are there

significant differences in Health1, Health2 andHealth3?

One

-

way Repeated Measures

ANOVA


142/212

142

ANOVA In Within-Subject Factor Name: box, type Health which is measured at 3

different times, (assuming). In the Number of Levels: type 3

Click Add.

Click Define and in the Repeated Dialogue Box click the 3 variables: Health1,Health2 and Health3.

If you want to compare this between male and female, click on the Betweensubjects variable – in this case – Sex and move it to Between-subjects Factors box.

Click on Options, then Display and click on Descriptive stats, Estimates ofeffect size, Homogeneity tests and power, then Continue.

Click on Plots, then click on Within group variable (in this case Health) andmove it to the box labeled Horizontal Axis.

In the Separate Lines box, click on the grouping variable (i.e. Race)

Click Add Click Continue and OK.


143/212

Mauchly's Test of Sphericityb

Measure: MEASURE_1

Epsilona

Test equality of Variance or Sphericity


144/212

144

.666 408.769 2 .000 .750 .751 .500

Within Subjects Ef fect

health

Mauchly 's W

Approx.

Chi-Square df Sig.

Greenhous

e-Geisser Huynh-Feldt Lower-bound

Epsilon

Tests the null hypothesis that the error covariance matrix of the orthonormalized transf ormed dependent variables is

proportional to an identity matrix.

May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in

the Tests of Within-Subjects Ef fects table.

a.

Design: Intercept

Within Subjects Design: health

b. Mauchly’s W sig at p < .05, there is sig difference in variance

among the 3 measures so statistical correction must be made

choose Hunyh-Feldt correction F value = 810.81 which is Sig

with df = 1.50 and 1513.40.If Mauchly’s W is NOT sig at p < .05, read F value in the

Sphericity Assumed row.

Tests of Wi thin-Subjects Effects

Measure: MEASURE_1

171.779 2 85.889 810.813 .000 .446 1621.626 1.000

171.779 1.500 114.546 810.813 .000 .446 1215.939 1.000

171.779 1.501 114.413 810.813 .000 .446 1217.347 1.000

171.779 1.000 171.779 810.813 .000 .446 810.813 1.000

213.555 2016 .106

213.555 1511.651 .141

213.555 1513.402 .141

213.555 1008.000 .212

Sphericity Assumed

Greenhouse-Geisser

Huynh-Feldt

Lower-bound

Sphericity Assumed

Greenhouse-Geisser

Huynh-Feldt

Lower-bound

Sourcehealth

Error(health)

Type III Sum

of Squares df Mean Square F Sig.

Part ial E ta

Squared

Noncent.

Parameter

Observed

Power

a

Computed using alpha = .05a.

Effect Size

df 1

df 2

Check Assumptions of Equality of

Error Variance and Equality of


145/212

145

Error Variance and Equality of

Covariances Matrices In the output check the Levene’s Test of

Equality of Error Variance. If not sig at

p =.05, then assumption of homogeneity ofvariances is not violated.

Then check Box’s Test of Equality of

Covariance Matrices. If sig at p = .001, thenassumption of equality of Covariance is

violated.

Tests of Within-Subjects Effects

Measure: MEASURE_1

S h i it A d

Source

HEALTH

Type III Sum


Partial Eta

Squared

Noncent.

Parameter

Observed

Power a


146/212

146

35.625 2 17.813 169.004 .000 .144 338.008 1.000

35.625 1.497 23.802 169.004 .000 .144 252.953 1.000

35.625 1.501 23.727 169.004 .000 .144 253.749 1.000

35.625 1.000 35.625 169.004 .000 .144 169.004 1.000

1.496 4 .374 3.548 .007 .007 14.193 .871

1.496 2.993 .500 3.548 .014 .007 10.621 .788

1.496 3.003 .498 3.548 .014 .007 10.655 .789

1.496 2.000 .748 3.548 .029 .007 7.096 .660

212.059 2012 .105

212.059 1505.710 .141

212.059 1510.448 .140

212.059 1006.000 .211

Sphericity Assumed

Greenhouse-Geisser

Huynh-Feldt

Lower-bound

Sphericity Assumed

Greenhouse-Geisser

Huynh-Feldt

Lower-bound

Sphericity Assumed

Greenhouse-Geisser

Huynh-Feldt

Lower-bound

HEALTH

HEALTH * RACE

Error(HEALTH)


Within Subject Table shows F is sig at p


147/212

147

2178.266 1 2178.266 17697.670 .000 .946 17697.670 1.000

.698 2 .349 2.836 .059 .006 5.672 .557

123.821 1006 .123

Source

Intercept

RACE

Error

of Squares df Mean Square F Sig. Squared Parameter Power a


If compared between subjects (Race – White, Black and Others)

RACE line shows F is not sig at p < .05. See Plot to

Confirm this.

Estimated Marginal Means of MEASURE_1

HEALTH

321

2.1

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

Race of Respondent

White

Black

Other

2.0

1.9

1.8Means

Estimated Marginal Means of MEASURE_1

Pairwise Comparisons

Measure: MEASURE_1

(J) health

(I) health

Mean

Dif f erence

(I-J) Std. Error Sig.a

Lower Bound Upper Bound

95% Confidence Interval for

Dif f erencea


148/212

148

321

health

1.8

1.7

1.6

1.5

1.4

Es

timatedMarginalM

APA style report:

There are sig differences in the health measures,

F (1.50, 1513.40) = 810.00, p < .05 with a moderate effect size

(Eta squared = .45). LSD (Least Sig Difference) comparisons revealthat Health3 is significantly higher than Health2 and Health 1 while

Health2 is significantly higher than Health1.

-.494* .017 .000 -.526 -.461

-.516* .016 .000 -.548 -.485

.494* .017 .000 .461 .526

-.023* .009 .016 -.041 -.004

.516* .016 .000 .485 .548

.023* .009 .016 .004 .041

(J) ea t

2

3

1

3

1

2

( ) ea t

1

2

3

( J) Std o Sg o e ou d Uppe ou d

Based on estimated marginal means

The mean dif f erence is signif icant at the .05 lev el.*.

Adjustment f or multiple comparisons: Least Signif icant Dif f erence (equivalent to no

adjustments).

a.

Exercise: Try exercise 23 on p. 12 (SPSS Module Part 2/Advanced)

Exercise 23 (additional Q)


149/212

149

1. Are there significant differences in EDUC, MAEDUand PAEDU?

2. Are there significant differences in EDUC,

PRESTIG80 and OCCAT80?

3. Assuming hlth1, hlth2 and hlth3 are interval data,

are there significant differences in these 3 variables?

For each analysis, write a report using the APA style.


150/212

150

Formulate a research question

based on your study which will

require One-way repeatedmeasures ANOVA

MULTIVARIATE ANOVA

(MANOVA)


151/212

151

(MANOVA)

MANOVA is used when you wish tocompare two or more dependent variables(INTERVAL DATA) among a grouping

independent variable (NOMINAL DATA),e.g. REGION.

For example, you wish to check whether

respondents in the various locations(REGION) (IV) defer in the level of EDUC,MAEDU and PAEDU (several DVs).

Assumptions of MANOVA 1) Sample size – each subgroup n > 30.


152/212

152

) p g p

2) Linearity between DVs. Can be tested using

Scatter-plots among pairs of the DVs across IVgroups. (Click Graph Legacy DialoguesScatter/PlotMatrix Scatter Define – send alldependent var to Matrix variable box, IV to row box

continue, OK) 3) Univariate and Multivariate Normality – Test

univariate normality using skewness and kurtosis (orKolmogorov-Smirnov) or use EXPLORE in

descriptive statistics (Box Plot). Test Multivariate Normality using Mahalanobis Distance in MultipleRegression Analysis (use ID as the Dependentvariable and the predictors as independent variable)


153/212

153

4) Univariate test of equality of variance – UseLevene’s test in Output to test this. If Levene’stest is not significant at p < .05 there is equalityof variance among each DV.

5) Homogeneity of variance – covariancematrices – Use the Box’s M test. If Box’s M isnot significant at p < .001 (you need to set at .001

because Box’s M test is very sensitive), it means

that there is homogeneity of variance-covariance). 6) Multicollinearity - use Pearson r (consider

removing one of the DV pairs with r > .8)

MULTIVARIATE ANOVA

(MANOVA)


154/212

154

(MANOVA)

Analyze General Linear ModelMultivariate

Send the DV to the Dependent variables box

The independent variable to the Fixed Factor box.

Click Options, click REGION and enter it intoDisplay Means

Click Compare Main Effects and click Bonferroniand check Descriptive Statistics and Homogeneitytests.

Click Continue and OK.


13.53 2.719 454

13.33 3.060 239

13.75 2.679 280

13.54 2.797 973

Region of the

North East

South East

West

Total

Highest Year of

School Completed

Mean Std. Dev iation N


155/212

155

11.20 3.218 454

10.59 3.466 239

11.10 3.633 280

11.02 3.409 97311.04 3.838 454

10.69 4.421 239

11.22 4.282 280

11.01 4.117 973

North East

South East

West

Total

North East

South East

West

Total

Highest Year School

Completed, Mother

Highest Year School

Completed, Father

The Box’s M tests the homogeneity of

the variance-covariance matrices at p < .001.

Box’s M is not significant at p < .001,

so there are no sig diff in the

variance-covariance homogeneity of

variance

Box's Test of Equality of Covariance Matricesa

26.711

2.215

12

2786265

.009

Box's M

F

df1

df2

Sig.

Tests the null hypothesis that the observed cov ariance

matrices of the dependent variables are equal across groups.

Design: Intercept+regiona.

Levene's Test of Equality of Error Variancesa

1.529 2 970 .217

4 363 2 970 013

Highest Year of

School Completed

Highest Year School

C l t d M th

F df 1 df 2 Sig.

The univariate tests for homogeneityof variance for each DV shows that

for EDUC (not sig at p < .05), there is

no sig diff in var there is equality


156/212

156

4.363 2 970 .013

5.416 2 970 .005

Completed, Mother

Highest Year School

Completed, Father

Tests the null hypothesis that the error variance of the dependent variableis equal across groups.

Design: Intercept+regiona.

no sig diff in var there is equality

of var.

For MAEDU and PAEDU sig diffno equality of variance need to

Interpret the F for MAEDU and PAEDU

at higher Alpha level say p < .01

Multivariate Tests

.008 1.323 6.000 1938.000 .243

.992 1.322b 6.000 1936.000 .244

.008 1.321 6.000 1934.000 .244

.006 1.800c 3.000 969.000 .146

Pillai's trace

Wilks' lambda

Hotelling's trace

Roy's largest root

Value F Hypothesis df Error df Sig.

Each F tests the multiv ariate eff ect of Region of the United States. These tests are

based on the linearly independent pairwise comparisons among the estimated

marginal means.


Exact statisticb.

The statistic is an upper bound on F that yields a lower bound on the

significance level.

c.

These Multivariate Tests test whether

there is sig group (REGION) diff

on the linear combination of the DVs.

Pillai’s Trace (most robust of statistic

against Violation of assumptions) is

NOT sig at p < .05so no sig

Multivariate Effect for REGION.

No need to interpret the univariate

between-subject (REGION).

Tests of Between-Subjects Effects

23.697b

2 11.848 1.516 .220 .003 3.033 .324

59.902c

2 29.951 2.586 .076 .005 5.172 .517

38 095d

2 19 047 1 124 325 002 2 248 249

Dependent VariableHighest Year of School Completed

Highest Year SchoolCompleted, Mother

Highest Year School

SourceCorrected Model

Type III Sum


Partial Eta

Squared

Noncent.

Parameter

Observed

Power a


157/212

157

38.095 2 19.047 1.124 .325 .002 2.248 .249

165616.187 1 165616.187 21194.722 .000 .956 21194.722 1.000

108579.532 1 108579.532 9375.498 .000 .906 9375.498 1.000

109037.361 1 109037.361 6433.135 .000 .869 6433.135 1.000

23.697 2 11.848 1.516 .220 .003 3.033 .324

59.902 2 29.951 2.586 .076 .005 5.172 .517

38.095 2 19.047 1.124 .325 .002 2.248 .249

7579.609 970 7.814

11233.765 970 11.581

16440.855 970 16.949

186109.000 973

129423.000 973

134366.000 973

7603.305 972

11293.667 972

16478.950 972

g

Completed, Father

Highest Year of School Completed


Highest Year SchoolCompleted, Father




Highest Year of

School CompletedHighest Year SchoolCompleted, Mother

Highest Year School

Completed, Father

Highest Year of

School Completed






Intercept

region

Error

Total

Corrected Total


R Squared = .003 (Adjusted R Squared = .001)b.

R Squared = .005 (Adjusted R Squared = .003)c.

R Squared = .002 (Adjusted R Squared = .000)d.

As shown in Pillai’s

Trace test that

multivariate tests are

not sig, (using Bonferroni Correction,

alpha = .05/3 = .017). There are no

significant EDUC, MAEDU and PAEDU

differences by REGION

APA report


158/212

158

p

MANOVA was undertaken to investigate Region differences in

PAEDUC, MAEDUC, EDUC. All assumptions relating to normality,

linearity, univariate and multivariate outliers (Mahalanobis Distance

within required limits) , homogeneity of variance – covariance

matrices (Box’s M was not sig at p .05.

Note:

(If F is significant, you will need to state Pillai’s trace and effect size –

partial eta squared. Check the mean scores of the DV that is significantfor the 3 regions to check which two regions this DV is significantly

different)

Another example of MANOVA output


159/212

159

Statistical assumptions of the analyses are met, and descriptive

statistics are reported in Table xx. A one-way between-groups

MANOVA partially supported the first hypothesis of

there being a difference in procrastination types between

students and white-collar workers, Pillai’s Trace=.05,

F (3, 181) = 3.2, p=.03, η p2 = .05, power =.73.

Another eg of MANOVA table with Tukey

Jin Hwang, YoungHo Kim (2011). Adolescents’ physical activity and its

related cognitive and behavioral processes, Biology of Sports, 28, 19-22. (ISI TIER 4)


160/212

160

g p gy p ( )

DISCRIMINANT ANALYSIS


161/212

161

Is used when you wish to find out, for example,students with which personality characteristics or

interests (Independent Scale data) will be choosing

which career (Dependent Nominal data).

So the independent variable will be the students’

personality characteristics or interests e.g.

extrovert, creative, etc (Scale Data) and the

dependent variable will be the choice of the careere.g. Medicine or Architecture (Nominal Data)


162/212


163/212

163

To analyze click :

ANALYZE CLASSIFY DISCRIMINANT

Let’s say you wish to find out if you classify

students into Happy, Pretty Happy and Not So

Happy (assume Nominal Variable - HAPPY)

using the information from the Age, EDUC and

Prestig80.

Move the dependent variable (e. g Career) to GroupingVariable Click Define Range to indicate how many


164/212

164

Variable. Click Define Range to indicate how manydifferent types of Career you wish to study and indicate theMaximum and Minimum number.

Click independent variables (e.g. Personality variables) tothe independents box.

Click Use Stepwise Method.

Click STATISTICS, and select Means, UnivariateANOVAs, Box’s M and Unstandardized FunctionCoefficients and Total Covariance Matrix and SeparateGroup Covariance. Click Continue.

Click CLASSIFY and select Summary table, clickContinue.

Click METHOD button – Wilk’s Lambda selected asdefault as statistic that will be used for the addition and


165/212

default as statistic that will be used for the addition andsubtraction of variables to and from the discriminant

functions. The criteria set for entry and removal are 3.84and 2.71 respectively. [Or check the lower radio button toset using the F values i.e. at .05 and .01]

Click SAVE to get Discriminant Analysis: Save dialogue

box which give Discriminant Scores and Predicted GroupMembership in the Data File.

If you wish to analyze for Male students only, you can useSelection Variable and click 1 for male in the Value Box.

Then click OK to execute the Discriminant Analysis

165

OUTPUTGroup Statistics

47 28 17 766 441 441 000Age of Respondent

General Happiness

Very Happy

Mean Std. Deviat ion Unweighted Weighted

Valid N (listwise)

No of

respondents

in each


166/212

166

47.28 17.766 441 441.000

13.52 2.987 441 441.000

45.19 12.883 441 441.000

44.82 17.422 814 814.000

12.87 2.914 814 814.000

42.22 12.925 814 814.000

46.66 17.329 147 147.000

12.28 2.835 147 147.000

40.35 13.653 147 147.000

45.79 17.547 1402 1402.000

13.01 2.952 1402 1402.000

42.96 13.080 1402 1402.000

Age of Respondent

Highest Year of School

Completed

R's OccupationalPrestige Score (1980)

Age of Respondent


Completed

R's Occupational

Prestige Score (1980)

Age of Respondent


Completed

R's Occupational


Age of Respondent


Completed

R's Occupational


Very Happy

Pretty Happy

Not Too Happy

Total

group

Tests of Equality of Group Means

.996 3.018 2 1399 .049

.983 12.109 2 1399 .000

Age of Respondent


Completed

Wilks'

Lambda F df 1 df 2 Sig.

There is sig diff

among the 3 group

(Happy, Pretty Happy

Not so Happy) on the


167/212

167

.985 10.823 2 1399 .000

Completed

R's Occupational

Prest ige Score (1980)

Not so Happy) on the

3 IVS (AGE, EDUC,

PRESTIG80) at p < .05

Variables in the Analysis

1.000 12.109

.929 15.166 .996

.929 6.042 .983

Highest Year of

School Completed


Age of Respondent

Step

1

2

Tolerance F to Remove

Wilks'

Lambda

High Tolerance value

means that IVs can contribute

to the discrimination.

“F to Remove” tests the sig of

the decrease in discrimination

if the variable is removed.

Since Prestig80 has F less than

2.71 (default) i.e. 1.993, it is

removed from prediction.

Variables Not in the Analysis

1.000 1.000 3.018 .996

1.000 1.000 12.109 .983

1.000 1.000 10.823 .985

.929 .929 6.042 .975

.737 .737 3.146 .979

.716 .665 1.993 .972

Age of Respondent


Completed

R's Occupational


Age of Respondent

R's Occupational


R's Occupational


Step

0

1

2

Tolerance

Min.

Tolerance F to Enter

Wilks'

Lambda

Eigenvalues

.024a 91.1 91.1 .152

.002a 8.9 100.0 .048

Function

1

2

Eigenvalue % of Variance Cumulativ e %

Canonical

Correlation

First 2 canonical discriminant f unctions were used in thea

Function 1 has

The highest

% of variance


168/212

168

First 2 canonical discriminant f unctions were used in the

analysis.

a.

Wilks' Lambda

.975 36.038 4 .000

.998 3.244 1 .072

Test of Function(s)

1 through 2

2

Wilks'

Lambda Chi-square df Sig.

Wilks’ Lambda

is sig for Function

1 and 2.

Structure Matrix

.837* -.547

.509* -.160

.305 .952*


Completed

R's Occupational

Prestige Score (1980)a

Age of Respondent

1 2

Function

Pooled within-groups correlations between discriminating

variables and standardized canonical discriminant f unctions

Variables ordered by absolute size of correlation within f unction.

Largest absolute correlation between each v ariable and

any discriminant f unct ion

*.

This variable not used in the analysis.a.

Classification Resultsa

214 105 148 467

310 227 329 866

General HappinessVery Happy

Pretty Happy

CountOriginalVery Happy Pretty Happy

Not TooHappy

Predicted Group Membership

Total


169/212

169

47 39 77 163

5 1 6 12

45.8 22.5 31.7 100.0

35.8 26.2 38.0 100.0

28.8 23.9 47.2 100.0

41.7 8.3 50.0 100.0

Not Too Happy

Ungrouped cases

Very Happy

Pretty Happy

Not Too Happy

Ungrouped cases

%

34.6% of original grouped cases correctly classif ied.a.

The success rate of

predicting HAPPY using

EDUC, AGE and

PRESTIG80 is 34.6%

Those in Not Too Happy

were most accuratelyclassified (47.2%) followed

by those in Very Happy (45.8%).

Pretty Happy is least successfully

classified (26.2%)

Those not classified in Very Happy

tend to be classified as Pretty Happy

than in Not Too Happy


170/212

170

Note: if we click ‘Save’ and ‘Predicted

Group Membership’ you will get a column

in the datafile with the predicted group each

respondent will belong to!

Testing for

Moderating

Effects of

a Variable


171/212

171

a Variable

Use Multiple Regression with the

Moderating Variable as Dummy Variable.

Eg. If sex is the moderating variable,RECODE Male = 1 and Female = 0 to s