Action Research More Crosstab Measures

50
INFO 515 Lecture #9 1 Action Research More Crosstab Measures INFO 515 Glenn Booker

description

Action Research More Crosstab Measures. INFO 515 Glenn Booker. Nominal Crosstab Tests. Four more measures which could apply to nominal data in a crosstab Eta Lambda Goodman and Kruskal’s tau Uncertainty coefficient. Eta Coefficient. - PowerPoint PPT Presentation

Transcript of Action Research More Crosstab Measures

Page 1: Action Research More Crosstab Measures

INFO 515 Lecture #9 1

Action ResearchMore Crosstab Measures

INFO 515Glenn Booker

Page 2: Action Research More Crosstab Measures

INFO 515 Lecture #9 2

Nominal Crosstab Tests Four more measures which could apply to nominal data in a crosstab Eta Lambda Goodman and Kruskal’s tau Uncertainty coefficient

Page 3: Action Research More Crosstab Measures

INFO 515 Lecture #9 3

Eta Coefficient Used when the dependent variable uses

an interval or ratio scale, and the independent variable is nominal or ordinal

Eta () squared is the proportion of the dependent variable’s variance which is explained by the independent variable Eta squared is symmetric, and ranges from

0 to 1 This is the same eta from the end of lecture 6

Page 4: Action Research More Crosstab Measures

INFO 515 Lecture #9 4

Directional vs Symmetric Directional measures give a different

answer depending on whether A is dependent on B, or B is dependent on A

Symmetric measures don’t care which variable is dependent or independent

Tests indicate whether there is a statistically significant relationship; measures, here, describe the strength of association

Page 5: Action Research More Crosstab Measures

INFO 515 Lecture #9 5

Directional Measures Directional measures help determine how

much the dependent variable is affected by the independent variable

Directional measures for nominal data: Lambda (recommended) Goodman and Kruskal’s tau Uncertainty coefficient

Page 6: Action Research More Crosstab Measures

INFO 515 Lecture #9 6

Directional Measures Directional measures generally range

from 0 to 1 A value of 0 means the independent

variable doesn’t help predict the dependent variable

A value of 1 means the independent variable perfectly predicts the resulting dependent variable

Page 7: Action Research More Crosstab Measures

INFO 515 Lecture #9 7

Directional Measures In this context, either variable can be

considered dependent or independent Does A predict B? Does B predict A?

A “symmetric” value is the weighted average of the two possible selections (A predicts B, or B predicts A)

Page 8: Action Research More Crosstab Measures

INFO 515 Lecture #9 8

Proportional Reduction in Error Proportional Reduction in Error (PRE)

measures find the fractional reduction in errors due to some factor (such as an independent variable) PRE = (Error without X – Error with X) / Error with X

Two we’ll look at are Lambda, and Goodman and Kruskal’s Tau

Page 9: Action Research More Crosstab Measures

INFO 515 Lecture #9 9

Lambda Coefficient Lambda has a symmetric option for output Its Value is the proportion of the

dependent variable predicted by the independent one

The Asymptotic Std. Error allows a 95% confidence interval to be made

“Approx. T” is the Value divided by the Std. Error if the parameter were zero (not the usual definition!)

Page 10: Action Research More Crosstab Measures

INFO 515 Lecture #9 10

Goodman and Kruskal’s Tau SPSS note: Goodman and Kruskal’s Tau is

not directly selected; it appears only when Lambda is checked!

Does not have Symmetric option Does not approximate T Based on chi square Otherwise similar to Lambda for

interpretation

Page 11: Action Research More Crosstab Measures

INFO 515 Lecture #9 11

Uncertainty Coefficient Does have symmetric dependency option Does have T approximation Also based on chi square Goodman and Kruskal’s tau and the

Uncertainty Coefficient may give opposite results as Lambda, so use them cautiously!

Page 12: Action Research More Crosstab Measures

INFO 515 Lecture #9 12

Nominal Example Use “GSS91 political.sav” data set Use Analyze / Descriptive Statistics /

Crosstabs… Select “region” for Row(s), and “relig”

for Column(s) Under “Statistics…” select Lambda,

and Uncertainty Coefficient

Page 13: Action Research More Crosstab Measures

INFO 515 Lecture #9 13

Nominal ExampleDirectional Measures

.042 .009 4.336 .000

.048 .012 4.083 .000

.028 .017 1.648 .099

.017 .003 .000c

.073 .010 .000c

.044 .006 7.720 .000d

.033 .004 7.720 .000d

.070 .009 7.720 .000d

Symmetric

REGION OFINTERVIEW Dependent

RS RELIGIOUSPREFERENCEDependent

REGION OFINTERVIEW Dependent

RS RELIGIOUSPREFERENCEDependent

Symmetric

REGION OFINTERVIEW Dependent

RS RELIGIOUSPREFERENCEDependent

Lambda

Goodman andKruskal tau

Uncertainty Coefficient

Nominal byNominal

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on chi-square approximationc.

Likelihood ratio chi-square probability.d.

Page 14: Action Research More Crosstab Measures

INFO 515 Lecture #9 14

Nominal Example - Lambda Focus on the Lambda () output first Lambda measures the percent of error

reduction when using the independent variable to predict the dependent variable Calculation based on any desired outcome

contributing to lambda Lambda ranges from 0 to 1

Page 15: Action Research More Crosstab Measures

INFO 515 Lecture #9 15

Nominal Example As usual, we want Sig. < 0.050 for the

meaning of lambda to be statistically significant

If Region is dependent, then we see that religious preference is a significant (sig. = 0.000) predictor “relig” contributes (Value) 4.8% +/- (Std Error)

1.2% of the variability of a person’s region

Page 16: Action Research More Crosstab Measures

INFO 515 Lecture #9 16

Lambda Example 95% confidence interval of that contribution is

(not shown) 4.8 – 2*1.2 = 2.4% to 4.8 + 2*1.2 = 7.2%

But “region” is not a significant predictor of “relig” (sig. = 0.099)

Ignore the value of lambda if it isn’t significant

The symmetric value is significant, and its Value is between the other two lambda values

Page 17: Action Research More Crosstab Measures

INFO 515 Lecture #9 17

G and K Tau Example Goodman and Kruskal’s tau () is similar to

lambda, but is based on predictions in the same proportion as the marginal totals (individual row or column subtotals) No symmetric value is given – it’s only

directional Same method for interpretation, but notice

it predicts both variables can be significant as dependent, and ‘relig’ is much stronger!

Still from slide 13

Page 18: Action Research More Crosstab Measures

INFO 515 Lecture #9 18

Uncertainty Coefficient Example Is a measure of association that indicates

the proportional reduction in error when values of one variable are used to predict values of the other variable

The program calculates both symmetric and directional versions of it

Here, gives results similar to G and K Tau

Page 19: Action Research More Crosstab Measures

INFO 515 Lecture #9 19

Tests for 2x2 Tables Many special measures can be applied to a

2x2 table, including: Relative risk Odds ratio

Look at these in the context of answering questions like: “Are people who approve of women working more likely to vote for a woman President?”

Page 20: Action Research More Crosstab Measures

INFO 515 Lecture #9 20

Tests for 2x2 Tables Use “GSS91 social.sav” data set Variables are “should women work”

(fework) and “vote for woman president” (fepres)

Isolate the cases using Data / Select Cases Use the If condition

(fepres=1 | fepres=2) & (fework=1 | fework=2)

‘|’ means ‘or’; ‘&’ means ‘and’

Page 21: Action Research More Crosstab Measures

INFO 515 Lecture #9 21

Tests for 2x2 Tables Use Analyze / Descriptive Statistics /

Crosstabs… Select “fework” for Row(s), and “fepres”

for Column(s) For Statistics select Risk For Cells select Row percentages This gives 947 valid cases

Page 22: Action Research More Crosstab Measures

INFO 515 Lecture #9 22

Tests for 2x2 Tables

SHOULD WOMEN WORK * VOTE FOR WOMAN PRESIDENT Crosstabulation

713 50 763

93.4% 6.6% 100.0%

146 38 184

79.3% 20.7% 100.0%

859 88 947

90.7% 9.3% 100.0%

Count

% within SHOULDWOMEN WORK

Count

% within SHOULDWOMEN WORK

Count

% within SHOULDWOMEN WORK

APPROVE

DISAPPROVE

SHOULD WOMENWORK

Total

YES NO

VOTE FOR WOMANPRESIDENT

Total

Page 23: Action Research More Crosstab Measures

INFO 515 Lecture #9 23

Tests for 2x2 Tables

‘cohort’ = subset

Risk Estimate

3.712 2.348 5.867

1.178 1.091 1.271

.317 .215 .469

947

Odds Ratio forSHOULD WOMENWORK (APPROVE /DISAPPROVE)

For cohort VOTEFOR WOMANPRESIDENT = YES

For cohort VOTEFOR WOMANPRESIDENT = NO

N of Valid Cases

Value Lower Upper

95% ConfidenceInterval

Page 24: Action Research More Crosstab Measures

INFO 515 Lecture #9 24

Relative Risk The relative risk is a ratio of percentages It is very directional Those who (approve of voting for a woman

president) are 1.178 times as likely to (approve of women working) Based on 93.4%/79.3% = 1.178 Note the 95% confidence intervals for each

ratio are given; roughly 1.09 to 1.27 for this example

Page 25: Action Research More Crosstab Measures

INFO 515 Lecture #9 25

Relative Risk Conversely, those who do not approve of

voting for a woman president are 0.317 times as likely to approve of women working (6.6/20.7=0.317), with a broader confidence interval of 0.22 to 0.47

Page 26: Action Research More Crosstab Measures

INFO 515 Lecture #9 26

Odds Ratio The odds ratio is the ratio of (the

probability that the event occurs) to (the probability that the event does not occur)

The odds ratio that someone who (would vote for a woman president) also (approves of women working) has two terms One is the ratio of (those who approve of

women working) divided by (voting for a woman president) (93.4/6.6=14.152)...

Page 27: Action Research More Crosstab Measures

INFO 515 Lecture #9 27

Odds Ratio Divided by the ratio of (those who would

NOT approve of women working) (voting for a woman president) (79.3/20.7=3.831)

Hence the odds ratio is 14.152/3.831 =3.694 or (93.4*20.7)/(6.6*79.3)

Round off error, probably in the 6.6 value, kept us from getting the stated odds ratio of 3.712 (first row of output on slide 23)

Page 28: Action Research More Crosstab Measures

INFO 515 Lecture #9 28

Square Tables (RxR) Tables with the same number of rows as

columns (RxR tables) also have special measures Cohen’s Kappa (), which measures the

strength of agreement (did two people’s measurements match well?)

Applies for R values of one nominal variable

Page 29: Action Research More Crosstab Measures

INFO 515 Lecture #9 29

Kappa Kappa is used only when the rows and

columns have the same categories Set of possible diagnoses achieved by two

different doctors Two sets of outcomes which are believed to

be dependent on each other Kappa ranges from zero to one; is

one when the diagonal has the only non-zero values

Page 30: Action Research More Crosstab Measures

INFO 515 Lecture #9 30

Kappa Example Example here is the educational level of

one’s parents (maeduc and paeduc; as in ‘ma and pa education’)

Use “GSS91 social.sav” data set Define new variables madeg and padeg,

which are derived from maeduc and paeduc (convert years of education into rough levels of achievement)

Page 31: Action Research More Crosstab Measures

INFO 515 Lecture #9 31

Kappa Example New scale for madeg and padeg is

Education <12 is code 1, “LT High School” Education 12-15 is code 2, “High School” Education 16 is code 3, “Bachelor degree” Education 17+ is code 4, “Graduate”

Use Analyze / Descriptive Statistics / Crosstabs…

Page 32: Action Research More Crosstab Measures

INFO 515 Lecture #9 32

Kappa Example Select “padeg” for Row(s), and “madeg”

for Column(s) For Statistics select Kappa The basic crosstab just shows the data

counts (next slide) Then we get the Kappa measure (slide

after next) As usual, check to make sure the result

is significant before going any further

Page 33: Action Research More Crosstab Measures

INFO 515 Lecture #9 33

Kappa Example

Father's education * Mother's education Crosstabulation

Count

284 125 6 55 470

77 270 23 41 411

5 61 24 13 103

137 149 26 204 516

503 605 79 313 1500

LT High School

High School

Bachelor degree

Graduate

Father'seducation

Total

LT HighSchool High School

Bachelordegree Graduate

Mother's education

Total

Page 34: Action Research More Crosstab Measures

INFO 515 Lecture #9 34

Kappa Example

Symmetric Measures

.325 .018 20.332 .000

1500

KappaMeasure of Agreement

N of Valid Cases

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Page 35: Action Research More Crosstab Measures

INFO 515 Lecture #9 35

Kappa Example Here the significance is 0.000, very clearly

significant (< 0.050) This is confirmed by the approximate T of

over 20 - as before, this T is based on the null hypothesis

The actual value of kappa and its standard error are 0.325 +/- 0.018

What does this mean?

Page 36: Action Research More Crosstab Measures

INFO 515 Lecture #9 36

Kappa Kappa is judged on a fairly fixed scale

Kappa below 0.40 indicates poor agreement beyond chance

Kappa from 0.40 to 0.75 is fair to good agreement

Kappa above 0.75 is strong agreement So in this case we are confident there

is poor agreement between parents’ education

Scale from J.L. Fleiss, 1981

Page 37: Action Research More Crosstab Measures

INFO 515 Lecture #9 37

Ordinal Crosstab Measures Several association measures can be used

for a table with R rows and C columns which contain ordinal data (and presumably R ≠ C) Kendall’s tau-b Kendall’s tau-c (Goodman and Kruskal’s) Gamma (preferred) Somers’ d Spearman’s Correlation Coefficient

Page 38: Action Research More Crosstab Measures

INFO 515 Lecture #9 38

General RxC Table Measures Many are based on comparing adjacent

pairs of data from the two variables If B increases when A increases, the pair

is concordant If B decreases when A increases, the pair

is discordant If A and B are equal, the pair is tied

Page 39: Action Research More Crosstab Measures

INFO 515 Lecture #9 39

General RxC Table Measures The number of concordant pairs is “P” The number of discordant pairs is “Q” The number of ties on X are “Tx” The number of ties on Y are “Ty” The smaller of the number of rows R and

columns C is called “m” m = min(R,C)

Given this vocabulary, we can define many measures

Page 40: Action Research More Crosstab Measures

INFO 515 Lecture #9 40

General RxC Table Measures Kendall’s tau-b is

tau-b = (P-Q) / sqrt[ (P+Q+Tx)*(P+Q+Ty) ] Kendall’s tau-c is

tau-c = 2m*(P-Q) / [N2*(m-1)] Gamma () is

Gamma = (P-Q) / (P+Q) Somers’ d is

dy = (P-Q) / (P+Q+Ty) or dx = (P-Q) / (P+Q+Tx)

Page 41: Action Research More Crosstab Measures

INFO 515 Lecture #9 41

General RxC Table Measures All of the RxC measures are symmetric

except Somers’ d, which has both symmetric and directional values given

All are evaluated by their significance, which also has an approximate T score

All are expressed by a Value +/- its Std Error

Page 42: Action Research More Crosstab Measures

INFO 515 Lecture #9 42

RxC Measures Example Use “GSS91 social.sav” data set Use Analyze / Descriptive Statistics /

Crosstabs… Select “paeduc” for Row(s), and “maeduc”

for Column(s) Under “Statistics…” select Eta,

Correlations, Gamma, Somers’ d, Kendall’s tau-b and tau-c

Page 43: Action Research More Crosstab Measures

INFO 515 Lecture #9 43

RxC Measures Example This compares the number of years of

education of one’s mother and father to see how strongly they affect one another

The crosstab data table is very large, since it ranges from 0 to 20 for each category, with irregular gaps (we’re not using the simplified categories from the Kappa example) Hence we’re not showing it here!

Page 44: Action Research More Crosstab Measures

INFO 515 Lecture #9 44

RxC Measures Example

Both measures show the mother’s education is a slightly better predictor

Directional Measures

.549 .019 26.767 .000

.568 .019 26.767 .000

.531 .019 26.767 .000

.692

.688

Symmetric

HIGHEST YEARSCHOOL COMPLETED,FATHER Dependent

HIGHEST YEARSCHOOL COMPLETED,MOTHER Dependent

HIGHEST YEARSCHOOL COMPLETED,FATHER Dependent

HIGHEST YEARSCHOOL COMPLETED,MOTHER Dependent

Somers' dOrdinal by Ordinal

EtaNominal by Interval

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Page 45: Action Research More Crosstab Measures

INFO 515 Lecture #9 45

RxC Measures Example Directional measures:

Somers’ d is significant It shows that there are about 55% +/- 2% more

concordant pairs than discordant ones, excluding ties on the independent variable

The Eta measure shows that around 69% of the variability of one parent’s education is shared with the other’s

Page 46: Action Research More Crosstab Measures

INFO 515 Lecture #9 46

RxC Measures ExampleSymmetric Measures

.549 .019 26.767 .000

.486 .018 26.767 .000

.637 .020 26.767 .000

.663 .021 27.381 .000c

.675 .020 28.314 .000c

959

Kendall's tau-b

Kendall's tau-c

Gamma

Spearman Correlation

Ordinal byOrdinal

Pearson's RInterval by Interval

N of Valid Cases

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

Page 47: Action Research More Crosstab Measures

INFO 515 Lecture #9 47

RxC Measures Example All of the symmetric measures are

statistically significant, with approximate t values around 27-28 The Kendall tau-b and tau-c measures disagree

a little on the magnitude of the agreement Gamma and Spearman give fairly strong

positive correlations

Page 48: Action Research More Crosstab Measures

INFO 515 Lecture #9 48

RxC Measures Example Spearman, like ‘r’, ranges from -1 to +1, and

does not require a normal distribution Based on ordered categories, not their values

Even ‘r’ can be calculated for this case, and it gives results similar to Gamma and Spearman

Page 49: Action Research More Crosstab Measures

INFO 515 Lecture #9 49

Yule’s Q A special case of gamma for a 2x2 table is

called Yule’s Q It is appropriate for ordinal data in 2x2

tables; so values for each variable are Low/High, Yes/No, or similar

Define Yule’s Q = (a*d – b*c) / (a*d + b*c) See PDF page 59 of Action Research handout

for the definition of a, b, c, and d (cell labels)

Page 50: Action Research More Crosstab Measures

INFO 515 Lecture #9 50

Yule’s Q Measures the strength and direction of

association from -1 (perfect negative association) to 0 (no association) to +1 (perfect positive association)

Judge the results for Yule’s Q by the table on page 59 of Action Research handout ; and see pages 58-64 for other related discussion