Action Research More Crosstab Measures
description
Transcript of Action Research More Crosstab Measures
INFO 515 Lecture #9 1
Action ResearchMore Crosstab Measures
INFO 515Glenn Booker
INFO 515 Lecture #9 2
Nominal Crosstab Tests Four more measures which could apply to nominal data in a crosstab Eta Lambda Goodman and Kruskal’s tau Uncertainty coefficient
INFO 515 Lecture #9 3
Eta Coefficient Used when the dependent variable uses
an interval or ratio scale, and the independent variable is nominal or ordinal
Eta () squared is the proportion of the dependent variable’s variance which is explained by the independent variable Eta squared is symmetric, and ranges from
0 to 1 This is the same eta from the end of lecture 6
INFO 515 Lecture #9 4
Directional vs Symmetric Directional measures give a different
answer depending on whether A is dependent on B, or B is dependent on A
Symmetric measures don’t care which variable is dependent or independent
Tests indicate whether there is a statistically significant relationship; measures, here, describe the strength of association
INFO 515 Lecture #9 5
Directional Measures Directional measures help determine how
much the dependent variable is affected by the independent variable
Directional measures for nominal data: Lambda (recommended) Goodman and Kruskal’s tau Uncertainty coefficient
INFO 515 Lecture #9 6
Directional Measures Directional measures generally range
from 0 to 1 A value of 0 means the independent
variable doesn’t help predict the dependent variable
A value of 1 means the independent variable perfectly predicts the resulting dependent variable
INFO 515 Lecture #9 7
Directional Measures In this context, either variable can be
considered dependent or independent Does A predict B? Does B predict A?
A “symmetric” value is the weighted average of the two possible selections (A predicts B, or B predicts A)
INFO 515 Lecture #9 8
Proportional Reduction in Error Proportional Reduction in Error (PRE)
measures find the fractional reduction in errors due to some factor (such as an independent variable) PRE = (Error without X – Error with X) / Error with X
Two we’ll look at are Lambda, and Goodman and Kruskal’s Tau
INFO 515 Lecture #9 9
Lambda Coefficient Lambda has a symmetric option for output Its Value is the proportion of the
dependent variable predicted by the independent one
The Asymptotic Std. Error allows a 95% confidence interval to be made
“Approx. T” is the Value divided by the Std. Error if the parameter were zero (not the usual definition!)
INFO 515 Lecture #9 10
Goodman and Kruskal’s Tau SPSS note: Goodman and Kruskal’s Tau is
not directly selected; it appears only when Lambda is checked!
Does not have Symmetric option Does not approximate T Based on chi square Otherwise similar to Lambda for
interpretation
INFO 515 Lecture #9 11
Uncertainty Coefficient Does have symmetric dependency option Does have T approximation Also based on chi square Goodman and Kruskal’s tau and the
Uncertainty Coefficient may give opposite results as Lambda, so use them cautiously!
INFO 515 Lecture #9 12
Nominal Example Use “GSS91 political.sav” data set Use Analyze / Descriptive Statistics /
Crosstabs… Select “region” for Row(s), and “relig”
for Column(s) Under “Statistics…” select Lambda,
and Uncertainty Coefficient
INFO 515 Lecture #9 13
Nominal ExampleDirectional Measures
.042 .009 4.336 .000
.048 .012 4.083 .000
.028 .017 1.648 .099
.017 .003 .000c
.073 .010 .000c
.044 .006 7.720 .000d
.033 .004 7.720 .000d
.070 .009 7.720 .000d
Symmetric
REGION OFINTERVIEW Dependent
RS RELIGIOUSPREFERENCEDependent
REGION OFINTERVIEW Dependent
RS RELIGIOUSPREFERENCEDependent
Symmetric
REGION OFINTERVIEW Dependent
RS RELIGIOUSPREFERENCEDependent
Lambda
Goodman andKruskal tau
Uncertainty Coefficient
Nominal byNominal
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on chi-square approximationc.
Likelihood ratio chi-square probability.d.
INFO 515 Lecture #9 14
Nominal Example - Lambda Focus on the Lambda () output first Lambda measures the percent of error
reduction when using the independent variable to predict the dependent variable Calculation based on any desired outcome
contributing to lambda Lambda ranges from 0 to 1
INFO 515 Lecture #9 15
Nominal Example As usual, we want Sig. < 0.050 for the
meaning of lambda to be statistically significant
If Region is dependent, then we see that religious preference is a significant (sig. = 0.000) predictor “relig” contributes (Value) 4.8% +/- (Std Error)
1.2% of the variability of a person’s region
INFO 515 Lecture #9 16
Lambda Example 95% confidence interval of that contribution is
(not shown) 4.8 – 2*1.2 = 2.4% to 4.8 + 2*1.2 = 7.2%
But “region” is not a significant predictor of “relig” (sig. = 0.099)
Ignore the value of lambda if it isn’t significant
The symmetric value is significant, and its Value is between the other two lambda values
INFO 515 Lecture #9 17
G and K Tau Example Goodman and Kruskal’s tau () is similar to
lambda, but is based on predictions in the same proportion as the marginal totals (individual row or column subtotals) No symmetric value is given – it’s only
directional Same method for interpretation, but notice
it predicts both variables can be significant as dependent, and ‘relig’ is much stronger!
Still from slide 13
INFO 515 Lecture #9 18
Uncertainty Coefficient Example Is a measure of association that indicates
the proportional reduction in error when values of one variable are used to predict values of the other variable
The program calculates both symmetric and directional versions of it
Here, gives results similar to G and K Tau
INFO 515 Lecture #9 19
Tests for 2x2 Tables Many special measures can be applied to a
2x2 table, including: Relative risk Odds ratio
Look at these in the context of answering questions like: “Are people who approve of women working more likely to vote for a woman President?”
INFO 515 Lecture #9 20
Tests for 2x2 Tables Use “GSS91 social.sav” data set Variables are “should women work”
(fework) and “vote for woman president” (fepres)
Isolate the cases using Data / Select Cases Use the If condition
(fepres=1 | fepres=2) & (fework=1 | fework=2)
‘|’ means ‘or’; ‘&’ means ‘and’
INFO 515 Lecture #9 21
Tests for 2x2 Tables Use Analyze / Descriptive Statistics /
Crosstabs… Select “fework” for Row(s), and “fepres”
for Column(s) For Statistics select Risk For Cells select Row percentages This gives 947 valid cases
INFO 515 Lecture #9 22
Tests for 2x2 Tables
SHOULD WOMEN WORK * VOTE FOR WOMAN PRESIDENT Crosstabulation
713 50 763
93.4% 6.6% 100.0%
146 38 184
79.3% 20.7% 100.0%
859 88 947
90.7% 9.3% 100.0%
Count
% within SHOULDWOMEN WORK
Count
% within SHOULDWOMEN WORK
Count
% within SHOULDWOMEN WORK
APPROVE
DISAPPROVE
SHOULD WOMENWORK
Total
YES NO
VOTE FOR WOMANPRESIDENT
Total
INFO 515 Lecture #9 23
Tests for 2x2 Tables
‘cohort’ = subset
Risk Estimate
3.712 2.348 5.867
1.178 1.091 1.271
.317 .215 .469
947
Odds Ratio forSHOULD WOMENWORK (APPROVE /DISAPPROVE)
For cohort VOTEFOR WOMANPRESIDENT = YES
For cohort VOTEFOR WOMANPRESIDENT = NO
N of Valid Cases
Value Lower Upper
95% ConfidenceInterval
INFO 515 Lecture #9 24
Relative Risk The relative risk is a ratio of percentages It is very directional Those who (approve of voting for a woman
president) are 1.178 times as likely to (approve of women working) Based on 93.4%/79.3% = 1.178 Note the 95% confidence intervals for each
ratio are given; roughly 1.09 to 1.27 for this example
INFO 515 Lecture #9 25
Relative Risk Conversely, those who do not approve of
voting for a woman president are 0.317 times as likely to approve of women working (6.6/20.7=0.317), with a broader confidence interval of 0.22 to 0.47
INFO 515 Lecture #9 26
Odds Ratio The odds ratio is the ratio of (the
probability that the event occurs) to (the probability that the event does not occur)
The odds ratio that someone who (would vote for a woman president) also (approves of women working) has two terms One is the ratio of (those who approve of
women working) divided by (voting for a woman president) (93.4/6.6=14.152)...
INFO 515 Lecture #9 27
Odds Ratio Divided by the ratio of (those who would
NOT approve of women working) (voting for a woman president) (79.3/20.7=3.831)
Hence the odds ratio is 14.152/3.831 =3.694 or (93.4*20.7)/(6.6*79.3)
Round off error, probably in the 6.6 value, kept us from getting the stated odds ratio of 3.712 (first row of output on slide 23)
INFO 515 Lecture #9 28
Square Tables (RxR) Tables with the same number of rows as
columns (RxR tables) also have special measures Cohen’s Kappa (), which measures the
strength of agreement (did two people’s measurements match well?)
Applies for R values of one nominal variable
INFO 515 Lecture #9 29
Kappa Kappa is used only when the rows and
columns have the same categories Set of possible diagnoses achieved by two
different doctors Two sets of outcomes which are believed to
be dependent on each other Kappa ranges from zero to one; is
one when the diagonal has the only non-zero values
INFO 515 Lecture #9 30
Kappa Example Example here is the educational level of
one’s parents (maeduc and paeduc; as in ‘ma and pa education’)
Use “GSS91 social.sav” data set Define new variables madeg and padeg,
which are derived from maeduc and paeduc (convert years of education into rough levels of achievement)
INFO 515 Lecture #9 31
Kappa Example New scale for madeg and padeg is
Education <12 is code 1, “LT High School” Education 12-15 is code 2, “High School” Education 16 is code 3, “Bachelor degree” Education 17+ is code 4, “Graduate”
Use Analyze / Descriptive Statistics / Crosstabs…
INFO 515 Lecture #9 32
Kappa Example Select “padeg” for Row(s), and “madeg”
for Column(s) For Statistics select Kappa The basic crosstab just shows the data
counts (next slide) Then we get the Kappa measure (slide
after next) As usual, check to make sure the result
is significant before going any further
INFO 515 Lecture #9 33
Kappa Example
Father's education * Mother's education Crosstabulation
Count
284 125 6 55 470
77 270 23 41 411
5 61 24 13 103
137 149 26 204 516
503 605 79 313 1500
LT High School
High School
Bachelor degree
Graduate
Father'seducation
Total
LT HighSchool High School
Bachelordegree Graduate
Mother's education
Total
INFO 515 Lecture #9 34
Kappa Example
Symmetric Measures
.325 .018 20.332 .000
1500
KappaMeasure of Agreement
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
INFO 515 Lecture #9 35
Kappa Example Here the significance is 0.000, very clearly
significant (< 0.050) This is confirmed by the approximate T of
over 20 - as before, this T is based on the null hypothesis
The actual value of kappa and its standard error are 0.325 +/- 0.018
What does this mean?
INFO 515 Lecture #9 36
Kappa Kappa is judged on a fairly fixed scale
Kappa below 0.40 indicates poor agreement beyond chance
Kappa from 0.40 to 0.75 is fair to good agreement
Kappa above 0.75 is strong agreement So in this case we are confident there
is poor agreement between parents’ education
Scale from J.L. Fleiss, 1981
INFO 515 Lecture #9 37
Ordinal Crosstab Measures Several association measures can be used
for a table with R rows and C columns which contain ordinal data (and presumably R ≠ C) Kendall’s tau-b Kendall’s tau-c (Goodman and Kruskal’s) Gamma (preferred) Somers’ d Spearman’s Correlation Coefficient
INFO 515 Lecture #9 38
General RxC Table Measures Many are based on comparing adjacent
pairs of data from the two variables If B increases when A increases, the pair
is concordant If B decreases when A increases, the pair
is discordant If A and B are equal, the pair is tied
INFO 515 Lecture #9 39
General RxC Table Measures The number of concordant pairs is “P” The number of discordant pairs is “Q” The number of ties on X are “Tx” The number of ties on Y are “Ty” The smaller of the number of rows R and
columns C is called “m” m = min(R,C)
Given this vocabulary, we can define many measures
INFO 515 Lecture #9 40
General RxC Table Measures Kendall’s tau-b is
tau-b = (P-Q) / sqrt[ (P+Q+Tx)*(P+Q+Ty) ] Kendall’s tau-c is
tau-c = 2m*(P-Q) / [N2*(m-1)] Gamma () is
Gamma = (P-Q) / (P+Q) Somers’ d is
dy = (P-Q) / (P+Q+Ty) or dx = (P-Q) / (P+Q+Tx)
INFO 515 Lecture #9 41
General RxC Table Measures All of the RxC measures are symmetric
except Somers’ d, which has both symmetric and directional values given
All are evaluated by their significance, which also has an approximate T score
All are expressed by a Value +/- its Std Error
INFO 515 Lecture #9 42
RxC Measures Example Use “GSS91 social.sav” data set Use Analyze / Descriptive Statistics /
Crosstabs… Select “paeduc” for Row(s), and “maeduc”
for Column(s) Under “Statistics…” select Eta,
Correlations, Gamma, Somers’ d, Kendall’s tau-b and tau-c
INFO 515 Lecture #9 43
RxC Measures Example This compares the number of years of
education of one’s mother and father to see how strongly they affect one another
The crosstab data table is very large, since it ranges from 0 to 20 for each category, with irregular gaps (we’re not using the simplified categories from the Kappa example) Hence we’re not showing it here!
INFO 515 Lecture #9 44
RxC Measures Example
Both measures show the mother’s education is a slightly better predictor
Directional Measures
.549 .019 26.767 .000
.568 .019 26.767 .000
.531 .019 26.767 .000
.692
.688
Symmetric
HIGHEST YEARSCHOOL COMPLETED,FATHER Dependent
HIGHEST YEARSCHOOL COMPLETED,MOTHER Dependent
HIGHEST YEARSCHOOL COMPLETED,FATHER Dependent
HIGHEST YEARSCHOOL COMPLETED,MOTHER Dependent
Somers' dOrdinal by Ordinal
EtaNominal by Interval
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
INFO 515 Lecture #9 45
RxC Measures Example Directional measures:
Somers’ d is significant It shows that there are about 55% +/- 2% more
concordant pairs than discordant ones, excluding ties on the independent variable
The Eta measure shows that around 69% of the variability of one parent’s education is shared with the other’s
INFO 515 Lecture #9 46
RxC Measures ExampleSymmetric Measures
.549 .019 26.767 .000
.486 .018 26.767 .000
.637 .020 26.767 .000
.663 .021 27.381 .000c
.675 .020 28.314 .000c
959
Kendall's tau-b
Kendall's tau-c
Gamma
Spearman Correlation
Ordinal byOrdinal
Pearson's RInterval by Interval
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
INFO 515 Lecture #9 47
RxC Measures Example All of the symmetric measures are
statistically significant, with approximate t values around 27-28 The Kendall tau-b and tau-c measures disagree
a little on the magnitude of the agreement Gamma and Spearman give fairly strong
positive correlations
INFO 515 Lecture #9 48
RxC Measures Example Spearman, like ‘r’, ranges from -1 to +1, and
does not require a normal distribution Based on ordered categories, not their values
Even ‘r’ can be calculated for this case, and it gives results similar to Gamma and Spearman
INFO 515 Lecture #9 49
Yule’s Q A special case of gamma for a 2x2 table is
called Yule’s Q It is appropriate for ordinal data in 2x2
tables; so values for each variable are Low/High, Yes/No, or similar
Define Yule’s Q = (a*d – b*c) / (a*d + b*c) See PDF page 59 of Action Research handout
for the definition of a, b, c, and d (cell labels)
INFO 515 Lecture #9 50
Yule’s Q Measures the strength and direction of
association from -1 (perfect negative association) to 0 (no association) to +1 (perfect positive association)
Judge the results for Yule’s Q by the table on page 59 of Action Research handout ; and see pages 58-64 for other related discussion