Describing Association for Discrete Variables. Discrete variables can have one of two different...

58
Describing Association Describing Association for Discrete Variables for Discrete Variables

Transcript of Describing Association for Discrete Variables. Discrete variables can have one of two different...

Page 1: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Describing Association for Describing Association for Discrete VariablesDiscrete Variables

Page 2: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Discrete variables can have one of two different qualities:

1. ordered categories2. non-ordered categories

Page 3: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

1. Ordered categoriese.g., “High,” “Medium,” and

“Low”[both variables must be ordered]

2. Non-ordered categoriese.g., “Yes” and “No”

Page 4: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Relationships between two variables may be either

1. symmetrical or 2. asymmetrical

Page 5: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Symmetrical means that we are only interested in describing the extent to which two variables “hang around together” [non-directional]

Symbolically,

X Y

Page 6: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Asymmetrical means that we want a measure of association that yields a different description of X’s influence on Y from Y’s influence on X [directional]

Symbolically,

X Y

Y X

Page 7: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Ordered Categories Asymmetrical Relationship No Yes

Yule’s Q Cramer’s V

Gamma (G)

Lambda () Somers’ dyx

No

Yes

Page 8: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For symmetrical relationships between two non-orderedvariables, there are two choices:

1. Yule’s Q (for 2x2 tables)

2. Cramer’s V (for larger tables)

Page 9: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Respondents in the 1997 General Social Survey (GSS 1997) were asked:Were they strong supporters of any political party (yes or no)?; and, Did they vote in the 1996 presidential election (yes or no)?

Party Identification

Not Strong Strong Total

Voting Voted a b a + bTurnout

Not Voted c d c + d

Total a + c b + d a+b+c+d

Page 10: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

adbc

adbcQsYule

'

Page 11: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Party Identification

Not Strong Strong Total

Voting Voted 615 339 954

Turnout Not Voted 318 59 377

Total 933 398 1,331

Page 12: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

adbc

adbcQ

Q = [(339)(318) - (615)(59)] / [(339)(318) + (615)(59)]

= [(107,801) - (36,285)] / [(107,801) + (36,285)]

= (71,516) / (144,086)

= 0.496

Page 13: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

What does this mean?

Yule’s Q varies from 0.00 (statistical independence; no association) to

+ 1.00 (perfect direct association) and – 1.00 (perfect inverse association)

Page 14: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Use the following rule of thumb (for now):

0.00 to 0.24 "No relationship"0.25 to 0.49 "Weak relationship"0.50 to 0.74 "Moderate relationship"0.75 to 1.00 "Strong relationship"

Yule’s Q = + 0.496 ". . . represents a moderate positive association between party identification strength and voting turnout."

Page 15: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Party Identification

Not Strong Strong Total

Voting Voted 0 954 954

Turnout Not Voted 377 0 377

Total 377 954 1,331

Page 16: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

What would be the value of Yule's Q?

Q = [(954)(377) - (0)(0)] / [(954)(377) + (0)(0)]

= [(359,658) - (0)] / [(359,658) + (0)]

= (359,658) / (359,658)

= 1.000

Page 17: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Party Identification

Not Strong Strong Total

Voting Voted 477 477 954

Turnout Not Voted 189 188 377

Total 666 665 1,331

Page 18: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

In this case, Yule's Q would be:

Q = [(477)(189) - (477)(188)] / [(477)(189) + (477)(188)]

= [(90,153) - (89,676)] / [(90,153) + (89,676)]

= (477) / (179,829)

= 0.003

Page 19: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Obviously Yule's Q can only be calculated for 2 x 2 tables. For larger tables (e.g., 3 x 4 tables having three rows and four columns), most statistical programs such as SAS report the Cramer's V statistic. Cramer's V has properties similar to Yule's Q, but since it is computed from 2 it cannot take negative values:

)1()1min(

2

CorRNV

Where min(R – 1) or (C – 1) means either number of rows less one or number of columns less one, whichever is smaller, and N is sample size.

Page 20: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

In the example above, 2 = 50.968 and Cramer's V is

)1()331,1(

968.50V

0383.0

= 0.196

Page 21: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For asymmetrical relationships between two non-ordered variables, the statistic of choice is:

Lambda ()

Page 22: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Lambda is calculated as follows:

= [(Non-modal responses on Y) - (Sum of non-modal responses for each category of X)]

/ (Non-modal responses on Y)

Page 23: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Party Identification

Not Strong Strong Total

Voting Voted 615 339 954

Turnout Not Voted 318 59 377

Total 933 398 1,331

Page 24: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

In this example,

= [(377) - (318 + 59)] / (377)= [(377) - (377)] / (377)

= (0) / (377)= 0.00

Page 25: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For symmetrical relationships between two variables having ordered categories, the statistic of choice is:

Gamma (G)

Page 26: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

ds

ds

nn

nnG

where ns are concordant pairs and nd are discordant pairs

Page 27: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

The concepts of concordant and discordant pairs are simple and are based on a generalization of the diagonal and off-diagonal in the Yule’s Q statistic.

Page 28: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

adbc

adbcQ

Page 29: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

To construct concordant pairs: "Starting with the upper right cell (i.e., the first row, last column in the table), add together all frequencies in cells below AND to the left of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the left) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the concordant pairs."

Page 30: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

To illustrate, take the crosstabulation below which shows the relationship between a measure of social class and respondents' satisfaction with their current financial situation:

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 31: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 32: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 33: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 34: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 35: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For this table, the calculations are:

36 x (343 + 309 + 19 + 84 + 190 + 43) = 35,568 251 x (309 + 19 + 190 + 43) = 140,811 131 x (19 + 43) = 8,122 19 x (84 + 190 + 43) = 6,023 343 x (190 + 43) = 79,919 309 x (43) = 13,287

These are NOT the value of the concordant pairs; they are the values that must be added together to determine the value of concordant pairs.

ns = (35,568 + 140,811 + 8,122 + 6,023 + 79,919 + 13,287)

ns = 283,730

Page 36: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

To construct discordant pairs: "Starting with the upper left cell (i.e., the first row, first column in the table), add together all frequencies in cells below AND to the right of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the right) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the discordant pairs."

Page 37: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 38: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 39: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 40: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 41: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For the discordant pairs in this table, the calculations are:

10 x (309 + 343 + 19 + 190 + 84 + 7) = 9,520 131 x (343 + 19 + 84 + 7) = 59,343 251 x (19 + 7) = 6,526 19 x (190 + 84 + 7) = 5,339 309 x (84 + 7) = 28,119 343 x (7) = 2,401

Again, these are NOT the value of the disconcordant pairs; they are the values that must be added together to determine the value of disconcordant pairs.

nd = (9,520 + 59,343 + 6,526 + 5,339 + 28,119 + 2,401)

nd = 111,248

Page 42: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

G = [(283,730) - (111,248)] / [(283,730) + (111,248)]

= (172,482) / (394,978)

= 0.437

ds

ds

nn

nnG

Page 43: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For asymmetrical relationships between two variables having ordered categories, the statistic of choice is:

Somers’ dyx

Page 44: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

For this crosstabulation, we specify Social Class (the column variable) as the independent variable (X) and Financial Satisfaction (the row variable) as the dependent variable (Y).

Social Class (X)FinanciallySatisfied (Y) Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 45: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Somers' dyx statistic is created by adjusting concordant and

discordant pairs for tied pairs on the dependent variable (Y).

In the example we have been using example, the only asymmetrical relationship that makes sense is one with the dependent variable (Y) as the row variable. Therefore Somers' dyx

will be shown only for this situation, that is, for tied pairs on the row variable. (Tied pairs for the column variable follow the identical logic.)

A tied pair is all respondents who are identical with respect to categories of the dependent variable but who differ on the category of the independent variable to which they belong. In the case of financial satisfaction, it is all respondents who express the same satisfaction level but who identify themselves with different social classes. In other words, for ties for a dependent row variable it is all the observations in the other cells in the same row.

Page 46: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

The computational rule is: Target the upper left hand cell (in the first row, first column); multiply its value by the sum of the cell frequencies to right in the same row; move to the cell to the right and multiply its value by the sum of the cell frequencies to right in the same row; repeat until there are no more cells to the right in the same row; then move to the first cell in the next row (first column) and repeat until there are no more cells in the table having cells to the right. Add up these products.

Page 47: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 48: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 49: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 50: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Social ClassFinanciallySatisfied Lower Working Middle Upper Total

Very well 10 131 251 36 428

More or less 19 309 343 19 690

Not at all 43 190 84 7 324

Total 72 630 678 62 1,442

Page 51: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Here, the products are:

10 x (131 + 251 + 36) = 4,180131 x (251 + 36) = 37,597251 x (36) = 9,03619 x (309 + 343 + 19) = 12,749309 x (343 + 19) = 111,858343 x (19) = 6,51743 x (190 + 84 + 7) = 12,083190 x (84 + 7) = 17,29084 x (7) = 588

Thus, tied pairs (Tr) for rows equals

Tr = (4,180 + 37,597 + 9,036 + 12,749 + 111,858 + 6,517 + 12,083 + 17,290 + 588)

= 211,898

Page 52: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

rds

dsyx Tnn

nndSomers

'

In this example,

Somers' dyx = [(283,730) - (111,248)] / [(283,730) + (111,248) + (211,898)]

= (172,482) / (606,976)

= 0.284

Page 53: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Ordered Categories Asymmetrical Relationship No Yes

Yule’s Q Cramer’s V

Gamma (G)

Lambda () Somers’ dyx

No

Yes

Page 54: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Using SAS to Produce Two-Way FrequencyUsing SAS to Produce Two-Way Frequency Distributions and StatisticsDistributions and Statistics 

libname mystuff 'a:\';libname library 'a:\'; options formchar='|----|+|---+=|-/\< >*' ps=66 nodate nonumber; proc freq data=mystuff.marriage;tables church*married / expected all;title1 ‘Crosstabulation for Discrete Variables';run; 

Page 55: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Crosstabulation for Discrete Variables 

TABLE OF CHURCH BY MARRIED  CHURCH MARRIED  Frequency| Expected | Percent | Row Pct | Col Pct |Divorced|Married |Never |Separate|Widowed | Total ---------+--------+--------+--------+--------+--------+ Annually | 74 | 269 | 129 | 18 | 43 | 533 | 62.318 | 290.33 | 101.17 | 18.695 | 60.485 | | 5.09 | 18.50 | 8.87 | 1.24 | 2.96 | 36.66 | 13.88 | 50.47 | 24.20 | 3.38 | 8.07 | | 43.53 | 33.96 | 46.74 | 35.29 | 26.06 | ---------+--------+--------+--------+--------+--------+ Monthly | 30 | 149 | 50 | 10 | 26 | 265 | 30.983 | 144.35 | 50.303 | 9.295 | 30.072 | | 2.06 | 10.25 | 3.44 | 0.69 | 1.79 | 18.23 | 11.32 | 56.23 | 18.87 | 3.77 | 9.81 | | 17.65 | 18.81 | 18.12 | 19.61 | 15.76 | ---------+--------+--------+--------+--------+--------+ Never | 32 | 85 | 34 | 6 | 16 | 173 | 20.227 | 94.234 | 32.839 | 6.0681 | 19.632 | | 2.20 | 5.85 | 2.34 | 0.41 | 1.10 | 11.90 | 18.50 | 49.13 | 19.65 | 3.47 | 9.25 | | 18.82 | 10.73 | 12.32 | 11.76 | 9.70 | ---------+--------+--------+--------+--------+--------+ Weekly | 34 | 289 | 63 | 17 | 80 | 483 | 56.472 | 263.09 | 91.684 | 16.942 | 54.811 | | 2.34 | 19.88 | 4.33 | 1.17 | 5.50 | 33.22 | 7.04 | 59.83 | 13.04 | 3.52 | 16.56 | | 20.00 | 36.49 | 22.83 | 33.33 | 48.48 | ---------+--------+--------+--------+--------+--------+ Total 170 792 276 51 165 1454 11.69 54.47 18.98 3.51 11.35 100.00

Page 56: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Crosstabulation for Discrete Variables 

STATISTICS FOR TABLE OF CHURCH BY MARRIED  Statistic DF Value Prob ------------------------------------------------------ Chi-Square 12 57.792 0.000 Likelihood Ratio Chi-Square 12 57.806 0.000 Mantel-Haenszel Chi-Square 1 8.152 0.004 Phi Coefficient 0.199 Contingency Coefficient 0.196 Cramer's V 0.115   Statistic Value ASE ------------------------------------------------------ Gamma 0.052 0.033 Kendall's Tau-b 0.035 0.022 Stuart's Tau-c 0.031 0.020  Somers' D C|R 0.033 0.021 Somers' D R|C 0.037 0.024  Pearson Correlation 0.075 0.026 Spearman Correlation 0.041 0.026  Lambda Asymmetric C|R 0.000 0.000 Lambda Asymmetric R|C 0.062 0.027 Lambda Symmetric 0.036 0.016  Uncertainty Coefficient C|R 0.016 0.004 Uncertainty Coefficient R|C 0.015 0.004 Uncertainty Coefficient Symmetric 0.016 0.004  Sample Size = 1454 

Page 57: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Exercise

Compute values for Lambda (), Gamma (G) and Somers' dyx for the following two-way frequency distribution. Assume that the row variable, self-described health, is the dependent (Y) variable.  

Education Degree LevelSelf-DescribedHealth

Less than H.S. H.S. Jr.Co. Col. Grad.Sch. Total Excellent 69 227 20 82 37 435 Good 156 403 26 77 34 696 Fair 122 111 8 12 5 258 Poor 50 16 0 3 0 69 Total 397 757 54 174 76 1,458

Page 58: Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Answers

1. The modal responses on Y (self-described health) are 696. Therefore, the non-modal responses are 435 + 258 + 69 = 762. For each category of self-described health, the non-modal responses total 754. Therefore,  

Lambda = (762 - 754) / 762 = 0.010  2. Concordant pairs (ns) = 320,060 and discordant pairs (nd) = 130,272  Gamma = (320060 - 130272) / (320060 + 130272) = 189788 / 450332 = 0.421  3. Tied pairs (Tr) = 227,737 Therefore,  Somers' dyx = (320060 - 130272) / (320060 + 130272 + 227737)

= 189788 / 678069 = 0.280