Contingency Tables

Contingency Tables

• Chapters Seven, Sixteen, and Eighteen

• Chapter Seven– Definition of Contingency Tables– Basic Statistics– SPSS program (Crosstabulation)

• Chapter Sixteen – Basic Probability Theory Concepts– Test of Hypothesis of Independence

Basic Empirical Situation

• Unit of data.

• Two nominal scales measured for each unit. – Example: interview study, sex of respondent,

variable such as whether or not subject has a cellular telephone.

– Objective is to compare males and females with respect to what fraction have cellular telephones.

Contingency Table

• One column for each value of the column variable; C is the number of columns.

• One row for each value of the row variable; R is the number of rows.

• R x C contingency table.

Contingency Table

• Each entry is the OBSERVED COUNT O(i,j) of the number of units having the (i,j) contingency.

• Column of marginal totals.

• Row of marginal totals.

Basic Hypothesis

• ASSUME column variable is the independent variable.

• Hypothesis is independence.

• That is, the conditional distribution in any column is the same as the conditional distribution in any other column.

Expected Count

• Basic idea is proportional allocation of observations in a column based on column total.

• Expected count in (i, j ) contingency = E(i,j)= total number in column j *total number in row i/total number in table.

• Expected count need not be an integer; one expected count for each contingency.

Residual

• Residual in (i,j) contingency = observed count in (i,j) contingency - expected count in (i,j) contingency.

• That is, R(i,j)= O(i,j)-E(i,j)

• One residual for each contingency.

Pearson Chi-squared Component

• Chi-squared component for (i, j) contingency =C(i,j)= (Residual in (i, j) contingency)2/expected count in (i, j) contingency.

• C(i,j)=(R(i,j))2 / E(i,j)

Assessing Pearson Component

• Rough guides on whether the (i, j) contingency has an excessively large chi-squared component C(i,j):– the observed significance level of 3.84 is about

0.05.– Of 6.63 is about 0.01.– Of 10.83 is 0.001.

Pearson Chi-Squared Test

• Sum C(i,j) over all contingencies.

• Pearson chi-squared test has (R-1)(C-1) degrees of freedom.

• Under null hypothesis– Expected value of chi-square equals its degrees

of freedom.– Variance is twice its degrees of freedom

Marijuana Use at Time 4 by Marijuana Use at Time 3

Use attime 4

No use attime 3

Used attime 3

Total

No use attime 4

120 9 129

Used attime 4

95 142 237

Total 215 151 366

Contingency Tables

• Chapter Eighteen– Measures of Association– For nominal variables– For ordinal variables

Measures of Association

• Measures strength of an association– usually, a dimensionless number between 0 and 1 in

absolute value.

– Values near 0 indicate no association, near 1 mean strong association.

• Correlation coefficient is a measure of association

• Chi-square test is not– depends on the number of observations.

Measures of Association for Nominal Scale Variables

• Chi-square based– Phi coefficient– Coefficient of contingency– Cramer’s V

• Proportional reduction in error– Lambda, symmetric– Lambda, not symmetric

Chi-squared Measure: Phi Coefficient

• Definition of the Phi Coefficient

N

2

Phi Coefficient

• Can be greater than one.

• N is the total number of the table.

• For marijuana at time 3 and 4 data, phi coefficient is (96.595/366)0.5=0.51.

Coefficient of Contingency

• Definition of coefficient of contingency

NC

2

2

Coefficient of Contingency

• Can never get as large as one.

• Largest value depends on number in table.

• For example given, c=0.46.

Cramér’s V

• Definition of statistic; k is smaller of number of rows and columns.

)1(

2

kNV

Interpretation of Chi-squared measures of association

• An approximate observed level of significance is given for each measure.

• Use this in the usual way.

Proportional Reduction in Error (PRE) Measures

• Prediction is the modal category.

• Predict overall– Predict used marijuana at time 4; correct for

237 and wrong for 129.

• Number of misclassified is 129.


• Predict for each condition of the independent variable.– Predict not use at time 4 for those not using at

time 3• correct 120 of 215 times• misclassify 95 times

– Predict use at time 4 for those using at time• correct 142 of 151 times• misclassify 9 times.


• Using only totals, number of misclassified is 129.

• Using marijuana at time 3, number misclassified is 104.

• The lambda measure is λ= (129-104) /129=0.19

Lambda PRE Measures

• There is a lambda measure using marijuana use at time 4 as the independent variable.– Total: predict no usage at time 3: 151 errors.– Conditional

• no usage at Time 4: predict none at 3 with 9 errors

• usage at time 4: predict use at 3 with 95 errors

• 104 total errors.

– Lambda measure is (151-104)/151=0.31

Lambda PRE Measures

• There is a symmetric lambda measure.

• [(129-104)+(151-104)]/(129+151)=0.26

Text Example Data Set

Subject Life Degree

Case 1 1 2

Case 2 2 3

Case 3 3 2

Comparing Pairs of Cases

• Concordant pair of cases: sign of difference on variable 1 is the same as the sign of the difference on variable 2. – Case 1 and Case 2: concordant.– Case 2 and Case 3: discordant– Case 1 and Case 3: tied

• Let P be number of concordant pairs and Q be the number of discordant pairs.

Measures Based on Concordant and Discordant Pairs

• Goodman and Kruskal’s Gamma– (P-Q)/(P+Q)

• Kendall’s Tau-b

• Kendall’s Tau-c

• Somers’ d

Choosing a measure

• Choose a measure “interpretable for the purpose in hand”!

• Avoid data dredging (taking the measure that is largest for the data set that you have).

Other measures

• Correlation based– Pearson’s correlation – Spearman correlation: replace values by ranks.

• Measures of agreement– Cohen’s kappa.

Summary

• Contingency table methods crucial to the analysis of market research and social science data.

• Hypothesis of independence

• Measures of association describe the strength of the dependence between two variables.

Contingency Tables

Documents

Transcript of Contingency Tables