Contingency Tables For Tests of Independence

21
Contingency Tables Contingency Tables For For Tests of Tests of Independence Independence

description

Contingency Tables For Tests of Independence. Multinomials Over Various Categories. Thus far the situation where there are multiple outcomes for the qualitative variable without regard to anything else has been discussed. - PowerPoint PPT Presentation

Transcript of Contingency Tables For Tests of Independence

Page 1: Contingency Tables For Tests of Independence

Contingency TablesContingency TablesForFor

Tests of IndependenceTests of Independence

Page 2: Contingency Tables For Tests of Independence

Multinomials Over Various Categories

• Thus far the situation where there are multiple outcomes for the qualitative variable without regard to anything else has been discussed.

• Now we discuss whether or not two qualitative variables are related, i.e are they independent?

Page 3: Contingency Tables For Tests of Independence

EXAMPLES

(1) Can it be concluded that cola preference and gender are dependent?

(2) Can it be concluded that cola preference and age are dependent?

Page 4: Contingency Tables For Tests of Independence

RULE OF 5

2 (Chi-squared) is actually only an approximate distribution for the test statistic.

• To be a “valid” approximation:

ALL ei’s should be 5

• If the rule of 5 is violated, combine some categories so that the condition is met.

Page 5: Contingency Tables For Tests of Independence

COLA PREFERENCE VS. GENDER• The 1000 cola drinkers were further classified as to whether they

were male or female. COLA MALE FEMALE COLA MALE FEMALE ROW TOTALROW TOTAL

Coke Coke 240 170 rr11 = 410 = 410 PepsiPepsi 200 150 rr22 = 350 = 350 RCRC 50 30 rr33 = 80 = 80

ShastaShasta 35 15 rr44 = 50 = 50 JoltJolt 75 35 rr55 = 110 = 110

COLUMNCOLUMNTOTALTOTAL cc11 = 600 c = 600 c22 = 400 n = 1000 = 400 n = 1000

Page 6: Contingency Tables For Tests of Independence

HYPOTHESIS TEST:Can we Conclude Cola Preference

and Gender Are Dependent?H0: (NO) Cola preference and gender are independentHA: (YES) Cola preference and gender are dependent = .05Reject H0 if 2 > 2

.05,DF – The correct DF = (r-1)(c-1) = (5-1)(2-1) = (4)(1) = 4 where r = # rows and c = # columns

Reject H0 if 2 > 2.05,4 = 9.48773

Page 7: Contingency Tables For Tests of Independence

HOW DO WE GET THE eij’s?Let P(A) = Probability a respondent favors CokeLet P(B) = Probability a respondent is a male

If H0 is true: The classifications are independent Thus P(A and B) = P(A)P(B) Best guess for P(A) 410/1000 =.41 Best guess for P(B) 600/1000 = .6 Thus P(A and B) (.41)(.6) = .246 Expected Expected numbernumber (Coke and male) (Coke and male) =ee1111= 1000(.246) = 246246

This can be gotten by rThis can be gotten by r11cc11/n = (410)(600)/1000 =246/n = (410)(600)/1000 =246

Page 8: Contingency Tables For Tests of Independence

CONTIGENCY TABLES• Contingency tables are a convenient way of

expressing the results when there are two classifications– It is the equivalent of a multinomial table for

two classifications

• We put the eij’s in parentheses under (or next to) the fij’s in the table; then we calculate:

e

)e - (f

ij

2ijij2

Page 9: Contingency Tables For Tests of Independence

eij’s for Cola vs. Gender• Coke/Male e11 = (410)(600)/1000 = 246• Coke/Female e12 = (410)(400)/1000 = 164• Pepsi/Male e21 = (350)(600)/1000 = 210• Pepsi/Female e22 = (350)(400)/1000 = 140• RC/Male e31 = ( 80)(600)/1000 = 48• RC/Female e32 = ( 80)(400)/1000 = 32• Shasta/Male e41 = ( 50)(600)/1000 = 30• Shasta/Female e42 = ( 50)(400)/1000 = 20• Jolt/Male e51 = (110)(600)/1000 = 66• Jolt/Female e52 = (110)(400)/1000 = 44

Page 10: Contingency Tables For Tests of Independence

Notes on Calculating e’s• The column totals may be set in advance or may be

random based on the survey.

• These eij’s were all whole numbers -- if they are not DO NOT ROUND TO WHOLE NUMBERS.

• All these e’s 5 but suppose e52 were actually = 3– We might combine the results from Shasta and Jolt colas.– This would reduce the number of rows and hence the

degrees of freedom.– ee5252 is not less than 5 here, so we do not have to do this. is not less than 5 here, so we do not have to do this.

Page 11: Contingency Tables For Tests of Independence

CONTINGENCY TABLE FORCOLA vs. GENDER

MenMen WomenWomen TotalTotalCokeCoke 240 170 410410

(246) (164)

PepsiPepsi 200 150 350350(210) (140)

RCRC 50 30 8080( 48) ( 32)

ShastaShasta 35 15 5050( 30) ( 20)

JoltJolt 75 35 110110( 66) ( 44)

TotalTotal 600 600 400 400 1000 1000

Page 12: Contingency Tables For Tests of Independence

2 for Cola vs. Gender

2 = (240-246)2/246 + (170-164)2/164 + (200-210)2/210 + (150-140)2/140 + ( 50 - 48)2/ 48 + ( 30- 32)2/ 32 + ( 35 - 30)2/ 30 + ( 15- 20)2/ 20 + ( 75- 66)2/ 66 + ( 35- 44)2/ 44 = 6.92

• 2 = 6.92 < 2.05,4

= 9.48773• There is not enough evidence to conclude gender There is not enough evidence to conclude gender

and cola preference are dependent.and cola preference are dependent.

Page 13: Contingency Tables For Tests of Independence

COLA PREFERENCE vs. AGE

• Survey results: <20 20-40<20 20-40 40-60 40-60 >60 TOTAL>60 TOTAL

Coke Coke 155 140 75 40 410410PepsiPepsi 155 95 75 25 350 350RCRC 30 20 15 15 8080Shasta Shasta 20 15 10 5 5050JoltJolt 40 30 25 15 110 110TOTAL 400TOTAL 400 300300 200 200 100100 1000 1000

Page 14: Contingency Tables For Tests of Independence

HYPOTHESIS TEST

• There are r = 5 rows and c = 4 columns

H0: (NO) Cola preference and age are independentH1: (YES) Cola preference and age are dependent = .05Reject H0 if 2 > 2

.05,DF – DF = (r-1)(c-1) = (5-1)(4-1) = (4)(3) = 12

Reject H0 if 2 > 2.05,12 = 21.0261

Page 15: Contingency Tables For Tests of Independence

Sample eij’s

• ee3434 =(Row 3 Total)(Column 4 Total)/(Grand Total) = (80) (100) / 1000 = 88

• ee4141 =(Row 4 Total)(Column 1 Total)/(Grand Total) = (50) (400) / 1000 = 2020

Page 16: Contingency Tables For Tests of Independence

CONTINGENCY TABLE FORCOLA vs. AGE

<20<20 20-40 20-40 40-60 >60 40-60 >60 TotalTotalCokeCoke 155 140 75 40 410410

(164) (123) (82) (41)

PepsiPepsi 155 95 75 25 350350(140) (105) (70) (35)

RCRC 30 20 15 15 8080( 32) ( 24) (16) ( 8)

ShastaShasta 20 15 10 5 5050( 20) ( 15) (10) ( 5)

JoltJolt 40 30 25 15 110110( 44) ( 33) (22) (11)

TotalTotal 400 400 300 300 200 200 100 100 1000 1000

Page 17: Contingency Tables For Tests of Independence

2 for Cola vs. Age

2 = (155-164)2/164 + (140-123)2/123 + (75-82)2/82 + (40-41)2/41 + … + ( 40 - 44)2/ 44 + ( 30- 33)2/ 33 + ( 25- 22)2/ 22 + ( 15- 11)2/ 11 =

18.72

• 2 = 18.72 < 2.05,12

= 21.0261• There is not enough evidence to conclude There is not enough evidence to conclude

cola preference and age are dependent.cola preference and age are dependent.

Page 18: Contingency Tables For Tests of Independence

Excel

• CHITEST gives the p-value for the test=CHITEST(Observed Values, Expected Values)

• Must first calculate the expected values, eij’s• See next slide for easy way to calculate these

values.

Page 19: Contingency Tables For Tests of Independence

=SUM(B4:C4)Drag to D5:D8

=$D4*B$9/$D$9Drag to C13

Then drag B13:C13 to B17:C17

=CHITEST(B4:C8,B13:C17)

=SUM(B4:B8)Drag to C9:D9

Page 20: Contingency Tables For Tests of Independence

=SUM(B4:E4)Drag to F5:F8

=SUM(B4:B8)Drag to C9:D9

=$F4*B$9/$F$9Drag to E13

Then drag B13:E13 to B17:E17

=CHITEST(B4:E8,B13:E17)

Page 21: Contingency Tables For Tests of Independence

Review• Contingency tables allow for comparisons to

determine if two different categories are independent• Excel -- CHITEST is used to generate the p-values

for the chi-squared test• Expected Values =

(Row Total)(Column Total)/n• By hand -- total degrees of freedom = (r-1)(c-1) and the 2 statistic is calculated by:

nscombinatio (ij) allover summed e

)e - (f

ij

2ijij2