Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way ......
Transcript of Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way ......
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
STATWAY™ STUDENT HANDOUT
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
STUDENT NAME DATE
INTRODUCTION
A survey is conducted of 1,000 randomly selected moviegoers who just saw a new, highly anticipated
science fiction/action film. In the survey, the moviegoers were asked if they liked the film (yes or no) and if
they considered themselves very knowledgeable about science fiction, moderately knowledgeable about
science fiction, or having little to no knowledge about science fiction. It is important for the company that
made the movie to know if the movie was more popular among certain groups. This knowledge might affect
the movie’s future advertising strategy and future marketing campaign.
The company is interested in determining if the moviegoer’s opinion of the film is dependent on the
moviegoer’s self reported science fiction knowledge level. Recall from Module 6, that two variables are not
independent if the presence of one variable influences the presence of the other.
TRY THESE
1 A table such as the one below might appear in a report summarizing the results of the survey.
Did you like the film?
Yes No TOTAL
Very Knowledgeable 400
Moderately Knowledgeable 250
Having Little to No Knowledge 350
TOTAL 300 700 1000
Response
Self-reported Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 2
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
In the table, notice that a proportion of 0.30 of the entire group liked the film (300/1,000).
Assuming that the variables Self-reported Science Fiction Knowledge Level and Response are
independent, answer the following questions.
A How many of the 400 Very Knowledgeable moviegoers would you expect to answer Yes, and
how many would you expect to answer No? Fill in the appropriate two cells with your answers.
B How many of the 250 Moderately Knowledgeable moviegoers would you expect to answer,
“Yes,” and how many would you expect to answer “No”? Fill in the appropriate two cells with your answers.
C How many of the 350 “Having Little to No Knowledge” moviegoers would you expect to
answer Yes, and how many would you expect to answer No? Fill in the appropriate cells with your answers.
D Verify that the values you just added to the table support (add up to) the appropriate row
and column totals shown in the margins of the table.
Based on sampling variability, you know that it would be unusual to obtain a different sample with the exact
same counts as the table in Question 1 even if the row and columns variables were truly independent for the
entire population.
STATWAY STUDENT HANDOUT | 3
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
2 For another sample of 1,000 individuals fill in the following table below in such a way that the data
would not lead you to challenge the claim that the variables Response and Self-reported Science
Fiction Knowledge Level are independent. Make sure your entries add up to the correct row and
column totals.
Did you like the film?
3 For another sample of 1,000 individuals, fill in the following table in such a way that the data
would provide strong evidence against the claim that the variables Response and Self-reported
Science Fiction Knowledge Level are independent. Make sure your entries add up to the correct
row and column totals.
Did you like the film?
Yes No TOTAL
Very Knowledgeable 400
Moderately Knowledgeable 250
Having Little to No Knowledge 350
TOTAL 300 700 1000
Response
Self-reported Science Fiction
Knowledge Level
Yes No TOTAL
Very Knowledgeable 400
Moderately Knowledgeable 250
Having Little to No Knowledge 350
TOTAL 300 700 1000
Response
Self-reported Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 4
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
4 Below are the distributions of three hypothetical random samples of size 1,000. For each sample,
examine the proportions of Yes responses for each category of Self-reported Science Fiction
Knowledge Level and determine whether you think the sample provides strong evidence,
moderate evidence, or weak evidence against the claim that the variables Response and Self-
reported Science Fiction Knowledge Level are independent. Then list the characteristics of the
sample that led you to your decision.
Hypothetical Sample 1: “Did you like the film?”
It appears that Sample 1 provides (circle one)
strong moderate weak
evidence against the claim that the variables Response and Self-reported Science Fiction
Knowledge Level are independent because…
Hypothetical Sample 2: “Did you like the film?”
It appears that Sample2 provides (circle one)
strong moderate weak
evidence against the claim that the variables Response and Self-reported Science Fiction
Knowledge Level are independent because…
Yes No TOTAL
Very Knowledgeable 300 100 400
Moderately Knowledgeable 200 50 250
Having Little to No Knowledge 100 250 350
TOTAL 600 400 1000
Response
Self-reported Science Fiction
Knowledge Level
Yes No TOTAL
Very Knowledgeable 102 98 200
Moderately Knowledgeable 140 170 310
Having Little to No Knowledge 220 270 490
TOTAL 462 538 1000
Response
Self-reported Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 5
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
Hypothetical Sample 3: “Did you like the film?”
It appears that Sample3 provides (circle one)
strong moderate weak
evidence against the claim that the variables Response and Self-reported Science Fiction
Knowledge Level are independent because…
5 For each hypothetical sample, develop a graphical display that visually compares the proportions
of Yes responses for each category of Self-reported Science Fiction Knowledge Level. For each
sample’s graph, state the most notable feature of the graph. Does the information shown in each
graph correspond in any way to your initial “strength of evidence” statements that you made
above in Question 4?
Sample 1 graphical display:
Most notable feature:
Yes No TOTAL
Very Knowledgeable 180 20 200
Moderately Knowledgeable 190 120 310
Having Little to No Knowledge 80 410 490
TOTAL 450 550 1000
Response
Self-reported Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 6
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
Sample 2 graphical display:
Most notable feature:
Sample 3 graphical display:
Most notable feature:
6 Based on your work in the previous questions, which sample gives the strongest evidence against
the claim that the variables Response and Self-reported Science Fiction Knowledge Level are
independent? What characteristics of the sample were most important in your decision?
7 It would be useful to have a statistical measure of deviation to determine how much the
distribution of a sample (such as those shown previously) deviates from what is expected. Create a
method (or a statistic) to measure which sample deviates the most from its ideal expected
distribution. How is your method similar to methods discussed in previous lessons? How is it
different?
STATWAY STUDENT HANDOUT | 7
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
NEXT STEPS Quantifying the Strength of the Evidence Revisiting Independence in the Context of Expected Counts In Question 1, you filled in a table under the assumption that the variables Self-reported Science Fiction
Knowledge Level and Response were independent. To fill in the count for the number of Very
Knowledgeable moviegoers who said Yes, it was important to account for the fact that that 0.30 of the
entire survey said Yes and that 400 individuals (i.e., 0.40 of the entire survey) responded as being Very
Knowledgeable.
Since 0.30 of the entire survey said Yes, under the assumption of independence, the conditional
frequency distribution for each category of Self-reported Science Fiction Knowledge Level was 0.30 Yes
and 0.70 No.
This means that the number of Yes responses for each category of Self-reported Science Fiction
Knowledge Level was:
0.30 of the Very Knowledgeable group = 0.30 × 400 = 120
0.30 of the Moderately Knowledgeable group = 0.30 × 250 = 75
0.30 of the Having Little to No Knowledge group = 0.30 × 350 = 105
The expected count of 120 for the number of moviegoers who were Very Knowledgeable and said Yes is
directly related to three pieces of information: 0.40 of the entire survey responded as being Very
Knowledgeable, 0.30 of the entire survey said Yes, and 1,000 people were surveyed.
Said another way, the expected count for Very Knowledgeable and Yes equals the proportion of those
surveyed who said Very Knowledgeable × proportion of those surveyed who said Yes × total number
surveyed = 0.40 ×0.30 × 1,000 = 120.
Generally speaking, for a two-way table, under the assumption of independence, any expected count for a
cell that represents the combination of a row variable category with a column variable category can be
computed based on the cell’s corresponding row proportion, corresponding column proportion, and the
total number of observations in the table (called the grand total).
STATWAY STUDENT HANDOUT | 8
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
As a formula this is expressed as follows.
Through algebra, there is another form of the above formula that is based on raw counts from the table:
Each proportion is equal to the total for each category divided by the grand total.
Simplifying the above expression yields the final form.
8 For the table in Question 1, you filled in the six counts under the assumption that
the variables Self-reported Science Fiction Knowledge Level and Response were independent.
Thus, the values you filled in should have been the appropriate expected counts for each
combination of a row variable category and a column variable category if the claim of
independence between the row and column variables were true for the population that the 1,000
randomly selected moviegoers represent. Verify that the counts you developed based on the
respective row and column totals follow the formula above (or change your expected counts as
needed to adhere to the formula and to the row and column totals in the table).
Yes No TOTAL
Very Knowledgeable 400 * 300/1000
= 120
400 * 700/1000
= 280400
Moderately Knowledgeable 250 * 300/1000
= 75
250 * 700/1000
= 175250
Having Little to No Knowledge350 * 300/1000
= 105
350 * 700/1000
= 245350
TOTAL 300 700 1000
Response
Self-reported
Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 9
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
Computing a Chi-Square Statistic For two-way tables, you can compute a chi-square statistic that is useful in assessing the strength of your
evidence against the claim of independence. The mechanics of computing the chi-square statistic for a two-
way table are very similar to the methods introduced in a previous lesson for developing the chi-square
statistic for a one-way table. The differences are as follows:
Expected counts are computed for each combination of a row variable category with a column
variable category based on the corresponding row totals and column totals. (Note: The Total
row and Total column presented in a table do not represent a category of the variable of
interest. Any presentation of a row or column total in the table is not considered as a category
of a given variable.)
Computations comparing observed counts with expected counts are performed for each
combination of a row variable category with a column variable category.
The expected count for a combination of a row variable category with a column variable
category is computed as shown previously:
As before, when calculating expected counts, it is okay if the expected counts are non integer values and you
should generally not round expected counts to whole numbers. Also note that the sum of your expected
counts for a given row should equal that row’s total in your sample and the sum of your expected counts for
a given column should equal that column’s total in your sample.
The expected counts for Hypothetical Sample 1 would be as follows:
Did you like the film?
Yes No TOTAL
Very Knowledgeable 400 * 600/1000
= 240
400 * 400/1000
= 160400
Moderately Knowledgeable 250 * 600/1000
= 150
250 * 400/1000
= 100250
Having Little to No Knowledge350 * 600/1000
= 210
350 * 400/1000
= 140350
TOTAL 600 400 1000
Response
Self-reported
Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 10
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
Once you have computed expected counts for each row and column variable combination, the steps for
computing a chi-square value for a two-way table are as follows:
(1) For each combination of row variable category and column variable category, compute the
difference between the actual count for that combination (obtained from the sample) and the expected count for that combination:
(Observed Count – Expected Count)
(2) For each combination of row variable category and column variable category, compute the
square of the difference obtained in Step 1:
(Observed Count – Expected Count)2
(3) For each combination of row variable category and column variable category, divide the
squared difference obtained in Step 2 by the expected count for the combination:
(Observed Count – Expected Count)2/Expected Count (4) Add up the Step 3 calculation results from each combination; this will be the chi-square
value.
Example: Hypothetical Sample 1
Step 4: 15 + 22.5 + 16.667 + 25 + 57.619 + 86.429 = 223.215
"Yes"
AND "V
ery K
nowle
dgeable
"
"No" A
ND "Very
Know
ledge
able"
"Yes"
AND "M
oderate
ly K
nowle
dgeable
"
"No" A
ND "Modera
tely
Know
ledge
able
"
"Yes"
AND "L
ittle
to N
o Know
ledge
"
"No" A
ND "Litt
le to
No K
nowle
dge"
Observed Count (from sample) 300 100 200 50 100 250
Expected Count (based on claim of independence) 240 160 150 100 210 140
Step 1: Observed Count - Expected Count 60 -60 50 -50 -110 110
Step 2: (Observed Count - Expected Count)23600 3600 2500 2500 12100 12100
Step 3: (Observed Count - Expected Count)2/Expected Count 15 22.5 16.667 25 57.619 86.429
STATWAY STUDENT HANDOUT | 11
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
For Hypothetical Sample 1, the chi-square value generated is 223.215.
9 Compute the chi-square values for the other two hypothetical samples by filling in the following
tables. Compute the Step 3 calculations to three decimal places as shown in the example above.
Hypothetical Sample 2
Step 4: 0.997 + + + 0.062 + + 0.154 = 2.322
For Hypothetical Sample 2, the chi-square value generated is 2.322.
Hypothetical Sample 3
Step 4: 90 + + + 14.957 + 89.525 + 73.248 =
"Yes"
AND "V
ery K
nowle
dgeable
"
"No" A
ND "Very
Know
ledge
able"
"Yes"
AND "M
oderate
ly K
nowle
dgeable
"
"No" A
ND "Modera
tely
Know
ledge
able
"
"Yes"
AND "L
ittle
to N
o Know
ledge
"
"No" A
ND "Litt
le to
No K
nowle
dge"
Observed Count (from sample) 102 98 140 170 220 270
Expected Count (based on claim of independence) 92.4 107.6 143.22 166.78 226.38 263.62
Step 1: Observed Count - Expected Count 9.6 -9.6 -3.22 3.22 -6.38 6.38
Step 2: (Observed Count - Expected Count)292.16 92.16 10.3684 10.3684 40.7044 40.7044
Step 3: (Observed Count - Expected Count)2/Expected Count 0.997 0.857 0.072 0.062 0.180 0.154
"Yes"
AND "V
ery K
nowle
dgeable
"
"No" A
ND "Very
Know
ledge
able"
"Yes"
AND "M
oderate
ly K
nowle
dgeable
"
"No" A
ND "Modera
tely
Know
ledge
able
"
"Yes"
AND "L
ittle
to N
o Know
ledge
"
"No" A
ND "Litt
le to
No K
nowle
dge"
Observed Count (from sample) 180 20 190 120 80 410
Expected Count (based on claim of independence) 90 110 139.5 170.5 220.5 269.5
Step 1: Observed Count - Expected Count 90 -90 50.5 -50.5 -140.5 140.5
Step 2: (Observed Count - Expected Count)28100 8100 2550.25 2550.25 19740.25 19740.25
Step 3: (Observed Count - Expected Count)2/Expected Count 90 73.6 18.281 14.957 89.525 73.248
STATWAY STUDENT HANDOUT | 12
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
For Hypothetical Sample 3, the chi-square value generated is __________.
10 Which of the hypothetical samples had the highest chi-square value? Does the size of the chi-
square values generated by these three samples correspond in any way to the initial strength of
evidence statements that you made in Question 4 or the conjectures you made in Question 6?
STATWAY STUDENT HANDOUT | 13
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
TAKE IT HOME 1 Based on your work today, which seems to provide greater evidence against the claim of
independence: a high chi-square or a low chi-square value?
2 A fourth hypothetical sample is as follows:
Did you like the film?
A Develop a graphical display that visually compares the proportions of Yes responses for each category of Self-reported Science Fiction Knowledge Level.
B Compute the chi-square value and determine if this value provides strong evidence against
the claim that the variables Question Response and Self-reported Science Fiction Knowledge Level are independent.
(Note: For reasons that will be explained in a future lesson, in a two-way table case such as this where one
categorical variable contains three categories and the other categorical variable contains two categories,
consider a chi-square value of 5.99 or greater to be statistically significant evidence against the claim that
the two variables are independent.)
Yes No TOTAL
Very Knowledgeable 260 140 400
Moderately Knowledgeable 150 100 250
Having Little to No Knowledge 190 160 350
TOTAL 600 400 1000
Response
Self-reported Science Fiction
Knowledge Level
STATWAY STUDENT HANDOUT | 14
Lesson 11.2.1 Introduction to Chi-Square Tests for Two-Way Tables
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT
+++++ This lesson is part of STATWAY™, A Pathway Through College Statistics, which is a product of a Carnegie Networked Improvement Community that seeks to advance student success. Version 1.0, A Pathway Through Statistics, Statway™ was created by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This version 1.5 and all subsequent versions, result from the continuous improvement efforts of the Carnegie Networked Improvement Community. The network brings together community college faculty and staff, designers, researchers and developers. It is an open-resource research and development community that seeks to harvest the wisdom of its diverse participants in systematic and disciplined inquiries to improve developmental mathematics instruction. For more information on the Statway Networked Improvement Community, please visit carnegiefoundation.org. For the most recent version of instructional materials, visit Statway.org/kernel.
+++++ STATWAY™ and the Carnegie Foundation logo are trademarks of the Carnegie Foundation for the Advancement of Teaching. A Pathway Through College Statistics may be used as provided in the CC BY license, but neither the Statway trademark nor the Carnegie Foundation logo may be used without the prior written consent of the Carnegie Foundation.