AP Statistics Chi Square Tests Student Handout
Transcript of AP Statistics Chi Square Tests Student Handout
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
AP Statistics
Chi Square Tests
Student Handout
2017-2018 EDITION
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 1
1. The Behavioral Risk Factor Surveillance System is an ongoing health survey system that
tracks health conditions and risk behaviors in the United States. In one of their studies, a random
sample of 8,866 adults answered the question “Do you consume five or more servings of fruits
and vegetables per day?” The data are summarized by response and by age-group in the
frequency table below.
Age-Group (years) Yes No Total
18-34 231 741 972
35-54 669 2,242 2,911
55 or older 1,291 3,692 4,983
Total 2,191 6,675 8,866
Do the data provide convincing statistical evidence that there is an association between age-
group and whether or not a person consumes five or more servings of fruits and vegetables per
day for adults in the United States?
Directions: Below are the elements of a complete response for this question. But in each cell
there is an error. With a partner or small group, find the error and write in the correction needed
for an AP quality response.
Ho: Fruit and vegetable consumption is associated
with age group for the population of adults in
the United States.
Ha: Fruit and vegetable consumption is not
associated with age group for the population of
adults in the United States.
Chi Square Goodness of Fit Test
Random sample of adults given.
Large enough sample since n > 30; 8866 > 30
2( )2 O E
E
2 = 0.353+0.116+3.528+1.158+2.883+0.946
= 8.983
df = 6 – 1 = 5
p-value is P(χ2 ≤ 8.983) = 0.011
Because the p-value is very small (smaller than
α = 0.05 ), we would fail to reject the null
hypothesis at the 0.05 level and conclude that
the sample data provide strong evidence that
there is not an association between age group
and consumption of fruits and vegetables for
adults in the United States. Older (55+ years)
people were more likely to eat 5 or more
servings of fruits and vegetables, and middle-aged people (35–54 years) were less likely to
eat 5 or more servings of fruits and vegetables.
2 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
2. An ornithologist researching four bird species in Zilker Park of Austin, Texas is concerned
that grackles are overtaking the habitat. Previous studies have shown that the proportions of the
population including just these four species contains 8% parakeets, 16% warblers, 35% grackles,
and 41% doves. A team visited the park and randomly selected (yeah, right) 238 bird sightings of
these four species with the results listed below.
parakeets warblers grackles doves
20 33 100 85
Do these results support the ornithologist’s concern that the population proportions of these four
bird species are different than previous studies suggest?
Directions: Below are the elements of a complete response for this question. But in each cell
there is an error. With a partner or small group, find the error and write in the correction needed
for an AP quality response.
Ho: The proportion of each species of bird is equal.
Ha: The proportion of each species of bird is not
equal.
One proportion z-test
Random sample of bird sightings given.
All expected counts ≥ 5
Expected Counts
parakeets warblers grackles doves
59.5 59.5 59.5 59.5
2( )2 O E
E
2 = 4.297
df = 4 – 1 = 3
p-value is P(χ2 ≥ 5.696) = 0.216
Because the p-value is greater than 0.05
(0.216 > 0.05), we would fail to reject the null
hypothesis. We have enough evidence to claim
that the proportion of each species of bird is the same as in previous studies.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 3
Multiple Choice Questions
1. A survey of 75 people with jobs at a mall was conducted to determine whether they worked
in retail. The two-way table below shows the responses by gender.
Retail Non-retail Total
Male 26 18 44
Female 26 5 31
Total 52 23 75
Which of the following best describes the association between type of employment and
gender?
A) There appears to be no association, since the same number of men and women are
working in retail.
B) There appears to be an association, since more than half of the people have jobs in retail.
C) There appears to be an association, since there are more men than women in the survey.
D) There appears to be an association, since the proportion of women having retail jobs is
much larger than the proportion of men having retail jobs.
E) An association cannot be determined from these data.
2. The following two-way table resulted from a random sample of employees from a large
company. Employees were classified as managers or workers and asked if they were satisfied
with their job. Manager Non-Manager Total
Satisfied 45 70 115
Not Satisfied 15 25 40
Total 60 95 155
If the null hypothesis of no association between employment status and job satisfaction is
true, which of the following expressions gives the expected number of employees in the
sample who are managers and who are satisfied with their job?
A) (115)(95)
155
B) (115)(45)
60
C) (60)(115)
155
D) (115)(60)
45
E) (60)(45)
115
4 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
3. A biologist hypothesizes that half of the fish population in a large river will be brown trout
and that the remaining half will be split evenly between brook trout and rainbow trout. In a
random sample of 100 fish from this river, the individual species are distributed as shown in
the table below.
Brown trout Rainbow trout Brook trout
51 30 19
What is the value of the 2 statistic for the goodness of fit test on these data?
A) Less than 1
B) At least 1, but less than 5
C) At least 5, but less than 10
D) At least 10, but less than 15
E) At least 15
4. In an experiment, two different species of fruit trees were crossbred. The resulting fruit from
this crossbreeding experiment were classified by color of meat of the fruit and color of the
skin of the fruit, into one of four groups, as shown in the table below.
Fruit Type Resulting From Crossbreeding Number of Fruit Observed
With These Colors
I: Red meat with orange skin 65
II: Red meat with yellow skin 37
III: Yellow meat with orange skin 24
IV: Yellow meat with yellow skin 24
A botanist expected that the ratio of 5:2:2:1 for the color types I: II: III: IV, respectively,
would result from this crossbreeding experiment. Using the data above, a 2 value of
approximately 9.567 was computed. Are the observed results inconsistent with the expected
ratio at the 1 percent level of significance?
A) Yes, because the computed 2 value is greater than the critical value.
B) Yes, because the computed 2 value is less than the critical value.
C) No, because the computed 2 value is less than the critical value.
D) No, because the computed 2 value is greater than the critical value.
E) It cannot be determined because some of the expected counts are not large enough to use
the 2 test.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 5
5. A parent advisory board for a certain university was concerned about the effect of part-time
jobs on the academic achievement of students attending the university. To obtain some
information, the advisory board surveyed a simple random sample of 200 of the more than
20,000 students attending the university. Each student reported the average number of hours
spent working part-time each week and his or her perception of the effect of part-time work
on academic achievement. The data in the table below summarize the students’ responses by
average number of hours worked per week (less than 11, 11 to 20, more than 20) and the
perception of the effect of part-time work on academic achievement (positive, no effect,
negative).
Average Time Spent on Part-Time Jobs
Less Than 11
Hours per Week
11 to 20 Hours
per Week
More Than 20
Hours per Week
Perception of the
Effect of Part-time
Work on
Academic
Achievement
Positive Effect
21 9 5
No Effect
58 32 15
Negative Effect
18 23 19
A chi-square test was used to determine if there is an association between the effect of
part-time work on academic achievement and the average number of hours per week that
students work. Which of the following is the number of degrees of freedom used for the
appropriate test of significance?
A) df 2
B) df 3
C) df 4
D) df 6
E) df 9
6. The head of marketing for a major fast food chain is interested in determining whether a
customer’s age is related to the type of French fries they prefer. Her chain offers three types
of fries, home-style, sweet potato, and cross-cut. A random sample of 100 customers is taken
and the customer’s age group and French fry preference are collected. A chi-square test of
independence is to be used to test the hypothesis. Assuming the conditions for inference have
been met, which of the following statements is true?
A. A t-test for slope would also be appropriate for this data.
B. The test is not valid since only one sample was taken.
C. The null hypothesis for the test is that the proportion of each age group who prefer each
type of French fry is 1
3.
D. A larger sample would increase the value of the 2 critical value of this test.
E. A smaller value of the chi-square statistic indicates a smaller difference in the French fry
preference between age groups.
6 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
Additional Free Response Questions
1. A study was done to determine whether field mice found in a specific region of the country
have preference for the habitat where they tend to live. This region was divided into the
following four habitat types:
(1) Fields on farms currently growing crops,
(2) Fields on farms not currently growing crops,
(3) Non-maintained fields close to working farms,
(4) Undeveloped grasslands not close to working farms,
The proportion of total acreage in each of the habitat types was determined for the study area.
Using an eco-friendly catch and release method, mice were trapped and released in each of
the four habitat areas. The results are given in the table below.
Habitat Type Proportion of Total Acreage Number of Mice
Observed
1 0.550 57
2 0.120 12
3 0.115 8
4 0.215 40
Total 1.000 117
(a) The researchers who are conducting the study expect the number of mice observed in a
habitat type to be proportional to the amount of acreage of that type of habitat if mice
have no habitat preference. Do these data provide convincing evidence that mice have a
preference for the type of habitat in which they live? Conduct an appropriate statistical
test to support your conclusion. Assume the conditions for inference are met.
(b) Relative to the proportion of total acreage, which habitat type did the mice seem to
prefer? Explain.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 7
2. A medical research team at a certain university was interested in the effect of daily exercise
on stress level in students. To obtain some information, the team surveyed a simple random
sample of 400 of the more than 35,000 students attending the university. Each student
reported the average number of hours spent exercising each week and his or her perception of
the effect of exercise on stress level. The data in the table below summarize the students’
responses by average number of hours exercising per week (less than 5, 5 to 7, more than 7)
and the perception of the effect of exercise on stress level (positive, no effect, negative).
Average Time Spent Exercising
Less Than 5
Hours per Week
5 to 7 Hours
per Week
More Than 7
Hours per Week
Perception of the
Effect of Exercise on
Stress Level
Positive Effect
87 102 35
No Effect
95 36 8
Negative Effect
20 4 13
A chi-square test was used to determine if there is an association between the effect of
exercise on stress level and the average number of hours per week that students spent
exercising. Computer output that resulted from performing this test is shown below.
(a) State the null and alternative hypotheses for this test.
CHI-SQUARE TEST
Expected counts are printed below observed counts
<5 5—7 >7 Total
Positive 87 102 35 224
113.12 79.52 31.36
No effect 95 36 8 139
70.195 49.345 19.46
Negative 20 4 13 37
18.685 13.135 5.18
Total 202 142 56 400
Chi-Sq 50.1832 , DF 4 , P-Value103.306 10
8 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
(b) Discuss whether the conditions for a chi-squared inference procedure are met for these
data
(c) Given the results from the chi-square test, what should the research team conclude?
(d) Based on your conclusion in part (c), which type of error (Type I or Type II) might the
research team have made? Describe this error in the context of the question.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 9
3. Contestants on a game show spin a wheel like the one shown in the figure below. Each of
four outcomes on the wheel is equally likely and outcomes are independent from one spin to
the next.
• The contestant spins the wheel.
• If the result is a skunk, no money is won and the contestant’s turn is finished.
• If the result is a number, the corresponding amount in dollars is won. The contestant can
then stop with those winnings or can choose to spin again, and his or her turn continues.
• If the contestant spins again and the result is a skunk, all the money earned on that turn is
lost and the turn ends.
• The contestant may continue adding to his or her winnings until he or she chooses to stop
or until a spin results in a skunk.
(a) What is the probability that the result will be a number on all the first three spins of the
wheel?
(b) Suppose a contestant has earned $800 on his or her first three spins and chooses to spin
again. What is the expected value of his or her total winnings for the four spins?
(c) A contestant who lost at this game alleges that the wheel is not fair. In order to check on
the fairness of the wheel, the data in the table below were collected for 100 spins of this
wheel. Result Skunk $100 $200 $500
Frequency 33 21 20 26
Based on these data, can you conclude that the four outcomes on this wheel are not
equally likely? Give appropriate statistical evidence to support your answer.
10 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
Chi-square Tests Notes
Requirements for a complete response:
1. Identify the population parameters of interest, the null and the alternative hypotheses, and the
significance level of the test.
• All the population parameters of interest (proportions in this case) must be defined in the
context of the problem.
• The null and alternative hypotheses are different for each type of chi-square test. Both
hypothesis statements must be stated in the context of the situation.
• Chi-square goodness-of-fit test
0H : The population proportions (in context) are equal to the stated proportions.
H :a At least one proportion differs from its stated proportion.
• Chi-square test of homogeneity
0H : The category proportions (in context) are the same for all populations.
H :a The category proportions (in context) are different for at least one population.
• Chi-square test of independence
0H : There is no association between (two categorical variables in context). OR The
two categorical variables are independent.
H :a There is an association between two categorical variables. OR
The two categorical variables are not independent.
2. Name the inferential procedure and state and verify whether the conditions needed for this
procedure are met.
o Observed counts are based on random samples or a randomized experiment.
o All expected counts are at least 5 (or all expected counts are at least 1 with no more
than 20% of the expected counts less than 5). Expected counts must be written.
3. Show the calculations including the formula for the test statistic, the value of the test statistic,
the degrees of freedom, and the p-value. and a picture of the sampling distribution showing
the associated p-value shaded.
• Degrees of freedom for a goodness-of-fit test = number of categories 1 ;
Degrees of freedom for chi-square two way tests = ( 1)( 1)r c .
The formula for 2 is included on the formula sheet but the formula for expected counts is
not.
(column total)(row total)expected count
table total
4. State the conclusion in context of the situation.
Include a statement linking the p-value to the significance level and the decision about
the null hypothesis (fail to reject Ho or reject Ho) and a second statement about whether or
not there is evidence to support the alternative hypothesis. The alternative should always
be stated in context. Remember: you never accept the null hypothesis!
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 11
Additional Practice Problems
1. Chi square goodness of fit test
A clothing company claims that the dominant color of the shirts students wear can be predicted
using the proportions below.
35% black
15% white
30 % blue
20% other
A random sample of 200 students was selected with the following results for their dominant shirt
color. Black White Blue Other
64 28 71 37
2. A clothing company claims that the dominant color of the shirts students wear differs for male
and female students.
A random sample of 120 female students and another random sample of 120 male students was
selected with the following results for their dominant shirt color. Black White Blue Other Total
Males 36 14 49 21 120
Females 36 24 27 33 120
Total 72 38 76 54 240
12 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org