AP Statistics Chi Square Tests Student Handout

14
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org AP Statistics Chi Square Tests Student Handout 2017-2018 EDITION

Transcript of AP Statistics Chi Square Tests Student Handout

Page 1: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

AP Statistics

Chi Square Tests

Student Handout

2017-2018 EDITION

Page 2: AP Statistics Chi Square Tests Student Handout
Page 3: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 1

1. The Behavioral Risk Factor Surveillance System is an ongoing health survey system that

tracks health conditions and risk behaviors in the United States. In one of their studies, a random

sample of 8,866 adults answered the question “Do you consume five or more servings of fruits

and vegetables per day?” The data are summarized by response and by age-group in the

frequency table below.

Age-Group (years) Yes No Total

18-34 231 741 972

35-54 669 2,242 2,911

55 or older 1,291 3,692 4,983

Total 2,191 6,675 8,866

Do the data provide convincing statistical evidence that there is an association between age-

group and whether or not a person consumes five or more servings of fruits and vegetables per

day for adults in the United States?

Directions: Below are the elements of a complete response for this question. But in each cell

there is an error. With a partner or small group, find the error and write in the correction needed

for an AP quality response.

Ho: Fruit and vegetable consumption is associated

with age group for the population of adults in

the United States.

Ha: Fruit and vegetable consumption is not

associated with age group for the population of

adults in the United States.

Chi Square Goodness of Fit Test

Random sample of adults given.

Large enough sample since n > 30; 8866 > 30

2( )2 O E

E

2 = 0.353+0.116+3.528+1.158+2.883+0.946

= 8.983

df = 6 – 1 = 5

p-value is P(χ2 ≤ 8.983) = 0.011

Because the p-value is very small (smaller than

α = 0.05 ), we would fail to reject the null

hypothesis at the 0.05 level and conclude that

the sample data provide strong evidence that

there is not an association between age group

and consumption of fruits and vegetables for

adults in the United States. Older (55+ years)

people were more likely to eat 5 or more

servings of fruits and vegetables, and middle-aged people (35–54 years) were less likely to

eat 5 or more servings of fruits and vegetables.

Page 4: AP Statistics Chi Square Tests Student Handout

2 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

2. An ornithologist researching four bird species in Zilker Park of Austin, Texas is concerned

that grackles are overtaking the habitat. Previous studies have shown that the proportions of the

population including just these four species contains 8% parakeets, 16% warblers, 35% grackles,

and 41% doves. A team visited the park and randomly selected (yeah, right) 238 bird sightings of

these four species with the results listed below.

parakeets warblers grackles doves

20 33 100 85

Do these results support the ornithologist’s concern that the population proportions of these four

bird species are different than previous studies suggest?

Directions: Below are the elements of a complete response for this question. But in each cell

there is an error. With a partner or small group, find the error and write in the correction needed

for an AP quality response.

Ho: The proportion of each species of bird is equal.

Ha: The proportion of each species of bird is not

equal.

One proportion z-test

Random sample of bird sightings given.

All expected counts ≥ 5

Expected Counts

parakeets warblers grackles doves

59.5 59.5 59.5 59.5

2( )2 O E

E

2 = 4.297

df = 4 – 1 = 3

p-value is P(χ2 ≥ 5.696) = 0.216

Because the p-value is greater than 0.05

(0.216 > 0.05), we would fail to reject the null

hypothesis. We have enough evidence to claim

that the proportion of each species of bird is the same as in previous studies.

Page 5: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 3

Multiple Choice Questions

1. A survey of 75 people with jobs at a mall was conducted to determine whether they worked

in retail. The two-way table below shows the responses by gender.

Retail Non-retail Total

Male 26 18 44

Female 26 5 31

Total 52 23 75

Which of the following best describes the association between type of employment and

gender?

A) There appears to be no association, since the same number of men and women are

working in retail.

B) There appears to be an association, since more than half of the people have jobs in retail.

C) There appears to be an association, since there are more men than women in the survey.

D) There appears to be an association, since the proportion of women having retail jobs is

much larger than the proportion of men having retail jobs.

E) An association cannot be determined from these data.

2. The following two-way table resulted from a random sample of employees from a large

company. Employees were classified as managers or workers and asked if they were satisfied

with their job. Manager Non-Manager Total

Satisfied 45 70 115

Not Satisfied 15 25 40

Total 60 95 155

If the null hypothesis of no association between employment status and job satisfaction is

true, which of the following expressions gives the expected number of employees in the

sample who are managers and who are satisfied with their job?

A) (115)(95)

155

B) (115)(45)

60

C) (60)(115)

155

D) (115)(60)

45

E) (60)(45)

115

Page 6: AP Statistics Chi Square Tests Student Handout

4 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

3. A biologist hypothesizes that half of the fish population in a large river will be brown trout

and that the remaining half will be split evenly between brook trout and rainbow trout. In a

random sample of 100 fish from this river, the individual species are distributed as shown in

the table below.

Brown trout Rainbow trout Brook trout

51 30 19

What is the value of the 2 statistic for the goodness of fit test on these data?

A) Less than 1

B) At least 1, but less than 5

C) At least 5, but less than 10

D) At least 10, but less than 15

E) At least 15

4. In an experiment, two different species of fruit trees were crossbred. The resulting fruit from

this crossbreeding experiment were classified by color of meat of the fruit and color of the

skin of the fruit, into one of four groups, as shown in the table below.

Fruit Type Resulting From Crossbreeding Number of Fruit Observed

With These Colors

I: Red meat with orange skin 65

II: Red meat with yellow skin 37

III: Yellow meat with orange skin 24

IV: Yellow meat with yellow skin 24

A botanist expected that the ratio of 5:2:2:1 for the color types I: II: III: IV, respectively,

would result from this crossbreeding experiment. Using the data above, a 2 value of

approximately 9.567 was computed. Are the observed results inconsistent with the expected

ratio at the 1 percent level of significance?

A) Yes, because the computed 2 value is greater than the critical value.

B) Yes, because the computed 2 value is less than the critical value.

C) No, because the computed 2 value is less than the critical value.

D) No, because the computed 2 value is greater than the critical value.

E) It cannot be determined because some of the expected counts are not large enough to use

the 2 test.

Page 7: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 5

5. A parent advisory board for a certain university was concerned about the effect of part-time

jobs on the academic achievement of students attending the university. To obtain some

information, the advisory board surveyed a simple random sample of 200 of the more than

20,000 students attending the university. Each student reported the average number of hours

spent working part-time each week and his or her perception of the effect of part-time work

on academic achievement. The data in the table below summarize the students’ responses by

average number of hours worked per week (less than 11, 11 to 20, more than 20) and the

perception of the effect of part-time work on academic achievement (positive, no effect,

negative).

Average Time Spent on Part-Time Jobs

Less Than 11

Hours per Week

11 to 20 Hours

per Week

More Than 20

Hours per Week

Perception of the

Effect of Part-time

Work on

Academic

Achievement

Positive Effect

21 9 5

No Effect

58 32 15

Negative Effect

18 23 19

A chi-square test was used to determine if there is an association between the effect of

part-time work on academic achievement and the average number of hours per week that

students work. Which of the following is the number of degrees of freedom used for the

appropriate test of significance?

A) df 2

B) df 3

C) df 4

D) df 6

E) df 9

6. The head of marketing for a major fast food chain is interested in determining whether a

customer’s age is related to the type of French fries they prefer. Her chain offers three types

of fries, home-style, sweet potato, and cross-cut. A random sample of 100 customers is taken

and the customer’s age group and French fry preference are collected. A chi-square test of

independence is to be used to test the hypothesis. Assuming the conditions for inference have

been met, which of the following statements is true?

A. A t-test for slope would also be appropriate for this data.

B. The test is not valid since only one sample was taken.

C. The null hypothesis for the test is that the proportion of each age group who prefer each

type of French fry is 1

3.

D. A larger sample would increase the value of the 2 critical value of this test.

E. A smaller value of the chi-square statistic indicates a smaller difference in the French fry

preference between age groups.

Page 8: AP Statistics Chi Square Tests Student Handout

6 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

Additional Free Response Questions

1. A study was done to determine whether field mice found in a specific region of the country

have preference for the habitat where they tend to live. This region was divided into the

following four habitat types:

(1) Fields on farms currently growing crops,

(2) Fields on farms not currently growing crops,

(3) Non-maintained fields close to working farms,

(4) Undeveloped grasslands not close to working farms,

The proportion of total acreage in each of the habitat types was determined for the study area.

Using an eco-friendly catch and release method, mice were trapped and released in each of

the four habitat areas. The results are given in the table below.

Habitat Type Proportion of Total Acreage Number of Mice

Observed

1 0.550 57

2 0.120 12

3 0.115 8

4 0.215 40

Total 1.000 117

(a) The researchers who are conducting the study expect the number of mice observed in a

habitat type to be proportional to the amount of acreage of that type of habitat if mice

have no habitat preference. Do these data provide convincing evidence that mice have a

preference for the type of habitat in which they live? Conduct an appropriate statistical

test to support your conclusion. Assume the conditions for inference are met.

(b) Relative to the proportion of total acreage, which habitat type did the mice seem to

prefer? Explain.

Page 9: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 7

2. A medical research team at a certain university was interested in the effect of daily exercise

on stress level in students. To obtain some information, the team surveyed a simple random

sample of 400 of the more than 35,000 students attending the university. Each student

reported the average number of hours spent exercising each week and his or her perception of

the effect of exercise on stress level. The data in the table below summarize the students’

responses by average number of hours exercising per week (less than 5, 5 to 7, more than 7)

and the perception of the effect of exercise on stress level (positive, no effect, negative).

Average Time Spent Exercising

Less Than 5

Hours per Week

5 to 7 Hours

per Week

More Than 7

Hours per Week

Perception of the

Effect of Exercise on

Stress Level

Positive Effect

87 102 35

No Effect

95 36 8

Negative Effect

20 4 13

A chi-square test was used to determine if there is an association between the effect of

exercise on stress level and the average number of hours per week that students spent

exercising. Computer output that resulted from performing this test is shown below.

(a) State the null and alternative hypotheses for this test.

CHI-SQUARE TEST

Expected counts are printed below observed counts

<5 5—7 >7 Total

Positive 87 102 35 224

113.12 79.52 31.36

No effect 95 36 8 139

70.195 49.345 19.46

Negative 20 4 13 37

18.685 13.135 5.18

Total 202 142 56 400

Chi-Sq 50.1832 , DF 4 , P-Value103.306 10

Page 10: AP Statistics Chi Square Tests Student Handout

8 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

(b) Discuss whether the conditions for a chi-squared inference procedure are met for these

data

(c) Given the results from the chi-square test, what should the research team conclude?

(d) Based on your conclusion in part (c), which type of error (Type I or Type II) might the

research team have made? Describe this error in the context of the question.

Page 11: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 9

3. Contestants on a game show spin a wheel like the one shown in the figure below. Each of

four outcomes on the wheel is equally likely and outcomes are independent from one spin to

the next.

• The contestant spins the wheel.

• If the result is a skunk, no money is won and the contestant’s turn is finished.

• If the result is a number, the corresponding amount in dollars is won. The contestant can

then stop with those winnings or can choose to spin again, and his or her turn continues.

• If the contestant spins again and the result is a skunk, all the money earned on that turn is

lost and the turn ends.

• The contestant may continue adding to his or her winnings until he or she chooses to stop

or until a spin results in a skunk.

(a) What is the probability that the result will be a number on all the first three spins of the

wheel?

(b) Suppose a contestant has earned $800 on his or her first three spins and chooses to spin

again. What is the expected value of his or her total winnings for the four spins?

(c) A contestant who lost at this game alleges that the wheel is not fair. In order to check on

the fairness of the wheel, the data in the table below were collected for 100 spins of this

wheel. Result Skunk $100 $200 $500

Frequency 33 21 20 26

Based on these data, can you conclude that the four outcomes on this wheel are not

equally likely? Give appropriate statistical evidence to support your answer.

Page 12: AP Statistics Chi Square Tests Student Handout

10 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org

Chi-square Tests Notes

Requirements for a complete response:

1. Identify the population parameters of interest, the null and the alternative hypotheses, and the

significance level of the test.

• All the population parameters of interest (proportions in this case) must be defined in the

context of the problem.

• The null and alternative hypotheses are different for each type of chi-square test. Both

hypothesis statements must be stated in the context of the situation.

• Chi-square goodness-of-fit test

0H : The population proportions (in context) are equal to the stated proportions.

H :a At least one proportion differs from its stated proportion.

• Chi-square test of homogeneity

0H : The category proportions (in context) are the same for all populations.

H :a The category proportions (in context) are different for at least one population.

• Chi-square test of independence

0H : There is no association between (two categorical variables in context). OR The

two categorical variables are independent.

H :a There is an association between two categorical variables. OR

The two categorical variables are not independent.

2. Name the inferential procedure and state and verify whether the conditions needed for this

procedure are met.

o Observed counts are based on random samples or a randomized experiment.

o All expected counts are at least 5 (or all expected counts are at least 1 with no more

than 20% of the expected counts less than 5). Expected counts must be written.

3. Show the calculations including the formula for the test statistic, the value of the test statistic,

the degrees of freedom, and the p-value. and a picture of the sampling distribution showing

the associated p-value shaded.

• Degrees of freedom for a goodness-of-fit test = number of categories 1 ;

Degrees of freedom for chi-square two way tests = ( 1)( 1)r c .

The formula for 2 is included on the formula sheet but the formula for expected counts is

not.

(column total)(row total)expected count

table total

4. State the conclusion in context of the situation.

Include a statement linking the p-value to the significance level and the decision about

the null hypothesis (fail to reject Ho or reject Ho) and a second statement about whether or

not there is evidence to support the alternative hypothesis. The alternative should always

be stated in context. Remember: you never accept the null hypothesis!

Page 13: AP Statistics Chi Square Tests Student Handout

Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 11

Additional Practice Problems

1. Chi square goodness of fit test

A clothing company claims that the dominant color of the shirts students wear can be predicted

using the proportions below.

35% black

15% white

30 % blue

20% other

A random sample of 200 students was selected with the following results for their dominant shirt

color. Black White Blue Other

64 28 71 37

2. A clothing company claims that the dominant color of the shirts students wear differs for male

and female students.

A random sample of 120 female students and another random sample of 120 male students was

selected with the following results for their dominant shirt color. Black White Blue Other Total

Males 36 14 49 21 120

Females 36 24 27 33 120

Total 72 38 76 54 240

Page 14: AP Statistics Chi Square Tests Student Handout

12 Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org