AP Statistics Unit 4 Concepts (Inference) (Chapter 8, 9, 10, 11, 12) · 2017. 1. 18. · 3 CLIFF...

1

AP Statistics – Unit 4 Concepts (Inference) (Chapter 8, 9, 10, 11, 12) I can identify a point estimator to help estimate an unknown parameter.

I can correctly interpret the meaning of the margin of error in context.

I can understand that a confidence interval gives a range of plausible values for the parameter.

I can interpret a confidence level and interval in context.

I can understand why each of the three inference conditions—Random, Normal, and Independent—is important.

I can understand how confidence level or sample size will affect the margin of error.

I can construct and interpret a confidence interval for a population proportion or mean.

I can determine critical values for calculating a confidence interval using a table or your calculator.

I can explain how practical issues like nonresponse, under coverage, and response bias can affect the interpretation of a confidence interval.

I can carry out the steps in constructing a confidence interval for a population proportion: define the parameter; check conditions; perform calculations; interpret results in context.

I can determine the sample size required to obtain a level C confidence interval for a population proportion or mean with a specified margin of error.

I can understand how the margin of error of a confidence interval changes with the sample size and the level of confidence C.

I can carry out the steps in constructing a confidence interval for a population mean: define the parameter; check conditions; perform calculations; interpret results in context.

I can determine sample statistics from a confidence interval.

I can correctly identify the parameter of interest for a hypothesis test.

I can state correct hypotheses for a significance test about a population proportion or mean.

I can interpret P-values in context.

I can interpret a Type I error and a Type II error in context, and give the consequences of each.

I can understand the relationship between the significance level of a test, P(Type II error), and power.

I can check conditions for carrying out a test about a population proportion or mean.

I can recognize that if conditions are met, conduct a significance test about a population proportion or mean.

I can use a confidence interval to draw a conclusion for a two-sided test about a population proportion.

I can use a confidence interval to draw a conclusion for a two-sided test about a population mean.

I can recognize when a confidence interval is not needed for estimations.

I can recognize paired data and use one-sample t procedures to perform significance tests for such data.

I can describe the characteristics of the sampling distribution of 1 2ˆ ˆp p

I can calculate probabilities using the sampling distribution of 1 2ˆ ˆp p

I can determine whether the conditions for performing inference are met.

I can construct and interpret a confidence interval to compare two proportions.

I can perform a significance test to compare two proportions.

I can interpret the results of inference procedures in a randomized experiment.

I can describe the characteristics of the sampling distribution of 1 2x x

2

I can calculate probabilities using the sampling distribution of 1 2x x

I can determine whether the conditions for performing inference are met.

I can use two-sample t procedures to compare two means based on summary statistics.

I can use two-sample t procedures to compare two means from raw data.

I can interpret standard computer output for two-sample t procedures.

I can perform a significance test to compare two means.

I can check conditions for using two-sample t procedures in a randomized experiment.

I can interpret the results of inference procedures in a randomized experiment.

I can know how to compute expected counts, conditional distributions, and contributions to the chi-square statistic.

I can check the Random, Large sample size, and Independent conditions before performing a chi-square test.

I can use a chi-square goodness-of-fit test to determine whether sample data are consistent with a specified distribution of a categorical variable.

I can examine individual components of the chi-square statistic as part of a follow-up analysis.


I can use a chi-square test for homogeneity to determine whether the distribution of a categorical variable differs for several populations or treatments.

I can interpret computer output for a chi-square test based on a two-way table.


I can show that the two-sample z test for comparing two proportions and the chi-square test for a 2-by-2 two-way table give equivalent results.


I can use a chi-square test of association/independence to determine whether there is convincing evidence of an association between two categorical variables.

I can interpret computer output for a chi-square test based on a two-way table.


I can distinguish between the three types of chi-square tests.

I can check conditions for performing inference about the slope 𝜷 of the population regression line.

I can interpret computer output from a least-squares regression analysis.

I can construct and interpret a confidence interval for the slope 𝜷 of the population regression line.

I can perform a significance test about the slope 𝜷 of a population regression line.

I can use transformations involving powers and roots to achieve linearity for a relationship between two variables.

I can make predictions from a least-squares regression line involving transformed data.

I can use transformations involving logarithms to achieve linearity for a relationship between two variables.

I can make predictions from a least-squares regression line involving transformed data. I can determine which of several transformations does a better job of producing a linear

relationship. I can determine the proper inference procedure to use in a given setting.

3

CLIFF NOTES: AP Statistics – Exam Review (Inference Review) Overview of Chapters 8, 9, 10, 11, 12 Some Key Vocabulary:

Understand the difference between a statistic and a parameter o STATISTICS: �̅�, 𝒔𝒙, �̂� o PARAMETERS: μ, σ P

Null hypothesis versus alternative hypothesis; One versus two-sided alternative Major Concepts to be mastered: #1. You need to know the difference between a population parameter, a sample statistic, and the sampling distribution of a statistic

a. In sample proportions, what do we represent the population parameter as? ANSWER: p-hat

b. Be familiar with the formulas to find the mean and standard deviations of a sampling distribution of (p-hat).

𝝈�̂� = √𝒑(𝟏 − 𝒑)

𝒏; 𝑵𝑶𝑻𝑬: 𝟏 − 𝒑 = 𝒒

c. Know the formula to find the standard deviation of a sampling distribution for a mean.

𝒔�̅� =𝒔𝒙

√𝒏

#2. When can you determine if a sample is large enough to assume that sampling distribution is approximately normal?

FOR MEANS: If the sample size is greater than 30, we can assume an approximately normal distribution (normality) thanks to the Central Limit Theorem. FOR PROPORTIONS: you must show normality by doing the tests: n * p ≥ 10; n * q ≥ 10; note: q = 1-p

Inference Review You must be able to decide which statistical inference procedure is appropriate in a given setting. Working lots of review problems will help you. #3. On any hypothesis testing problem:

Textbook refers to this as the FOUR STEP PROCESS, we use the acronym: PHANTOMS: 1. P/H: State the parameter of interest in the context of the problem. State hypotheses in words

and symbols. 2. A/N: Identify the correct inference procedure and verify assumptions/conditions for using it. 3. T/O: Calculate the test statistic and the P-value (or rejection region). 4. M/S: Draw a conclusion in the context of the problem that is directly linked to your P-value or

rejection region. TIPS:

State your hypotheses in terms of population parameters, NOT ON SAMPLE STATISTICS!!

Use the standard notation in your hypotheses: μ for a population mean; ‘p’ for a population proportion, or 𝜷 of the slope of a regression line.

Don’t reverse the NULL and ALTERNATIVE hypotheses. Remember, the null hypotheses is basically a statement of ‘no effect’ of ‘no difference.’ If you hope to show that there is a difference been TWO POPULATION MEANS, then the null hypotheses should be that the population MEANS ARE EQUAL!

It is not good enough to state the conditions/assumptions for the chosen inference procedure. You must show that the conditions/assumptions are indeed satisfied.

4

#4. On any confidence interval problem:

STEPS (P.A.N.I.C.): 1. P: Identify the population of interest and the parameter you want to draw conclusions about. 2. A/N: Choose the appropriate inference procedure and verify assumptions/conditions for its

use. 3. I: Carry out the inference procedure. 4. C: Interpret your results in the context of the problem.

#5. You need to know the specific conditions required for the validity of each statistical inference procedure -- confidence intervals and significance tests. They are:

RANDOM; INDEPENDENT; NORMAL Introduction to Inference #6. Be sure to have a clear understanding as to what a confidence interval tells us.

“We are ________% confident that the interval (_____, _____) captures the true mean/proportion/difference of ___________________.”

#7. What is z*? ANSWER: z-score upper ‘p’ critical value.

#8. Understand what margin of error tells us. ANSWER: Shows us how accurate we believe our ‘guess’ is going to be.

#9. Be able to understand what happens to the margin error as

z* (OR t*) decreases m.e. decreases

σ decreases. m.e. decreases

n decreases m.e. increases

By how many times must the sample size n increase in order to cut the margin of error in half?

Multiply by 4

#10. Know what happens in regards to the null hypothesis when the p-value is both large and small. LARGE: Fail to reject Ho SMALL: reject Ho

#11. What does a test statistic estimate? ANSWER: It says how far our statistic is from the parameter. It is often a measures of standard deviation. It’s a distance from the mean…

#12. Be able to understand the difference between a Type I and Type II Error. Be able to calculate the probability of a Type I Error

TYPE I: When we reject the null hypothesis when we should not P(Type I) = alpha α TYPE II: When we fail to reject the null hypothesis when we should reject it. P(Type II) = beta β

5

#13. Know how to calculate the power of a test.

Power of a test: Probability of correctly rejecting a null hypothesis

Power = 1 - P(Type II error). You can increase the power of a test by increasing the sample size or increasing the

significance level (the probability of a Type I error).

t-distributions #14. When do we use‘s’ as an estimate of σ?

Virtually always…. We do this when we do not know the true value of standard deviation, which is essentially always.

#15. What is the standard error of the sample mean (x-bar)? 𝒔�̅� =𝒔𝒙

√𝒏

#16. What are some differences between a standard normal distribution and a t-distribution? ANSWER: a ‘z’ distribution is perfectly normal; whereas a ‘t’ distribution is only approximately normal. A ‘t’ distribution has more spread, variation, and variability.

#17. What will happen to the t-distribution as the degrees of freedom increases? ANSWER: df = n – 1 As the df increases, the distribution will get closer and closer to being normal. (variation of the CENTRAL LIMIT THEOREM).

#18. In a matched pair’s t-procedure, what is the parameter of interest (μ)? ANSWER: The parameter of the interest is the DIFFERENCE BETWEEN the two MEANS: 𝑯𝒐: 𝝁𝑫 = 𝟎, this means that there is NO DIFFERENCE between the two means, thus the letter ‘D’.

#19. Know the assumptions for a t-distribution.

RANDOM, INDEPENDENCE, NORMAL #20. How are two sample problems different from one-sample problems?

You must ask the question about whether the two samples are INDEPENDENT of each other or not. If they are NOT independent, then you are probably dealing with a matched pairs scenario.

ALSO: When you are doing the tests for proportions, you must do it for BOTH proportions. Comparing Two Means #21. Know the assumptions for comparing two means.

RANDOM, INDEPENDENCE, NORMAL #22. Do the two sample sizes have to be the same for comparing two means?

NO!!!, but they can be. For MATCHED PAIRS, they will be the same. Just because sample sizes are the same, don’t ALWAYS assume its matched pairs.

#23. What is the null hypothesis when comparing two means?

𝑴𝑬𝑨𝑵𝑺: 𝑯𝒐: 𝝁𝟏 = 𝝁𝟐; 𝑷𝑹𝑶𝑷𝑶𝑹𝑻𝑰𝑶𝑵𝑺: 𝑯𝒐: 𝒑𝟏 = 𝒑𝟐 #24. How do you standardize for comparing two means?

1 2 1 2

2 2

1 2

1 2

x xt

s s

n n

6

Inference for Proportions #25. Know how to calculate the mean and standard deviation of a sample proportion.

Formulas are both featured in the formula booklet provided on all tests and on the AP EXAM. #26. Know how to calculate the standard error of �̂�. SEE ABOVE #27. Know the assumptions that must be met in order to use z-procedures for inference about a proportion.

RANDOM, INDEPENDENCE, NORMAL #28. Know how to calculate the test statistic and margin of error for a one sample proportion.

𝒛 =�̂�−𝒑

√𝒑(𝟏−𝒑)

𝒏

; Margin of error = 𝒛∗√�̂�(𝟏−�̂�)

𝒏

Chi-Square Distributions #29. What is a chi-square distribution? What is the shape of a chi-square distribution?

ANSWER: We use chi-square tests when we are dealing with CATEGORICAL data/variables. A chi-square distribution is ALWAYS skewed to the right. It is NOT a NORMAL distribution.

#30. Know how to state the null and alternative hypotheses for a goodness of fit test. ANSWER: There are variety of ways, depending on whether it is a:

CHI-SQUARE test for Goodness of Fit CHI-SQUARE test for Independence CHI-SQUARE test for Homogeneity

#31. Know the assumptions that must be met in order to use a goodness of fit test.

RANDOM, INDEPENDENCE, LARGE SAMPLE SIZE The NORMAL condition is replaced by the ‘LARGE SAMPLE SIZE’ condition, which asks whether all the expected counts are bigger than 5 or not. #32. Know how to calculate the expected count in any cell of a two-way table when the null hypothesis is true.

𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒄𝒐𝒖𝒏𝒕𝒔 =𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍 𝒙 𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍

𝒕𝒂𝒃𝒍𝒆 𝒕𝒐𝒕𝒂𝒍

#33. Know how to calculate the degrees of freedom for a two-way table in a chi-square test.

df = (r – 1 )(c – 1), for a matrix Otherwise: df = n – 1, where ‘n’ is the number of categories, NOT the number of individuals in the sample.

Inference for the Regression Setting #34. When dealing with inference in the regression setting, what are the unknown parameters?

SLOPE: 𝜷, which is a parameter. The sample statistic that estimates the slope is of course, ‘b’. #35. How do you calculate the degrees of freedom in the regression setting?

df = n – 2, because there are TWO variables now instead of just 1.

7

#36. What are the assumptions for inference in the regression setting?

L.I.N.E.R.: Relationship between the data should be LINEAR INDEPENDENCE NORMALITY EQUAL VARIANCE RANDOM

#37. How do you calculate the test statistic in the regression setting?

𝒕 =𝒃

𝑺𝑬𝒃

#38. How do you calculate a confidence interval in the regression setting?

𝒃 ± 𝒕∗𝑺𝑬𝑩

#39. What is the null hypothesis going to be in the regression setting?

𝑯𝒐: 𝑻𝒉𝒆𝒓𝒆 𝒊𝒔 𝑵𝑶 𝒂𝒔𝒔𝒐𝒄𝒊𝒂𝒕𝒊𝒐𝒏/𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏𝒔𝒉𝒊𝒑..

8

TEN BASELINE QUESTIONS: #1. Concept: I can understand how confidence level or sample size will affect the margin of error.

I can interpret a confidence level and interval in context.

I can correctly interpret the meaning of the margin of error in context. Based on a survey of a random sample of 500 adults in the United States, a statistician reports that 55 percent of adults in the United States are in favor of increasing the minimum hourly wage.

If the reported percent has a margin of error of 5.7 percentage points, what is the closest to the level of confidence?

#2. Concept:

I can identify a point estimator to help estimate an unknown parameter.

A large sample 95 percent confidence interval for the proportion of airline tickets that are canceled on the intended arrival day is (0.028, 0.106). What is the point estimate for the proportion of airline tickets that are canceled from which this interval was constructed? #3. Concept:

I can carry out the steps in constructing a confidence interval for a population mean: define the parameter; check conditions; perform calculations; interpret results in context.

When using a one-sample t-procedure to construct a confidence interval for the mean of a finite population, there is a condition known as the 10% condition. Explain this condition and what the reason for the condition is designed to ensure?

9

#4. Concept:

I can construct and interpret a confidence interval for a population proportion or mean.

A random sample of 150 students at a large high school resulted in a 80 percent confidence interval for the mean number of hours of studying per day of (0.75, 2.8). What best summarizes the meaning of this confidence interval?

“We are ________ confident that the interval : _______________________ captures the _____________ _________________________________________.” “About _______ of all _______________________ of size ________________ from this population would result in a(n) __________ confidence interval that would cover the ______________________________________________________________. “ #5. Concept:

I can determine the sample size required to obtain a level C confidence interval for a population proportion or mean with a specified margin of error.

In 2006 a survey of Internet usage found that 68 percent of adult’s age 18 years and older in the United States use the Internet. A broadband company believes that the percent is greater now that it was in 2006 and will conduct a survey. The company plans to construct a 90 percent confidence interval to estimate the current percent and wants the margin of error to be no more than 4.5 percentage points. Assuming that at least 68 percent of adults use the Internet, what inequality should be used to find the sample size (n) needed?

10

#6. Concept: I can recognize paired data and use one-sample t procedures to perform significance tests for such

data.

I can determine the proper inference procedure to use in a given setting.

Suppose you want to do a hypothesis test for two sets of data that are independent from each other, what type of test would you do? #7. Concept:


A university researcher conducted a two-tailed hypothesis test on a set of data and obtained a p-value of 0.32. If the experimenter had conducted a one-tailed test on the same set of data, what would be some possible p-value(s) that the researcher could have obtained? #8. Concept: I can recognize paired data and use one-sample t procedures to perform significance tests for such

data.

I can determine the proper inference procedure to use in a given setting.

What type of inference procedure would be used to determine if there is a relationship between the type of car a person drives and the driver’s gender? What type of inference procedure would be done where a group of randomly selected subjects take a sleeping pill one week and then a week later take another sleeping pill and afterwards the results of the two sleeping pills are compared?

11

#9. Concept:

I can construct and interpret a confidence interval for the slope 𝜷 of the population regression line.

12

#10. Concept: (from the 2011 AP Stats Exam)


High cholesterol levels in people can be reduced by exercise, diet, and medication. Twenty middle-aged males with cholesterol readings between 220 and 240 milligrams per deciliter (mg/dL) of blood were randomly selected from the population of such male patients at a large hospital. Ten of the 20 males were randomly assigned to group A, advised on appropriate exercise and diet, and also received a placebo. The other 10 males were assigned to Group B, received the same advice on appropriate exercise and diet, but received a drug intended to reduce cholesterol instead of a placebo. After three months, post treatment cholesterol readings were taken for all 20 males and compared to pretreatment cholesterol readings. The tables below give the reduction in cholesterol level (pretreatment reading minus post treatment reading) for each male in the study.

a. Do the data provide convincing evidence at the α = 0.01 level, that the cholesterol drug is

effective in producing a reduction in mean cholesterol level beyond that produced by exercise and diet?

b. Interpret what this p-value measures in the context of this study. c. Based on this p-value and study design, what conclusion should be drawn in the context of this

study? Use a significance level of 𝜶 = 0.01. d. Based on your conclusion in part (b), which type of error, Type I or Type II, could have been

made? What is ONE potential consequence of this error?

13

#11. BASELINE: CONCEPT: I can use a chi-square to determine whether sample data are consistent with a specified distribution of a categorical variable.

A few weeks before the senatorial election between Senator Smirk and his challenger, former Governor Graft, the senator’s polling organization wants to know where he should concentrate his campaigning. They take simple random samples of potential voters in the southern and northern portions of the state, and ask them if they have decided who to vote for or are still undecided. Here are the results:

Decided on a candidate Still Undecided TOTALS

REGIONS: NORTH 116 60 176

SOUTH 148 52 200

TOTALS 264 112 376 a. Do these data provide convincing evidence that there is a difference in the distribution of voters

who have decided or are still undecided in the two regions? b. The pollsters are concerned that while all 200 people in the ‘south’ sample responded, 24

people (out of the original SRS of 200) in the ‘north’ sample did not respond. Is it possible that the opinions of these people would change the pollsters’ conclusions? What type of error might have been made?

14

FREE RESPONSE #1: “Women & Time” A New York Times poll on women’s issues interviewed 1025 women randomly selected from the United States, excluding Alaska and Hawaii. The poll found that 47% of the women said they do not get enough time for themselves.

a. Construct and interpret a 90% confidence interval that estimates the proportion of women in the United States who do not feel that they get enough time for themselves.

b. Suppose this poll was conducted by telephone calls made from 9 AM to 5 PM. Explain how using this method might result in biased results, and speculate about the direction of the bias on whether the 47% was an overestimate or underestimate of the true proportion.

c. Which type of error was made in this sampling procedure? Sampling error or non-sampling error?

15

FREE RESPONSE #2: “Fuel Efficiency” National Fuelsaver Corporation manufactures the Platinum Gasver, a device they claim “may increase gas mileage by 30%.” Here are the percent changes in gas mileage for 15 identical, randomly selected vehicles, as presented in one of the company’s advertisements:

-2.4 6.9 10.4 10.8 24.8

28.7 33.7 34.6 38.5 28.7

40.2 44.6 46.8 46.9 48.3

a. The sample mean is �̅� = 𝟐𝟗. 𝟒𝟑 and the sample standard deviation is s = 16.23. Calculate and interpret the standard error of the mean for these data.

b. Construct and interpret a 90% confidence interval to estimate the mean change (in percent) in gas mileage. Does this data support the company’s claim? Explain.

16

FREE RESPONSE #3: “Big Box Electronics” Big Box Electronics, a large national chain store, has one store in the city of Kingston. One factor in deciding whether to build a second store in the city is whether the current store is serving all residents equally well, or whether unequal proportions of residents from different parts of town are using the store because its located on one side of town. The national managers of Big Box divide Kingston into four geographical regions and determine the percentage of residents who live in each region. Here’s what they find:

Region NORTH SOUTH EAST WEST

Percentage of the Population

40% 24% 22% 14%

Then the managers take a simple random sample used by a higher proportion of the residents in some parts of town and determine which part of town they come from by asking for their zip code when they are checking out:

Region NORTH SOUTH EAST WEST

Number of shoppers

120 48 62 20

a. Is Kingston’s only Big Box store used by a higher proportion of the residents in some parts of the town than others? Support your answer with an appropriate statistical test.

b. Considering the decision that you made in part (a) regarding the p-value that you obtained, what type of error might have you made? What would be a possible consequence of this error?

17

FREE RESPONSE #4: “Distracted Driving” (from 2007 AP Stats Exam Question #5) Researchers want to determine whether drivers are significantly more distracted while driving when using a cell phone than when talking to a passenger in the car. In a study involving 48 people, 24 people were randomly assigned to drive in a driving simulator while using a cell phone. The remaining 24 were assigned to drive in the driving simulator while talking to a passenger in the simulator. Part of the driving simulation for both groups involved asking drivers to exit the freeway at a particular exit. In the study, 7 of the 24 cell phone users missed the exit, while 2 of the 24 talking to a passenger missed the exit.

a. Would this study be classified as an experiment or an observational study? Provide an explanation to support your answer.

b. State the null and alternative hypotheses of interest to the researchers. c. One test of significance that you might consider using to answer the researchers’ question is a

two-sample z-test for proportions. State the conditions required for this test to be appropriate. Then comment on whether each condition is met.

d. Using an advanced statistical method for small samples to test the hypotheses in part (b), the researchers report a p-value of 0.0683. Interpret, in everyday language, what this p-value measures in the context of this study and state what conclusion should be made based on this p-value.

18

FREE RESPONSE #5: “Estimators” (from 2008 AP Stats Exam Question #2) Four different statistics have been proposed as estimators of a population parameter. To investigate the behavior of these estimators, 500 random samples are selected from a known population and each statistic is calculated for each sample. The true value of the population parameter is 75. The graphs below show the distribution of values for each statistic.

a. Which of the statistics appear to be unbiased estimators of the population parameter? How can you tell?

b. Which of the statistics A or B would be a better estimator of the population parameter? Explain your choice.

c. Which of the statistics C or D would be a better estimator of the population parameter? Explain your choice.

19

FREE RESPONSE #6: “Name Brands Versus Generic Brands” (from 2001 AP Stats Exam Question #5) A growing number of employers are trying to hold down the costs that they pay for medical insurance for their employees. As part of this effort, many medical insurance companies are now requiring clients to use generic brand medicines when filling prescriptions. An independent consumer advocacy group wanted to determine if there was a difference, in milligrams, in the amount of active ingredient between a certain “name” brand drug and its generic counterpart. Pharmacies may store drugs under different conditions. Therefore, the consumer group randomly selected ten different pharmacies in a large city and filled two prescriptions at each of these pharmacies, one for the “name” brand and the other for the generic brand of the drug. The consumer group’s laboratory then tested a randomly selected pill from each prescription to determine the amount of active ingredient in the pill. The results are given in the following table: ACTIVE INGREDIENT (in milligrams)

Pharmacy 1 2 3 4 5 6 7 8 9 10

Name Brand 245 244 240 250 243 246 246 246 247 250

Generic Brand 246 240 235 237 243 239 241 238 238 234

a. Based on these results, what should the consumer group’s laboratory report about the difference in the active ingredient in the two brands of pills? Give appropriate statistical evidence to support your response.

b. Consider the decision that you made in part (a). What type of error might have been made? What might be a possible consequence of that error?

20

FREE RESPONSE #7: “Hospitals” (from 2004 AP Stats Exam Question #5) A rural county hospital offers several health services. The hospital administrators conducted a poll to determine whether the residents’ satisfaction with the available services depends on their gender. A random sample of 1,000 adult county residents was selected. The gender of each respondent was recorded and each was asked whether he or she was satisfied with the services offered by the hospital. The resulting data are shown in the table below:

Male Female Total

Satisfied 384 416 800

Not Satisfied 80 120 200

Total 464 536 1,000

a. Using a significance level of 0.05, conduct an appropriate test to determine if, for adult residents of this county, there is an association between gender and whether or not they were satisfied with services offered by the hospital.

b. Is 𝟖𝟎𝟎

𝟏𝟎𝟎𝟎 a reasonable estimate for the proportion of all adult county residents who are satisfied

with the services offered by this hospital? Explain why or why not.

21

FREE RESPONSE #8: “Foot Length” Can we predict the heights of school-aged children from foot length? Below is computer output from a regression analysis of this relationship for 15 randomly-selected Canadian children from 8 to 15 years old, along with a residual plot. The explanatory variable is each child’s foot length (in centimeters), and the response variable is the child’s height (in centimeters).

a. What is the equation of the least-squares regression line based on these data? Define any

parameters used. Interpret the slope of the regression line. b. Assuming all conditions have been met, construct and interpret a 99% confidence interval for

the slope of the least squares regression of height on foot length. c. If you were to perform a test of the hypotheses 𝑯𝒐: 𝜷 = 𝟎 versus 𝑯𝒂: 𝜷 ≠ 𝟎 at the α = 0.01

level, what would you conclude? Justify your answer by using your result in part (c).

22

FREE RESPONSE #9: “Flu Vaccine” (from 2011 AP Stats Exam Question #5) During a flu vaccine shortage in the United States, it was believed that 45 percent of vaccine-eligible people received flu vaccine. The results of a survey given to a random sample of 2,350 vaccine-eligible people indicated that 978 of the 2,350 people had received flu vaccine.

a. Construct and interpret a 99 percent confidence interval for the proportion of vaccine-eligible people who had received flu vaccine. Use your confidence interval to comment on the belief that 45 percent of the vaccine eligible people had received flu vaccine.

b. Suppose a similar survey will be given to vaccine eligible people in Canada by Canadian health officials. A 99 percent confidence interval for the proportion of people who will have received the flu vaccine is to be constructed. What is the smallest sample size that can be used to guarantee that the margin of error will be less than or equal to 0.02?

23

FREE RESPONSE #10: “Customer Satisfaction” (from 2010 AP Stats Exam Question #6) An automobile company wants to learn about customer satisfaction among the owners of five specific car models. Large sales volumes have been recorded for three of the models, but the other two models were recently introduced so their sales volumes are smaller. The number of new cars sold in the last six months for each of the models is shown in the table below:

Car Model A B C D E TOTAL Number of new cars sold in the last six months

112,338 96,174 83,241 3,278 2,323 297,354

The company can obtain a list of all individuals who purchased new cars in the last six months for each of the five models shown in the table. The company wants to sample 2,000 of these owners.

a. For simple random samples of 2,000 new car owners, what is the expected value of owners of model E and the standard deviation of the number of owners of model E?

b. When selecting a simple random sample of 2,000 new car owners, how likely is it that fewer than 12 owners of model E would be included in the sample? Justify your answers.

c. The company is concerned that a simple random sample of 2,000 owners would include fewer than 12 owners of model D or fewer than 12 owners of model E. Briefly describe a sampling method for randomly selecting 2,000 owners that will ensure at least 12 owners will be selected for each of the 5 car models.

24

FREE RESPONSE #11: “Sewing Machines” (from 2012 AP Stats Exam Question #1) The scatterplot below displays the price in dollars and quality rating for 14 different sewing machines.

a. Describe the nature of the association between price and quality rating for the sewing

machines. b. One of the 14 sewing machines substantially affects the appropriateness of using a linear

regression model to predict quality rating based on price. Report the approximate price and quality rating of that machine and explain your choice.

c. Chris is interested in buying one of the 14 sewing machines. He will consider buying only those machines for which there is no other machine that has both higher quality and lower price. On the scatterplot reproduced below, circle all data points corresponding to machines that Chris will consider buying.

AP Statistics Unit 4 Concepts (Inference) (Chapter 8, 9, 10, 11, 12) · 2017. 1. 18. · 3 CLIFF...

Documents

Transcript of AP Statistics Unit 4 Concepts (Inference) (Chapter 8, 9, 10, 11, 12) · 2017. 1. 18. · 3 CLIFF...