Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large...

27
1 AP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our questions. Two ways to study populations are: 1. Observational Study 2.Experiment Two ways to produce data to study the populations are: 1. Census 2. Sample Most of the time studies involve the use of samples rather than populations because of time and cost restraints. Three types of samples used are: 1. Voluntary Response 2. Convenience 3. Probability

Transcript of Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large...

Page 1: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

1

AP Statistics – 4.1 Producing Data

In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our questions.

Two ways to study populations are:

1. Observational Study

2. Experiment

Two ways to produce data to study the populations are:

1. Census

2. Sample

Most of the time studies involve the use of samples rather than populations because of time and cost restraints.

Three types of samples used are:

1. Voluntary Response

2. Convenience

3. Probability

The best type of sample is a __________________ ___ because it avoids bias which leads to incorrect conclusions about a population of interest.

What is bias?

Page 2: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

2

Three types of probability samples that avoid bias are:

1. Simple Random Sample (SRS)

2. Stratified Random Sample

3. Cluster Sample

Types of bias that might occur are:

1. Undercoverage

2. Nonresponse

3. Response

4. Wording

Page 3: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

3

Example Problems

1. A university’s financial aid office wants to know how much it can expect students to earn from summer employment. This information will be used to set the level of financial aid. The population contains 3478 students who have completed at least one year of study but have not yet graduated. A questionnaire will be sent to an SRS of 100 of these students, drawn from an alphabetized list.

(a) Describe how you will select the sample. Follow the four steps: label; table; stopping rule; and identify sample.

(b) Use the portion of the random digits table below to select the first five students in the sample.

95592 94007 69971 91481 60779 53791 17297 59335

68417 35013 15529 72765 85089 57067 50211 47487

2. You’re in college now, and you want to investigate the attitudes of students at your school toward the faculty’s commitment to teaching. The Student Government will pay the costs of contacting about 500 students.

(a) Specify the exact population for your study; for example, will you include part-time students?

(b) Describe your sampling design. Will you use a stratified sample?

(c) Briefly discuss the practical difficulties that you anticipate; for example, how will you contact the students in your sample?

Page 4: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

4

Why Random Sampling??

1. Use the map to choose a sample of 5 states to estimate the average (mean) land area. You have15 seconds.

2. Refer to the table of land areas provided. Find the mean land area for your sample.

3. Make a class dotplot of the mean land areas from Step 2.

4. Now choose an SRS of 5 states. Find the mean land area for this sample.

5. Make a class dotplot of the mean land areas from Step 4 right above your dotplot from Step 2.

6. How do the class’s estimates using the two methods compare? What advantage(s) does randomsampling provide?

Page 5: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

5

When an SRS might not be best…

A British farmer grows sunflowers for making sunflower oil. Her field is arranged in a grid pattern, with 10 rows and 10 columns as shown. Irrigation ditches run along the top and bottom of the field, as shown. The farmer would like to estimate the number of healthy plants in the field so she can project how much money she’ll make from selling them. It would take too much time to count the plants in all 100 squares, so she’ll accept an estimate based on a sample of 10 squares.

1. Take a simple random sample of 10 grid squares. Use the table provided to calculate your estimate of the mean number of sunflowers per square.

2. Take a stratified random sample using the rows as strata. Use the table provided to calculate your estimate of the mean number of sunflowers per square.

3. Take a stratified random sample using the columns as strata. Use the table provided to calculate your estimate of the mean number of sunflowers per square.

Page 6: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

6

Page 7: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

7

4.2: Experimental Design

Observational Study – observes individuals and measures variables of interest but does not attempt to influence responses.

Experiment – deliberately imposes some treatment on individuals in order to observe their responses. This is the only source of understanding cause and effect.

Vocabulary of Experiments:

1. Experimental Units –

2. Subjects –

3. Explanatory Variable –

4. Factors –

5. Levels –

6. Treatment –

7. Control group –

8. Placebo –

9. Replication –

10. Randomization –

11. Principles of Experimental DesignA.B.C.

12. Statistically significant –

13. Completely randomized design –

14. Block design –

15. Block –

16. Matched pairs design –

17. Double blind experiment –

18. Lack of realism –

Page 8: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

8

Page 9: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

9

What is the difference?Two types of treating wood for preservation are available. One uses chemicals and the other uses heat. After treating the wood, all of it will be exposed to elements such as bugs, water, and severe temperatures. Then the wood will be tested for its resistance to these elements. There are 100 boards of pine wood available, all of equal length, width, and depth. Because there were not enough boards at any one store, the boards were acquired from 4 different building supply stores, 25 from each. You will design 3 different methods of carrying out this experiment. You may draw a diagram or explain in paragraph form but be sure to include all necessary information.

1. Completely Randomized design

2. Randomized Block design

3. Matched Pairs design

Page 10: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

10

1997 Free Response – Question # 2 – Fish Tank

A new type of fish food has become available for salmon raised on fish farms. Your task is to design an experiment to compare the weight gain of salmon raised over a six-month period on the new and the old types of food. The salmon you will use for this experiment have already been randomly placed in eight large tanks in a room that has a considerable temperature gradient. Specifically, tanks on the north side of the room tend to be much colder than those on the south side. The arrangement of tanks is shown below.

Describe a design for this experiment that takes into account the temperature gradient.

Page 11: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

11

2000 – Free Response – Question 5

High Cholesterol level in people can be reduced by exercise or by drug treatment. A pharmaceutical company developed a new cholesterol-reducing drug. Researchers would like to compare its effects to the effects of the cholesterol-reducing drug that is currently available on the market. Volunteers who have a history of high cholesterol and who are currently not on medication will be recruited to participate in a study.

a. Explain how you would carry out a completely randomized experiment for the study.

b. Describe an experimental design that would improve the design in (a) by incorporating blocking.

c. Can the experimental design in (b) be carried out in a double blind manner? Explain.

Page 12: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

12

2004 Free Response (form B) – Dining Hall Study

At a certain university, students who live in the dormitories eat at a common dining hall. Recently, some students have been complaining about the quality of the food served there. The dining hall manager decided to do a survey to estimate the proportion of students living in the dormitories who think that the quality of the food should be improved. One evening, the manager asked the first 100 students entering the dining hall to answer the following question.

Many students believe that the food served in the dining hall needs improvement. Do you think that the quality of food served here needs improvement, even though that would increase the cost of the meal plan?

_______Yes ______No ______No opinion

a.) In this setting, explain how bias may have been introduced based on the way this convenience sample was selected and suggest how the sample could have been selected differently to avoid bias.

b.) In this setting, explain how bias may have been introduced based on the way the question was worded and suggest how it could have been worded differently to avoid bias.

Page 13: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

13

2001 Free Response – Trees

Students are designing an experiment to compare the productivity of two varieties of dwarf fruit trees. The site for the experiment is a field that is bordered by a densely forested area on the west (left) side. The field has been divided into eight plots of approximately the same area. The students have decided that the test plots should be blocked. Four trees, two of each of the two varieties, will be assigned at random to the four plots within each block, with one tree planted in each plot.

The two blocking schemes shown below are under consideration. For each scheme, one block is indicated by the white region and the other block is indicated by the gray region in the figures.

a.) Which of the blocking schemes, A or B, is better for this experiment? Explain your answer.

b.) Even though the students have decided to block, they must randomly assign the varieties of trees to the plots within each block. What is the purpose of this randomization in the context of this experiment?

Page 14: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

14

1999 Free Response – Cavities

The dentists in a dental clinic would like to determine if there is a difference between the number of new cavities in people who eat an apple a day and in people who eat less than one apple a week. They are going to conduct a study with 50 people in each group.

Fifty clinic patients who report that they routinely eat an apple a day and 50 clinic patients who report that they eat less than one apple in a week are identified. The dentists will examine the patients and their records to determine the number of new cavities the patients have had over the past two years. They will then compare the new cavities in the two groups.

a.) Explain why is this scenario is an observational study and not an experiment.

b.) Explain the concept of confounding in the context of this study. Include an example of a possible confounding variable.

c.) If the mean number of cavities for those who ate an apple a day was statistically significantly smaller than the mean number of new cavities for those who ate less than one apple a week, could one conclude that the lower number of new cavities can be attributed to eating an apple a day?

Page 15: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

15

2002 Free Response – Question 2 – Boots

A manufacturer of boots plans to conduct an experiment to compare a new method of waterproofing to the current method. The appearance of the boots is not changed by either method. The company recruits 100 volunteers in Seattle, where it rains frequently, to wear the boots as they normally would for 6 months. At the end of 6 months, the boots will be returned to the company to be evaluated for water damage.

a.) Describe a design for this experiment that uses the 100 volunteers. Include a few sentences on how it would be implemented.

b.) Could your design be double blind?

Page 16: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

16

2002 # 3 – Magnets

A preliminary study conducted at a medical center in St. Louis has shown that treatment with small, low-intensity magnets reduces the self-reported level of pain in polio patients. During each session, a patient rested on an examining table in the doctor’s office while the magnets, embedded in soft pads, were strapped to the body at the site of pain. Sessions continued for several weeks, after which pain reduction was measured.

A new study is being designed to investigate whether magnets also reduce pain in patients suffering from herniated disks in the lower back. One hundred male patients are available for the new study.

(a) Describe an appropriate design for the new study. Your discussion should briefly address treatments used, methods of treatment assignment, and what variables would be measured. Do not describe how the data would be analyzed.

(b) Would you modify the design above if, instead of 100 male patients, there were 50 female and 50 male patients available for the study? If so, how would you modify your design? If not, why not?

Page 17: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

17

2003 # 4 Coffee and Cholesterol

There have been many studies recently concerning coffee drinking and cholesterol level. While it is known that several coffee-bean components can elevate blood cholesterol level, it is thought that a new type of paper filter may reduce the presence of some of these components in coffee.

The effect of the new filter on cholesterol level will be studied over a 10-week period using 300 nonsmokers who each drink 4 cups of caffeinated coffee per day. Each of these 300 participants will be assigned to one of two groups: the experimental group, who will only drink coffee that has been made with the new filter, or the control group, who will only drink coffee that has been made with the standard filter. Each participant’s cholesterol level will be measured at the beginning and at the end of the study.

(a) Describe an appropriate method for assigning the subjects to the two groups so that each group will have an equal number of subjects.

(b) In this study, the researchers chose to include a group who only drank coffee that was made with the standard filter. Why is it important to include a control group in this study even though cholesterol levels will be measured at the beginning and end of the study?

(c) Why would the researchers choose to use only nonsmokers in the study?

Page 18: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

18

2005 #1 – Caloric Intake of Adolescents

The goal of a nutritional study was to compare the caloric intake of adolescents living in rural areas of the United States with the caloric intake of adolescents living in urban areas of the United States. A random sample of ninth-grade students from one high school in a rural area was selected. Another random sample of ninth graders from one high school in an urban area was also selected. Each student in each sample kept records of all the food he or she consumed in one day.

The back-to-back stemplot below displays the number of calories of food consumed per kilogram of body weight for each student on that day.

(a) Write a few sentences comparing the distribution of the daily caloric intake of ninth-grade students in the rural high school with the distribution of daily caloric intake of ninth-grade students in the urban high school.

(b) Is it reasonable to generalize the findings of this study to all rural and urban ninth-grade students in the United States? Explain.

(c) Researchers who want to conduct a similar study are debating which of the following two plans to use.

Plan I: Have each student in the study record all the food he or she consumed in one day. Then Researchers would compute the number of calories per kilogram of body weight for each student for that day.

Plan II: Have each student in the study record all the food he or she consumed over the same 7-day period. Then researchers would compute the average daily number of calories of food consumed per kilogram of body weight for each student during the 7-day period.

Assuming that the students keep accurate records, which plan, I or II, would better meet the goal of the study? Justify your answer.

Page 19: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

19

AP Stats: Chapter 4 Practice Test

T4.1. When we take a census, we attempt to collect data from

(a) a stratified random sample.(b) every individual chosen in a simple random sample.(c) every individual in the population.(d) a voluntary response sample.(e) a convenience sample.

T4.2. You want to take a simple random sample (SRS) of 50 of the 816 students who live in a dormitory on campus. You label the students 001 to 816 in alphabetical order.In the table of random digits, you read the entries

95592   94007   69769   33547   72450   16632   81194   14873

The first three students in your sample have labels

(a) 955, 929, 400.(b) 400, 769, 769.(c) 559, 294, 007.(d) 929, 400, 769.(e) 400, 769, 335.

T4.3. A study of treatments for angina (pain due to low blood supply to the heart)compared bypass surgery, angioplasty, and use of drugs. The study looked at the medical records of thousands of angina patients whose doctors had chosen one of these treatments. It found that the average survival time of patients given drugs was the highest. What do you conclude?

(a) This study proves that drugs prolong life and should be the treatment of choice.(b) We can conclude that drugs prolong life because the study was a comparative experiment.(c) We can’t conclude that drugs prolong life because the patients were volunteers.(d) We can’t conclude that drugs prolong life because this was an observational study.(e) We can’t conclude that drugs prolong life because no placebo was used.

T4.4. A simple random sample (SRS) is

(a) any sample selected by using chance.(b) any sample that gives every individual the same chance to be selected.(c) a sample that gives every possible sample of the same size the same chance to be selected.(d) a sample that doesn’t involve strata or clusters.(e) a sample that is guaranteed to be representative of the population.

T4.5. Consider an experiment to investigate the effectiveness of different insecticides in controlling pests and their impact on the productivity of tomato plants. What is the best reason for randomly assigning treatment levels (spraying or not spraying) to the experimental units (farms)?

(a) Random assignment makes the experiment easier to conduct since we can apply the insecticide in any pattern rather than in a systematic fashion.(b) Random assignment will tend to average out all other uncontrolled factors such as soil fertility so that they are not confounded with the treatment effects.(c) Random assignment makes the analysis easier since the data can be collected and entered into the computer in any order.(d) Random assignment is required by statistical consultants before they will help you analyze the experiment.(e) Random assignment implies that it is not necessary to be careful during the experiment, during data collection, and during data analysis.

T4.6. The most important advantage of experiments over observational studies is that

(a) experiments are usually easier to carry out.(b) experiments can give better evidence of causation.(c) confounding cannot happen in experiments.(d) an observational study cannot have a response variable.(e) observational studies cannot use random samples.

T4.7. A TV station wishes to obtain information on the TV viewing habits in its market area. The market area contains one city of population 170,000, another city of 70,000,and four towns of about 5000 inhabitants each. The station

Page 20: Name · Web viewAP Statistics – 4.1 Producing Data In order to answer questions about a large group of individuals (people or objects), we must produce data that will answer our

20

suspects that the viewing habits may be different in larger and smaller cities and in the rural areas. Which of the following sampling designs would give the type of information that the station requires?

(a) A cluster sample using the cities and towns as clusters(b) A convenience sample from the market area(c) A simple random sample from the whole market area(d) A stratified sample from the cities and towns in the market area(e) An online poll that invites all people from the cities and towns in the market area to participate

T4.8. Bias in a sampling method is

(a) any error in the sample result, that is, any deviation of the sample result from the truth about the population.(b) the random error due to using chance to select a sample.(c) any error due to practical difficulties such as contacting the subjects selected.(d) any systematic error that tends to occur in the same direction whenever you use this sampling method.(e) racism or sexism on the part of those who take the sample.

T4.9. You wonder if TV ads are more effective when they are longer or repeated more often or both. So you design an experiment. You prepare 30-second and 60-second ads for a camera. Your subjects all watch the same TV program, but you assign them at random to four groups. One group sees the 30-second ad once during the program;another sees it three times; the third group sees the 60-second ad once; and the last group sees the 60-second ad three times. You ask all subjects how likely they are to buy the camera.

(a) This is a randomized block design, but not a matched pairs design.(b) This is a matched pairs design.(c) This is a completely randomized design with one explanatory variable (factor).(d) This is a completely randomized design with two explanatory variables (factors).(e) This is a completely randomized design with four explanatory variables (factors).

T4.10. A researcher wishes to compare the effects of 2 fertilizers on the yield of soybeans. She has 20 plots of land available for the experiment, and she decides to use a matched pairs design with 10 pairs of plots. To carry out the random assignment for this design, the researcher should

(a) use a table of random numbers to divide the 20 plots into 10 pairs and then, for each pair, flip a coin to assign the fertilizers to the 2 plots.(b) subjectively divide the 20 plots into 10 pairs (making the plots within a pair as similar as possible) and then, for each pair, flip a coin to assign the fertilizers to the 2 plots.(c) use a table of random numbers to divide the 20 plots into 10 pairs and then use the table of random numbers a second time to decide upon the fertilizer to be applied to each member of the pair.(d) flip a coin to divide the 20 plots into 10 pairs and then, for each pair, use a table of random numbers to assign the fertilizers to the 2 plots.(e) use a table of random numbers to assign the 2 fertilizers to the 20 plots and then use the table of random numbers a second time to place the plots into 10 pairs.

T4.11. You want to know the opinions of American high school teachers on the issue of establishing a national proficiency test as a prerequisite for graduation from high school. You obtain a list of all high school teachers belonging to the National Education Association (the country’s largest teachers’ union) and mail a survey to a random sample of 2500 teachers. In all, 1347 of the teachers return the survey. Of those who responded, 32% say that they favor some kind of national proficiency test. Which of the following statements about this situation is true?

(a) Since random sampling was used, we can feel confident that the percent of all American high school teachers who would say they favor a national proficiency test is close to 32%.(b) We cannot trust these results, because the survey was mailed. Only survey results from face-to-face interviews are considered valid.(c) Because over half of those who were mailed the survey actually responded, we can feel pretty confident that the actual percent of all American high school teachers who would say they favor a national proficiency test is close to 32%.(d) The results of this survey may be affected by nonresponse bias.(e) The results of this survey cannot be trusted due to voluntary response bias.