Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing...

20
Describing Distribution Numerically 3. Summaries. Here are costs of 10 electric smoothtop ranges rated very good or excellent by Consumer Reports in August 2002. $850 900 1400 1200 1050 1000 750 1250 1050 565 Find these statistics by hand (no calculator!): a) mean b) median and quartiles c) range and IQR 4. More summaries. Here are the annual numbers of deaths from tornadoes in the United States from 1990 through 2000. (Source: NOAA) 53 39 39 33 69 30 25 67 130 94 40 Find these statistics by hand (no calculator!): a) mean b) median and quartiles c) range and IQR 5. Mistake. A clerk entering salary data into a company spreadsheet accidentally put an extra "0" in the boss's salary, listing it as $2,000,000 instead of $200,000. Explain how this error will affect these summary statistics for the company payroll: a) measures of center: median and mean. b) measures of spread: range, IQR, and standard deviation. 6. Sick days. During contract negotiations a company seeks to change the number of sick days employees may take, saying that the annual "average" is 7 days of absence per employee. The union negotiators counter that the "average" employee misses only 3 days of work each year. Explain how both sides might be correct, identifying the measure of center you think each side is using and why the stated difference might exist. 7. Payroll. A small warehouse employs a supervisor at $1200 a week, an inventory manager at $700 a week, six stock boys at $400 a week, and four drivers at $500 a week. a) Find the mean and median wage. b) How many employees earn more than the mean wage? c) Which measure of center best describes a typical wage at this company, the mean or the median? d) Which measure of spread would best describe the payroll, the range, the IQR, or the standard deviation? Why? 8. Singers. The frequency table shows the heights (in inches) of 130 members of a choir. Height Count Height Count 60 2 69 5 61 6 70 11 62 9 71 8 63 7 72 9 64 5 73 4 65 20 74 2 66 18 75 4 67 7 76 1 68 12 a) Find the five-number summary for these data. b) Display these data with a boxplot. c) Find the mean and standard deviation. d) Display these data with a histogram. e) Write a few sentences describing the distribution of heights. 9. Standard deviation. For each lettered part, a through c, examine the two given sets of numbers. Without doing any calculations, decide which set has the larger standard deviation and explain why. Then check by finding the standard deviations by hand. Set 1 Set 2 a) 3, 5, 6, 7, 9 2, 4, 6, 8, 10 b) 10, 14, 15, 16, 20 10, 11, 15, 19, 20 c) 2, 6, 6, 9, 11, 14 82, 86, 86, 89, 91, 94 10. Standard deviation. For each lettered part, a through c, examine the two given sets of numbers. Without doing any calculations, decide which set has the larger standard deviation and explain why. Then check by finding the standard deviations by hand.

Transcript of Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing...

Page 1: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

3. Summaries. Here are costs of 10 electric smoothtop ranges rated very good or excellent by Consumer Reports in August 2002.

$850 900 1400 1200 1050 1000 750 1250 1050 565 Find these statistics by hand (no calculator!): a) mean b) median and quartiles c) range and IQR 4. More summaries. Here are the annual numbers of deaths from tornadoes in the United States from 1990 through

2000. (Source: NOAA) 53 39 39 33 69 30 25 67 130 94 40

Find these statistics by hand (no calculator!): a) mean b) median and quartiles c) range and IQR 5. Mistake. A clerk entering salary data into a company spreadsheet accidentally put an extra "0" in the boss's salary,

listing it as $2,000,000 instead of $200,000. Explain how this error will affect these summary statistics for the company payroll:

a) measures of center: median and mean. b) measures of spread: range, IQR, and standard deviation. 6. Sick days. During contract negotiations a company seeks to change the number of sick days employees may take,

saying that the annual "average" is 7 days of absence per employee. The union negotiators counter that the "average" employee misses only 3 days of work each year. Explain how both sides might be correct, identifying the measure of center you think each side is using and why the stated difference might exist.

7. Payroll. A small warehouse employs a supervisor at $1200 a week, an inventory manager at $700 a week, six stock boys at $400 a week, and four drivers at $500 a week.

a) Find the mean and median wage. b) How many employees earn more than the mean wage? c) Which measure of center best describes a typical wage at this company, the mean or the median? d) Which measure of spread would best describe the payroll, the range, the IQR, or the standard deviation? Why? 8. Singers. The frequency table shows the heights (in inches) of 130 members of a choir.

Height Count Height Count60 2 69 5 61 6 70 11 62 9 71 8 63 7 72 9 64 5 73 4 65 20 74 2 66 18 75 4 67 7 76 1 68 12

a) Find the five-number summary for these data. b) Display these data with a boxplot. c) Find the mean and standard deviation. d) Display these data with a histogram. e) Write a few sentences describing the distribution of heights. 9. Standard deviation. For each lettered part, a through c, examine the two given sets of numbers. Without doing any

calculations, decide which set has the larger standard deviation and explain why. Then check by finding the standard deviations by hand.

Set 1 Set 2 a) 3, 5, 6, 7, 9 2, 4, 6, 8, 10 b) 10, 14, 15, 16, 20 10, 11, 15, 19, 20 c) 2, 6, 6, 9, 11, 14 82, 86, 86, 89, 91, 94

10. Standard deviation. For each lettered part, a through c, examine the two given sets of numbers. Without doing any calculations, decide which set has the larger standard deviation and explain why. Then check by finding the standard deviations by hand.

Page 2: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

Set 1 Set 2 a) 4, 7, 7, 7, 10 4, 6, 7, 8, 10 b) 100, 140, 150, 160, 200 10, 50, 60, 70, 110 c) 10, 16, 18, 20, 22, 28 48, 56, 58, 60, 62, 70

11. Home runs. In 1961, Roger Maris made baseball headlines by hitting 61 home runs, breaking a famous record held by Babe Ruth. Here are Maris's home run totals for his 10 seasons in the American League. Would you consider his record-setting year to be an outlier? Explain.

8, 13, 14, 16, 23, 26, 28, 33, 39, 61 12. Camp sites. Shown below are the histogram and summary statistics for the number of camp sites at public parks in

Vermont. a) Which statistics would you use to identify the center and spread of this distribution? Why? b) How many parks would you classify as outliers? Explain. c) Create a boxplot for these data. d) Write a few sentences describing the distribution.

Count 46 Mean 62.78 sites Median 43.50 StdDev 56.196 Min 0 Max 275 Lower Q 28 Upper Q 78

13. Marriage age. Do men and women marry at the same age? Here are boxplots of the age at first marriage for a sample of U.S. citizens. Write a brief report discussing what these data show.

14. Fuel economy. Describe what these boxplots tell you about the relationship between the number of cylinders a car's

engine has and the car's fuel economy (mpg).

Page 3: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

15. Grapes. The boxplots display case prices {in dollars) of varieties of grapes produced by vineyards along three of the Finger Lakes.

a) Which lake region produces the most expensive grape? b) Which lake region produces the cheapest grape? c) In which region are the grapes generally more expensive? d) Write a few sentences describing these grape prices. 16. Ozone. Ozone levels (in parts per million, ppm) were recorded at sites in New Jersey monthly between 1926 and

1971. Here are boxplots of (he data for each month (over the 46 years) lined up in order (January = 1).

a) In what month was the highest ozone level ever recorded? b) Which month has the largest 1QR? c) Which month has the smallest range? d) Write a brief comparison of the ozone levels in January and June, e) Write a report on the annual patterns you see in the ozone levels. 17. Wild card Summer Olympics. Seventy-one swimmers finished the qualifying first day of (he men's 100-m swim in

Sydney. The average time was 52.65 seconds with a standard deviation of 7.66 seconds. The median time was 51.34 seconds and the IQR was 2.58 seconds.

a) Without looking at a graphical display, what shape would you expect the distributions of times to have? b) What might account for the difference between these two sets of statistics? c) Here is the histogram of the actual times. Write a couple of sentences summarizing what you see.

18. Unemployment. In May of 2001, the U.S. Bureau of Labor Statistics (BLS) issued a news release that said, in part: In April, 223 metropolitan areas recorded unemployment rates below) the U.S. average of 4.2 percent (not

seasonally adjusted), while 99 areas registered higher rates. Sketch what the distribution of unemployment rates for the 322 metropolitan areas reported on by BLS probably

looks like. 19. Test scores. Three Statistics classes all took the same test. Histograms of the scores for each class are shown below.

Page 4: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

a) Which class had the highest mean score? b) Which class had the highest median score? c) For which class are the mean and median most different? Which is higher? Why? d) Which class had the smallest standard deviation? e) Which class had the smallest IQR? 20. Test scores. Look again at the histograms of test scores for the three Statistics classes in Exercise 19. a) Overall, which class do you think performed better on the test? Why? b) How would you describe the shape of each distribution? c) Match each class with the corresponding boxplot below.

21. Still rockin'. On page 60, you read about the 66 deaths attributed to "crowd crush" at rock concerts during the years

1999 and 2000. Here are the histogram and boxplot of the victims' ages that we saw earlier:

a) What features of the distribution can you see in both the histogram and the boxplot? b) What features of the distribution can you see in the histogram that you could not see in the boxplot? c) What summary statistic would you choose to summarize the center of this distribution? Why? d) What summary statistic would you choose to summarize the spread of this distribution? Why? 22. Golf courses. One measure of the difficulty of a golf course is its length: the total distance {in yards) from tee to

hole for all 18 holes. Below are the histogram and summary statistics for the lengths of all the golf courses in Vermont.

Mean 5892.91 yd

Page 5: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

StdDev 386.591 Min 5185 Q1 5585.75 Median 5928 Q3 6131 Max 6796

a) What is the range of these lengths? b) Between what lengths do the central 50% of these courses lie? c) What summary statistics would you use to describe these data? d) Write a brief description of these data (shape, center, and spread). 23. Graduation? A survey of major universities asked what percentage of incoming freshmen usually graduate in 4

years. Use the summary statistics given to answer these questions. % on timeCount 48 Mean 68.35 Median 69.90 StdDev 10.196 Min 43.20 Max 87.40 Range 44.20 25th %tile 59.15 75th %tile 74.75

a) Would you describe this distribution as symmetric or skewed? Explain. b) Are there any outliers? Explain. c) Create a boxplot of these data. d) Write a few sentences about the graduation rates. 24. Wineries. Here are summary statistics for the sizes (in acres) of Finger Lakes wineries.

Count 36 Mean 46.50 acresStdDev 47.76 Median 33.50 IQR 36.50 Min 6 Q1 18.50 Q3 55 Max 250

a) Would you describe this distribution as symmetric or skewed? Explain. b) Are there any outliers? Explain. c) Create a boxplot of these data. d) Write a few sentences about the sizes of the wineries. 25. Caffeine. A student study of the effects of caffeine asked volunteers to take a memory test 2 hours after drinking

soda. Some drank caffeine-free cola, some drank regular cola (with caffeine), and others drank a mixture of the two (getting a half-dose of caffeine). Here are the five-number summaries for each group's scores (number of items recalled correctly) on the memory test:

n Min Q1 Median Q3 MaxNo caffeine 15 16 20 21 24 26 Low caffeine 15 16 18 21 24 27 High caffeine 15 12 17 19 22 24

a) Describe the W's for these data: Who, What, When, Where, Why, How. b) Name the variables and classify each as categorical or quantitative. c) Create parallel boxplots to display these results as best you can with this information. d) Write a few sentences comparing the performances of the three groups.

Page 6: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

26. Rainmakers? In an experiment to determine whether seeding clouds with silver iodide increases rainfall, 52 clouds were randomly assigned to be seeded or not. The amount of rain they generated was then measured (in acre-feet).

n Mean Median SD IQR Q1 Q3 Un seeded 26 164.59 44.20 278.43 138.60 24.40 163 Seeded 26 441.98 221.60 650.79 337.60 92.40 430

27. States. The stem-and-leaf display shows populations of the 50 states and Washington, D.C., in millions of people, according to the 2000 census.

3 4 2 2 1 1 69 1 0122 0 5555666667888 0 111111111111122222333333344444

State Populations (1|2 = 12 million) a) What measures of center and spread are most appropriate? b) Without doing any calculations, which must be larger—the median or the mean? Explain how you know. c) From the stem-and-leaf display, find the median and the interquartile range. d) Write a few sentences describing this distribution. 28. Population growth. The back-to-back stem-and-leaf display compares the percentage change between the 1990

census and 2000 census in the populations of northeastern and midwestern states with the changes in population of southern and western states. The fastest growing states were Nevada at 66% and Arizona at 40%. Use the data displayed in the stem-and-leaf display to construct comparative boxplots.

6 6 6 5 5 4 4 0 3 3 001 2 6 2 001134 1 578

2100 1 00113444499998876655 0 6999

1344 0 1 NE/MW

States S/W States

a) Which of the summary statistics are most appropriate for describing these distributions? Why? b) Do you see any evidence that seeding clouds may be effective? Explain. 29. Derby speeds. How fast do horses run? Kentucky Derby winners top 30 miles per hour, as shown in the graph

below. In fact, this graph shows the percentage of Derby winners that have run slower than a given speed. Note that few have won running less than 33 miles per hour, but about 95% of the winning horses have run less than 37 miles per hour. (A cumulative frequency graph like this is called an "ogive.")

Page 7: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

a) Estimate the median winning speed. b) Estimate the quartiles. c) Estimate the range and the 1QR. d) Create a boxplot of these speeds. e) Write a few sentences about the speeds of the Kentucky Derby winners. 30. Cholesterol. The Framingham Heart Study recorded the cholesterol levels of more than 1400 men. Here is an ogive

of the distribution of these cholesterol measures. (Recall that an ogive shows the percentage of cases at or below a certain value.) Construct a boxplot for these data and write a few sentences describing the distribution.

31. Reading scores. A class of fourth graders takes a diagnostic reading test, and the scores are reported by reading

grade level. The five-number summaries for the 14 boys and 11 girls are shown: Boys: 2.0 3.9 4.3 4.9 6.0 Girls: 2.8 3.8 4.5 5.2 5.9

a) Which group had the highest score? b) Which group had the greatest range? c) Which group had the greatest interquartile range? d) Which group's scores appear to be more skewed? Explain. e) Which group generally did better on the test? Explain. f) If the mean reading level for boys was 4.2 and for girls was 4.6, what is the overall mean for the class? 32. SAT scores. Here are the summary statistics for Verbal SAT scores for a high school graduating class.

n Mean Median SD Min Max Q1 Q3Male 80 590 600 97.2 310 800 515 650

Female 82 602 625 102.0 360 770 530 680 a) Create parallel boxplots comparing the scores of boys and girls as best you can from the information given. b) Write a brief report on these results. Be sure to discuss shape, center, and spread of the scores. 33. Phone calls. In an advertisement in USA Today (July 9, 2001), the company Net2Phone listed its long distance

rates to 24 of the 250 countries to which it offers service. Country Cost per minute (cents) Country Cost per minute (cents)Belgium 7.9 Italy 9.9 Chile 17 Japan 7.9 Canada 3.9 Mexico 16

Page 8: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

Colombia 9.9 Pakistan 49 Dominican Republic 15 Philippines 21 Finland 9.9 Puerto Rico 6.9 France 7.9 Singapore 11 Germany 7.9 South Korea 9.9 Hong Kong 7.9 Taiwan 9.9 India 49 United Kingdom 7.9 Ireland 7.9 United States 3.9 Israel 8.9 Venezuela 22

a) Make a display of these rates. b) Find the mean and the median. Which is a more appropriate measure of center? c) Find the 1QR and the standard deviation. Which is the more appropriate measure of spread? d) Would you consider any of these to be outliers? Carefully explain how you reached your decision. e) Write a brief description of these rates. Don't forget to mention shape, center, and spread as well as any unusual

features of the distribution. f) What can you conclude about Net2Phone's rates to the 250 countries the ad says they service? 34. Job growth. In 1996 the firm Standard and Poor's DR1 predicted that the cities listed below would experience (he

fastest growing job markets in the United States over the next 3 years and predicted their growth rates, given here. City Growth(%)Las Vegas, NV-AZ 3.72 Raleigh-Durham-Chapel Hill, NC 2.69 Austin-San Marcos, TX 2.64 Riverside-San Bernardino, CA 2.62 Boise, ID 2.61 Orlando, FL 2.51 Phoenix-Mesa, AZ 2.44 West Palm Beach-Boca Raton, FL 2.37 Sacramento, CA 2.26 Atlanta, GA 2.25 Sarasota-Bradenton, FL 2.22 Portland-Vancouver, OR-WA 2.16 Fort Lauderdale, FL 2.13 Charlotte-Gastonia-Rock Hill, NC-SC 2.07 Tucson, AZ 2.07 Vallejo-Fairfield-Napa, CA 2.02 Omaha, NE-IA 1.93 Salt Lake City-Ogden, UT 1.90 Albuquerque, NM 1.87 Fort Worth-Arlington, TX 1.86

a) Make a suitable display of the growth rates. b) Summarize the central growth rate with a median and mean. Why do they differ? c) Given what you know about the distribution, which of these measures does the better job of summarizing the

growth rates? Why? d) Summarize the spread of the growth rate distribution with a standard deviation and with an IQR. e) Given what you know about the distribution, which of these measures does the better job of summarizing the

growth rates? Why? f) Suppose we subtract from each of these growth rates the predicted U.S. average growth rate of 1.20%, so that we

could look at how much these growth rates exceed the U.S. rate. How would this change the values of the summary statistics you calculated above? (Hint: You need not recompute any of the summary statistics from scratch.)

g) If we were to omit Las Vegas from the data, how would you expect the mean, median, standard deviation, and IQR to change? Explain your expectations for each.

h) Write a brief report about these growth rates. 35. Math scores. The National Center for Education Statistics reported 1999 average mathematics achievement scores

for eighth graders in 38 nations. Singapore led the group, with an average score of 604, while South Africa had the lowest average of 275. The United States scored 502. The average score for each nation are given below.

Page 9: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

604 587 585 582 579 558 540 534 532 531 530 526 525 520 520 519 511 505 502 496 491 482 479 476 472 469 467 466 448 447 429 428 422 403 392 345 337 275

a) Find the five-number summary, the IQR, the mean, and the standard deviation of these national averages. b) Write a brief summary of the performance of eighth graders worldwide. Be sure to comment on the performance

of the United States. 36. Prisons. A report from the U.S. Department of Justice gave the following percent increases in federal prison

populations in 20 northeastern and midwestern states during 1999. 5.9, 1.3, 3.0, 5.9, 4.5, 5.6, 2.1, 6.3, 4.8, 6.9, 4.5, 3.5, 7.2, 6.4, 5.5, 5.3,8.0, 4.4, 7.2, 3.2

a) Graph these data. b) Calculate appropriate summary statistics. c) Write a few sentences about these data. (Remember: shape, center, spread, unusual features.) 37. Gasoline usage. The U.S. Department of Transportation collects data on the amount of gasoline sold in each state.

The following data show the per capita (gallons used per person) consumption in the year 2000. Using appropriate graphical displays and summary statistics, write a report on the gasoline use by state in the year 2000.

Alabama 544.71 Hawaii 327.27 Mass 438.1 NM 474.28 SD 555.06Alaska 433.08 Idaho 500.34 Michigan 502.77 New York 551.18 TN 586.58Arizona 452.82 Illinois 406.66 MN 528.06 NC 296.66 Texas 515.17Arkansas 532.96 Indiana 518.7 MS 559.29 ND 513.3 Utah 498.66CA 422.65 Iowa 534.7 Missouri 563.56 Ohio 574.83 Vermont 456.27Colorado 461.90 Kansas 511.34 Montana 548.5 OK 457.63 Virginia 584.03CT 431.04 Kentucky 510.9 Nebraska 508.28 Oregon 520.42 WA 506.92Delaware 481.45 LA 522.12 Nevada 446.17 PA 441.44 WV 450.4 Florida 542.36 Maine 542.36 NH 542.86 Rl 410.31 WI 462 Georgia 452.82 Maryland 452.82 NJ 474.28 SC 381.86 Wyoming 462.67

38. Industrial experiment. Engineers at a computer production plant tested two methods for accuracy in drilling holes into a PC board. They tested how fast they could set the drilling machine by running 10 boards at each of two different speeds. To assess the results, they measured the distance (in inches) from the center of a target on the board to the center of the hole. The data and summary statistics are shown in the table:

Fast Slow 0.000101 0.000098 0.000102 0.000096 0.000100 0.000097 0.000102 0.000095 0.000101 0.000094 0.000103 0.000098 0.000104 0.000096 0.000102 0.975600 0.000102 0.000097 0.000100 0.000096 Mean 0.000102 0.097647

StdDev 0.000001 0.308481 Write a report summarizing the findings of the experiment. Include appropriate visual and verbal displays of the

distributions, and make a recommendation to the engineers if they are most interested in the accuracy of the method.

39. Customer Database. A philanthropic organization has a database of millions of donors that they contact by mail to raise money for charities. One of the variables in the database, Title, contains the title of the person or persons printed on the address label. The most common are Mr., Ms., Miss, and Mrs., but (here are also Ambassador and Mrs., Your Imperial Majesty, and Cardinal to name a few others. In all there are over 100 different titles, each with a corresponding numeric code. Here are a few of them:

Code Title 000 MR. 001 MRS. 002 MR. and MRS.

Page 10: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

003 MISS 004 DR. 005 MADAME 006 SERGEANT 009 RABBI 010 PROFESSOR 126 PRINCE 127 PRINCESS 128 CHIEF 129 BARON 130 SHEIK 131 PRINCE AND PRINCESS 132 YOUR IMPERIAL MAJESTY

1035 M. ET MME. 1210 PROF.

An intern who was asked to analyze the organization's fundraising efforts presented these summary statistics for the variable Title:

Mean 54.41 StdDev 957.5 Median 1 IQR 2 n 94649

a) What does the mean of 54.41 mean? b) What are the typical reasons that cause measures of center and spread to be as different as those in this table? c) Is that why these are so different? 40. Zip codes revisited. Here are some summary statistics to go with the histogram of the zip codes of 500 customers

from the Holes R Us Internet Jewelry Salon that we saw in Exercise 23 of Chapter 4.

Count 500 Mean 64970.0StdDev 23523.0Median 64871 IQR 44183 Q1 46050 Q3 90233

a) Is the mean or median a "better" summary of the center of the zip code distribution? b) Is the standard deviation or the IQR a better summary of the spread? c) What can these statistics tell you about the company's sales? 41. Eye and hair color. A survey of 1021 school-age children was conducted by randomly selecting children from

several large urban elementary schools. Two of the questions concerned eye and hair color. In the survey, the following codes were used:

Hair color Eye color 1 = Blond 1 = Blue 2 = Brown 2 = Green3 = Black 3 = Brown4 –Red 4 = Grey 5 - Other 5 – Other

Page 11: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

The statistics students analyzing the data were asked to study the relationship between eye and hair color. They produced this plot:

Is their graph appropriate? If so, summarize the findings. If not, explain why not. 42. Stereograms. Stereograms appear to be composed entirely of random dots. However, they contain separate images

that a viewer can "fuse" into a three-dimensional (3D) image by staring at the dots while defocusing the eyes. An experiment was performed to determine whether knowledge of the embedded image affected the time required for subjects to fuse the images. One group of subjects (group NV) received no information or just verbal information about the shape of the embedded object. A second group (group VV) received both verbal information and visual information (specifically, a drawing of the object). The experimenters measured how many seconds it took for the subject to report that he or she saw the 3D image.

a) What two variables are discussed in this description? b) For each variable, is it quantitative or categorical? If quantitative, what are the units? c) Here are boxplots comparing the fusion times for the two treatment groups. Write a few sentences comparing

these distributions. What does the experiment show? 43. Stereograms, revisited. Because of the skewness of the distributions of fusion times, we might consider a re-

expression. Here are the boxplots of the log of fusion times. Is it better to analyze the original fusion times or the log fusion times? Explain.

44. Stereograms, yet again. Here are the boxplots of the reciprocal of fusion times. Is it better to analyze the original

fusion times, the log fusion times, or the reciprocal? Explain.

Page 12: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

Page 13: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

CHAPTER 5 3. a) $1001.50 b) 1025,850,1200 c) 835, 350 4. a) 56.3 b) 40, 36, 68 c) 105, 32 5. a) Median will probably be unaffected. The mean will be larger, b) The range and standard deviation will increase; IQR will be unaffected. 6. The company is using the mean of 7 days, while the union negotiators are using the median of 3 days. There may

be high outliers, or a distribution of sick days that is generally skewed to the right. 7. a) Mean $525, median $450 b) 2 employees earn more than mean. c) The median because of the outlier. d) The IQR will be least sensitive to the outlier of $1200, so it would be the best to report. 8. a) 60, 65, 66, 70, 76 b)

c) Mean 67.12 inches. SD 3.79 inches. d)

e) The distribution is centered near 67 in., with median 66 in. and mean 67.12 in. The IQR is 15 in., with the

middle 50% of heights between 65 and 70 in. The distribution appears to be bimodal, probably due to differences in heights of men and women.

9. a) The standard deviation will be larger for set 2, since they are more spread out. SD(set 1) = 2.2, SD(set 2) = 3.2. b) The standard deviation will be larger for set 2, since 11 and 19 are farther from 15 than are 14 and 16. Other

numbers are the same. SD(set 1) = 3.6, SD(set 2) = 4.5. c) The standard deviation will be the same for both sets, since the values in the second data set are just the values in

the first data set + 80. The spread has not changed. SD(set 1) = 4.2, SD(set 2) = 4.2. 10. a) Set 2 will have a larger SDbecause6and 8 inset 2are farther from 7 than 7 and 7 in set 1. The other numbers are

the same. SD(set 1) = 2.1, SD(set 2) = 2.2. b) The standard deviation will be the same for both sets, since the values in set 1 are just the values in set 2 + 90.

The spread remains the same. SD(set 1) = 36.1, SD(set 2) = 36.1 c) Set 2 has a wider range and will have a larger SD. SD(set 1) = 6.0, SD(set 2) = 7.2. 11. According to the fence rule, 61 is not technically an outlier. The median is 24.5. The two quartiles are 14 and 33, so

the IQR is 19. That means 1.5 IQRs is 26.5. Adding 26.5 to the upper quartile of 33 gives a fence of 61.5. So, 61 is just within the fence and not an outlier by that definition. But it certainly looks unusual compared with the other 9 seasons!

12. a) The distribution is strongly skewed to the right, so use the median and IQR. b) The IQR is 50, so the upper fence is the upper quartile + 1.5 IQRs; that is, 78 + 75 = 153. There appear to be 4 to

5 parks that should be considered as outliers with more than 153 camp sites.

Page 14: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

c)

d) The distribution is unimodal with a strong skew to the right. There are several outliers past the 1.5 X IQR upper

fence of 153 camp sites. The median number of camp sites is 43.5 sites. The mean is 62.8 sites. The mean is larger than the median because it has been influenced by the strong skew and the outliers.

13. Women appear to marry about 3 years younger than men, but the two distributions are very similar in shape and spread.

14. Both fuel economy and its spread decrease from 4 to 6 to 8 cylinders (not enough data to compare 5-cylinder cars). The lower 50% of MPGs for the 8-cylinder cars corresponds roughly to the bottom 25% of MPGs for the 6-cylinder cars. All of the 8-cylinder cars get less mileage than all of the 4-cylinder cars.

15. a) Seneca. b) Seneca. c) Keuka. d) The Cayuga and Seneca wineries have about same average price; the boxplots show similar medians and similar

IQRs, even though Seneca's range is larger. Keuka has consistently higher prices except for one low outlier, and a more consistent pricing as shown by the smaller IQR. Distributions for all three appear to be roughly symmetric around their centers.

16. a) April b) February c) August d) The median ozone level in June is slightly higher than January, but June's readings are much more consistent.

June does show two outliers, one low and one high. e) Strong seasonal pattern with low consistent ozone concentrations in later summer/early fall and high variable

concentrations in early spring. The medians follow a cyclic pattern rising from January to April, then falling to October and rising again from October to December.

17. a) Because the mean is larger than the median, we might suppose that the distribution is right skewed. An outlier or outliers might explain why the SD is so much larger than the IQR.

b) The differences in the statistics could be due to skewness or to outliers on the high end. c) The distribution of times in the men's 100-m swim in Sydney is unimodal, with most times between 48 and 53

minutes. There is a slight skew to the right, with three times between 58 and 63 and one very extreme outlier at about 113 seconds.

18. Since the mean or average is the balance point for the data, having 223 areas below the average and only 99 above the average indicates that the distribution of unemployment rates must be skewed to the right.

19. a) Class 3 b) Class 3 c) Class 3 because it is skewed. Median would be higher than the mean because it is skewed to the left. d) Class 1 e) Class 1 20. a) Class 3. Highest median and almost 75% of class 3 scored at or above the medians of 1 and 2. b) Fairly symmetric (except class 3). Quite spread out, ranging from about 30 to near 100. c) The class A is 1, class B is 2 and class C is 3. 21. a) Essentially symmetric, very slightly skewed to the right with two high outliers at 36 and 48. Most victims are

between the ages of 16 and 24. b) The slight increase between ages 22 and 24 is apparent in the histogram but not in the boxplot. It may be a

second mode. c) The median would be the most appropriate measure of center because of the slight skew and the extreme

outliers.

Page 15: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

d) The IQR would be the most appropriate measure of spread because of the slight skew and the extreme outliers. 22. a) 1611 yd b) Between 5585.75 yd and 6131 yd c) Mean and SD, since this is roughly unimodal and symmetric. d) The distribution is fairly symmetric and unimodal. Because of the symmetry, the mean of 5892.91 yd and

median of 5928 yd are nearly the same, showing a center near 5900 yd. The standard deviation of the lengths is 386.59 yd. The range is 1611 yd from a minimum 5185 yd to a maximum 6796 yd?

23. a) Probably slightly left skewed. The mean is slightly below the median and the 25th percentile is farther from the median than the 75th percentile.

b) No, all data are within the fences. c)

d) The 48 universities graduate about 68% of freshman in 4 years on average, with percents ranging from 43% to

87%. The middle 50% of these universities graduate between 59% and 75% of their freshman in 4 years. 24. a) Skewed to the right, since the mean is much larger than the median and the upper quartile is farther from the

median than the lower quartile. b) The IQR is about 36, so fences are below 0 and at 109. Since the range is 244 and the minimum is 6, the

maximum is 250, which is certainly an outlier. Without knowing the data points, we are not sure of other outliers, but the standard deviation of 47.8 makes us suspect there are others.

c) We don't know if there are other outliers above the upper fence, but the boxplot may look something like this:

d) The wineries range in size from 6 to 250 acres. The median size of the 36 wineries is 33.5 acres, so half are

larger and half are smaller. The middle 50% of the wineries have sizes between 18.5 and 55 acres. The distribution of sizes is skewed to the right, with at least one outlier.

25. a) Who: Student volunteers What: Memory test Where, when: Not specified How: Students took memory test 2 hours after drinking caffeine-free, half-dose caffeine, or high-caffeine soda.

Why: To see if caffeine makes you more alert and aids memory retention. b) Drink: Categorical test score: quantitative. c)

Page 16: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

d) The participants scored about the same with no caffeine and low caffeine. The medians for both were 21 points,

with slightly more variation for the low-caffeine group. The high-caffeine group generally scored lower than other two groups on all measures of the five-number summary: min, lower quartile, median, upper quartile, and max. It is not clear if the differences are significant.

26. a) The median and IQR because the means are much larger than the median and the SDs are much larger than the IQR, indicating either right skewness and/or outliers,

b) Since the median rainfall for seeded clouds is more than 4 times that for unseeded clouds, it appears that seeding clouds may be effective.

27. a) Since these data are strongly skewed to the right, the median and IQR are the best statistics to report. b) The mean will be larger than the median because the data are skewed to the right. c) Median 4 million. The IQR is 4.5 million (UQ = 6 million, LQ =1.5 million). d) The distribution of populations of the states and Washington, D.C., is unimodal and skewed to the right. The

median population is 4 million. One state is an outlier with a population of 34 million. 28. The southern and western states appear to have significantly higher growth rates between 1990 and 2000 than the

northeastern and midwestern states.

29. a) About 36 mph b) LQ about 34.5 mph and UQ about 36.5 mph c) The range appears to be 7 mph from about 31 to 38 mph. The IQR is about 2 mph. d) We can't know exactly, but the boxplot may look something like this:

e) The median winning speed has been about 36 mph, with a max of about 38 and a min of about 31 mph. The

middle 50% of the speeds appeared to range from between 34.5 and 36.5 mph for an IQR of 2 mph. The middle 20% of the data appear to be tightly clustered around 36. Only 10% of the Kentucky Derby winners raced at speeds above about 37 mph.

30. Distribution is essentially symmetric with median near 225. The IQR is about 100 points. Extremes are about 80 and 380.

Page 17: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

31. a) Boys b) Boys c) Girls d) The boys appeared to have more skew, as their scores were less symmetric between quartiles. The girls' quartiles

are the same distance from the median, although the left tail stretches a bit farther to the left. e) Girls. Their median and upper quartiles are larger. The lower quartile is slightly lower, but close. f) [14(4.2) + 11(4.6)]/25 = 4.38 32. a)

b) Median score by females at 625 points is 25 points higher than that by males. Female mean is higher by 12. The

middle 50% of females scored between 530 and 680, while the middle 50% of males scored between 515 and 650. The males did have a larger range, from 310 to 800, the highest score for both genders. Both distributions are slightly skewed to the left.

33. a)

b) The mean cost per minute for Net2Phone rates is 13.7 cents. The median cost per minute is 9.9 cents per minute.

Because of the outliers, the median is the more appropriate measure of center to use. c) The IQR for the cost per minute of the phone calls is 7.6 cents. The standard deviation is 11.8 cents per minute.

Again, due to the outliers, the better choice for measure of spread is the IQR. d) Yes, both India and Pakistan are outliers. They each lie more than 1.5 times the IQR above the UQ. e) The distribution of the long-distance rates for Net2Phone is unimodal and symmetric except for two outliers,

India and Pakistan at 49 cents per minute. The median cost is 9.9 cents per minute. The middle 50% of costs per minute are between 7.9 and 15.5 cents per minute for an IQR of 7.6 cents per minute.

f) These data represent only 24 of the 250 countries that Net2Phone offers service to. These might not be representative of the entire group.

34. a) Either the boxplot or the histogram would be appropriate displays of the data.

Page 18: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

b) Mean 2.317%, median 2.235%. Mean is higher because of the outlier. c) The median because of the outlier. d) IQR 0.515%, std dev 0.427%. e) The standard deviation is also influenced by the outlier. The better measure of spread is the IQR. f) Mean and median would be 1.2% lower. SD and IQR would not change. g) Median and IQR won't change very much. The middle value and the two quartiles will shift at most one data

value. The mean and SD will decrease. h) The distribution of growth rates for the cities is unimodal and symmetric except for the one outlier, Las Vegas,

at 3.72%. The median growth rate for these cities is 2.235%. The middle 50% of the cities had growth rates between 2.045% and 2.560% for an interquartile range of 0.515%. The median and IQR are the best statistics to report when a distribution has outliers, but if the outlier were omitted, the average growth rate would be 2.24% with a standard deviation of 0.28%.

35. a)

b) The distribution is unimodal and skewed to the left. As shown in the boxplot, there is one outlier, an average

score of 275. The median was 499 while the mean was 502, slightly above the median score. The middle 50% of the nations scored between 448 and 531 for an IQR of 83 points.

36. a)

b) Mean 5.08%, SD 1.79%. Five number summary: 1.3, 3.95, 5.4, 6.35,8 (%). c) The distribution is unimodal and symmetric. There are no outliers. The mean percent increase was 5.08% with a

standard deviation of 1.79%. 37. In the year 2000, per capita gasoline use by state in the United States averaged around 500 gallons per person

(mean 488, median 502). States varied in their consumption with a standard deviation of 62 gallons, ranging from a min of 297 (NC) to a max of 587 (TN). The two low outliers were NC and HI. The IQR of 82 gallons reflects the fact that 50% of the states varied from 453 to 533 gallons per capita.

38. a)

Page 19: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

38.

There appears to be an outlier! This point should be investigated. We'll proceed by redoing the plots with the

outlier omitted:

It appears that slow speed provides much greater accuracy. But the outlier should be investigated. It is possible that

slow speed can induce an infrequent very large distance. 39. a) Although numeric codes have been assigned to the different titles, these data are categorical, not quantitative.

The mean of 54.41 is meaningless. b) The typical reasons are skewness and/or outliers. c) No. Here the numbers are just codes. Most of the people probably had titles of Mr. or Mrs., making the "median"

1, but these summary statistics are meaningless. 40. a) Neither. These are not quantitative data. b) Neither. These are not quantitative data. c) Not very much, since zip codes are categorical. However, there is some information in the first digit of zip

codes. They indicate a general East (0-1) to West (8-9) direction. So, the distribution shows that a large portion of their sales occurs in the West and another in the 32000 area. But a bar chart of the first digits might be a better display of this.

41. No, boxplots are for quantitative data, and these are categorical, although coded as numbers. The numbers used for hair color and eye color are arbitrary, so the boxplot and any accompanying statistics for eye color make no sense.

42. a) Fusion time and group. b) Fusion time is quantitative (units = seconds). Group is categorical. c) Both distributions are skewed to the right with high outliers. The boxplot indicates that verbal information may

reduce fusion time. The median for the Verbal/Visual group seems to be about the same as the lower quartile of the No/Verbal group.

43. The analysis would probably be improved by using the log-transformed data. The distributions are more symmetric, and it's easier to compare the groups. The outliers are eliminated.

Page 20: Bock-Ch05 Problems Describing Distributions …lhvarsitymath.com/Bock-Ch05 Problems Describing Distributions...Mean 62.78 sites Median 43.50 StdDev 56 ... Write a brief comparison

Describing Distribution Numerically

44. The reciprocal re-expression has gone too far. The distributions for the reciprocals show the same skewness as the original data, but now there are no outliers. Of the three choices, the log re-expression appears to be the best.