56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot...

33
Chapter 3 3.1 (a) The amount of time a student spends studying is the explanatory variable and the grade on the exam is the response variable. (b) Height is the explanatory variable and weight is the response variable. (c) Inches of rain is the explanatory variable and the yield of corn is the response variable. (d) It is more reasonable to explore the relationship between a student’s grades in statistics and French. (e) A family’s income is the explanatory variable and the years of education their eldest child completes is the response variable. 3.2 The explanatory variable is weight of a person, and the response variable is mortality rate (that is, how likely a person is to die over a 10-year period). The other variables that may influence the relationship between weight and survival are the amount of physical activity, perhaps measured by hours of exercise per week, and economic status, which could be measured by annual income of the person, family net worth, amount of savings, or some other financial variable. 3.3 Water temperature is the explanatory variable, and weight change (growth) is the response variable. Both are quantitative. 3.4 The explanatory variable is the type of treatment—removal of the breast or removal of only the tumor and nearby lymph nodes, followed by radiation, and survival time is the response variable. Type of treatment is a categorical variable, and survival time is a quantitative variable. 3.5 (a) The explanatory variable is the number of powerboat registrations. (b) A scatterplot is shown below. Powerboat Registrations (1000s) Manatees killed 750 700 650 600 550 500 450 50 40 30 20 10 The scatterplot shows a positive linear relationship between these variables. (c) There is a positive linear association between powerboat registrations and manatees killed. (d) Yes, the relationship between these variables is linear. (e) The relationship is a strong, positive, linear association. Yes, the number of manatees killed can be predicted accurately from powerboat registrations. For 719,000 powerboat registrations, about 48 manatees would be killed by powerboats. 3.6 (a) A scatterplot is shown below. 56 Chapter 3

Transcript of 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot...

Page 1: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Chapter 3 3.1 (a) The amount of time a student spends studying is the explanatory variable and the grade on the exam is the response variable. (b) Height is the explanatory variable and weight is the response variable. (c) Inches of rain is the explanatory variable and the yield of corn is the response variable. (d) It is more reasonable to explore the relationship between a student’s grades in statistics and French. (e) A family’s income is the explanatory variable and the years of education their eldest child completes is the response variable. 3.2 The explanatory variable is weight of a person, and the response variable is mortality rate (that is, how likely a person is to die over a 10-year period). The other variables that may influence the relationship between weight and survival are the amount of physical activity, perhaps measured by hours of exercise per week, and economic status, which could be measured by annual income of the person, family net worth, amount of savings, or some other financial variable. 3.3 Water temperature is the explanatory variable, and weight change (growth) is the response variable. Both are quantitative. 3.4 The explanatory variable is the type of treatment—removal of the breast or removal of only the tumor and nearby lymph nodes, followed by radiation, and survival time is the response variable. Type of treatment is a categorical variable, and survival time is a quantitative variable. 3.5 (a) The explanatory variable is the number of powerboat registrations. (b) A scatterplot is shown below.

Powerboat Registrations (1000s)

Man

atee

s ki

lled

750700650600550500450

50

40

30

20

10

The scatterplot shows a positive linear relationship between these variables. (c) There is a positive linear association between powerboat registrations and manatees killed. (d) Yes, the relationship between these variables is linear. (e) The relationship is a strong, positive, linear association. Yes, the number of manatees killed can be predicted accurately from powerboat registrations. For 719,000 powerboat registrations, about 48 manatees would be killed by powerboats. 3.6 (a) A scatterplot is shown below.

56 Chapter 3

Page 2: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Percent returning

New

adu

lts

8070605040

20.0

17.5

15.0

12.5

10.0

7.5

5.0

(b) The scatterplot shows a negative, linear, fairly weak relationship. (Note: direction=negative, form=linear, strength=weak.) (c) Because this association is negative, we conclude that the sparrowhawk is a long-lived territorial species. 3.7 (a) A positive association between IQ and GPA means that students with higher IQs tend to have higher GPAs, and those with lower IQs generally have lower GPAs. The plot does show a positive association. (b) The form of the relationship roughly linear, because a line through the scatterplot of points would provide a good summary. The positive association is moderately strong (with a few exceptions) because most of the points would be close to the line. (c) The lowest point on the plot is for a student with an IQ of about 103 and a GPA of about 0.5. 3.8 (a) From Figure 3.5, the returns on stocks were about 50% in 1954 and about −28% in 1974. (b) The return on Treasury bills in 1981 was about 15%. (c) The scatterplot shows no clear pattern. The statement that “high treasury bill returns tend to go with low returns on stocks” implies a negative association; there may be some suggestion of such a pattern, but it is extremely weak. 3.9 (a) A scatterplot with speed as the explanatory variable is shown below.

Speed (km/h)

Fuel

use

d (l

iter

s/10

0 km

)

160140120100806040200

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

(b) The relationship is curved or quadratic. High amounts of fuel were used for low and high values of speed and low amounts of fuel were used for moderate speeds. This makes sense because the best fuel efficiency is obtained by driving at moderate speeds. (Note: 60 km/hr is about 37 mph) (c) Poor fuel efficiency (above average fuel consumption) is found at both high and low speeds and good fuel efficiency (below average fuel consumption) is found at moderate speeds. (d) The relationship is very strong, with little deviation for a curve that can be drawn through the points.

Examining Relationships 57

Page 3: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

3.10 (a) A scatterplot with mass as the explanatory variable is shown below.

Mass (kg)

Rat

e (c

al)

555045403530

1500

1400

1300

1200

1100

1000

900

(b) The association is positive, and the relationship is linear and moderately strong. (c) The scatterplot below shows that the pattern of the relationship does hold for men. However, the relationship between mass and rate is not as strong for men as it is for women. The group of men has higher lean body masses and metabolic rates than the group of women.

Mass (kg)

Rat

e (c

al)

6560555045403530

2000

1800

1600

1400

1200

1000

sexFM

3.11 A scatterplot from a calculator is shown below. As expected, the calculator graph looks the same as the scatterplot in Exercise 3.9 (a).

3.12 A scatterplot from a calculator is shown below. As expected, the calculator graph shows the same relationship as the scatterplot in Exercise 3.10.

58 Chapter 3

Page 4: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

3.13 (a) The scatterplot below shows a strong, positive, linear relationship between the two measurements. Thus, all five specimens appear to be from the same species.

Femur length (cm)

Hum

erus

leng

th (

cm)

7570656055504540

90

80

70

60

50

40

(b) The femur measurements have mean of 58.2 and a standard deviation of 13.2. The humerus measurements have a mean of 66 and a standard deviation of 15.89. The table below shows the standardized measurements (labeled zfemur and zhumerus) obtained by subtracting the mean and dividing by the standard deviation. The column labeled “product” contains the product (zfemur×zhumerus) of the standardized measurements. The sum of the products is 3.97659, so

the correlation coefficient is 1 3.97659 0.99414

r = × = .

femur Humerus zfemur zhumerus product 38 41 -1.53048 -1.57329 2.40789 56 63 -0.16669 -0.18880 0.03147 59 70 0.06061 0.25173 0.01526 64 72 0.43944 0.37759 0.16593 74 84 1.19711 1.13277 1.35605

(c) The correlation coefficient is the same, 0.9941. 3.14 The scatterplot below, with price as the explanatory variable, shows a strong, positive, linear association between price and deforestation percent.

Examining Relationships 59

Page 5: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Price (cents per pound)

Def

ores

tati

on (

perc

ent)

7060504030

3.0

2.5

2.0

1.5

1.0

0.5

(b) The prices have a mean of 50 and a standard deviation of 16.32. The deforestation percents have a mean of 1.738% and a standard deviation of 0.928%. The table below shows the standardized values (labeled zprice and zdeforestation) obtained by subtracting the mean and dividing by the standard deviation. The column labeled “product” contains the product (zprice×zdeforestation) of the standardized measurements. The sum of the products is 3.82064,

so the correlation coefficient is 1 3.82064 0.95524

r = × = .

price Deforestation zprice zdeforestation product 29 0.49 −1.28638 −1.34507 1.73028 40 1.59 −0.61256 −0.15951 0.09771 54 1.69 0.24503 −0.05173 −0.01268 55 1.82 0.30628 0.08838 0.02707 72 3.10 1.34764 1.46794 1.97826

(c) The correlation coefficient is the same, 0.9552. 3.15 (a) The lowest calorie count is about 107 calories and the sodium level for this brand is about 145 mg. The highest calorie count is about 195 calories, and the sodium level for this brand is about 510 mg. (b) The scatterplot shows positive association; high-calorie hot dogs tend to be high in salt, and low-calorie hot dogs tend to have low sodium. (c) The lower left point is an outlier. Ignoring this point, the relationship is linear and moderately strong. 3.16 (a) The correlation r is clearly positive but not near 1. The scatterplot shows that students with high IQs tend to have high grade point averages, but there is more variation in the grade point averages for students with moderate IQs. (b) The correlation r for the data in Figure 3.8 would be closer to one. The overall positive relationship between calories and sodium is stronger than the positive association between IQs and GPAs. (c) The outliers with moderate IQ scores in Figure 3.4 weaken the positive relationship between IQ and GPA, so removing them would increase r. The outlier in the lower left corner of Figure 3.8 strengthens the positive, linear relationship between calories and sodium, so removing this outlier would decrease r. 3.17 (a) A scatterplot is shown below.

60 Chapter 3

Page 6: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

x

y

43210-1-2-3-4-5

0.50

0.25

0.00

-0.25

-0.50

-0.75

(b) The correlation r = 0.2531. (c) The two scatterplots, using the same scale for both variables, are shown below.

x

y

43210-1-2-3-4-5

5.0

2.5

0.0

-2.5

-5.0

-7.5

x*

y*

43210-1-2-3-4-5

5.0

2.5

0.0

-2.5

-5.0

-7.5

(d) The correlation between x* and y* is the same as the correlation between x and y, r = 0.2531. Although the variables have been transformed, the distances between the corresponding points and the strengths of the association have not changed. 3.18 (a) The correlation between the percent of returning birds and the number of new adults is

. (b) A scatterplot with the two new points added is shown below. 0.748r = −

Percent returning

New

adu

lts

9080706050403020100

25

20

15

10

5

CodeABOriginal

Examining Relationships 61

Page 7: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

The correlation for the original data plus point A is r = −0.807. The correlation for the original data plus point B is r = −0.469. (c) Point A fits in with the negative linear association displayed by the other points, and even emphasizes (strengthens) that association because, when A is included, the points of the scatterplot are less spread out (relative to the length of the apparent line suggested by the points). On the other hand, Point B deviates from the pattern, weakening the association. 3.19 There is a perfect, positive association between the ages of the women and their spouses, so

. 1r = 3.20 (a) A scatterplot of mileage versus speed is shown below.

Speed (mph)

Mile

age

(mpg

)

6050403020

30

29

28

27

26

25

24

(b) The speeds have a mean of 40 and a standard deviation of 15.81. The mileages have a mean of 26.8 mpg and a standard deviation of 2.68 mpg The table below shows the standardized values (labeled zspeed and zmpg) obtained by subtracting the mean and dividing by the standard deviation. The column labeled “product” contains the product (zspeed×zmpg) of the standardized measurements. The sum of the products is 0.0, so the correlation coefficient is also 0.0.

speed mpg zspeed zmpg product 20 24 −1.26491 −1.04350 1.31993 30 28 −0.63246 0.44721 −0.28284 40 30 0.00000 1.19257 0.00000 50 28 0.63246 0.44721 0.28284 60 24 1.26491 −1.04350 −1.31993

The correlation coefficient r measures the strength of linear association between two quantitative variables; this plot shows a nonlinear relationship between speed and mileage. 3.21 (a) New York’s median household income is about $32,800 and the mean income per person is about $27,500. (b) Both of these variables measure the prosperity of a state, so you would expect an increase on one measure to correspond with an increase in the other measure. Household income will generally be higher than income per person because most households have one primary source of income and at least one other smaller source of income. (c) In the District of Columbia there are a relatively small number of individuals earning a great deal of money. Thus, the income distribution is skewed to the right, which would raise the mean per capita income above the median household income. (d) Alaska’s median household income is about $48,000. (e) Ignoring the outliers, the relationship is positive, linear, and moderately strong.

62 Chapter 3

Page 8: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

3.22 (a) A scatterplot, with the explanatory variable time, is shown below.

Time (minutes)

Puls

e (b

eats

per

min

ute)

36.536.035.535.034.534.0

160

150

140

130

120

(b) The association between time and pulse is negative. The faster Professor Moore swims 2000 yards the more effort he will have to exert. Thus, a higher speed (lower time) will correspond with a higher pulse and slower speeds (higher times) will correspond with lower pulses. (c) The negative, linear relationship is moderately strong. (d) The correlation is r = −0.744. The scatterplot shows a negative association between time and pulse. Small times correspond with large pulses and large times correspond with small pulses. (e) The value of would not change.

r

3.23 (a) Gender is a categorical variable and the correlation coefficient measures the strength of linear association for two quantitative variables. (b) The largest possible value of the correlation coefficient is 1. (c) The correlation coefficient has no units.

r

r r 3.24 The paper’s report is wrong because the correlation ( 0.0r = ) is interpreted incorrectly. The author incorrectly suggests that a correlation of zero indicates a negative association between research productivity and teaching rating. The psychologist meant that there is no linear association between research productivity and teaching rating. In other words, knowledge of a professor’s research productivity will not help you predict her teaching rating. 3.25 (a) A scatterplot, with the correct calories as the explanatory variable, is shown below.

Correct calories

Gues

sed

calo

ries

4003002001000

400

300

200

100

0

(b) There is a positive, linear relationship between the correct and guessed calories. The guessed calories for 5 oz. of spaghetti with tomato sauce and the cream-filled snack cake are unusually high and do not appear to fit the overall pattern displayed for the other foods. (c) The correlation

Examining Relationships 63

Page 9: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

is . This agrees with the positive association observed in the plot; it is not closer to 1 because of the unusual guessed calories for spaghetti and cake. (d) The fact that the guesses are all higher than the true calorie count does not influence the correlation. The correlation r would not change if every guess were 100 calories higher. The correlation does not change if a constant is added to all values of a variable because the standardized values would be unchanged.

0.825r =

r

(c) The correlation without these two foods is 0.984r = . The correlation is closer to 1 because the relationship is much stronger without these two foods. 3.26 (a) Rachel should choose small-cap stocks because small-cap stocks have a lower correlation with municipal bonds. Thus, the weak, positive relationship between small-cap stocks and bonds will provide more diversification than the large-cap stocks, which have a stronger positive relationship with bonds. (b) She should look for a negative correlation, although this would also mean that the return on this investment would tend to decrease when return on bonds increases. 3.27 The correlation is . The one unusual point (10, 1) is responsible for reducing the correlation. Outliers tend to have fairly strong effects on correlation; the effect is very strong here because there are only six observations.

0.481r =

3.28 (a) A scatterplot of yield versus plants is shown below.

Plants (per acre)

Yie

ld (

bush

els

per

acre

)

3000025000200001500010000

170

160

150

140

130

120

110

(b) The overall pattern is not linear. The yield tends to be highest for moderate planting rates and smallest for small and large planting rates. There is clearly no positive or negative association between planting rates and yield. (d) The mean yields for the five planting rates are:

Planting rate Mean 12000 131.025 16000 143.150 20000 146.225 24000 143.067 28000 134.750

A scatterplot with the means added is shown below. We would recommend the planting rate with the highest average yield, 20,000 plants per acre.

64 Chapter 3

Page 10: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Plants (per acre)

Yie

ld (

bush

els

per

acre

)

3000025000200001500010000

170

160

150

140

130

120

110

3.29 (a) For every one week increase in age, the rat will increase its weight by an average of 40 grams. (b) The y intercept provides an estimate for the birth weight (100 grams) of this male rat. (c) A graph is provided below.

Time (weeks)

Wei

ght

(gra

ms)

1086420

500

400

300

200

100

(d) No, we should not use this line to predict the rat’s weight at 104 weeks. This would be extrapolation. This regression line would predict a weight of 4260 grams (about 9.4 lbs) for a 2 year old rat! The regression equation is only reliable for times where data were collected. 3.30 (a) The slope is 0.882; this means that on the average, reading score increases by 0.882 for each one-point increase in IQ. (b) The predicted scores for x = 90 and x = 130 are −33.4 + 0.882×90 = 45.98 and −33.4 + 0.882×130 = 81.26. (c) This is most easily done by plotting the points (90, 45.98) and (130, 81.26), then drawing the line connecting them.

IQ

Rea

ding

Sco

re

13012011010090

85

80

75

70

65

60

55

50

45

(d) The intercept (−33.4) would correspond to the expected reading score for a child with an IQ of 0; neither that reading score nor that IQ has any meaningful interpretation.

Examining Relationships 65

Page 11: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

3.31 (a) The slope is 0.0138 minutes per meter. On average, if the depth of the dive is increased by one meter, it adds 0.0138 minutes (about 0.83 seconds) to the time spent underwater. (b) When Depth = 200, the regression line estimates DiveDuration to be 5.45 minutes (5 minutes and 27 seconds). (c) To plot the line, compute DiveDuration = 3.242 minutes when Depth = 40 meters, and DiveDuration = 6.83 minutes when Depth = 300 meters.

Depth of dive (meters)

Dur

atio

n (m

inut

es)

30025020015010050

7

6

5

4

3

(d) The intercept suggests that a dive of no depth would last an average of 2.69 minutes; this obviously does not make any sense. 3.32 (a) The slope is −0.0053; this means that on the average for each additional week of study the pH decreased by 0.0053 units. Thus, the acidity of the precipitation increased over time. (b) To plot the line, compute pH at the beginning (weeks = 0) and end (weeks = 150) of the study. At the beginning of the study pH is 5.43 and at the end of the study pH is 4.635.

Weeks

pH

160140120100806040200

5.5

5.4

5.3

5.2

5.1

5.0

4.9

4.8

4.7

4.6

(c) Yes, the y intercept provides an estimate for the pH level at the beginning of the study. (d) The regression line predicts the pH to be 4.635 at the end of this study. 3.33 (a) A scatterplot from the calculator is shown below.

(b) Let y = number of manatees killed and x = number of powerboat registrations. The least-square regression equation is . ˆ 41.43 0.1249y x= − +

66 Chapter 3

Page 12: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Powerboat Registrations (1000s)

Man

atee

s ki

lled

750700650600550500450

50

40

30

20

10

(c) When 716,000 powerboats are registered, the predicted number of manatees killed will be −41.43 + 0.1249 × 716 = 47.99, or about 48 manatees. (d) Yes, the measures seem to be succeeding, three of the four new points are below the regression line, indicating that fewer manatees than predicted were killed. Additional evidence of success is provided by the two points for 1992 and 1993; they fall well below the overall pattern.

Powerboat registrations (1000s)

Man

atee

s ki

lled

750700650600550500450

50

40

30

20

10

(e) The mean number of manatee deaths for the years with 716,000 powerboat registrations is 42. The prediction of 48 was too high. 3.34 (a) The least squares regression line is ˆ 31.9 0.304y x= − . The calculator output (and Minitab output) is shown below.

Minitab output The regression equation is newadults = 31.9 - 0.304 %returning Predictor Coef SE Coef T P Constant 31.934 4.838 6.60 0.000 %returning -0.30402 0.08122 -3.74 0.003 S = 3.66689 R-Sq = 56.0% R-Sq(adj) = 52.0%

Examining Relationships 67

Page 13: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

(b) The means, standard deviations, and correlation are: 58.23%x = , 13.03%xs = , 14.23y = new birds, new birds, 5.29ys = 0.748r = − . (c) The slope is

5.290.748 0.30413.03

b ⎛ ⎞= − −⎜ ⎟⎝ ⎠

and the intercept is 14.23 58.23 31.9a b= − × . (d) The slope tells

us that as the percent of returning birds increases by one the number of new birds will decrease by −0.304 on average. The y intercept provides a prediction that we will see 31.9 new adults in a new colony when the percent of returning birds is zero. This value is clearly outside the range of values studied for the 13 colonies of sparrowhawks and has no practical meaning in this situation. (e) The predicted value for the number of new adults is 31.9 − 0.304×60 = 13.69 or about 14. 3.35 (a) Let = Blood Alcohol Content (BAC) and y x = Number of Beers. The least-squares regression line is . (b) The slope indicates that on average, the BAC will increase by 0.017964 for each additional beer consumed. The intercept suggests that the average BAC will be −0.01270 if no beers are consumed; this is clearly ridiculous. (c) The predicted BAC for a student who consumed 6 beers is −0.0127 + 0.017964×6 = 0.0951. (d) The prediction error is 0.10 − 0.0951 = 0.0049.

ˆ 0.0127 0.017964y = − + x

3.36 (a) The relationship between the two variables in Figure 3.15 is positive, linear, and very strong. (b) The regression line predicts that the Sanchez family would average about 500 cubic feet of gas per day in a month that averages 20 degree-days per day. (c) The blue line in Figure 3.15 is called the “least-squares line” because it minimizes the sum of the squared deviations of the observed amounts of gas consumed from the predicted amounts of gas. In other words, the least squares line minimizes the squared vertical distances from the observed amounts of gas consumed to the values predicted by the line. (d) The least squares line provides a very good fit because the prediction errors, the vertical distances from the points to the line, are very small and the linear relationship is very strong.

3.37 The slope is 0.0441399290.894 0.0182.1975365

b ⎛ ⎞= ⎜ ⎟⎝ ⎠

and the intercept is

, which is the same as the equation in Exercise 3.35. 0.07375 4.8125 0.0129a b= − × − 3.38 (a) Let y = gas used and x = degree-days. The least-squares regression line is . (b) The slope tells us that on average the amount of gas used increases by 0.188999 for each one unit increase in degree-days. The y intercept provides a realistic estimate (108.921 cubic feet) for the average amount of gas used when the average number of heating degree-days per day is zero. (c) The predicted value is 1.08921 + 0.188999×20 = 4.8629, which is very close to the rough estimate of 5 from Exercise 3.36 (b). (d) The predicted value for this month is 1.08921 + 0.188999×30 = 6.7592, so the prediction error is 640 − 675.92 = −35.92.

ˆ 1.08921 0.188999y = + x

3.39 (a) There is a positive, linear association between the two variables. There is more variation in the field measurements for larger laboratory measurements. The values are scattered above and below the line y = x for small and moderate depths, indicating strong agreement, but

68 Chapter 3

Page 14: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

the field measurements tend to be smaller than the laboratory measurements for large depths. (b) The points for the larger depths fall systematically below the line y x= showing that the field measurements are too small compared to the laboratory measurements. (c) In order to minimize the sum of the squared distances from the points to the regression line, the top right part of the blue line in Figure 3.20 would need to be pulled down to go through the “middle” of the group of points that are currently below the blue line. Thus, the slope would decrease and the intercept would increase. (d) The residual plot clearly shows that the prediction errors increase for larger laboratory measurements. In other words, the variability in the field measurements increases as the laboratory measurements increase. The least squares line does not provide a great fit, especially for larger depths. 3.40 (a) A scatterplot with the least squares regression line is shown below.

Speed (km/hour)

Fuel

con

sum

ptio

n (l

iter

s/10

0 km

)

160140120100806040200

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

(b) We would certainly not use the regression line to predict fuel consumption. The scatterplot shows a nonlinear relationship. (c) The sum of the residuals provided is −0.01, which illustrates a slight roundoff error. (d) The residual plot indicates that the regression line underestimates fuel consumption for slow and fast speeds and overestimates fuel consumption for moderate speeds. The quadratic pattern in the residual plot indicates that the regression model is not appropriate for these data.

Speed (km/hour)

Res

idua

l

160140120100806040200

10.0

7.5

5.0

2.5

0.0

-2.5

-5.0

0

3.41 (a) The scatterplot with y = rate and x = mass is shown below.

Examining Relationships 69

Page 15: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

(b) The least-squares regression line is ˆ 201.162 24.026y x= + . (c) The slope tells us that a female will increase her metabolic rate by a mean of 24.026 calories for each additional kg of lean body mass. The intercept provides an estimate for the average metabolic rate (201 calories) for women, when their lean body mass is zero (clearly unrealistic). (d) The residual plot (shown below) shows no clear pattern, so the least squares line is an adequate model for the data.

(e) The residual plot with the predicted value on the horizontal axis looks exactly like the previous plot of the residuals versus lean body mass.

3.42 (a) The correlations are all approximately the same (To three decimal places

and ), and the regression lines are all approximately . For all four sets, we predict when

0.816A B Cr r r= = = 0.817Dr = ˆ 3.0 0.5y x= +ˆ 8y 10x = . (b) The scatterplots are provided below.

x

y

15.012.510.07.55.0

11

10

9

8

7

6

5

4

Data Set A

x

y

15.012.510.07.55.0

10

9

8

7

6

5

4

3

Data Set B

70 Chapter 3

Page 16: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

x

y

15.012.510.07.55.0

13

12

11

10

9

8

7

6

5

4

Data Set C

x

y

2018161412108

13

12

11

10

9

8

7

6

5

Data Set D

(c) The residual plots are shown below.

x

Res

idua

l

15.012.510.07.55.0

2

1

0

-1

-2

0

Data Set A

x

Res

idua

l

15.012.510.07.55.0

1.0

0.5

0.0

-0.5

-1.0

-1.5

-2.0

0

Data Set B

x

Res

idua

l

15.012.510.07.55.0

3

2

1

0

-1

0

Data Set C

x

Res

idua

l

2018161412108

2

1

0

-1

-2

0

Data Set D

(d) The regression line should only be used for Data Set A. The variables have a moderate linear association with a fair amount of variability from the regression line and no obvious pattern in the residual plot. For Data Set B, there is an obvious nonlinear relationship which can be seen in both plots; we should fit a parabola or some other curve. For Data Set C, the point (13, 12.74) deviates from the strong linear relationship of the other points, pulling the regression line up. If a data entry error (or some other error) was made for this point, a regression line for the other points would be very useful for prediction. For Data Set D, the data point with x = 19 is a very influential point—the other points alone give no indication of slope for the line. The regression line is not useful in this situation with only two values of the explanatory variable x. 3.43 (a) The scatterplot of the data with the least-squares regression line is below.

Examining Relationships 71

Page 17: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Amount (ng)

Res

pons

e

20151050

900

800

700

600

500

400

300

200

100

0

(b) The regression equation is . (c) The residual plot is below. The residuals for the extreme x-values (x = 0.25 and x = 20.0) are almost all positive; all of the residuals for the middle two x values are negative.

ˆ 14.4 46.6y = − + x

Amount (ng)

Res

idua

l

20151050

10

5

0

-5

-10

-15

0

(d) r 2 = 0.9997; 99.97% of the variation in the response is explained by the least-squares regression with the amount of substance. This value suggests that the regression line does a great job predicting gas chromatograph readings. 3.44 0.16 0.40r = = (high attendance goes with high grades, so the correlation must be positive). 3.45 (a) The regression line is , where y = pulse and x = time. (b) Prediction is 147.52 bpm, 4.48 bpm lower that the actual value. (c) With = time and

ˆ 482.28 9.76y = − xy x = pulse,

. The predicted time is now 34.39 minutes, only 0.09 minutes (5.4 seconds) too high. (d) The results depend on which variable is viewed as the explanatory variable. ˆ 43.0 0.0567y = − x

3.46 (a) You should not use the least-squares regression line from Example 3.13 because the roles of the variables are reversed. Fat gain is now the explanatory variable and change in NEA is the response variable. (b) The least-squares line for predicting change in y = NEA from x =

fat gain has slope 257.660.7786 176.14721.1389

b ⎛ ⎞= − = −⎜ ⎟⎝ ⎠

and intercept

. Thus, the regression line is . (c) There will be variability in NEA change from subject to subject

so we want to use the regression line to predict the change in NEA rather than data from one

324.8 ( 176.1472) 2.388 745.4395a = − − × =ˆ 745.4395 176.1472y = − x

72 Chapter 3

Page 18: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

individual. The predicted change in NEA from this new subject is , which is much different than the NEA change for the

other subject. (d) For the NEA data, ˆ 745.4395 176.1472 3.0 216.9979y = − × =

0.7786r = − and . About 61% of the variation in NEA change is accounted for by the linear relationship with fat gained. The other 39% is individual variation among subjects that is not explained by the linear relationship. The two

values are the same because switching the roles of x and y does not change the correlation coefficient or the square of the correlation coefficient.

2 0.6062r =

2r

3.47 (a) . Thus, the straight-line relationship explains 35.52% of the variation in yearly changes. (b) The regression equation is

( )22 0.596 0.3552r = =ˆ 6.083 1.707y x= + . (c) The predicted

change is . We could have given the answer without doing calculations because the regression line must pass through

ˆ 6.083 1.707 1.75 9.0703%y = + × =( , )x y = (1.75, 9.07).

3.48 (a) A scatterplot, with the least-squares regression line, is shown below. The plot shows a strong, positive linear association between the number of beaver-caused stumps and the number of beetle larvae clusters.

Number of stumps

Num

ber

of c

lust

ers

of b

eetl

e la

rvae

54321

60

50

40

30

20

10

0

(b) The least-squares regression line is ˆ 1.29 11.89y x= − + . (c) The residual plot is shown below. The linear model appears to provide a very good fit.

Number of stumps

Res

idua

l

54321

10

5

0

-5

-10

-15

0

(d) About 84% of the variation in the number of beetle larvae clusters is accounted for by the linear relationship with the number of stumps. 3.49 (a) The slope (1.507) says that, on average, BOD rises (falls) by 1.507 mg/L for every 1 mg/L increase (decrease) in TOC. (b) When TOC = 0 mg/L, the predicted BOD level is –55.43

Examining Relationships 73

Page 19: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

mg/L. The negative value of BOD was obtained because values of TOC near zero were probably not included in the study. This is another example where the intercept does not have a practical interpretation. 3.50 (a) The least-squares line for predicting y = GPA from x = IQ has slope

2.10.6337 0.10113.17

b ⎛ ⎞= ⎜ ⎟⎝ ⎠

= and intercept 7.447 0.101 108.9 3.5519a = − × = − . Thus, the

regression line is . (b) . Thus, 40.16% of the variation in GPA is accounted for by the linear relationship with IQ. (c) The predicted GPA for this student is and the residual is .

ˆ 3.5519 0.101y x= − + ( )22 0.6337 0.4016r = =

ˆ 3.5519 0.101 103 6.8511y = − + × = 6.8511 0.53 6.3211− = 3.51 (a) A scatterplot, with the regression line, is shown below.

Age (months)

Wei

ght

(kg)

121086420

8

7

6

5

4

(b) Clearly, this line does not fit the data very well; the data show a clearly curved pattern. (c) The residuals sum to 0.01 (the result of roundoff error). The residual plot below shows a clear quadratic pattern, with the first two and the last four residuals being negative and those between 3 and 8 months being positive.

Age (months)

Res

idua

l for

Wei

ght

121086420

0.50

0.25

0.00

-0.25

-0.50

-0.75

-1.00

0

3.52 (a) A scatterplot, with the regression line, is shown below.

74 Chapter 3

Page 20: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Nitrates (mg/l)

Abs

orba

nce

2000150010005000

250

200

150

100

50

0

The correlation is 0.99994 > 0.997, so recalibration is not necessary. (b) The regression line for predicting absorbance is . The average increase in absorbance for a 1 mg/l increase in nitrates is 0.1133. The predicted absorbance when no nitrates are present is 1.6571. Ideally, we should predict no absorbance when nitrates are not present. (c) The predicted absorbance in a specimen with 500 mg/l of nitrates is

. (d) This prediction should be very accurate since the linear relationship is almost perfect, see the scatterplot above. Almost 100%

ˆ 1.6571 0.1133y = + x

ˆ 1.6571 0.1133 500 58.308y = + × =

( )2 0.9999r = of the variation in absorbance is accounted for by the linear relationship with nitrates. 3.53 (a) A scatterplot, with the regression line, is shown below.

Age (months)

Hei

ght

(cen

tim

eter

s)

605550454035

95.0

92.5

90.0

87.5

85.0

(b) The regression line for predicting y = height from x = age is ˆ 71.95 0.3833y x= + . (c) When x = 40 months: = 87.28 cm. When y x = 60 months: = 94.95 cm. (d) A change of 6 cm in 12 months is 0.5 cm/month. Sarah is growing at about 0.38 cm/month; more slowly than normal.

y

3.54 (a) Sarah’s predicted height at 480 months is ˆ 71.95 0.3833 480 255.93y = + × = cm. Converting to inches, Sarah’s predicted height is 255.93 0.3937 100.7596× = inches or about 8.4 feet! (b) The prediction is impossibly large, because we incorrectly used the least-squares regression line to extrapolate. 3.55 (a) The slope of the regression line for predicting final-exam score from pre-exam totals is

80.6 0.1630

b ⎛ ⎞= =⎜ ⎟⎝ ⎠

; for every extra point earned on the midterm, the score on the final exam

increases by a mean of 0.16. The intercept of the regression line is 75 0.16 280 30.2a = − × = ; if the student had a pre-exam total of 0 points, the predicted score on the final would be 30.2. (b)

Examining Relationships 75

Page 21: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Julie’s predicted final exam score is ˆ 30.2 0.16 300 78.2y = + × = . (c) , so only 36% of the variability in the final exam scores is accounted for by the linear relationship with pre-exam totals. About 64% of the individual variation is not accounted for by the least squares regression line, so Julie has a good reason to think this is not a good estimate.

2 0.36r =

3.56 (a) A scatterplot, with the regression line, is shown below.

U.S. (% return)

Ove

rsea

s (%

ret

urn)

403020100-10-20-30

70

60

50

40

30

20

10

0

-10

-20

(b) and or 21.54%. There is a positive linear association between U.S. and overseas returns, but it is not very strong. Only 21.54% of the variation in overseas returns is accounted for by the linear relationship with U.S. returns. (c) The regression line for predicting overseas returns is . (d) The overseas return in 1997 is predicted to be

. Thus, the residual is −24.31%. About 78.5% of the variation in the overseas returns will not be accounted for by the linear relationship and some of the prediction errors are very large, so predictions based on the regression line will not be very accurate. (e) In 1986, the overseas return was 69.4%—over 50 percentage points higher than would be expected. There are no other unusual points.

0.4641r = 2 0.2154r =

ˆ 5.694 0.6201y = + xˆ 5.694 0.6201 33.4 26.41%y = + × =

3.57 (a) The five-number summaries for the different returns are:

Minimum

First Quartile

Median

Third Quartile

Maximum

U.S. −26.4% 5.1% 18.2% 30.4% 37.6% Overseas −23.2% 2.1% 11.2% 29.6% 69.4%

These statistics are used to construct the side-by-side boxplots shown below.

Tota

l Ret

urn

(%)

U.S.Overseas

80

60

40

20

0

-20

-40

(b) The quartiles and the median of the U.S. five-number summary are higher, but the minimum and maximum overseas returns are higher. (c) Overseas stocks are more volatile. The box is

76 Chapter 3

Page 22: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

slightly taller for the overseas returns and the range (maximum − minimum) is larger for the overseas returns, as shown by the whiskers in the plot above. 3.58 Since the least-squares regression line must pass through the point of averages, we know that 46.6 0.41y = + x . Octavio’s predicted final exam score is

( ) ( )ˆ 46.6 0.41 10 46.6 0.41 0.41 10 4.1y x x= + + = + + × = +y . Thus, he will score 4.1 points above the mean on the final exam. 3.59 (a) The scatterplot is shown below. The guessed values are much higher than expected for these two foods.

Correct calories

Gues

sed

calo

ries

300250200150100500

450

400

350

300

250

200

150

100

(b) The regression line for predicting y = guessed calories from x = actual calories using all points is ˆ 58.59 1.3036y x (= + )2 0.68r = . Excluding spaghetti and snack cake:

(ˆ 43.88 1.14721y x= + )2 0.968r = . (c) A scatterplot with the two separate regression lines is shown below.

Correct calories

Gues

sed

calo

ries

300250200150100500

450

400

350

300

250

200

150

100

50

The two removed points could be called influential, in that when they are included, the regression line passes above every other point; after removing them, the new regression line passes through the “middle” of the remaining points. 3.60 (a) Without Child 19, the least-squares regression line for predicting y = score from x = age is . ˆ 109.305 1.1933y x= −

Examining Relationships 77

Page 23: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Age

Scor

e

45403530252015105

120

110

100

90

80

70

60

50

The slope and intercept change slightly when Child 19 is removed, so this point does not appear to be extremely influential. (b) With all children, ; without Child 19, . increases because more of the variability in the scores is explained by the stronger linear relationship with age. In other words, with Child 19’s high Gesell score removed, there is less variability around the regression line.

2 0.410r = 2 0.572r = 2r

3.61 (a) A scatterplot with the two new points is shown below. Point A is a horizontal outlier; that is, it has a much smaller x -value than the others. Point B is a vertical outlier; it has a higher

-value than the others. y

Percent of returning birds

Coun

t of

new

bir

ds

9080706050403020100

30

25

20

15

10

5

A

B

(b) The three regression formulas are: ˆ 31.9 0.304y x= − (the original data);

(with Point A); ˆ 22.8 0.156y = − x xˆ 32.3 0.293y = − (with Point B). Adding Point B has little impact. Point A is influential; it pulls the line down, and changes how the line looks relative to the original 13 data points. 3.62 (a) Who? The individuals are 16 couples in their mid-twenties who were married or had been dating for two years. What? The variables are empathy score (a quantitative measure of empathy from a psychological test) and brain activity (a quantitative variable reported as a fraction between −1 and 1). Why? The researchers wanted to see how the brain expresses empathy. In particular, they were interested in checking if women with higher empathy scores have a stronger response when their partner has a painful experience. When, where, how, and by whom? The researchers zapped the hands of the men and women to measure brain activity, presumably in a lab, doctor’s office, or hospital. The results appeared in Science in 2004 so the data were probably collected shortly before publication of the article. (b) Subject 16 is influential on the correlation. With all subjects, 0.515r = ; without Subject 16, . (c) Subject 16 is not influential on the least-squares regression line (see the scatterplot below).

0.331r =

78 Chapter 3

Page 24: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Empathy score

Brai

n ac

tivi

ty

1101009080706050403020

0.8

0.6

0.4

0.2

0.0

The regression lines are: (with all subjects) and

(without Subject 16). ˆ 0.0578 0.0076y = − + x

xˆ 0.0152 0.0067y = − + 3.63 Higher income can cause better health: higher income means more money to pay for medical care, drugs and better nutrition, which in turn results in better health. Better health can cause higher income: if workers enjoy better health, they are better able to get and hold a job, which can increase their income. 3.64 No, you cannot shorten your stay by choosing a smaller hospital. The positive correlation does not imply a cause and effect relationship. Larger hospitals tend to see more patients in poor condition, which means that the patients will tend to require a longer stay. 3.65 (a) A scatterplot, with the regression line, is shown below.

Year

Farm

pop

ulat

ion

(mill

ions

of

pers

ons)

198019701960195019401930

35

30

25

20

15

10

5

The least-squares regression line for predicting farm y = population from the explanatory variable x = year is . (b) The farm population decreased on average by about 0.59 million (590,000) people per year. About 97.7% of the variation in the farm population is accounted for by the linear relationship with year. (c) The predicted farm population for the year 2010 is −12,538,000; clearly impossible, as population must be greater than or equal to zero.

ˆ 1166.93 0.5868y = − x

3.66 (a) Who? The individuals are students at a large state university. What? The variables are the number of first-year students and the number of students who enroll in elementary mathematics courses. Both variables are quantitative and take on integer values from several hundred to several thousand, depending on the size of the university. Why? The data were collected to try to predict the number of students who will enroll in elementary mathematics

Examining Relationships 79

Page 25: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

courses. When, where, how, and by whom? Faculty members in the mathematics department at a large state university obtained the enrollment data and class sizes from 1993 to 2000. These data were probably extracted from a historical data base in the Registrar’s office. A scatterplot, with the regression line, is shown below.

Number of first-year students

Num

ber

of e

lem

enta

ry m

ath

stud

ents

4900480047004600450044004300420041004000

7750

7500

7250

7000

6750

6500

The regression line appears to provide a reasonable fit. About 69.4% of the variation in enrollments for elementary math classes is accounted for by the linear relationship with the number of first-year students. (b) The residual plots are shown below.

Number of first-year students

Res

idua

l

4900480047004600450044004300420041004000

300

200

100

0

-100

-200

0

Year

Res

idua

l

20001999199819971996199519941993

300

200

100

0

-100

-200

The plot of the residuals against x shows that a somewhat different line would fit the five lower points well. The three points above the regression line represent a different relation between the number of first-year students and mathematics enrollments. The plot of the residuals against year clearly illustrates that the five negative residuals are from the years 1993 to 1997, and the three positive residuals are from 1998, 1999, and 2000. (c) The change in requirements was not visible on the scatterplot in part (a) or the plot of the residuals against x. However, the change is clearly illustrated (negative residuals before 1998 and positive residuals after 1998) on the plot of the residuals against year. 3.67 The correlation for individual stocks would be lower. Individual stock performances will be more variable weakening the relationship. 3.68 A scatterplot, with both regression lines, is shown below. A scatterplot with a circle around the point from 1986 with the largest residual is shown in the solution to Exercise 3.56.

80 Chapter 3

Page 26: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

U.S. (% return)

Ove

rsea

s (%

ret

urn)

403020100-10-20-30

70

60

50

40

30

20

10

0

-10

-20

As the scatterplot shows, the point from 1986 is not very influential on the regression line. The two regression lines are: (with all points) and ˆ 5.694 0.6201y = + x ˆ 4.141 0.5885y x= + (without the point in 1986). (b) The residual plot below, for all of the points, does not show any unusual patterns, although the large residual is clearly visible.

Year

Res

idua

l

2000199519901985198019751970

60

50

40

30

20

10

0

-10

-20

-30

3.69 (a) Yes, but the relationship is not very strong. (b) The mortality rate is extremely variable for those hospitals that treat few heart attacks. As the number of patients treated increases the variability decreases and the mortality rate appears to decrease giving the appearance of an exponentially decreasing pattern of points in the plot. The nonlinearity strengthens the conclusion that heart attack patients should avoid hospitals that treat few heart attacks. 3.70 (a) A scatterplot, with both regression lines, is below.

Round 1

Rou

nd 2

10510095908580

95

90

85

80

75

The influential observation (circled) is observation 7, (105, 89). (b) The line with the larger slope is the line that omits the influential observation (105, 89). The influential point pulls the regression line with all of the points downward in order to minimize the overall prediction error.

Examining Relationships 81

Page 27: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

3.71 Age is a lurking variable. We would expect both variables, shoe size and reading comprehension score, to increase as the child ages. 3.72 (a) A scatterplot, with the two unusual observations marked and the three separate regression lines added, is shown below.

HbA (%)

FPG

(mg/

ml)

20.017.515.012.510.07.55.0

350

300

250

200

150

100

#15

#18

(b) The correlations are: (all observations); (without Subject 15);

(without Subject 18). Both outliers change the correlation. Removing subject 15 increases r, because its presence makes the scatterplot less linear, while removing Subject 18 decreases r, because its presence decreases the relative scatter about the linear pattern. (c) The three regression lines shown in the scatterplot above are:

1 0.4819r 2 0.5684r

3 0.3837r

ˆ 66.4 10.4y x+ (all observations); ˆ 69.5 8.92y + x x (without #15); ˆ 52.3 12.1y + (without #18). While the equation changes in

response to removing either subject, one could argue that neither one is particularly influential, as the line moves very little over the range of x (HbA) values. Subject #15 is an outlier in terms of its y value; such points are typically not influential. Subject #18 is an outlier in terms of its x value, but is not particularly influential because it is consistent with the linear pattern suggested by the other points. 3.73 (a) Who? The individuals are land masses. What? The two quantitative variables are the amount of snow cover (in millions of square kilometers) and summer wind stress (in newtons per square meter). Why? The data were collected to explore a possible effect of global warming. When, where, how, and by whom? The data from Europe and Asia appear to be collected over a 7 year period during the months of May, June, and July. The amount of snow cover may have been estimated from arial photographs or satellite images and the summer wind stress measurements may have been collected by meteorologists. The scatterplot below suggests a negative linear association, with correlation = −0.9179. r

82 Chapter 3

Page 28: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Snowcover (millions of square kilometers)

Win

d st

ress

(ne

wto

ns p

er s

quar

e m

eter

)

30252015105

0.20

0.15

0.10

0.05

0.00

The regression line for predicting y = wind stress from x = snow cover is ;

. The linear relationship explains 84.3% of the variation in wind stress. We have good evidence that decreasing snow cover is strongly associated with increasing wind stress. (b) The graph shows 3 clusters of 7 points.

ˆ 0.212 0.0056y x−2 0.843r =

3.74 The sketch below shows two clusters of points, each with a positive correlation. The top cluster represents economists employed by business firms and the bottom cluster represents economists employed by colleges and universities. When the two clusters are combined into one large group of economists, the overall correlation is negative.

3.75 (a) In the scatterplot below right-hand points are filled circles; left-hand points are open circles.

Distance

Tim

e (m

illis

econ

ds)

350300250200150100500

400

350

300

250

200

150

100

(b) The right-hand points lie below the left-hand points. (This means the right-hand times are shorter, so the subject is right-handed.) There is no striking pattern for the left-hand points; the pattern for right-hand points is obscured because they are squeezed at the bottom of the plot. (c) The regression line for the right hand is ˆ 99.4 0.0283y x+ (r = 0.305, r2 = 9.3%). The regression line for the left hand is (r = 0.318, rˆ 172 0.262y + x 2 = 10.1%). The left-hand regression is

Examining Relationships 83

Page 29: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

slightly better, but neither is very good: distance accounts for only 9.3% (right) and 10.1% (left) of the variation in time. 3.76 The two residual plots are shown below; neither shows a systematic pattern.

Time order

Res

idua

l for

left

han

d

20151050

150

100

50

0

-50

-100

0

Time order

Res

idua

l for

rig

ht h

and

20151050

20

15

10

5

0

-5

-10

0

CASE CLOSED (1) A scatterplot is shown below. The average number of home runs hit per game decreases from 1960 to 1970, then levels off before increasing from about 1980 to 2000. The correlation is 0.466, which indicates a moderate positive association.

Year

Ave

rage

num

ber

of H

Rs

per

gam

e

20001990198019701960

2.4

2.2

2.0

1.8

1.6

1.4

1.2

1.0

(2) A scatterplot below, with the regression line, shows a moderately strong linear association between average home runs per game and year after Rawlings became the supplier. The correlation is 0.732.

Year

Ave

rage

num

ber

of H

Rs

per

gam

e

20001995199019851980

2.4

2.2

2.0

1.8

1.6

1.4

1.2

(3) The least-squares regression line is ˆ 61.09 0.0316y x− + . The slope (0.0316) indicates the average increase in the average number of home runs as year increases by one. The intercept has no practical meaning in this setting. (4) The residual plot suggests that the regression line

84 Chapter 3

Page 30: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

provides a surprisingly reasonable fit. However, all of the residuals after 1995 are positive so a model with some curvature would fit better.

Year

Res

idua

l

20001995199019851980

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

-0.3

-0.4

-0.5

0

(5) , which indicates that about 54% of the variation in the average number of home runs per game is accounted for by the linear relationship with year. In other words, about 46% of the variation is not explained by the least-squares regression line. (6) The predicted value for 2001 is about 2.16. This estimate is probably not very accurate. In particular, since the residuals are positive for all years after 1995, this estimate is likely to be too low. (7) The prediction error is 2.092−2.16 = −0.068. The estimate is not bad, and it even overestimated the average number of home runs per game. (8) No, these data should not be used to predict the mean number of home runs per game in 2020. This case study has illustrated that patterns can change over time so we have no data to help use predict what might happen 20 years in the future. We should not use the regression line to extrapolate.

2 0.536r =

3.77 Seriousness of the fire is a lurking variable: more serious fires need more attention. It would be more accurate to say that a large fire “causes” more firefighters to be sent, rather than vice versa. 3.78 (a) Two mothers are 57 inches tall; their husbands are 66 and 67 inches tall. (b) The tallest fathers are 74 inches tall; there are three of them, and their wives are 62, 64, and 67 inches tall. (c) There is no clear explanatory variable; either could go on the horizontal axis. (d) Positive association means that when one parent is short (tall) the other parent also tends to be short (tall). In other words, there is a direct association between the heights of parents. We say the association is weak because there is a considerable amount of variation (or scatter) in the points. 3.79 (a) A scatterplot, with the regression line, is shown below. There is a negative association between alcohol consumption and heart disease.

Examining Relationships 85

Page 31: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Alcohol from wine (liters per year)

Hea

rt d

isea

se d

eath

rat

e (p

er 1

00,0

00)

9876543210

300

250

200

150

100

50

(b) The regression equation for predicting y = heart disease death rate from x = alcohol consumption is . The slope provides an estimate for the average decrease (slope is negative) in the heart disease death rate for a one liter increase in wine consumption. Thus, for every extra liter of alcohol consumed, the heart disease death rate decreases on average by about 23 per 100,000. The intercept provides an estimate for the average death rate (261 per 100,000) when no wine is consumed. (c) The correlation is r = −0.843, which indicates a strong negative association between wine consumption and heart disease death rate. , so 71% of the variation in death rate is accounted for by the linear relationship with wine consumption. (d) The predicted heart disease death rate is

ˆ 260.56 22.969y − x

2 0.71r =

ˆ 260.56 22.969 4 168.68y − × . (e) No. Positive r indicates that the least-squares line must have positive slope, negative r indicates that it must have negative slope. The direction of the association and the slope of the least-squares line must

always have the same sign. Recall y

x

sb r

s⎛ ⎞

= ⎜⎝ ⎠

⎟ and the standard deviations are always

nonnegative. 3.80 (a) The point at the far left of the plot (Alaska) and the point at the extreme right (Florida) are unusual. Alaska may be an outlier because its cold temperatures discourage older residents from remaining in the state. Florida is unusual because many individuals choose to retire there. (b) The linear association is positive, but very weak. (c) The outliers tend to suggest a stronger linear trend than the other points and will be influential on the correlation. Thus, the correlation with the outliers is , and the correlation without the outliers is0.267r = 0.067r = . 3.81 (a) A scatterplot, with the regression line, is shown below.

Speed (feet per second)

Step

s pe

r se

cond

2221201918171615

3.6

3.5

3.4

3.3

3.2

3.1

3.0

86 Chapter 3

Page 32: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

(b) There is a very strong positive linear relationship, 0.999r = . (c) The regression line for predicting y = steps per second from x = running speed is ˆ 1.7661 0.0803y x+ . (d) Yes,

so 99.8% of the variation in steps per second is explained by the linear relationship with speed. (e) No, the regression line would change because the roles of x and y are reversed. However, the correlation would stay the same, so would also stay the same.

2 0.998r =

2r 3.82 The correlation for the individual runners would be lower because there is much more variation among the individuals. The variation in the average number of steps for the group is smaller so the regression line does a great job for the published data. 3.83 (a) One possible measure of the difference is the mean response: 106.2 spikes/second for pure tones and 176.6 spikes/second for monkey calls—an average of an additional 70.4 spikes/second. (b) A scatterplot, with the regression line ˆ 93.9 0.778y x+ , is shown below.

Pure tone response (spikes/second)

Mon

key

call

resp

onse

(sp

ikes

per

sec

ond)

5004003002001000

500

400

300

200

100

0

The third point (pure tone 241, call 485 spikes/second) has the largest residual; it is circled. The first point (474 and 500 spikes/second) is an outlier in the x direction; it is marked with a square. (c) The correlation drops only slightly (from 0.6386 to 0.6101) when the third point is removed; it drops more drastically (to 0.4793) without the first point. (d) Without the first point, the regression line is ; without the third point, it isˆ 101 0.693y + x ˆ 98.4 0.679y x+ . 3.84 (a) In the mid-1990s, European and American stocks were only weakly linked, but now it is more common for them to rise and fall together. Thus investing in both types of stocks is not that much different from investing in either type alone. (b) The article is incorrect; a correlation of 0.8 means that a straight-line relationship explains about 64% of the variation in European stock prices.

3.85 The slope is 2.70.5 0.542.5

b ⎛ ⎞= =⎜ ⎟⎝ ⎠

. The regression line, shown below, for predicting y =

husband’s height from x = wife’s height is ˆ 33.67 0.54y x= + .

Examining Relationships 87

Page 33: 56 Chapter 3 Chapter 3 - Green Local Schools 3...Examining Relationships 57 3.10 (a) A scatterplot with mass as the explanatory variable is shown below. Mass (kg) Ra t e (c a l) 30

Woman's height (inches)

Hus

band

's h

eigh

t (i

nche

s)

706050403020100

70

60

50

40

30

The predicted height is inches. ˆ 33.67 0.54 67 69.85y = + × = 3.86 Who? The individuals are the essays provided by students on the new SAT writing test. What? The variables are the word count (length of essay) and score. Both variables are quantitative and take on integer values. Why? The data were collected to investigate the relationship between length of the essay and score. When, where, how, and by whom? The data were collected after the first administration of the new SAT writing test in March, 2005. Dr. Perelman may have obtained the data from the Educational Testing Service or from colleagues who scored the essays. Graphs: The scatterplot below, with the regression line included, shows a relationship between length of the essay and score, but the relationship appears to be nonlinear. The residual plot also shows a clear pattern, so using the least-squares regression line to predict score from length of essay is not a good idea.

Number of words in essay

Scor

e of

ess

ay

7006005004003002001000

9

8

7

6

5

4

3

2

1

0

Length of essay (number of words)

Res

idua

l

7006005004003002001000

1.0

0.5

0.0

-0.5

-1.0

-1.5

-2.0

-2.5

0

Numerical summaries: The correlation between word count and score is 0.881. The least squares regression line for predicting y = score from x = word count is ˆ 1.1728 0.0104y + . This line accounts for about 77.5% of the variation in score. Interpretation: Even though the scatterplot shows a moderately strong positive association between length of the essay and score, we do not want to jump to conclusions about the nature of this relationship. Better students tend to give more thorough explanations so there could be another reason why the longer essays tend to get high scores. In fact, a careful look at the scatterplot reveals considerably more variation in the length of the essays for students who received a score of 4, 5, or 6. If Dr. Perelman’s made his second conclusion about being right over 90% of the time by rounding the correlation coefficient from 0.88 to 0.9, then he made a serious mistake with his interpretation of the correlation coefficient. If scores were assigned by simply sorting the word counts from smallest to largest, the error rate would be much larger than 10%.

88 Chapter 3