Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares...

43
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least- Squares Regression Model and Multiple Regression 14

Transcript of Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares...

Page 1: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Chapter

Inference on the Least-Squares Regression Model and Multiple Regression

14

Page 2: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Requirement 1 for Inference on the Least-Squares Regression Model

For any particular value of the explanatory variable x, the mean of the corresponding responses in the population depends linearly on x. That is,

for some numbers β0 and β1, where μy|x represents the population mean response when the value of the explanatory variable is x.

14-2

y|x

1x

0

Page 3: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Requirement 2 for Inference on the Least-Squares Regression Model

The response variables are normally distributed with mean and standard deviation σ.

14-3

y|x

1x

0

Page 4: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

“In Other Words”

When doing inference on the least-squares regression model, we require (1) for any explanatory variable, x, the mean of the response variable, y, depends on the value of x through a linear equation, and (2) the response variable, y, is normally distributed with a constant standard deviation, σ. The mean increases/decreases at a constant rate depending on the slope, while the standard deviation remains constant.

14-4

Page 5: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.14-5

A large value of σ, the population standard deviation, indicates that the data are widely dispersed about the regression line, and a small value of σ indicates that the data lie fairly close to the regression line.

Page 6: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

The least-squares regression model is given by

whereyi is the value of the response variable for the ith

individualβ0 and β1 are the parameters to be estimated based on

sample dataxi is the value of the explanatory variable for the ith

individualεi is a random error term with mean 0 an variance

, the error terms are independent.i =1,…,n, where n is the sample size (number of

ordered pairs in the data set)

1 0i i iy x

2 2

i

14-6

Page 7: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

The standard error of the estimate, se, is found using the formula

2 2ˆ residuals

2 2i i

e

y ys

n n

14-7

Page 8: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.14-8

Page 9: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

CAUTION!

Be sure to divide by n – 2 when computing the standard error of the estimate.

14-9

Page 10: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.14-10

Page 11: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Hypothesis Test Regarding the Slope Coefficient, β1

To test whether two quantitative variables are linearly related, we use the following steps provided that

1. the sample is obtained using random sampling.2. the residuals are normally distributed with

constant error variance.14-11

Page 12: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Step 1: Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways:

Step 2: Select a level of significance, α, depending on the seriousness of making a Type I error.

Two-tailed Left-Tailed Right-Tailed

H0: β1 = 0 H0: β1 = 0 H0: β1 = 0

H1: β1 ≠ 0 H1: β1 < 0 H1: β1 > 0

14-12

Page 13: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Step 3: Compute the test statistic

which follows Student’s t-distribution with n – 2 degrees of freedom. Remember, when computing the test statistic, we assume the null hypothesis to be true. So, we assume that β1 = 0. Use Table VI to determine the critical value using n – 2 degrees of freedom.

14-13

t0

b1

1

sb1

b

1

sb1

Classical Approach

Page 14: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Step 4: Compare the critical value with the test statistic.

Classical Approach

14-14

Page 15: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

By Hand Step 3: Compute the test statistic

which follows Student’s t-distribution with n – 2 degrees of freedom. Use Table VI to approximate the P-value.

14-15

t0

b1

1

sb1

b

1

sb1

P-Value Approach

Page 16: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Step 4: If the P-value < α, reject the null hypothesis.

P-Value Approach

14-16

Page 17: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

CAUTION!

Before testing H0: β1 = 0, be sure to draw a residual plot to verify that a linear model is appropriate.

14-17

Page 18: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Parallel Example 5: Testing for a Linear Relation

Test the claim that there is a linear relation between drill depth and drill time at the α = 0.05 level of significance using the drilling data.

14-18

Page 19: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Verify the requirements:• We assume that the experiment was randomized so

that the data can be assumed to represent a random sample.

• In Parallel Example 4 we confirmed that the residuals were normally distributed by constructing a normal probability plot.

• To verify the requirement of constant error variance, we plot the residuals against the explanatory variable, drill depth.

14-19

Page 20: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

There is no discernable pattern.

14-20

Page 21: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Step 1: We want to determine whether a linear relation exists between drill depth and drill time without regard to the sign of the slope. This is a two-tailed test with

H0: β1 = 0 versus H1: β1 ≠ 0

Step 2: The level of significance is α = 0.05.

Step 3: Using technology, we obtained an estimate of β1 in Parallel Example 2, b1= 0.0116. To determine the standard deviation of b1, we compute .

The calculations are on the next slide.

x i x 2

14-21

Page 22: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.14-22

Page 23: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Step 3, cont’d: We have

The test statistic is

sb1

se

x i x 2

0.5197

30006.250.0030

t0 b1

sb1

0.0116

0.0033.867

14-23

Page 24: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution: Classical Approach

Step 3: cont’d Since this is a two-tailed test, we determine the critical t-values at the α = 0.05 level of significance with n – 2 = 12 – 2 = 10 degrees of freedom to be –t0.025 = –2.228 andt0.025 = 2.228.

Step 4: Since the value of the test statistic, 3.867, is greater than 2.228, we reject the null hypothesis.

14-24

Page 25: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution: P-Value Approach

Step 3: Since this is a two-tailed test, the P-value is the sum of the area under the t-distribution with 12 – 2 = 10 degrees of freedom to the left of –t0 = –3.867 and to the right of t0 = 3.867. Using Table VI we find that with 10 degrees of freedom, the value 3.867 is between 3.581 and 4.144 corresponding to right-tail areas of 0.0025 and 0.001, respectively. Thus, the P-value is between 0.002 and 0.005.

Step 4: Since the P-value is less than the level of significance, 0.05, we reject the null hypothesis.

14-25

Page 26: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Step 5: There is sufficient evidence at the α = 0.05 level of significance to conclude that a linear relation exists between drill depth and drill time.

14-26

Page 27: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Confidence Intervals for the Slope of the Regression Line

A (1 – α)•100% confidence interval for the slope of the true regression line, β1, is given by the following formulas:

Lower bound:

Upper bound:

Here, ta/2 is computed using n – 2 degrees of freedom. 14-27

b1 t 2

s

e

xi x 2

b1 t 2

s

e

xi x 2

Page 28: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Note: The confidence interval formula for β1 can be computed only if the data are randomly obtained, the residuals are normally distributed, and there is constant error variance.

14-28

Page 29: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Parallel Example 7: Constructing a Confidence Interval for the Slope of the True Regression Line

Construct a 95% confidence interval for the slope of the least-squares regression line for the drilling example.

14-29

Page 30: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

The requirements for the usage of the confidence interval formula were verified in previous examples.We also determined• b1 = 0.0116• in previous examples.

sb10.0030

14-30

Page 31: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Since t0.025 = 2.228 for 10 degrees of freedom, we haveLower bound = 0.0116 – 2.228 • 0.003 = 0.0049Upper bound = 0.0116 + 2.228 • 0.003 = 0.0183.

We are 95% confident that the mean increase in the time it takes to drill 5 feet for each additional foot of depth at which the drilling begins is between 0.005 and 0.018 minutes.

14-31

Page 32: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

Confidence and Prediction Intervals

14.2

Page 33: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Confidence intervals for a mean response are intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population.

Prediction intervals for an individual response are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual’s predicted value.

14-33

Page 34: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Confidence Interval for the Mean Response of y, .A (1 – α)•100% confidence interval for , the mean response of y for a specified value of x, is given by

Lower bound:

Upper bound:

ˆ y

ˆ y

ˆ y t 2 se

1

n

x* x 2

x i x 2

ˆ y t 2 se

1

n

x* x 2

x i x 2where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n – 2 degrees of freedom.

14-34

Page 35: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Parallel Example 1: Constructing a Confidence Interval for a Mean Response

Construct a 95% confidence interval about the predicted mean time to drill 5 feet for all drillings started at a depth of 110 feet.

14-35

Page 36: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

The least squares regression line is .

To find the predicted mean time to drill 5 feet for all

drillings started at 110 feet, let x*=110 in the regression

equation and obtain .

Recall:

• se=0.5197

• t0.025 = 2.228 for 10 degrees of freedom

ˆ y 0.0116x 5.5273

ˆ y 6.8033

x i x 2 30006.25

x 126.25

14-36

Page 37: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Therefore, Lower bound:

Upper bound:

6.8033 2.2280.51971

12

110 126.25 2

30006.256.45

6.8033 2.2280.51971

12

110 126.25 2

30006.257.15

14-37

Page 38: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

We are 95% confident that the mean time to drill 5feet for all drillings started at a depth of 110 feet isbetween 6.45 and 7.15 minutes.

14-38

Page 39: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Prediction Interval for an Individual Response about

A (1 – α)•100% prediction interval for , the individual response of y, is given by

Lower bound:

Upper bound:

ˆ y t 2 se 11

n

x* x 2

x i x 2

ˆ y t 2 se 11

n

x* x 2

x i x 2

where x* is the given value of the explanatory variable, n is the number of observations, and tα/2 is the critical value with n – 2 degrees of freedom.

ˆ y

14-39

öy

Page 40: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Parallel Example 2: Constructing a Prediction Interval for an Individual Response

Construct a 95% prediction interval about the predicted time to drill 5 feet for a single drilling started at a depth of 110 feet.

14-40

Page 41: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

The least squares regression line is . To find the predictedmean time to drill 5 feet for all drillings started at110 feet, let x*=110 in the regression equation andobtain .

Recall:

• se=0.5197

• •

• t0.025=2.228 for 10 degrees of freedom

ˆ y 0.0116x 5.5273

ˆ y 6.8033

x i x 2 30006.25

x 126.25

14-41

Page 42: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

Therefore, Lower bound:

Upper bound:

6.8033 2.2280.5197 11

12

110 126.25 2

30006.255.59

6.8033 2.2280.5197 11

12

110 126.25 2

30006.258.01

14-42

Page 43: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution

We are 95% confident that the time to drill 5feet for a random drilling started at a depth of 110feet is between 5.59 and 8.01 minutes.

14-43