Chapter 11 Regression Analysis

73
1 Doing Doing Statistics for Statistics for Business Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Chapter 11 Regression Regression Analysis Analysis

description

Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer. Chapter 11 Regression Analysis. Doing Statistics for Business. Chapter 11 Objectives - PowerPoint PPT Presentation

Transcript of Chapter 11 Regression Analysis

Page 1: Chapter 11 Regression Analysis

1

DoingDoing Statistics for BusinessStatistics for Business Data, Inference, and Decision Making

Marilyn K. PelosiTheresa M. Sandifer

Chapter 11Chapter 11Regression Regression

AnalysisAnalysis

Page 2: Chapter 11 Regression Analysis

2

Doing Statistics for Business

Chapter 11 Objectives

Find the linear regression equation for a dependent variable Y as a function of a single independent variable X

Determine whether a relationship between X and Y exists

Analyze the results of a regression analysis to determine whether the simple linear model is appropriate

Page 3: Chapter 11 Regression Analysis

3

Doing Statistics for Business

Figure 11.1 Deterministic Relationship Between Total Order Cost and Number of Items Ordered

Page 4: Chapter 11 Regression Analysis

4

Doing Statistics for Business

Figure 11.2 Statistical Relationship Between Revenue and Advertising Expenditures

Page 5: Chapter 11 Regression Analysis

5

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Plotting Data to Look at the RelationshipAn oil company is trying to determine how the number of

refining sites available for refining crude oil relates to the

overall refining capacity. It would use this information to determine

whether or not expansion will provide the increase in capacity that it

wants or whether others steps to increase capacity will be necessary.

The company collects data on other competitive companies and finds the

following:

Page 6: Chapter 11 Regression Analysis

6

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Plotting Data to Look at the Relationship

(con’t) The Economist (July 15, 1995)

Oil Company # SitesRefining Capacity(m tons per year)

Royal Dutch/Shell 13 81.82Exxon 10 81.82Agip 13 58.18BP 8 43.64

Repsol 5 40.00Total 7 36.36

Turkish Petroleum 4 34.55Elf 8 32.73

Mobil 7 29.09Petrofina 3 25.45

Page 7: Chapter 11 Regression Analysis

7

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Plotting Data to Look at the Relationship

(con’t)

Use a grid to create a scatter plot of of the data.

Do you think that a linear model is a good one?

Page 8: Chapter 11 Regression Analysis

8

Doing Statistics for Business

The true relationship between the variablesX and Y, the Simple Linear RegressionModel, can be described by the equation

y = 0 + 1x +

Page 9: Chapter 11 Regression Analysis

9

Doing Statistics for Business

True regression liney = + x

x1

y1 = + x1

y2 = + x2

x2

Figure 11.3 The True Regression Model Showing how Y Varies for a Given Value of X

Page 10: Chapter 11 Regression Analysis

10

Doing Statistics for Business

Figure 11.4 Straight Line Approximating the Relationship Between Advertising and Revenue

Page 11: Chapter 11 Regression Analysis

11

Doing Statistics for Business

Figure 11.5 A Single Criterion Can Produce ManyDifferent Lines

Page 12: Chapter 11 Regression Analysis

12

Doing Statistics for Business

The distance between the predicted value ofY, and the actual value of Y, , is called thedeviation or error.

y

Page 13: Chapter 11 Regression Analysis

13

Doing Statistics for Business

Least Squares Line

0

5

10

15

20

25

30

35

0 1 2 3 4 5 6

X

Y

Deviationbetween thedata point andthe line

Figure 11.6 Deviations Between the Data Points and the Line

Page 14: Chapter 11 Regression Analysis

14

Doing Statistics for Business

The technique that finds the equation ofthe line that minimizes the total or sum ofthe squared deviations between the actualdata points and the line is called the leastsquares method.

Page 15: Chapter 11 Regression Analysis

15

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Finding the Equation of the Least-Squares Regression LineThe oil company that is looking at increasing refining capacity has

decided that a linear relationship is appropriate. Fill in the table shown

on the following slide or use some other means to find the equation of

the least-squares line:

Page 16: Chapter 11 Regression Analysis

16

Doing Statistics for Business

TRY IT NOW!Increasing CapacityFinding the Equation of the Least-Squares Regression Line (con’t)

Obs. # # Sites (X) Capacity(Y)

XY X2

1 13 81.822 10 81.823 13 58.184 8 43.645 5 40.006 7 36.367 4 34.558 8 32.739 7 29.09

10 3 25.45Totals

Page 17: Chapter 11 Regression Analysis

17

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Finding the Equation of the Least-Squares Regression Line(con’t)

Interpret the meaning of the estimate of the slope of the line. Does the y intercept make sense for these data?

Page 18: Chapter 11 Regression Analysis

18

Doing Statistics for Business

The value of that we find is really aprediction of the mean value of Y for a given value of X.

y

Page 19: Chapter 11 Regression Analysis

19

Doing Statistics for Business

Using the equation to predict values of Ywithin the range of the X data is calledinterpolation. Predicting values of forvalues of X outside the observed range is called extrapolation.

Page 20: Chapter 11 Regression Analysis

20

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Using the Regression Equation to Predict the Value of YUse the equation of the regression line you found earlier to predict the

refining capacity for each of the observed values of X, the number of

sites.

Page 21: Chapter 11 Regression Analysis

21

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Using the Regression Equation to Predict the Value of Y (con’t)

Obs. ## Sites (X)Capacity (Y) y = 9.03 + 4.79 x1 13 81.822 10 81.823 13 58.184 8 43.645 5 40.006 7 36.367 4 34.558 8 32.739 7 29.0910 3 25.45

Page 22: Chapter 11 Regression Analysis

22

Doing Statistics for Business

The difference between the observed valueof Y (y), and the predicted value of Y fromthe regression equation ( i), for a value of X = x, is called the ith residual, ei.

y

Page 23: Chapter 11 Regression Analysis

23

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating the ResidualsThe oil company that is looking at the relationship between

refining capacity and the number of refining sites wants to get a better

idea of how the regression line relates to the actual data. It decides to

calculate the residuals for each observed value of X, the number of sites.

Find the residuals and fill in the table found on the following slide:

Page 24: Chapter 11 Regression Analysis

24

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating the Residuals (con’t)Oil Refineries: The Economist July 15, 1995

Obs. ## Sites(X)

Refining Capacity(Y)

yI = 9.03 + 4.79x ei = yi -yi

1 13 81.82 71.32 10 81.82 56.93 13 58.18 71.34 8 43.64 47.45 5 40.00 33.06 7 36.36 42.67 4 34.55 28.28 8 32.73 47.49 7 29.09 42.610 3 25.45 23.4

Page 25: Chapter 11 Regression Analysis

25

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating the Residuals (con’t)

To get a picture of how the residuals and the regression line fit together,

the company also decides to graph the regression line on a plot of the

data.

Graph the regression line on the data plot. How well do you think the line

represents the data?

Page 26: Chapter 11 Regression Analysis

26

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating the Residuals (con’t)

Page 27: Chapter 11 Regression Analysis

27

Doing Statistics for Business

The standard error of the estimate, syx

is a measure of how much the data vary around the regression line.

Page 28: Chapter 11 Regression Analysis

28

Doing Statistics for Business

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.879773847R Square 0.774002022Adjusted R Square 0.745752275Standard Error 0.65297731Observations 10

Regression Analysis

The regression equation isRev $bn = 0.784 + 1.14 Members (m)

Predictor Coef StDev T PConstant 0.7845 0.5050 1.55 0.159Members 1.1439 0.2185 5.23 0.000

S = 0.6530 R-Sq = 77.4% R-Sq(adj) = 74.6%

Figure 11.8

Computer Output Showing the Standard Error of the Estimate

Excel Output

Minitab Output

Page 29: Chapter 11 Regression Analysis

29

Doing Statistics for Business

Figure 11.9 (a) Line with non-zero slope

(b) Line with zero slope

(a) (b)

Page 30: Chapter 11 Regression Analysis

30

Doing Statistics for Business

Figure 11.10

t-test Portion of Computer OutputMinitab

Excel

Page 31: Chapter 11 Regression Analysis

31

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Testing for Significance of theRegression ModelThe oil company that is looking at increasing capacity wants to determine

whether the relationship between refining capacity and number of refining

sites that it calculated is significant.

Write down the hypotheses that the company needs to test.

Page 32: Chapter 11 Regression Analysis

32

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Testing for Significance of theRegression Model (con’t)

The company decides to use a 0.01 level of significance for the test. Findthe critical values for the test.

It used a computer software package to run the analysis and obtained the

following output:

Page 33: Chapter 11 Regression Analysis

33

Doing Statistics for Business

TRY IT NOW!Increasing CapacityTesting for Significance of theRegression Model (con’t)

From the computer output, find the slope of the regression line, thestandard error of the slope, and the value of the t statistic.

Perform the hypothesis test and make a decision about the regression line.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 9.028295455 11.04587353 0.817345539 0.437393793 -16.44355105 34.50014196# Sites 4.786628788 1.307226853 3.661666509 0.006385827 1.77215631 7.801101266

Page 34: Chapter 11 Regression Analysis

34

Doing Statistics for Business

TRY IT NOW!Increasing CapacityTesting for Significance of theRegression Model (con’t)

Find the p value of the test from the output and explain how you could use the p value on the output to make the same decision.

Once we have determined that the relationship between X and Y is significant, we can perform some additional analyses to see if thepredictions we obtain are useful for the purposes of decision making and to determine the strength of the relationship.

Page 35: Chapter 11 Regression Analysis

35

Doing Statistics for Business

}SSTSSE

SSR

{{

yi

X

Y

y

y= bbx

Figure 11.11 Components of the Variation in y Value

Page 36: Chapter 11 Regression Analysis

36

Doing Statistics for Business

ANOVAdf SS MS F Significance F

Regression 1 37.7124744 37.7124744 182.6413924 1.9909E-12Residual 23 4.749125596 0.206483722Total 24 42.4616

Analysis of Variance

Source DF SS MS F PRegression 1 37.712 37.712 182.64 0.000Error 23 4.749 0.206Total 24 42.462

Excel Output

Minitab Output

Figure 11.12 Computer ANOVA Output for Regression Analysis

Page 37: Chapter 11 Regression Analysis

37

Doing Statistics for Business

A Confidence Interval provides an estimate for the mean value of Y (yx) at a particular value of X.

Page 38: Chapter 11 Regression Analysis

38

Doing Statistics for Business

Regression Line with Confidence Intervals

Y

X

Figure 11.13

Confidence Interval for the Mean Estimate

Page 39: Chapter 11 Regression Analysis

39

Doing Statistics for Business

TRY IT NOW!Increasing CapacityFinding Confidence Intervals for theMean Predicted ValueAfter calculating the regression model and deciding that the model issignificant, the analysts at the oil company would like to know about theaccuracy of the estimates from the model. They decide to calculate 95%confidence intervals for X = 8 and 13 sites. They know from previous

work that for the set of 10 observations in the model, syx = 13.43,

x = 78, and x 2 = 714.

Page 40: Chapter 11 Regression Analysis

40

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Finding Confidence Intervals for theMean Predicted Value (con’t)

Find 95% confidence intervals for the mean estimates.

Do you think that these estimates would be useful for planning purposes?Why or why not?

Page 41: Chapter 11 Regression Analysis

41

Doing Statistics for Business

A Prediction Interval gives an estimatefor an individual value of Y at a particularvalue of X.

Page 42: Chapter 11 Regression Analysis

42

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating Prediction Intervals for Regression EstimatesThe oil company analysts decide to calculate 95% prediction intervals

for the two X values that they are interested in. The relevant values from

the set of 10 observations are syx = 13.43, x = 78, and x 2 = 714.

Find 95% prediction intervals for X = 8 and X = 13 refining sites.

Page 43: Chapter 11 Regression Analysis

43

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating Prediction Intervals for Regression Estimates (con’t)

Do you think that confidence intervals or planning intervals would be

more appropriate for the oil company’s purpose?

Page 44: Chapter 11 Regression Analysis

44

Doing Statistics for Business

The Correlation Coefficient is used as a measure of the strength of a linear relation-ship. A correlation of – 1 corresponds to aperfect negative relationship, a correlation of 0 corresponds to no relationship, and acorrelation of +1 corresponds to a perfectpositive relationship.

Page 45: Chapter 11 Regression Analysis

45

Doing Statistics for Business

Perfect Negative

No Relationship

Perfect Positive

Figure 11.14 3 Types of Relationships: Perfect Negative, No Relationship, and

Perfect Positive

Page 46: Chapter 11 Regression Analysis

46

Doing Statistics for Business

TRY IT NOW!Increasing Capacity

Calculating the Correlation CoefficientThe relevant data to calculate the correlation coefficient for

the oil company problem are

n = 10 x = 78 y = 463.654 xy = 4121.86

x 2 = 714 y2 = 25,359.3224

Find the correlation coefficient for the data.

Page 47: Chapter 11 Regression Analysis

47

Doing Statistics for BusinessResidual Plots

Res

idua

ls

X

-1

-2

-3

-4

-5

-6

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Residual Plots

Res

idua

ls

X

-2.5

0.0

2.5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Residual Plot

Res

idua

ls

X

-2.5

0.0

2.5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Residual Plots

Res

idua

lsX

-10

-20

-30

-40

-50

0

10

20

30

40

50

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Figure 11.15 Examples of Residual Plots

Page 48: Chapter 11 Regression Analysis

48

Doing Statistics for BusinessHistogram of Residuals

Fre

quen

cy

OK Resids

0

1

2

3

4

5

-6 -4.8 -3.6 -2.4 -1.2 0 1.2 2.4 3.6 4.8 6

Histogram of Residuals

Fre

quen

cy

Std. Nonlin

0

1

2

3

4

5

6

-2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2

Histogram of Residuals

Fre

quen

cy

Funnel

0

1

2

3

4

-100 -75 -50 -25 0 25 50 75 100 125 150

Histogram of Residuals

Fre

quen

cy

Bow

0

1

2

3

4

5

-50 -40 -30 -20 -10 0 10 20 30 40 50

Figure 11.16 Histograms of Residuals

Page 49: Chapter 11 Regression Analysis

49

Doing Statistics for Business

A Normal Probability Plot is a plot of theordered data against their expected valuesunder a normal distribution. When data arenormally distributed, the plot will be a straight line.

Page 50: Chapter 11 Regression Analysis

50

Doing Statistics for Business

6420-2-4-6

876543210

Residual

Fre

quen

cy

Histogram of Residuals

20100

10

0

-10

Observation Number

Res

idua

l

I Chart of Residuals

X=-0.08434

3.0SL=10.25

-3.0SL=-10.42

20100

5

0

-5

Fit

Res

idua

lResiduals vs. Fits

210-1-2

5

0

-5

Normal Plot of Residuals

Normal Score

Res

idua

l

Residual Model Diagnostics

100806040200

9876543210

Residual

Fre

quen

cy

Histogram of Residuals

20100

100

50

0

Observation Number

Res

idua

l

I Chart of Residuals

1

5

6

1

6 2 22

22

11

X=66.50

3.0SL=106.7

-3.0SL=26.26

20100

100

50

0

Fit

Res

idua

l

Residuals vs. Fits

210-1-2

100

50

0

Normal Plot of Residuals

Normal Score

Res

idua

l

Residual Model Diagnostics

1251007550250-25-50-75

5

4

3

2

1

0

Residual

Fre

quen

cy

Histogram of Residuals

20100

100

0

-100

Observation Number

Res

idua

l

I Chart of Residuals

X=6.937

3.0SL=145.2

-3.0SL=-131.3

20100

100

0

-100

Fit

Res

idua

l

Residuals vs. Fits

210-1-2

100

0

-100

Normal Plot of Residuals

Normal Score

Res

idua

l

Residual Model Diagnostics

50403020100-10-20-30-40-50

4

3

2

1

0

ResidualF

requ

ency

Histogram of Residuals

20100

100

0

-100

Observation Number

Res

idua

l

I Chart of Residuals

6

X=-5.544

3.0SL=80.46

-3.0SL=-91.55

20100

50403020100

-10-20-30-40-50

Fit

Res

idua

l

Residuals vs. Fits

210-1-2

50403020100

-10-20-30-40-50

Normal Plot of Residuals

Normal Score

Res

idua

l

Residual Model Diagnostics

Figure 11.17 Regression Diagnostic Plots

Page 51: Chapter 11 Regression Analysis

51

Doing Statistics for Business

Obs Members HMORev $ Fit StDev Fit Residual St Resid 1 4.24 5.486 5.631 0.509 -0.146 -0.36 X 2 3.19 4.629 4.432 0.313 0.197 0.34 3 1.83 3.857 2.878 0.215 0.979 1.59 4 1.62 3.600 2.639 0.232 0.961 1.58 5 2.07 3.429 3.153 0.206 0.276 0.45 6 2.30 2.914 3.415 0.210 -0.501 -0.81 7 1.83 2.743 2.878 0.215 -0.136 -0.22 8 2.15 2.400 3.244 0.206 -0.844 -1.37 9 0.97 1.714 1.896 0.323 -0.182 -0.32 10 0.89 1.200 1.805 0.336 -0.605 -1.08

X denotes an observation whose X value gives it large influence.

Obs _ of Sta RadioRev Fit StDev Fit Residual St Resid 1 83 1.0500 0.4628 0.1191 0.5872 2.75R 2 57 0.3143 0.3446 0.0783 -0.0303 -0.13 3 104 0.3143 0.5583 0.1720 -0.2440 -1.40 4 35 0.3048 0.2446 0.0942 0.0602 0.27 5 21 0.2857 0.1809 0.1232 0.1048 0.50 6 63 0.2286 0.3719 0.0831 -0.1433 -0.62 7 67 0.2190 0.3901 0.0882 -0.1710 -0.75 8 41 0.2095 0.2718 0.0852 -0.0623 -0.27 9 38 0.2095 0.2582 0.0894 -0.0487 -0.21 10 20 0.1238 0.1764 0.1256 -0.0526 -0.25

R denotes an observation with a large standardized residual

Figure 11.18 Warning Output from Minitab

Page 52: Chapter 11 Regression Analysis

52

Doing Statistics for Business

Simple Linear Regression Model in Excel

1. From the list of data analysis tools, select Regression.

2. Position the cursor in the textbox labeled Input Y Range: and highlight the data range for the Y variable, in this case, Revenues.

3. Move the cursor in the textbox for Input X Range: and highlight the data range of the X variable, in this case, Members.

Page 53: Chapter 11 Regression Analysis

53

Doing Statistics for Business

Simple Linear Regression Model in Excel 4. If the data ranges contain labels, click on the Labels checkbox.

If you want confidence intervals for the regression estimates, click the checkbox for Confidence Level.

5. Specify the location where you want the output to appear, either in the current sheet, in a new worksheet, or in a new workbook.

6. Click the checkbox for Residuals. Do not check the Standardized Residuals checkbox. Excel does not calculate these values correctly.

Page 54: Chapter 11 Regression Analysis

54

Doing Statistics for Business

Simple Linear Regression Model in Excel

7. Click the checkboxes for Residual Plots and Line Fit Plots. Do not click the checkbox for Normal Probability Plot. The plot is not created correctly.

8. Click on OK. The output will appear in the location you specified.

Page 55: Chapter 11 Regression Analysis

55

Doing Statistics for Business

Figure 11.21 Completed Regression Dialog Box

Page 56: Chapter 11 Regression Analysis

56

Doing Statistics for Business

Figure 11.22 Summary Section of Regression Output

Page 57: Chapter 11 Regression Analysis

57

Doing Statistics for Business

Figure 11.23 Residual Output

Page 58: Chapter 11 Regression Analysis

58

Doing Statistics for Business

Figure 11.24 Plots from Regression Analysis

Page 59: Chapter 11 Regression Analysis

59

Doing Statistics for Business

Although Excel does perform linear regression, KaddStat can also be used for the analysis. The basic input is the same, although KaddStat has slightly different output.

From the Kadd menu select Regression and correlation > Single/Multiple. The dialog box shown in Figure 11.25 opens.

Page 60: Chapter 11 Regression Analysis

60

Doing Statistics for Business

Figure 11.25 Regression Dialog Box in KaddStat

Page 61: Chapter 11 Regression Analysis

61

Doing Statistics for Business

1. Position the cursor in the box labeled Input Range and highlight your data in the Excel worksheet.

Although nothing changes immediately, if you click on the drop down arrow in the box labeled Dependent Variable all of the variable names in the Input Range appear in the boxes for Dependent and Independent Variables as shown in Figure 11.26.

Page 62: Chapter 11 Regression Analysis

62

Doing Statistics for Business

Figure 11.26 Variable lists for regression analysis

Page 63: Chapter 11 Regression Analysis

63

Doing Statistics for Business

1. From the drop down list, select Rev $bn for the Dependent Variable.

2. Move the cursor over to the box labeled Independent Variable and from the list, click on the variable that you want to use for the independent variable, in this case, Members (m).

3. In the bottom part of the dialog box indicate which plots you want included in the output.

4. Indicate where you want the output to appear and click OK.

Page 64: Chapter 11 Regression Analysis

64

Doing Statistics for Business

The main portion of the output is shown in Figure 11.27

Page 65: Chapter 11 Regression Analysis

65

Doing Statistics for Business

The remainder of the output consists of the graphs requested and the residuals and standardized residuals shown in Figure 11.28

Page 66: Chapter 11 Regression Analysis

66

Doing Statistics for Business

Page 67: Chapter 11 Regression Analysis

67

Doing Statistics for Business

Kadd will calculate the predicted values for the data points, or for any other x values.

Click on the box labeled Forecast and the dialog box will open.

Page 68: Chapter 11 Regression Analysis

68

Doing Statistics for Business

Page 69: Chapter 11 Regression Analysis

69

Doing Statistics for Business

Place the cursor in the Forecast Data Range box and highlight the location of the values of the independent variable for which you want predictions.

Indicate where you want the output located Click OK

Page 70: Chapter 11 Regression Analysis

70

Doing Statistics for Business

Page 71: Chapter 11 Regression Analysis

71

Doing Statistics for Business

Chapter 11 Summary

In this chapter you have learned: Linear regression analysis is a powerful tool for

determining how two variables are related. The regression equation can be used for:

Description - used when you are simply

trying to understand the way that two

variables are related.

Page 72: Chapter 11 Regression Analysis

72

Doing Statistics for Business

Chapter 11 Summary (con’t)

Control - describes when the model is used

to set standards or reduce variability.

Predictability - describes when the model is

used to determine what the resulting Y value

should be when X takes on certain values. Although the simple linear model may be

significant, it might not be correct.

Page 73: Chapter 11 Regression Analysis

73

Doing Statistics for Business

Chapter 11 Summary (con’t)

It is necessary to test the Assumptions of the linear model to see whether the model you obtain is appropriate.