Chapter 13 Multiple Regression

27
1 © 2007 Thomson South-Western. All Rights Reserved © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Chapter 13 Multiple Regression Multiple Regression Multiple Regression Model Multiple Regression Model Least Squares Method Least Squares Method Multiple Coefficient of Determination Multiple Coefficient of Determination Model Assumptions Model Assumptions Testing for Significance Testing for Significance Using the Estimated Regression Equation Using the Estimated Regression Equation for Estimation and Prediction for Estimation and Prediction Qualitative Independent Variables Qualitative Independent Variables Residual Analysis Residual Analysis

description

Chapter 13 Multiple Regression. Multiple Regression Model. Least Squares Method. Multiple Coefficient of Determination. Model Assumptions. Testing for Significance. Using the Estimated Regression Equation for Estimation and Prediction. Qualitative Independent Variables. - PowerPoint PPT Presentation

Transcript of Chapter 13 Multiple Regression

Page 1: Chapter 13  Multiple Regression

1 1 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Chapter 13Chapter 13 Multiple Regression Multiple Regression

Multiple Regression ModelMultiple Regression Model Least Squares MethodLeast Squares Method Multiple Coefficient of DeterminationMultiple Coefficient of Determination

Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression EquationUsing the Estimated Regression Equation

for Estimation and Predictionfor Estimation and Prediction

Qualitative Independent VariablesQualitative Independent Variables Residual AnalysisResidual Analysis

Page 2: Chapter 13  Multiple Regression

2 2 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

The equation that describes how the The equation that describes how the dependent variable dependent variable yy is related to the independent is related to the independent variables variables xx11, , xx22, . . . , . . . xxpp and an error term is called and an error term is called the the multiplemultiple regression modelregression model..

Multiple Regression ModelMultiple Regression Model

yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +

where:where:00, , 11, , 22, . . . , , . . . , pp are the are the parametersparameters, and, and is a random variable called the is a random variable called the error termerror term

Page 3: Chapter 13  Multiple Regression

3 3 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

The equation that describes how the The equation that describes how the mean value of mean value of yy is related to is related to xx11, , xx22, . . . , . . . xxpp is is called the called the multiple regression equationmultiple regression equation..

Multiple Regression EquationMultiple Regression Equation

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Page 4: Chapter 13  Multiple Regression

4 4 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

A simple random sample is used to A simple random sample is used to compute sample statistics compute sample statistics bb00, , bb11, , bb22, , . . . , . . . , bbpp that are used as the point estimators of the that are used as the point estimators of the parameters parameters 00, , 11, , 22, . . . , , . . . , pp..

Estimated Multiple Regression EquationEstimated Multiple Regression Equation

^yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxpp

The The estimated multiple regression equationestimated multiple regression equation is: is:

Page 5: Chapter 13  Multiple Regression

5 5 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Estimation ProcessEstimation Process

Multiple Regression ModelMultiple Regression Model

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp + + Multiple Regression EquationMultiple Regression Equation

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp Unknown parameters areUnknown parameters are

00, , 11, , 22, . . . , , . . . , pp

Sample Data:Sample Data:xx11 x x22 . . . x . . . xpp y y. . . .. . . .. . . .. . . .

0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x

Estimated MultipleEstimated MultipleRegression EquationRegression Equation

Sample statistics areSample statistics are

bb00, , bb11, , bb22, , . . . , . . . , bbp p

bb00, , bb11, , bb22, , . . . , . . . , bbpp

provide estimates ofprovide estimates of00, , 11, , 22, . . . , , . . . , pp

Page 6: Chapter 13  Multiple Regression

6 6 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Least Squares MethodLeast Squares Method

Least Squares CriterionLeast Squares Criterion

2ˆmin ( )i iy y 2ˆmin ( )i iy y

Computation of Coefficient ValuesComputation of Coefficient Values

The formulas for the regression coefficientsThe formulas for the regression coefficients

bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of matrix algebra. involve the use of matrix algebra.

We will rely on computer software packages toWe will rely on computer software packages to

perform the calculations.perform the calculations.

Page 7: Chapter 13  Multiple Regression

7 7 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Interpreting the CoefficientsInterpreting the Coefficients

In multiple regression analysis, we In multiple regression analysis, we interpret eachinterpret each

regression coefficient as follows:regression coefficient as follows: bbii represents an estimate of the change in represents an estimate of the change in yy corresponding to a 1-unit increase in corresponding to a 1-unit increase in xxii when all when all other independent variables are held constant.other independent variables are held constant.

Page 8: Chapter 13  Multiple Regression

8 8 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SST = SSR + SSE SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y

Page 9: Chapter 13  Multiple Regression

9 9 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

RR22 = SSR/SST = SSR/SST

Adjusted Multiple CoefficientAdjusted Multiple Coefficientof Determinationof Determination

R Rn

n pa2 21 1

11

( )R Rn

n pa2 21 1

11

( )

Page 10: Chapter 13  Multiple Regression

10 10 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

The variance of The variance of , denoted by , denoted by 22, is the same for all, is the same for all values of the independent variables.values of the independent variables. The variance of The variance of , denoted by , denoted by 22, is the same for all, is the same for all values of the independent variables.values of the independent variables.

The error The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

The error The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

Assumptions About the Error Term Assumptions About the Error Term

The error The error is a random variable with mean of zero. is a random variable with mean of zero. The error The error is a random variable with mean of zero. is a random variable with mean of zero.

The values of The values of are independent. are independent. The values of The values of are independent. are independent.

Page 11: Chapter 13  Multiple Regression

11 11 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

In simple linear regression, the In simple linear regression, the FF and and tt tests provide tests provide the same conclusion.the same conclusion. In simple linear regression, the In simple linear regression, the FF and and tt tests provide tests provide the same conclusion.the same conclusion.

Testing for SignificanceTesting for Significance

In multiple regression, the In multiple regression, the FF and and tt tests have different tests have different purposes.purposes. In multiple regression, the In multiple regression, the FF and and tt tests have different tests have different purposes.purposes.

Page 12: Chapter 13  Multiple Regression

12 12 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: F F Test Test

The The FF test is referred to as the test is referred to as the test for overalltest for overall significancesignificance.. The The FF test is referred to as the test is referred to as the test for overalltest for overall significancesignificance..

The The FF test is used to determine whether a significant test is used to determine whether a significant relationship exists between the dependent variablerelationship exists between the dependent variable and the set of and the set of all the independent variablesall the independent variables..

The The FF test is used to determine whether a significant test is used to determine whether a significant relationship exists between the dependent variablerelationship exists between the dependent variable and the set of and the set of all the independent variablesall the independent variables..

Page 13: Chapter 13  Multiple Regression

13 13 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

A separate A separate tt test is conducted for each of the test is conducted for each of the independent variables in the model.independent variables in the model. A separate A separate tt test is conducted for each of the test is conducted for each of the independent variables in the model.independent variables in the model.

If the If the FF test shows an overall significance, the test shows an overall significance, the tt test is test is used to determine whether each of the individualused to determine whether each of the individual independent variables is significant.independent variables is significant.

If the If the FF test shows an overall significance, the test shows an overall significance, the tt test is test is used to determine whether each of the individualused to determine whether each of the individual independent variables is significant.independent variables is significant.

Testing for Significance: Testing for Significance: t t Test Test

We refer to each of these We refer to each of these tt tests as a tests as a test for individualtest for individual significancesignificance.. We refer to each of these We refer to each of these tt tests as a tests as a test for individualtest for individual significancesignificance..

Page 14: Chapter 13  Multiple Regression

14 14 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: F F Test Test

HypothesesHypotheses

Rejection RuleRejection Rule

Test StatisticsTest Statistics

HH00: : 11 = = 2 2 = . . . = = . . . = p p = 0= 0

HHaa: One or more of the parameters: One or more of the parameters

is not equal to zero.is not equal to zero.

FF = MSR/MSE = MSR/MSE

Reject Reject HH00 if if pp-value -value << or if or if FF > > FF

where where FF is based on an is based on an FF distribution distribution

with with pp d.f. in the numerator and d.f. in the numerator and

nn - - pp - 1 d.f. in the denominator. - 1 d.f. in the denominator.

Page 15: Chapter 13  Multiple Regression

15 15 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: t t Test Test

HypothesesHypotheses

Rejection RuleRejection Rule

Test StatisticsTest Statistics

Reject Reject HH00 if if pp-value -value << or or

if if tt << - -ttor or tt >> ttwhere where tt

is based on a is based on a t t distribution distribution

with with nn - - pp - 1 degrees of freedom. - 1 degrees of freedom.

tbs

i

bi

tbs

i

bi

0 : 0iH 0 : 0iH

: 0a iH : 0a iH

Page 16: Chapter 13  Multiple Regression

16 16 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables. The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables.

When the independent variables are highly correlatedWhen the independent variables are highly correlated it is not possible to determine the separate effect of it is not possible to determine the separate effect of any particular independent variable on the dependent any particular independent variable on the dependent variable.variable.

When the independent variables are highly correlatedWhen the independent variables are highly correlated it is not possible to determine the separate effect of it is not possible to determine the separate effect of any particular independent variable on the dependent any particular independent variable on the dependent variable.variable.

Page 17: Chapter 13  Multiple Regression

17 17 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated. Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated.

If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.

If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.

Page 18: Chapter 13  Multiple Regression

18 18 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.

The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use thethe estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use thethe estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.

Page 19: Chapter 13  Multiple Regression

19 19 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

Software packages for multiple regression will oftenSoftware packages for multiple regression will often provide these interval estimates.provide these interval estimates. Software packages for multiple regression will oftenSoftware packages for multiple regression will often provide these interval estimates.provide these interval estimates.

The formulas required to develop interval estimatesThe formulas required to develop interval estimates for the mean value of for the mean value of yy and for an individual value and for an individual value of of y y are beyond the scope of the textbook. are beyond the scope of the textbook.

The formulas required to develop interval estimatesThe formulas required to develop interval estimates for the mean value of for the mean value of yy and for an individual value and for an individual value of of y y are beyond the scope of the textbook. are beyond the scope of the textbook.

^

Page 20: Chapter 13  Multiple Regression

20 20 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

In many situations we must work with In many situations we must work with qualitativequalitative independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.

In many situations we must work with In many situations we must work with qualitativequalitative independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.

For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female. For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female.

Qualitative Independent VariablesQualitative Independent Variables

In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable.. In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable..

Page 21: Chapter 13  Multiple Regression

21 21 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

As an extension of the problem involving theAs an extension of the problem involving thecomputer programmer salary survey, supposecomputer programmer salary survey, supposethat management also believes that thethat management also believes that theannual salary is related to whether theannual salary is related to whether theindividual has a graduate degree inindividual has a graduate degree incomputer science or information systems.computer science or information systems.

The years of experience, the score on the The years of experience, the score on the programmerprogrammer

aptitude test, whether the individual has a relevant aptitude test, whether the individual has a relevant graduate degree, and the annual salary ($1000) for graduate degree, and the annual salary ($1000) for

eacheachof the sampled 20 programmers are shown on the of the sampled 20 programmers are shown on the

next next slide.slide.

Qualitative Independent VariablesQualitative Independent Variables

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Page 22: Chapter 13  Multiple Regression

22 22 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

4477115588101000116666

9922101055668844663333

787810010086868282868684847575808083839191

8888737375758181747487877979949470708989

24244343

23.723.734.334.335.835.83838

22.222.223.123.130303333

383826.626.636.236.231.631.629293434

30.130.133.933.928.228.23030

Exper.Exper. ScoreScore ScoreScoreExper.Exper.SalarySalary SalarySalaryDegr.Degr.

NoNoYesYes NoNoYesYesYesYesYesYes NoNo NoNo NoNoYesYes

Degr.Degr.

YesYes NoNoYesYes NoNo NoNoYesYes NoNoYesYes NoNo NoNo

Qualitative Independent VariablesQualitative Independent Variables

Page 23: Chapter 13  Multiple Regression

23 23 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Estimated Regression EquationEstimated Regression Equation

yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + + bb33xx33

^

where:where:

yy = annual salary ($1000) = annual salary ($1000)

xx11 = years of experience = years of experience

xx22 = score on programmer aptitude test = score on programmer aptitude test

xx33 = 0 if individual = 0 if individual does notdoes not have a graduate degree have a graduate degree 1 if individual 1 if individual doesdoes have a graduate degree have a graduate degree

xx33 is a dummy variable is a dummy variable

Page 24: Chapter 13  Multiple Regression

24 24 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

More Complex Qualitative VariablesMore Complex Qualitative Variables

If a qualitative variable has If a qualitative variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variablevariables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.

If a qualitative variable has If a qualitative variable has kk levels, levels, kk - 1 dummy - 1 dummy variables are required, with each dummy variablevariables are required, with each dummy variable being coded as 0 or 1.being coded as 0 or 1.

For example, a variable with levels A, B, and C couldFor example, a variable with levels A, B, and C could be represented by be represented by xx11 and and xx22 values of (0, 0) for A, (1, 0) values of (0, 0) for A, (1, 0) for B, and (0,1) for C.for B, and (0,1) for C.

For example, a variable with levels A, B, and C couldFor example, a variable with levels A, B, and C could be represented by be represented by xx11 and and xx22 values of (0, 0) for A, (1, 0) values of (0, 0) for A, (1, 0) for B, and (0,1) for C.for B, and (0,1) for C.

Care must be taken in defining and interpreting theCare must be taken in defining and interpreting the dummy variables.dummy variables. Care must be taken in defining and interpreting theCare must be taken in defining and interpreting the dummy variables.dummy variables.

Page 25: Chapter 13  Multiple Regression

25 25 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

For example, a variable indicating level of For example, a variable indicating level of education could be represented by education could be represented by xx11 and and xx22 values as follows:values as follows:

More Complex Qualitative VariablesMore Complex Qualitative Variables

HighestHighest

DegreeDegree xx1 1 xx22

Bachelor’sBachelor’s 00 00Master’sMaster’s 11 00Ph.D.Ph.D. 00 11

Page 26: Chapter 13  Multiple Regression

26 26 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Residual AnalysisResidual Analysis

yy

For simple linear regression the residual plot For simple linear regression the residual plot againstagainst

and the residual plot against and the residual plot against xx provide the provide the same information.same information.

yy In multiple regression analysis it is preferable In multiple regression analysis it is preferable

to use the residual plot against to determine to use the residual plot against to determine if the model assumptions are satisfied.if the model assumptions are satisfied.

Page 27: Chapter 13  Multiple Regression

27 27 Slide

Slide

© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved

Standardized Residual Plot Against Standardized Residual Plot Against y

Standardized residuals are frequently used in Standardized residuals are frequently used in residual plots for purposes of:residual plots for purposes of:

• Identifying outliers (typically, standardized Identifying outliers (typically, standardized residuals < -2 or > +2)residuals < -2 or > +2)

• Providing insight about the assumption that Providing insight about the assumption that the error term the error term has a normal distribution has a normal distribution

The computation of the standardized residuals The computation of the standardized residuals in multiple regression analysis is too complex in multiple regression analysis is too complex to be done by handto be done by hand

Excel’s Regression tool can be Excel’s Regression tool can be usedused