Chapter 13 Multiple Regression

39
Chapter 13 Chapter 13 Multiple Regression Multiple Regression Multiple Regression Model Multiple Regression Model Least Squares Method Least Squares Method Multiple Coefficient of Determination Multiple Coefficient of Determination Model Assumptions Model Assumptions Testing for Significance Testing for Significance Using the Estimated Regression Equation Using the Estimated Regression Equation for Estimation and Prediction for Estimation and Prediction Qualitative Independent Variables Qualitative Independent Variables

description

Chapter 13 Multiple Regression. Multiple Regression Model Least Squares Method Multiple Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction Qualitative Independent Variables. - PowerPoint PPT Presentation

Transcript of Chapter 13 Multiple Regression

Page 1: Chapter 13  Multiple Regression

Chapter 13Chapter 13 Multiple Regression Multiple Regression

Multiple Regression ModelMultiple Regression Model Least Squares Method Least Squares Method Multiple Coefficient of DeterminationMultiple Coefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression EquationUsing the Estimated Regression Equation

for Estimation and Predictionfor Estimation and Prediction Qualitative Independent VariablesQualitative Independent Variables

Page 2: Chapter 13  Multiple Regression

Multiple Regression ModelMultiple Regression Model

The equation that describes how the The equation that describes how the dependent variable dependent variable yy is related to the is related to the independent variables independent variables xx11, , xx22, . . . , . . . xxpp and an and an error term is called the error term is called the multiplemultiple regression regression modelmodel..

The multiple regression model is:The multiple regression model is:

yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +

• 00, , 11, , 22, . . . , , . . . , pp are the are the parametersparameters..

• is a random variable called theis a random variable called the error error termterm..

Page 3: Chapter 13  Multiple Regression

The equation that describes how the mean The equation that describes how the mean value of value of yy is related to is related to xx11, , xx22, . . . , . . . xxpp is called is called the the multiple regression equationmultiple regression equation..

The multiple regression equation is:The multiple regression equation is:

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Multiple Regression EquationMultiple Regression Equation

Page 4: Chapter 13  Multiple Regression

A simple random sample is used to compute A simple random sample is used to compute sample statistics sample statistics bb00, , bb11, , bb22, , . . . , . . . , bbpp that are that are used as the point estimators of the parameters used as the point estimators of the parameters 00, , 11, , 22, . . . , , . . . , pp..

The The estimated multiple regression equation is:estimated multiple regression equation is:

yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxpp

Estimated Multiple Regression EquationEstimated Multiple Regression Equation

Page 5: Chapter 13  Multiple Regression

Estimation ProcessEstimation Process

Multiple Regression ModelMultiple Regression ModelEE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp + +

Multiple Regression EquationMultiple Regression EquationEE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Unknown parameters areUnknown parameters are00, , 11, , 22, . . . , , . . . , pp

Sample Data:Sample Data:xx11 x x22 . . . x . . . xpp y y. . . .. . . .. . . .. . . .

Estimated MultipleEstimated MultipleRegression EquationRegression Equation

bb00, , bb11, , bb22, , . . . , . . . , bbpp

are sample statisticsare sample statistics

bb00, , bb11, , bb22, , . . . , . . . , bbpp

provide estimates ofprovide estimates of00, , 11, , 22, . . . , , . . . , pp

0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x

Page 6: Chapter 13  Multiple Regression

Least Squares MethodLeast Squares Method

Least Squares CriterionLeast Squares Criterion

Computation of Coefficients ValuesComputation of Coefficients Values

The formulas for the regression The formulas for the regression coefficients coefficients bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of involve the use of matrix algebra. We will rely on computer matrix algebra. We will rely on computer software packages to perform the calculations.software packages to perform the calculations.

min ( iy yi )2min ( iy yi )2^̂

Page 7: Chapter 13  Multiple Regression

Least Squares MethodLeast Squares Method

A Note on Interpretation of CoefficientsA Note on Interpretation of Coefficients

bbi i represents an estimate of the change represents an estimate of the change in in yy corresponding to a one-unit change in corresponding to a one-unit change in xxii when all other independent variables are held when all other independent variables are held constant.constant.

Page 8: Chapter 13  Multiple Regression

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

SST = SSR + SSESST = SSR + SSE

2 2 2ˆ ˆ( ) ( ) ( )i i i iy y y y y y 2 2 2ˆ ˆ( ) ( ) ( )i i i iy y y y y y

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

Page 9: Chapter 13  Multiple Regression

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

R R 22 = SSR/SST = SSR/SST

Adjusted Multiple Coefficient of DeterminationAdjusted Multiple Coefficient of Determination

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

R Rn

n pa2 21 1

11

( )R Rn

n pa2 21 1

11

( )

Page 10: Chapter 13  Multiple Regression

Model AssumptionsModel Assumptions

Assumptions About the Error Term Assumptions About the Error Term 1.1. The error The error is a random variable with mean is a random variable with mean

of zero.of zero.

2.2. The variance of The variance of , denoted by , denoted by 22, is the , is the same for all values of the independent same for all values of the independent variables.variables.

3.3. The values of The values of are independent. are independent.

4.4. The error The error is a normally distributed random is a normally distributed random variable reflecting the deviation between the variable reflecting the deviation between the yy value and the expected value of value and the expected value of yy given by given by

00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Page 11: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

A software firm collected data for a sample of A software firm collected data for a sample of 2020

computer programmers. A suggestion was made computer programmers. A suggestion was made thatthat

regression analysis could be used to determine if regression analysis could be used to determine if salarysalary

was related to the years of experience and the was related to the years of experience and the score onscore on

the firm’s programmer aptitude test.the firm’s programmer aptitude test.

The years of experience, score on the aptitude The years of experience, score on the aptitude test,test,

and corresponding annual salary ($1000s) for a and corresponding annual salary ($1000s) for a samplesample

of 20 programmers is shown on the next slide.of 20 programmers is shown on the next slide.

Page 12: Chapter 13  Multiple Regression

Exper.Exper. ScoreScore SalarySalary Exper.Exper. ScoreScore SalarySalary

44 7878 2424 99 8888 383877 100100 4343 22 7373 26.626.611 8686 23.723.7 1010 7575 36.236.255 8282 34.334.3 55 8181 31.631.688 8686 35.835.8 66 7474 29291010 8484 3838 88 8787 343400 7575 22.222.2 44 7979 30.130.111 8080 23.123.1 66 9494 33.933.966 8383 3030 33 7070 28.228.266 9191 3333 33 8989 3030

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Page 13: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Multiple Regression ModelMultiple Regression Model

Suppose we believe that salary (Suppose we believe that salary (yy) is related to ) is related to the years of experience (the years of experience (xx11) and the score on ) and the score on the programmer aptitude test (the programmer aptitude test (xx22) by the ) by the following regression model:following regression model:

yy = = 00 + + 11xx1 1 + + 22xx2 2 + +

wherewhere

yy = annual salary ($000) = annual salary ($000)

xx11 = years of experience = years of experience

xx22 = score on programmer aptitude test = score on programmer aptitude test

Page 14: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Solving for the Estimates of Solving for the Estimates of 00, , 11, , 22

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

ComputerComputerPackagePackage

for Solvingfor SolvingMultipleMultiple

RegressionRegressionProblemsProblems

bb00 = = bb11 = = bb22 = =

RR22 = =

etc.etc.

bb00 = = bb11 = = bb22 = =

RR22 = =

etc.etc.

Input DataInput DataLeast SquaresLeast Squares

OutputOutput

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

xx11 xx22 yy

4 78 244 78 24 7 100 437 100 43 . . .. . . . . .. . . 3 89 303 89 30

Page 15: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Minitab Computer OutputMinitab Computer Output

The regression isThe regression is

Salary = 3.174 + 1.404 Exper + 0.251 ScoreSalary = 3.174 + 1.404 Exper + 0.251 Score

Predictor CoefPredictor Coef Stdev t- Stdev t-ratio p ratio p

ConstantConstant 3.1743.174 6.1566.156 .52.52.613.613

ExperExper 1.40391.4039 .1986.1986 7.077.07.000.000

ScoreScore .25089.25089 .07735.07735 3.243.24.005.005

ss = 2.419 = 2.419 R-sqR-sq = 83.4% = 83.4% R-R-sq(adj) sq(adj) = 81.5% = 81.5%

Page 16: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Estimated Regression EquationEstimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + SALARY = 3.174 + 1.404(EXPER) + 0.2509(SCORE)0.2509(SCORE)

Note: Predicted salary will be in thousands of Note: Predicted salary will be in thousands of dollarsdollars

Page 17: Chapter 13  Multiple Regression

In simple linear regression, the In simple linear regression, the FF and and tt tests tests provide the same conclusion.provide the same conclusion.

In multiple regression, the In multiple regression, the FF and and tt tests have tests have different purposes.different purposes.

Testing for SignificanceTesting for Significance

Page 18: Chapter 13  Multiple Regression

Testing for Significance: Testing for Significance: F F Test Test

The The FF test is used to determine whether a test is used to determine whether a significant relationship exists between the significant relationship exists between the dependent variable and the dependent variable and the set of allset of all the the independent variables.independent variables.

The The FF test is referred to as the test is referred to as the test for overall test for overall significancesignificance..

Page 19: Chapter 13  Multiple Regression

Testing for Significance: Testing for Significance: t t Test Test

If the If the FF test shows an overall significance, the test shows an overall significance, the tt test is used to determine whether each of the test is used to determine whether each of the individual independent variables is significant.individual independent variables is significant.

A separate A separate tt test is conducted for each of the test is conducted for each of the independent variables in the model.independent variables in the model.

We refer to each of these We refer to each of these tt tests as a tests as a test for test for individual significanceindividual significance..

Page 20: Chapter 13  Multiple Regression

Testing for Significance: Testing for Significance: F F Test Test

HypothesesHypotheses

HH00: : 11 = = 2 2 = . . . = = . . . = p p = 0= 0

HHaa: One or more of the parameters: One or more of the parameters

is not equal to zero.is not equal to zero. Test StatisticTest Statistic

FF = MSR/MSE = MSR/MSE Rejection RuleRejection Rule

Reject Reject HH00 if if FF > > FF

where where FF is based on an is based on an FF distribution with distribution with pp d.f. in d.f. in

the numerator and the numerator and nn - - pp - 1 d.f. in the denominator. - 1 d.f. in the denominator.

Page 21: Chapter 13  Multiple Regression

Testing for Significance: Testing for Significance: t t Test Test

HypothesesHypotheses

HH00: : ii = 0 = 0

HHaa: : ii = 0 = 0 Test StatisticTest Statistic

Rejection RuleRejection Rule

Reject Reject HH00 if if tt < - < -ttor or tt > > tt

where where tt is based on a is based on a t t distribution with distribution with

nn - - pp - 1 degrees of freedom. - 1 degrees of freedom.

tbs

i

bi

tbs

i

bi

Page 22: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

Minitab Computer Output (continued)Minitab Computer Output (continued)

Analysis of VarianceAnalysis of Variance

SOURCESOURCE DF SS MS DF SS MS F PF P

RegressionRegression 22 500.33500.33 250.16250.16 42.7642.760.0000.000

ErrorError 1717 99.4699.46 5.855.85

TotalTotal 1919 599.79599.79

Page 23: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

FF Test Test

• HypothesesHypotheses

HH00: : 11 = = 2 2 = 0= 0

HHaa: One or both of the parameters: One or both of the parameters

is not equal to zero.is not equal to zero.

• Rejection RuleRejection Rule

For For = .05 and d.f. = 2, 17: = .05 and d.f. = 2, 17:

FF.05.05 = 3.59 = 3.59

Reject Reject HH00 if F > 3.59. if F > 3.59.

Page 24: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

FF Test Test• Test StatisticTest Statistic

FF = MSR/MSE = MSR/MSE

= 250.16/5.85 = 42.76= 250.16/5.85 = 42.76• ConclusionConclusion

FF = 42.76 > 3.59, so we can = 42.76 > 3.59, so we can reject reject HH00..

Page 25: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

tt Test for Significance of Individual Test for Significance of Individual ParametersParameters

• HypothesesHypotheses

HH00: : ii = 0 = 0

HHaa: : ii = 0 = 0

• Rejection RuleRejection Rule

For For = .05 and d.f. = 17: = .05 and d.f. = 17:

tt.025.025 = 2.11 = 2.11

Reject Reject HH00 if if tt > 2.11 > 2.11

Page 26: Chapter 13  Multiple Regression

Example: Programmer Salary SurveyExample: Programmer Salary Survey

tt Test for Significance of Individual Test for Significance of Individual ParametersParameters• Test StatisticsTest Statistics

• ConclusionsConclusions

Reject Reject HH00: : 11 = 0 and reject = 0 and reject HH00: : 22 = = 0.0.

Both independent variables are Both independent variables are significant.significant.

bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

.bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.

Page 27: Chapter 13  Multiple Regression

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

The term The term multicollinearitymulticollinearity refers to the refers to the correlation among the independent variables.correlation among the independent variables.

When the independent variables are highly When the independent variables are highly correlated (say, |correlated (say, |r r | > .7), it is not possible to | > .7), it is not possible to determine the separate effect of any particular determine the separate effect of any particular independent variable on the dependent independent variable on the dependent variable.variable.

Page 28: Chapter 13  Multiple Regression

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

If the estimated regression equation is to be If the estimated regression equation is to be used only for predictive purposes, used only for predictive purposes, multicollinearity is usually not a serious multicollinearity is usually not a serious problem.problem.

Every attempt should be made to avoid Every attempt should be made to avoid including independent variables that are highly including independent variables that are highly correlated.correlated.

Page 29: Chapter 13  Multiple Regression

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

The procedures for estimating the mean value The procedures for estimating the mean value of of yy and predicting an individual value of and predicting an individual value of y y in in multiple regression are similar to those in multiple regression are similar to those in simple regression.simple regression.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into the estimated regression equation and into the estimated regression equation and use the corresponding value of use the corresponding value of yy as the point as the point estimate. estimate.

The formulas required to develop interval The formulas required to develop interval estimates for the mean value of estimates for the mean value of yy and for an and for an individual value of individual value of y y are beyond the scope of are beyond the scope of the text.the text.

Software packages for multiple regression will Software packages for multiple regression will often provide these interval estimates.often provide these interval estimates.

Page 30: Chapter 13  Multiple Regression

Qualitative Independent VariablesQualitative Independent Variables

In many situations we must work with In many situations we must work with qualitative independent variablesqualitative independent variables such as such as gender (male, female), method of payment gender (male, female), method of payment (cash, check, credit card), etc.(cash, check, credit card), etc.

For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 indicates male and = 0 indicates male and xx22 = 1 indicates = 1 indicates female.female.

In this case, In this case, xx22 is called a is called a dummy or indicator dummy or indicator variablevariable..

Page 31: Chapter 13  Multiple Regression

Qualitative Independent VariablesQualitative Independent Variables

If a qualitative variable has If a qualitative variable has kk levels, levels, kk - 1 - 1 dummy variables are required, with each dummy variables are required, with each dummy variable being coded as 0 or 1.dummy variable being coded as 0 or 1.

For example, a variable with levels A, B, and C For example, a variable with levels A, B, and C would be represented by would be represented by xx11 and and xx22 values of values of (0, 0), (0, 0), (1, 0), and (0,1), respectively.(1, 0), and (0,1), respectively.

Page 32: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

As an extension of the problem involving theAs an extension of the problem involving the

computer programmer salary survey, suppose thatcomputer programmer salary survey, suppose that

management also believes that the annual salary ismanagement also believes that the annual salary is

related to whether or not the individual has a graduaterelated to whether or not the individual has a graduate

degree in computer science or information systems.degree in computer science or information systems.

The years of experience, the score on the programmerThe years of experience, the score on the programmer

aptitude test, whether or not the individual has aaptitude test, whether or not the individual has a

relevant graduate degree, and the annual salary ($000)relevant graduate degree, and the annual salary ($000)

for each of the sampled 20 programmers are shown onfor each of the sampled 20 programmers are shown on

the next slide.the next slide.

Page 33: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Exp.Exp. ScoreScore Degr.Degr. SalarySalary Exp.Exp. ScoreScore Degr.Degr. SalarySalary

44 7878 NoNo 2424 99 8888 YesYes 383877 100100 YesYes 4343 22 7373 NoNo 26.626.611 8686 NoNo 23.723.7 1010 7575 YesYes 36.236.255 8282 YesYes 34.334.3 55 8181 NoNo 31.631.688 8686 YesYes 35.835.8 66 7474 NoNo 29291010 8484 YesYes 3838 88 8787 YesYes 343400 7575 NoNo 22.222.2 44 7979 NoNo 30.130.111 8080 No No 23.123.1 66 9494 YesYes 33.933.966 8383 NoNo 3030 33 7070 NoNo 28.228.266 9191 YesYes 3333 33 8989 NoNo 3030

Page 34: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Multiple Regression EquationMultiple Regression Equation

E(E(y y ) = ) = 00 + + 11xx1 1 + + 22xx2 2 + + 33xx33

Estimated Regression EquationEstimated Regression Equation

yy = = bb00 + + bb11xx1 1 ++ bb22xx2 2 + + bb33xx33

wherewhere

yy = annual salary ($000) = annual salary ($000)

xx11 = years of experience = years of experience

xx22 = score on programmer aptitude test = score on programmer aptitude test

xx33 = 0 if individual = 0 if individual does notdoes not have a grad. have a grad. degreedegree

1 if individual 1 if individual doesdoes have a grad. degree have a grad. degree

Note: Note: xx33 is referred to as a dummy variable. is referred to as a dummy variable.

Page 35: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Minitab Computer OutputMinitab Computer Output

The regression isThe regression is

Salary = 7.95 + 1.15 Exp + 0.197 Score + 2.28 DegSalary = 7.95 + 1.15 Exp + 0.197 Score + 2.28 Deg

Predictor CoefPredictor Coef Stdev t- Stdev t-ratio p ratio p

ConstantConstant 7.9457.945 7.3817.381 1.081.08 .298.298

ExpExp 1.14761.1476 .2976.2976 3.863.86 .001.001

ScoreScore .19694.19694 .0899.0899 2.192.19 .044.044

DegDeg 2.2802.280 1.9871.987 1.151.15 .268.268

ss = 2.396 = 2.396 R-sqR-sq = 84.7% = 84.7% R-sq(adj) R-sq(adj) = = 81.8%81.8%

Page 36: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Minitab Computer Output (continued)Minitab Computer Output (continued)

Analysis of VarianceAnalysis of Variance

SOURCESOURCE DF SS MS F DF SS MS F P P

RegressionRegression 33 507.90507.90 169.30169.30 29.4829.480.0000.000

ErrorError 1616 91.8991.89 5.745.74

TotalTotal 1919 599.79599.79

Page 37: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb11 = 1.15 = 1.15

Salary is expected to increase by $1,150 for Salary is expected to increase by $1,150 for each additional year of experience (when all each additional year of experience (when all other independent variables are held other independent variables are held constant)constant)

Page 38: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb22 = 0.197 = 0.197

Salary is expected to increase by $197 for Salary is expected to increase by $197 for each additional point scored on the each additional point scored on the programmer aptitude test (when all other programmer aptitude test (when all other independent variables are held constant)independent variables are held constant)

Page 39: Chapter 13  Multiple Regression

Example: Programmer Salary Survey (B)Example: Programmer Salary Survey (B)

Interpreting the ParametersInterpreting the Parameters

• bb33 = 2.28 = 2.28

Salary is expected to be $2,280 higher for Salary is expected to be $2,280 higher for an individual with a graduate degree than an individual with a graduate degree than one without a graduate degree (when all one without a graduate degree (when all other independent variables are held other independent variables are held constant)constant)