Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The...

43
Chapter 13 Multiple Regression and Model Building

Transcript of Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The...

Page 1: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Chapter 13

Multiple Regression and Model

Building

Page 2: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Multiple Regression Models

The General Multiple Regression Model

is the dependent variable

are the independent variables

is the deterministic portion of

the model

determines the contribution of the independent variable

y

0 1 1 2 2. . .

k ky x x x

0 1 1 2 2. . .

k kE y x x x

ix

1 2, , . . . ,

kx x x

i

Page 3: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Multiple Regression Models

Analyzing a Multiple Regression Model

1. Hypothesize the deterministic component of the model

2. Use sample data to estimate β0,β1,β2,… βk

3. Specify probability distribution of ε and estimate σ

4. Check that assumptions on ε are satisfied

5. Statistically evaluate model usefulness

6. Useful model used for prediction, estimation, other

purposes

Page 4: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

For

the chosen fitted model

minimizes

0 1 1

ˆ ˆ ˆˆ . . .k k

y x x

0 1 1 2 2 3 3 4 4 5 5E y x x x x x

2

ˆS S E y y

Page 5: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

y = β0 + β1x1 + β2x2 + β3x3 + ε

where

Y = Sales price (dollars)

X1 = Appraised land value (dollars)

X2 = Appraised improvements (dollars)

X3 = Area (square feet)

Page 6: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

Plot of data for sample size n=20

Page 7: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

Fit model to data

Page 8: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

Interpret β estimates

2

ˆ .8 2 0 4

1

ˆ 1 3 .5 3

1

ˆ .8 1 4 5

E(y), the mean sale price of the property is

estimated to increase .8145 dollars for every $1

increase in appraised land value, holding other

variables constant

E(y), the mean sale price of the property is

estimated to increase .8204 dollars for every $1

increase in appraised improvements, holding other

variables constant

E(y), the mean sale price of the property is

estimated to increase 13.53 dollars for additional

square foot of living area, holding other variables

constant

Page 9: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

Given the model E(y) = 1 +2x1 +x2, the effect

of x2 on E(y), holding x1 and x2 constant is

Page 10: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

The First-Order Model: Estimating

and Interpreting the -Parameters

Given the model E(y) = 1 +2x1 +x2, the effect

of x2 on E(y), holding x1 and x2 constant is

Page 11: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Assumptions

Assumptions about Random Error ε

1. For any given set of values of x1, x2,…..xk, the random

error has a normal probability distribution with mean 0

and variance σ2

2. The random errors are independent

Estimators of σ2 for a Multiple Regression Model

with k Independent Variables

s2=SSE

=SSE

n-Number of Estimated β parameters n-(k+1)

Page 12: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Inferences about the -Parameters

2 types of inferences can be made, using

either confidence intervals or hypothesis

testing

For any inferences to be made, the

assumptions made about the random error

term ε (normal distribution with mean 0 and

variance σ2, independence or errors) must

be met

Page 13: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Inferences about the -Parameters

A 100(1-α)% Confidence Interval for a -Parameter

where tα/2 is based on n-(k+1) degrees of freedom and

n = Number of observations

k+1 = Number of parameters in the model

ˆ2

ˆ

ii

t s

Page 14: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Inferences about the -Parameters

A Test of an Individual Parameter Coefficient

One-Tailed TestTwo-Tailed

Test

H0: βi=0

Ha: βi<0 (or Ha: βi>0)

H0: βi=0

Ha: βi≠0

Rejection region: t< -tα

(or t< -tα when Ha: β1>0)

Rejection

region: |t|> tα/2

Where tα and tα/2 are based on n-(k+1) degrees of freedom

ˆ

ˆ

:

i

iT e s t S ta t i s t i c t

s

Page 15: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Inferences about the -Parameters

An Excel Analysis

Use for

confidence

Intervals

Use for hypotheses

about parameter

coefficients

Page 16: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Checking the Overall Utility of a

Model

3 tests:1. Multiple coefficient of determination R2

2. Adjusted multiple coefficient of determination

3. Global F-test

2 21 1

1 1 11 1

a

y y

n nS S ER R

n k S S n k

21

y y

y y y y

S S S S ES S E E x p la in e d v a r ia b i l i t yR

S S S S T o ta l v a r ia b i l i t y

2

2:

1 1 1

y yS S S S E k R k

T e s t s ta t i s t ic F

S S E n k R n k

Page 17: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Checking the Overall Utility of a

Model

Testing Global Usefulness of the Model: The

Analysis of Variance F-test

H0: β1 =β2=....βk=0

Ha: At least one βi ≠ 0

where n is the sample size and k is number of terms in the model

Rejection region: F>Fα, with k numerator degrees of freedom and [n-

(k+1)] denominator degrees of freedom

2

2:

1 1 1

y yS S S S E k R k M e a n S q u a r e M o d e l

T e s t s ta t i s t ic FM e a n S q u a r e E r r o rS S E n k R n k

Page 18: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Checking the Overall Utility of a

Model

Checking the Utility of a Multiple Regression Model

1. Conduct a test of overall model adequacy

using the F-test. If H0 is rejected, proceed to

step 2

2. Conduct t-tests on β parameters of particular

interest

Page 19: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Using the Model for Estimation and

Prediction

As in Simple Linear Regression, intervals around a

predicted value will be wider than intervals around

an estimated value

Most statistics packages will print out both

confidence and prediction intervals

Page 20: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Interaction Models

An Interaction Model relating E(y) to Two Quantitative Independent Variables

where

represents the change in E(y) for every 1-unit increase in x1, holding x2 fixed

represents the change in E(y) for every 1-unit increase in x2, holding x1 fixed

1 3 2 x

0 1 1 2 2 3 1 2 E y x x x x

2 3 1 x

Page 21: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Interaction Models

When the relationship between two y

and xi is not impacted by a second x

(no interaction)

When the linear relationship

between y and xi depends on

another x

Page 22: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Interaction Models

Page 23: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Quadratic and

other Higher-Order Models

A Quadratic (Second-Order) Model

where

is the y-intercept of the curve

is a shift parameter

is the rate of curvature

2

0 1 2 E y x x

0

1

2

Page 24: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Quadratic and

other Higher-Order Models

Home Size-Electrical

Usage Data

Size of Home,

x (sq. ft.)

Monthly Usage,

y (kilowatt-hours)

1,290 1,182

1,350 1,172

1,470 1,264

1,600 1,493

1,710 1,571

1,840 1,711

1,980 1,804

2,230 1,840

2,400 1,95

2,930 1,954

Page 25: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Quadratic and

other Higher-Order Models

2ˆ 1, 2 1 6 .1 2 .3 9 8 9 .0 0 0 4 5y x x

Page 26: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Quadratic and

other Higher-Order Models

A Complete Second-Order Model with Two

Quantitative Independent Variables

where

is the y-intercept, value of E(y) when x1=x2=0

changes cause the surface to shift along the x1 and x2

axes

controls the rotation of the surface

control the type of surface, rates of curvature

2 2

0 1 2 2 3 1 2 4 1 5 2 E y x x x x x x

0

1 2,

3

4 5,

Page 27: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Quadratic and

other Higher-Order Models

Page 28: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Qualitative

(Dummy) Variable Models

Dummy variables – coded, qualitative variables

•Codes are in the form of (1, 0), 1 being the presence of a condition, 0 the absence

•Create Dummy variables so that there is one less dummy variable than categories of the qualitative variable of interest

Gender dummy variable coded

as x = 1 if male, x=0 if female

If model is E(y)=β0+β1x ,

β1 captures the effect of being

male on the dependent variable

Page 29: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Models with both

Quantitative and Qualitative Variables

Start with a first order model with one quantitative

variable, E(y)=β0+β1x

Adding a qualitative variable

with no interaction,

E(y)=β0+β1x1+ β2x2+ β3x3

Page 30: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Models with both

Quantitative and Qualitative Variables

Adding an interaction term,

E(y)=β0+β1x1+ β2x2+ β3x3+ β4x1x2+ β5x1x3

Main effect, Main effect Interaction

x1 x2 and x3

Page 31: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Comparing Nested

Models

Models are nested if one model contains all

the terms of the other model and at least

one additional term.

Complete (full) model – the more complex

model

Reduced model – the simpler model

Page 32: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Comparing Nested

Models

Models are nested if one model contains all the

terms of the other model and at least one

additional term.

Complete (full) model – the more complex model

Reduced model – the simpler model

2 2

0 1 2 2 3 1 2 4 1 5 2 E y x x x x x x

0 1 2 2 3 1 2 E y x x x x

Page 33: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Comparing Nested

Models

F-Test for comparing nested models:F-Test for Comparing Nested Models

Reduced model

Complete Model

H0: βg+1 =βg+2=....βk=0

Ha: At least one β under test is nonzero.

Rejection region: F>Fα, with k-g numerator degrees of freedom and

[n-(k+1)] denominator degrees of freedom

0# '

:

1

R C R C

CC

S S E S S E k g S S E S S E s te s te d in HT e s t s ta t is t ic F

M S ES S E n k

0 1 1. . .

g gE y x x

0 1 1 1 1. . . . . .

g g g g k kE y x x x x

Page 34: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Model Building: Stepwise

Regression

Used when a large set of independent

variables

Software packages will add in variables in

order of explanatory value.

Decisions based on largest t-values at each

step

Procedure is best used as a screening

procedure only

Page 35: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

Regression Residual – the difference between an observed y value and its corresponding predicted value

Properties of Regression Residuals•The mean of the residuals equals zero

•The standard deviation of the residuals is equal to the

standard deviation of the fitted regression model

ˆ ˆy y

Page 36: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

Analyzing Residuals

Top plot of residuals reveals

non-random pattern, curved

shape

Second plot, based on

second-order term being

added to model, results in

random pattern, better

model

Page 37: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

Identifying OutliersResidual plots can reveal outliers

Outliers need to be checked to try

to determine if error is involved

If error is involved, or observation

is not representative, analysis can

be rerun after deleting data point

to assess the effect.

Outlier

Page 38: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

With Outlier Without Outlier

Checking for Normal Errors

Page 39: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

Checking for Equal Variances

Pattern in residuals indicate violation of equal

variance assumption

Can point to use of transformation on the

dependent variable to stabilize variance

Page 40: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Residual Analysis: Checking the

Regression Assumptions

Steps in Residual Analysis

1. Check for misspecified model by plotting

residuals against quantitative independent

variables

2. Examine residual plots for outliers

3. Check for non-normal error using frequency

distribution of residuals

4. Check for unequal error variances using plots

of residuals against predicted values

Page 41: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Some Pitfalls: Estimability,

Multicollinearity, and Extrapolation

Estimability – the number of levels of

observed x-values must be one more than

the order of the polynomial in x that you

want to fit

Multicollinearity – when two or more

independent variables are correlated

Page 42: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Some Pitfalls: Estimability,

Multicollinearity, and Extrapolation

Multicollinearity – when two or more independent variables are correlated

Leads to confusing, misleading results, incorrect parameter estimate signs.

Can be identified by

–checking correlations among x’s

–non-significant for most/all x’s

–signs opposite from expected in the estimated β parameters

Can be addressed by–Dropping one or more of the correlated variables in the model

–Restricting inferences to range of sample data, not making inferences about individual β parameters based on t-tests.

Page 43: Chapter 13 · Chapter 13 Multiple Regression and Model Building. Multiple Regression Models The General Multiple Regression Model is the dependent variable ... Ö 13.53 1 Ö .8145

Some Pitfalls: Estimability,

Multicollinearity, and Extrapolation

Extrapolation – use of model to predict

outside of range of sample data is

dangerous

Correlated Errors – most common when

working with time series data, values of y

and x’s observed over a period of time.

Solution is to develop a time series model.