Objectivesof*Multiple*Regressionpeople.umass.edu/biep640w/pdf/Alex Trindade Multiple...15-1...

27
15-1 Objectives of Multiple Regression • Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors {x 1 ,x 2 , ... x k }. • Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables.

Transcript of Objectivesof*Multiple*Regressionpeople.umass.edu/biep640w/pdf/Alex Trindade Multiple...15-1...

15-1

Objectives of Multiple Regression

• Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors x1, x2, ... xk.

• Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables.

15-2

Expanding Simple Linear Regression• Quadratic model.

Y = β0 + β1x1 + β2x12 + ε

• General polynomial model.

Y = β0 + β1x1 + β2x12 + β3x13 + ... + βkx1k + ε

Any independent variable, xi, which appears in the polynomial regression model as xik is called a kth-­degree term.

Adding one or more polynomial terms to the model.

15-3

Polynomial model shapes.

x

y

10203040

1.52.02.53.0

3.5 Linear

Quadratic xn

yn10203040

1.52.02.53.0

3.5Adding one more terms to the model significantly improves the model fit.

15-4

Incorporating Additional Predictors

Simple additive multiple regression model

y = β0 + β1x1 + β2x2 + β3x3 + ... + βkxk + ε

Additive (Effect) Assumption -­ The expected change in y per unit increment in xj is constantand does not depend on the value of any other predictor. This change in y is equal to βj.

15-5

Additive regression models: For two independent variables, the response is modeled as a surface.

15-6

Interpreting Parameter Values (Model Coefficients)

• “Intercept” -­ value of y when all predictors are 0.β0

• “Partial slopes”β1, β2, β3, ... βk

βj -­ describes the expected change in y per unit increment in xjwhen all other predictors in the model are held at a constant value.

15-7

Graphical depiction of βj.

β1 -­ slope in direction of x1.

β2 -­ slope in direction of x2.

15-8

Multiple Regression with Interaction Terms

Y = β0 +β1x1 + β2x2 +

β3x3 + ... + βkxk +

β12x1x2 + β13x1x3 +

... + β1kx1xk + ...

+ βk-1,kxk-1xk + ε

cross-­productterms quantify the interaction among predictors.

Interactive (Effect) Assumption: The effect of one predictor, xi, on the response, y, will depend on the value of one or more of the other predictors.

15-9

Interpreting Interaction

β1 – No longer the expected change in Y per unit increment in X1!

β12 – No easy interpretation! The effect on y of a unit increment in X1, now depends on X2.

i2i1i122i21i10i xxxxy ε+β+β+β+β=

or Define: 2i1i3i xxx =

i3i122i21i10i xxxy ε+β+β+β+β=

No difference

Interaction Model

15-10

x1

y

x2=2

x2=1

x2=0

β0

interaction

x1

y

x2=2x2=1x2=0

β0

β1β2

β2no-­interaction

β1

β1+2β12

β0+2β2β0+β2

15-11

Multiple Regression models with interaction:

Lines move apart

Lines come together

15-12

Effect of the Interaction Term in Multiple Regression

Surface is twisted.

15-13

A Protocol for Multiple Regression

Identify all possible predictors.

Establish a method for estimating model parameters and their standard errors.

Develop tests to determine if a parameter is equal to zero (i.e. no evidence of association).

Reduce number of predictors appropriately.

Develop predictions and associated standard error.

15-14

Estimating Model ParametersLeast Squares Estimation

Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of the parameters for the best predicting equation:

Is found by choosing the values:

which minimize the expression:

kˆ,,ˆ,ˆ βββ !10

∑ ∑= =

β−−β−β−β−=−=n

1i

n

1i

2ikk2i21i10i

2ii )xxxy()yy(SSE !

ikkiii xˆxˆxˆˆy β++β+β+β= !22110

15-15

Normal EquationsTake the partial derivatives of the SSE function with respect to β0, β1,…, βk, and equate each equation to 0. Solve this system of k+1 equations in k+1 unknowns to obtain the equations for the parameter estimates.

∑∑∑∑

∑∑∑∑

∑∑∑

====

====

===

β++β+β=

β++β+β=

β++β+β=

n

1ik

2ik

n

1i11iik0

n

1iik

n

1iiikik

n

1ikik1i

n

1i1

21i0

n

1i1i

n

1ii1i1i

n

1ikik

n

1i11i0

n

1ii

ˆxˆxxˆxyxx

ˆxxˆxˆxyxx

ˆxˆxˆny1

!

"""

!

!

15-16

An Overall Measure of How Well the Full Model Performs

• Denoted as R2.• Defined as the proportion of the variability in the dependent variable y that is accounted for by the independent variables, x1, x2, ..., xk, through the regression model.

• With only one independent variable (k=1), R2 = r2, the square of the simple correlation coefficient.

Coefficient of Multiple Determination

15-17

Computing the Coefficient of Determination

10, 2221

≤≤−

==⋅ RSSSES

SSSRR

yy

yy

yyxxxy k!

TSS )(1

2 =−=∑=

n

iiyy yyS

∑ ∑∑∑

= ===

=

−−−−=

−=n

i

n

iiikk

n

iii

n

iii

n

iii

yxyxyy

yySSE

1 1111

10

2

1

2

ˆˆˆ

)ˆ(

βββ !

15-18

0246x1

0

2

4

6x2-­10123456x1

-­1

1

3

5x2

12345x1

1

2

3

4

5

x2Multicollinearity

A further assumption in multiple regression (absent in SLR), is that the predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the predictors do not co-­vary. When the predictors are significantly correlated (correlation greater than about 0.6) then the multiple regression model is said to suffer from problems of multicollinearity.

r = 0 r = 0.6 r = 0.8

15-19

Effect of Multicollinearity on the Fitted Surface

xxxxxxx

Extreme collinearity

x1

xxxx xx

xxxxxxx

xxxx xx

x2

y

15-20

Multicollinearity leads to

• Numerical instability in the estimates of the regression parameters – wild fluctuations in these estimates if a few observations are added or removed.

• No longer have simple interpretations for the regression coefficients in the additive model.

Ways to detect multicollinearity• Scatterplots of the predictor variables.

• Correlation matrix for the predictor variables – the higher these correlations the worse the problem.

• Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity.

What can be done about multicollinearity• Regression estimates are still OK, but the resulting confidence/prediction

intervals are very wide.

• Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.)

• More advanced solutions: principal components analysis;; ridge regression.

15-21

Testing in Multiple Regression

• Testing individual parameters in the model.• Computing predicted values and associated standard errors.

),0(N~ , 2110 σεεβββ ++++= kk XXY !

Overall AOV F-­test

H0: None of the explanatory variables is a significant predictor of Y

MSEMSR

knSSEkSSRF =−−

=)1/(

/

α,1, −−> knkFFReject if:

15-22

Standard Error for Partial Slope Estimate

The estimated standard error for: jβ

)1(1ˆ 2ˆ

1121 kjjjjjj

xxxxxxxx RSs

!! +−•−= εβ σ where )1(

ˆ+−

=knSSE

εσ

∑=

−=n

ijijxx xxS

jj1

2)(

21121 kjjj xxxxxxR !! +−•and is the coefficient of determination for the

model with xj as the dependent variable and all other x variables as predictors.

What happens if all the predictors are truly independent of each other?

jj

jkjjj

xxxxxxxx S

sR εβ

σ0 ˆ2

1121→→

+−• !!

∞→→ β• +− jkjjj ˆxxxxxx sR 121121 !!

If there is high dependency?

15-23

Confidence Interval

100(1-­α)% Confidence Interval for jβ

jst

knj βαβ ˆ2),1(

ˆ+−

±

Reflects the number of data points minus the number of parameters that have to be estimated.

df for SSE

15-24

Testing whether a partial slope coefficient is equal to zero.

0

0

0

00

<

>

=

j

j

ja

j

H

H

β

β

β

β

Test Statistic:

js

t j

β

β

ˆ

ˆ=

Rejection Region:

2),1(

),1(

),1(

|| α

α

α

+−

+−

+−

>

−<

>

kn

kn

kn

tt

tt

ttAlternatives:

15-25

Predicting Y

• We use the least squares fitted value, , as our predictor of a single value of y at a particular value of the explanatory variables (x1, x2, ..., xk).

• The corresponding interval about the predicted value of y is called a prediction interval.

• The least squares fitted value also provides the best predictor of E(y), the mean value of y, at a particular value of (x1, x2, ..., xk). The corresponding interval for the mean prediction is called a confidence interval.

• Formulas for these intervals are much more complicated than in the case of SLR;; they cannot be calculated by hand (see the book).

y

15-26

Minimum R2 for a “Significant” Regression

Since we have formulas for R2 and F, in terms of n, k, SSE and TSS, we can relate these two quantities.

We can then ask the question: what is the min R2 which will ensure the regression model will be declared significant, as measured by the appropriate quantile from the F distribution?

The answer (below), shows that this depends on n, k, and SSE/TSS.

α,1,min2

1 −−−−= knkFTSS

SSEknkR

15-27

Minimum R2 for Simple Linear Regression (k=1)