Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

34
Dr. C. Ertuna 1 Issues Regarding Regression Models (Lesson - 06/C)

Transcript of Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Page 1: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 1

Issues Regarding Regression Models

(Lesson - 06/C)

Page 2: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 2

Collinearity

• A perfect linear relationship between two (or more) independent variables is called collinearity (multi-collinearity)

• Under this condition, the least-square regression coefficients cannot be uniquely defined.

Page 3: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 3

Collinearity

• A strong but less than perfect linear relationship between the independent variables can cause:

1. Regression coefficients to be unstable,2. Standard errors to the coefficients become

large, hence, confidence intervals for coefficients become large and coefficients become imprecise,

Page 4: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 4

Collinearity Mesurement

• One of the measures to determine the impact of Collinearity on the precision of the estimates is called

the “Variance Inflation Factor (VIF).”

Page 5: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 5

Collinearity Detection

• Wrong signs for the coefficients

• Drastic changes in the coefficients in terms of size and/or sign as a new variable is added to the equation.

• High VIF

Page 6: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 6

Collinearity: Remedies

• There is no Quick Fix for collinearity,• Some strategies:1. Variable selection for the model:

Based on correlation matrix, some of the highly correlated variables could be excluded from the model,

2. Ridge Regression instead Ordinary Least Squared Regression (OLR).

Page 7: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 7

Unusual Data

A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis. 

If a single observation (or small group of observations) substantially changes your results, you would want to know about this and investigate further. 

There are three ways that an observation can be unusual.

Page 8: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 8

Unusual Data

• Outliers: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Page 9: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 9

Unusual Data

• Leverage: An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients.

Page 10: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 10

Unusual Data

• Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.

Page 11: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 11

Outliers and Influential Data

• An outlier is an observation whose dependent variable value is unusual given the value of the independent variable

• Not all outliers has an important effect on the intercept and/or slope of the regression.

• For an outlier to be influential it should be away from the mean of the independent variable.

Page 12: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 12

Influential Data: Diagnosis

Cook’s D• If Cook’s distance for a particular

observation is greater than a cutoff point than that observation could be considered as influential data.

• One such cutoff point is– Di > 4 / (n-k-1)– Where, k = number of independent variables

Page 13: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 13

Influential Data Diagnostics on SPSS

Standardized DfBETA(s):• Change in the regression coefficient that results

from the deletion of the ith case. A standardized DfBETA value is computed for each case for each regression coefficient generated by a model.

• Cut-off Points> 0 means case i increases the slope< 0 means case i decreases the slope|DfBETA(s)| > 2 strong indication of influence|DfBETA(s)| > 2/sqrt(n) might be problem

Page 14: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 16

Influential Data: Remedies

• The unusual data need to be investigated– For example, it may stem from an error in data

entry

• The model could be re-specified, robust estimation methods could be used,

• An influential data could only be discarded if it is a truly bad data and cannot be corrected.

Page 15: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 17

Checking the Assumptions

• There are assumptions that need to be met to accept the results of Regression analysis and use the model for future decision making:

• Linearity

• Independence of errors (No autocorrelation),

• Normality of errors,

• Constant Variance of errors (Homoscadasticity ).

Page 16: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 18

Tests for Linearity

Linearity:• Plot dependent variable against each of the

independent variables separately.• Decide whether linear regression is a

“Reasonable” description of the tendency in the data.– Consider curvilinear pattern,– Consider undue influence of one data point on

the regression line, etc.

Page 17: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 19

Nonlinear Relationships

Advertising

Sale

s

Diminishing Returns Relationship of Advertising versus Sales

Page 18: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 20

Nonlinear Relationships

Advertising

Sale

s

Diminishing Returns Relationship of Advertising versus Sales

Page 19: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 21

Analysis of ResidualsR

esid

uals

0

1

-1

2

-2

3

-3

Resid

uals

0

1

-1

2

-2

3

-3

(a) Nonlinear Pattern

(b) Linear Pattern

Page 20: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 22

Tests for Independence

Independence of Errors:• Plot residuals against time (Residual-Time Plot)

– Residuals form y-axis, time form x-axis

– If the residuals group alternately into positive and negative clusters then that indicates auto-correlation

• Ljung-Box Test(Note that only one lag version is applied here)

Page 21: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 23

Residuals-Time Plot

• Notice the tendency of the residuals to group alternately into positive and negative clusters.

• That is an indication that the residuals are not independent but auto-correlated.

Residual-Time Plot

-20

0

20

0 10 20 30 40

TimeRe

sidua

ls

Page 22: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 24

Analysis of ResidualsR

esid

uals

0

1

-1

2

-2

3

-3 (a) Independent Residuals

Resid

uals

0

1

-1

2

-2

3

-3(b) Residuals Not Independent

Time

Time

Page 23: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 25

Ljung-Box Test

• Compute LB Test Statistics for one lag (Q(1))

– Q(1) = (n(n-2)/ (n-1) ) * Correl(Data_Range_1, Data_Range_2)^2

• Compare LB against Chi-square_alpha-value– Chiinv ( alpha / tails, 1)

• Ho: Q(1) < Chi-square_alpha

Page 24: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 26

Non-Independence: Remedies

• EGLS (Estimated Generalized Least Squares) Methods– Prais-Winsten – Cochrane-Orcutt

(Note that these are effective only for first-order autocorrelation.)

Page 25: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 27

Tests for Normality

Normality of Errors:

• Normal-Quantile Plot of Residuals (Errors)

• Compute Skewness

• Compute Kurtosis

• Jarque-Bera Test

Page 26: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 28

Normal-Quantile Plot of Residuals

• Sort Residuals (min => max)

• Create a Rank column• Compute z-scores

=NORMINV((rank-0.5)/N,0,1)

• Plot z-scores (x) and residuals (y)

• For normality the plot should be reasonably linear.

Normal Quantile Plot of Errors

-3

-2

-1

0

1

2

3

-15 -10 -5 0 5 10 15 20

Normal Quantiles

Err

ors

Page 27: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 29

Jarque-Bera Test (in Excel)

• Compute JB-Test Statistics– JB = (n/6)*Skew(Data_Range)^2 +

+ (n/24) * ( Kurt(Data_Range)^2

• Compute p-value by using the formula – Chdist(JB,2)

• Ho: Data is normally distributed – Note that JB is very sensitive to sample size, and p_values are

not uniformly distributed, hence danger in committing Type I

error.

Page 28: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 30

Non-Normality: Remedies

To stabilize error variance, one of the most frequently used technique is data transformation.

• X and/or Y values could be transformed by employing power to those variables,

• y (or x) => yp (or xp)

where p = -2, -1, -½, ½, 2, 3

Page 29: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 31

Tests for Constant Variance

Constant Variance of Errors:• Plot residuals against y-estimates:

– Residuals form y-axis and estimated y-values form x-axis.

– When errors get larger (or smaller) as y-values increase that would indicate non-constant variance.

• Plot residuals against each x:– Residuals form y-axis and x-values form x-axis.

Page 30: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 32

Analysis of ResidualsR

esid

uals

0

1

-1

2

-2

3

-3x1

(a) Variance Decreases as x Increases

Page 31: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 33

Analysis of ResidualsR

esid

uals

0

1

-1

2

-2

3

-3

(b) Variance Increases as x Increases

x1

Page 32: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 34

Analysis of ResidualsR

esid

uals

0

1

-1

2

-2

3

-3

(c) Constant Variance

x1

Page 33: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 35

Non-Constant Variance: Remedies

• Transform dependent variable (y)– y => yp

where p = -2, -1, -½, ½, 2, 3

• Weighted Least Square Regression Method

Page 34: Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)

Dr. C. Ertuna 36

Next Lesson

(Lesson - 07/A) Qualitative & Judgmental

Forecasting Methods