Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

39
Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    5

Transcript of Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Page 1: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 1

Dr. Ka-fu Wong

ECON1003Analysis of Economic Data

Page 2: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 2l

GOALS

1. Describe the relationship between two or more independent variables and the dependent variable using a multiple regression equation.

2. Compute and interpret the multiple standard error of estimate and the coefficient of determination.

3. Interpret a correlation matrix.4. Setup and interpret an ANOVA table.5. Conduct a test of hypothesis to determine if any of

the set of regression coefficients differ from zero.6. Conduct a test of hypothesis on each of the

regression coefficients.

Chapter TwelveMultiple Regression and Correlation Multiple Regression and Correlation AnalysisAnalysis

Page 3: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 3

Multiple Regression Analysis

For two independent variables, the general form of the multiple regression equation is:

Yi = E(Yi | X1i , X2i )+ i

= 0 + 1 X1i + 2 X2i + i

X1i and X2i are the i-th observation of independent variables.

0 is the Y-intercept. 1 is the net change in Y for each unit change

in X1 holding X2 constant. It is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient.

Page 4: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 4

y = 0 + 1x

X

y

X2

1

The simple linear regression modelallows for one independent variable, “x”

y =0 + 1x +

The multiple linear regression modelallows for more than one independent variable.Y = 0 +1x1 + 2x2 +

Note how the straight line becomes a plain, and...

Visualize the multiple linear regression in a plot

y = 0 + 1x1+ 2x2

Page 5: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 5

X

y

1

y= 0+ 1x2

b0

Visualize the multiple non-linear regression in a plot

Page 6: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 6

X

y

X2

1

… a parabola becomes a parabolic surface

y= 0+ 1x2

y = 0 + 1x12 + 2x2

b0

Visualize the multiple non-linear regression in a plot

Page 7: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 7

Multiple Regression Analysis

The general multiple regression with k independent variables is given by:

Yi = E(Yi |X1i , X2i , …, Xki) + i

0 + 1 X1i + 2 X2i +…+ k Xki+ i

When k>2, it is impossible to visualize the regression equation in a plot.

The least squares criterion is used to develop an estimation of this equation.

Because determining 1, 2, etc. is very tedious, a software package such as Excel or other statistical software is recommended to estimate them.

Page 8: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 8

Choosing the line that fits bestOrdinary Least Squares (OLS) Principle

Straight lines can be describe generally by Yi = b0 + b1 X1i + b2 X2i +…+ bk Xki

Finding the best line with smallest sum of squared difference is the same as

∑n

1i

2kik2i21i10ik210

b,...,b,b,b

)]xb...xbxb(b-[y≡)b,...,b,b,S(bmink210

Let b0* , b1

* , b2* … bk

* be the solution of the above

problem. Y* = b0

* + b1*X1+ b2

*X2+ …+bk*Xk

is known as the “average predicted value” (or simply “predicted value”) of y for any vector of (X1, X2, …, Xk).

Page 9: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 9

Coefficient estimates from the ordinary least squares (OLS) principle

Solving the minimization problem implies the first order conditions:

0))](-xxb...xbxb(b-2[yb

)b,...,b,b,S(b

b,...,b,...,b,btscoeffieienotherallforand

0)](-1)xb...xbxb(b-2[yb

)b,...,b,b,S(b

)]xb...xbxb(b-[y)b,...,b,b,S(b

n

1ijikik2i21i10i

j

k210

kj21

n

1ikik2i21i10i

0

k210

n

1i

2kik2i21i10ik210

∑ ∂

∑ ∂

∑≡

Page 10: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 10

Coefficient estimates from the ordinary least squares (OLS) principle

Solving the first order conditions impliesThe solution of b0

* , b1* , b2

* … bk*

Y* = b0* + b1

*X1+ b2*X2+ …+bk

*Xk

is known as the “average predicted value” (or simply “predicted value”) of y for any vector of (X1, X2, …, Xk).

Page 11: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 11

An estimation of the coefficients is too

complicated by hand, let me use some

computational software packages, such as Excel.

Multiple Linear Regression Equations

Page 12: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 12

1. Slope (k) Estimated Y Changes by k for Each 1 Unit Increase

in Xk Holding All Other Variables ConstantNote Yi = E(Yi |X1i , X2i , …, Xki) + i

=0 + 1 X1i + 2 X2i +…+ k Xki+ i

E(Yi | X1i , X2i , …, Xki=g+1) - E(Yi | X1i , X2i , …, Xki=g)

= 0 + 1 X1i + 2 X2i +…+ k (g+1) – [0 + 1 X1i + 2 X2i +…+ k g ]

= k

Example: If 1 = 2, then sales (Y) is expected to increase by 2 for each 1 unit increase in advertising (X1) given the number of sales rep’s (X2)

2. Y-Intercept (0) Average value of Y when all Xk = 0

Interpretation of Estimated Coefficients

Page 13: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 13

You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00).

You’ve collected the You’ve collected the following data:following data:

RespResp SizeSize CircCirc

11 11 2244 88 8811 33 1133 55 7722 66 4444 1010 66

Parameter Estimation Example

Page 14: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 14

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 0.0640 0.2599 0.246 0.8214

ADSIZE 1 0.2049 0.0588 3.656 0.0399

CIRC 1 0.2805 0.0686 4.089 0.0264

b2

b0

b1

Parameter Estimation Computer Output

Page 15: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 15

Interpretation of Coefficients Solution

1. Slope (b1) # Responses to Ad is expected to

increase by .2049 (20.49) for each 1 sq. in. increase in Ad Size Holding Circulation Constant

2. Slope (b2) # Responses to Ad is expected to

increase by .2805 (28.05) for each 1 unit (1,000) increase in circulation Holding Ad Size Constant

Page 16: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 16

Multiple Standard Error of Estimate

The multiple standard error of estimate is a measure of the effectiveness of the regression equation.

It is measured in the same units as the dependent variable.

It is difficult to determine what is a large value and what is a small value of the standard error.

Page 17: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 17

Multiple Standard Error of Estimate

The formula is:

1)(k - n)Y-Σ(Y

ss2*

y.12...ke

Interpretation is similar to that in simple linear regression.

Because in computing y*, k+1 parameters must be fixed. Hence, we lost k+1 degree of freedom.

Page 18: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 18

Multiple Regression and Correlation Assumptions

The independent variables and the dependent variable have a linear relationship.

The dependent variable must be continuous and at least interval-scale.

The variation in (Y-Y*) or residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedasticity.

The residuals should follow the normal distributed with mean 0.

Successive values of the dependent variable must be uncorrelated.

Page 19: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 19

The ANOVA Table

The ANOVA table reports the variation in the dependent variable. The variation is divided into two components.

The Explained Variation is that accounted for by the set of independent variable.

The Unexplained or Random Variation is not accounted for by the independent variables.

Page 20: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 20

Correlation Matrix

A correlation matrix is used to show all possible simple correlation coefficients among the variables. See which xj are most correlated with y, and

which xj are strongly correlated with each other.

y x1 x2 xk

y 1.00 1x yr

2x yr kx yr

x1 1.00 1 2x xr 1 kx xr

x2 1.00 2 kx xr

xk 1.00

Page 21: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 21

1. High correlation between X variables2. Multicollinearity makes it difficult to

separate effect of x1 on y from the effect of x2 on y. Leads to unstable coefficients depending on X variables in model

3. Always exists -- matter of degree

4. Example: using both age & height as explanatory variables in same model

Multicollinearity

Page 22: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 22

1. Examine correlation matrix Correlations between pairs of X

variables are more than with Y variable

2. Few remedies Obtain new sample data Eliminate one correlated X variable

Detecting Multicollinearity

Page 23: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 23

Correlation Analysis

Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6

RESPONSE ADSIZE CIRC

RESPONSE 1.00000 0.90932 0.93117

0.0 0.0120 0.0069

ADSIZE 0.90932 1.00000 0.74118

0.0120 0.0 0.0918

CIRC 0.93117 0.74118 1.00000

0.0069 0.0918 0.0 rY1: correlation between response and ADSIZE rY2 : correlation between

response and CIRC

All 1’s

r12: correlation between ADSIZE and CIRC

Correlation Matrix Computer Output

Page 24: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 24

Global Test

The global test is used to investigate whether any of the independent variables have significant coefficients. The hypotheses are:

0 equal s all Not :

0...:

1

210

H

H k

The test statistic follows an F distribution with k (number of independent variables) and n-(k+1) degrees of freedom, where n is the sample size.

Page 25: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 25

Test for Individual Variables

This test is used to determine which independent variables have nonzero regression coefficients.

The variables that have zero regression coefficients are usually dropped from the analysis.

The test statistic is the t distribution with n-(k+1) degrees of freedom.

Page 26: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 26

EXAMPLE 1

A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College).

Page 27: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 27

Example 1 continued

Note the following regarding the regression equation. The variable college is called a dummy or indicator

variable. It can take only one of two possible outcomes. That is a child is a college student or not.

Other examples of dummy variables include gender, the part is acceptable or unacceptable, the voter will or will not vote for the incumbent

governor. We usually code one value of the dummy variable

as “1” and the other “0.”

Page 28: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 28

EXAMPLE 1 continued

Family Food Income Size Student

1 3900 376 4 0

2 5300 515 5 1

3 4300 516 4 0

4 4900 468 5 0

5 6400 538 6 1

6 7300 626 7 1

7 4900 543 5 0

8 5300 437 4 0

9 6100 608 5 1

10 6400 513 6 1

11 7400 493 6 1

12 5800 563 5 0

Page 29: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 29

EXAMPLE 1 continued

Use a computer software package, such as Excel, to develop a correlation matrix.

From the analysis provided by Excel, write out the regression equation:

Y*= 954 +1.09X1 + 748X2 + 565X3

What food expenditure would you estimate for a family of 4, with no college students, and an income of $50,000 (which is input as 500)?

Page 30: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 30

The regression equation is

Food = 954 + 1.09 Income + 748 Size + 565 Student

Predictor Coef SE Coef T P

Constant 954 1581 0.60 0.563

Income 1.092 3.153 0.35 0.738

Size 748.4 303.0 2.47 0.039

Student 564.5 495.1 1.14 0.287

S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

Analysis of Variance

Source DF SS MS F P

Regression 3 10762903 3587634 10.94 0.003

Residual Error 8 2623764 327970

Total 11 13386667

EXAMPLE 1 continued

Page 31: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 31

From the regression output we note: The coefficient of determination is 80.4 percent.

This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student.

Each additional $100 dollars of income per year will increase the amount spent on food by $109 per year.

An additional family member will increase the amount spent per year on food by $748.

A family with a college student will spend $565 more per year on food than those without a college student.

EXAMPLE 1 continued

Page 32: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 32

The correlation matrix is as follows: Food Income SizeIncome 0.587

Size 0.876 0.609

Student 0.773 0.491 0.743

The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.

None of the correlations among the independent variables should cause problems. All are between –.70 and .70.

EXAMPLE 1 continued

Page 33: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 33

EXAMPLE 1 continued

The estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student is $4,491.

Y* = 954 + 1.09(500) + 748(4) + 565 (0)

= 4491

Page 34: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 34

EXAMPLE 1 continued

Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero.

H0 is rejected if F>4.07. From the computer output, the computed

value of F is 10.94. Decision: H0 is rejected. Not all the

regression coefficients are zero

0 equal s all Not :0: 13210 HversusH

Page 35: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 35

EXAMPLE 1 continued

Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size.

From the computer output, the only significant variable is SIZE (family size) using the p-values. The other variables can be omitted from the model.

Thus, using the 5% level of significance, reject H0 if the p-value<.05

0 :0: 2120 HversusH

Page 36: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 36

EXAMPLE 1 continued

We rerun the analysis using only the significant independent family size.

The new regression equation is:

Y* = 340 + 1031X2

The coefficient of determination is 76.8 percent. We dropped two independent variables, and the R-square term was reduced by only 3.6 percent.

Page 37: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 37

Example 1 continued

Regression Analysis: Food versus Size

The regression equation isFood = 340 + 1031 Size

Predictor Coef SE Coef T PConstant 339.7 940.7 0.36 0.726Size 1031.0 179.4 5.75 0.000

S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%

Analysis of Variance

Source DF SS MS F PRegression 1 10275977 10275977 33.03 0.000Residual Error 10 3110690 311069Total 11 13386667

Page 38: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 38

Most of the procedures to evaluate the multiple regression model are the same as those discussed in the chapter of simple regression models. Residual analysis Test for linearity

Global F-test Test for the coefficient of correlation

irrelevant. Test for individual slope of regression line not

enough. Non-independence of error variables. Durbin-Watson Statistics Outliers

yi* = b0 +b1x1i+…+bkxki

Evaluating the Model

Evalu

atin

g

the M

od

el

Page 39: Ka-fu Wong © 2003 Chap 12- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 12- 39

- END -

Chapter TwelveMultiple Regression and Correlation Multiple Regression and Correlation AnalysisAnalysis