1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

62
1 Research Method Research Method Lecture 6 (Ch7) Lecture 6 (Ch7) Multiple Multiple regression with regression with qualitative qualitative variables variables ©

Transcript of 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Page 1: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

1

Research MethodResearch Method

Lecture 6 (Ch7)Lecture 6 (Ch7)

Multiple regression Multiple regression with qualitative with qualitative

variablesvariables

©

Page 2: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Dummy variablesDummy variables

Often, our data contain qualitative variables, such as gender. These are not quantitative variable. They are qualitative variables.

2

Page 3: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

However, such qualitative variables are also important in analyzing data. For example, you may want to answer the following question: “Is there any gender wage gap?”

3

Page 4: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”.

If you would like to incorporate gender information in your model, create the following dummy variable:

Female =1 if the person is female =0 if the person is male

4

Page 5: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Incorporating dummy Incorporating dummy variable as an independent variable as an independent

variablevariable Suppose you are interested in gender

wage gap, then you include the dummy variable for female as

Log(wage)=β0+δ0(female)+β1(experience)+u

where wage is hourly wage rate, and experience is in years.

Then δ0 shows the wage difference between male and female who have the same experience. To understand this, see the next slide.

5

Page 6: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

For male, the predicted log wage at a given experience is

6

)(expˆˆ)(expˆ0ˆˆ)( 10100_

^

erienceeriencewageLog malefor

For female, the predicted log wage at a given experience is

)(expˆˆˆ)(expˆ1ˆˆ)( 100100_

^

erienceeriencewageLog femalefor

Therefore, the gender difference in wage at a given experience is given by

0_

^

_

^ˆ)()(gap eGender wag maleforfemalefor wageLogwageLog

If female earns less than male, will be negative

Page 7: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Using a graph, the gender wage gap is described as an intercept shift because:

7

)(expˆˆ)( :maleFor 10

^

eriencewageLog

)(expˆ)ˆˆ()( :femaleFor 100

^

eriencewageLog

Intercept for male

Intercept for female

Assuming that female earns lower salary, (that is is negative), the predicted wage experience profiles would look like the ones in the next slide

Page 8: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

8

Log(wage)

Experience

Male

Female

00ˆˆ

Note that is usually negative, so experience salary profile for female lies below the male’s.

The estimated wage-experience profiles by gender

Page 9: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

The base groupThe base group When you include (female), you do not

include (male). The predicted wage for males is given by

setting female=0. Thus the wage gap is estimated relative

to males. This means that, in our example, we set

males as the base group. We often call this group as, the benchmark group, excluded group, or the excluded category.

9

Page 10: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

ExampleExample

Use Wage1.dta, estimate the following model. Is there any gender wage gap? How big is the wage gap?

Log(wage)=β0+δ0(female)+β1(experience)+u

10

Page 11: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

11

_cons 1.747566 .0406442 43.00 0.000 1.66772 1.827412 exper .0037591 .0015813 2.38 0.018 .0006526 .0068656 female -.3929703 .0429205 -9.16 0.000 -.4772881 -.3086526 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .49133 Adj R-squared = 0.1456 Residual 126.253553 523 .241402588 R-squared = 0.1488 Model 22.076198 2 11.038099 Prob > F = 0.0000 F( 2, 523) = 45.72 Source SS df MS Number of obs = 526

. reg lwage female exper

Female earns 39% lower wage than male after controlling for experience.

Page 12: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Policy analysis using a Policy analysis using a dummy variabledummy variable

State of Michigan provided a job training program for manufacturing companies. Did this grant helped firms providing more training to their employees?

To answer to this question, you may estimate the following model.

12

(Hours of training per employee)=β0+δ0(grant)+β1log(sales)+u

Where (grant) is a dummy variable taking the value 1 if the firm received the grant, and 0 otherwise.

Page 13: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

13

_cons 64.37193 17.65352 3.65 0.000 29.63906 99.1048 lsales -3.542481 1.167679 -3.03 0.003 -5.839861 -1.245101 grant 33.90265 3.767611 9.00 0.000 26.48997 41.31533 hrsemp Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 223900.229 319 701.881595 Root MSE = 23.427 Adj R-squared = 0.2180 Residual 173981.225 317 548.836673 R-squared = 0.2230 Model 49919.0036 2 24959.5018 Prob > F = 0.0000 F( 2, 317) = 45.48 Source SS df MS Number of obs = 320

. reg hrsemp grant lsales

Grant appears to have a significant effect on employee training.

Page 14: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Using dummy variables Using dummy variables for multiple categoriesfor multiple categories

When you compare gender gap, there are only two groups: males or females.

However, in some situation, there are more than 2 categories. For example, you may want to examine the gender differences among the following four groups

Married men Married women Single men Single women

14

Page 15: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Then, solution is to create dummy variables for all the categories except one category. For example, you estimate

Log(wage)=β0+δ0(Married men)

+δ1(Married women)

+δ2(Single women)

+β1(Education)

+β2(experience)

+β3(experience)2+u

The excluded group is the single male. So the differences in wage among the four groups are estimated relative to single males

15

Page 16: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Exercise, using WAGE1.dta, estimate the model in the previous page.

16

Page 17: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

17

_cons .2841051 .1027389 2.77 0.006 .08227 .4859403 expersq -.0006192 .0001107 -5.59 0.000 -.0008367 -.0004017 exper .0348695 .0051408 6.78 0.000 .0247701 .0449688 educ .0822455 .006866 11.98 0.000 .0687569 .095734 singlewomen -.1207069 .0573047 -2.11 0.036 -.2332847 -.0081292marriedwomen -.2178298 .0593555 -3.67 0.000 -.3344364 -.1012231 marriedmen .2459701 .0566721 4.34 0.000 .1346352 .357305 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .40489 Adj R-squared = 0.4198 Residual 85.0841096 519 .163938554 R-squared = 0.4264 Model 63.2456419 6 10.5409403 Prob > F = 0.0000 F( 6, 519) = 64.30 Source SS df MS Number of obs = 526

. reg lwage marriedmen marriedwomen singlewomen educ exper expersq

Married men earns 24.6% more than single male. Married women earns 21.8% less than single male. Single women earns 12.1% less than single male.

Page 18: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

18

use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\WAGE1.DTA", clear

******************** Create dummy for** married men ******************** gen marriedmen=0 replace marriedmen=1 if female==0 & married==1******************** Create dummy for** married women ******************** gen marriedwomen=0 replace marriedwomen=1 if female==1 & married==1******************** Create dummy for** single women ******************** gen singlewomen=0 replace singlewomen=1 if female==1 & married==0********************** Estimate the model**********************

reg lwage marriedmen marriedwomen singlewomen educ exper expersq

Here is the do file I used to obtain the results.

Page 19: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Incorporating ordinary Incorporating ordinary information by using dummy information by using dummy

variablesvariables

Some information is ordinary, like the credit rating or the law school rankings.

For concreteness, consider to estimate the effect of municipal credit rating on the municipal bond interest

You have credit rating variable that takes values from 1 to 5. The rating 1 is the worst rating, and 5 is the best rating.

19

Page 20: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

How do we incorporate this information? One possibility is to estimate

(Municipal bond interest rate)=β0+β1(Credit rating)+(other factors)

Then β1 shows the change in municipal bond interest when credit rating increases by 1.

20

Page 21: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

But this assume that the effect of improving credit rating from 1 to 2 is the same as the effect of improving the rating from 2 to 3, and so on.

But there is no reason why the improvement from 1 to 2 should be the same as 2 to 3.

In this situation, it is better to create dummy variables for each rating, excluding one category, then include them in the model.

21

Page 22: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

That is, create the following 4 dummies

CR1 =1 if credit rating=1 =0 if otherwiseCR2=1 if credit rating=2 =0 if otherwiseCR3 =1 if credit rating=3 =0 if otherwiseCR4=1 if credit rating =4 =0 if otherwiseThe excluded category is credit rating=5

22

Page 23: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

(Municipal bond interest rate)=β0+β1CR1+β2CR2+β3CR3+β4CR4

+(other factors)

Then, β1 shows the effect of getting credit rating 1 on the bond interest rate relative to credit rating 5. Other coefficients are interpreted in the same way.

23

Page 24: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

ExerciseExercise Use beauty.dta, examine if one’s

physical attractiveness would affect wage. Use the variable for `below average looks’ and `above average looks’. Include other variables where it makes sense to do so. Try also to estimate separately for male and female.

24

Page 25: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Interactions involving Interactions involving dummy variablesdummy variables

Example 1 Suppose that you are interested in gender

wage gap, but you suspect that gender wage gap may change with experience.

Then you would estimate the following.Log(wage)=β0+δ0(female)

+δ1(female)(experience)

+β1(experience)+u

25

Page 26: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Then male wage at given experience is written as

26

)(expˆˆ

)(expˆ)(exp0ˆ0ˆˆ)(

10

1100_

^

erience

erienceriencewageLog malefor

)(expˆ)(expˆˆˆ

)(expˆ)(exp1ˆ1ˆˆ)(

1100

1100_

^

eriencerience

erienceriencewageLog femalefor

Female wage at given experience is written as

Thus, the gender gap at a given experience is:

)(expˆˆ)()(gap eGender wag 10_

^

_

^

eriencewageLogwageLog maleforfemalefor

Page 27: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Thus is the gender wage gap at hiring (i.e, experience=0). Usually it is negative. So, if the coefficient for the interaction term, , is positive, then the gender gap is decreasing with experience. If is negative, the gender gap is increasing with experience.

The case where gender gap is increasing with experience is described in the following slide.

27

1̂1̂

Page 28: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Case where gender gap is increasing with experience: (i.e., is negative)

28

Male

Female

)(expˆˆ10 erience

Gender gap at a given experience =

Experience

Log(wage)1̂

Page 29: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

ExerciseExercise Use Wage1.dta estimate the following

model.Log(wage)=β0+δ0(female)

+δ1(female)(experience)

+β1(experience)+u

Q1. Is the gender gap increasing or decreasing with experience?

Q2. What is the gender gap at hiring (exp=0)Q3. What is the gender gap at experience equal

to 10? Is the gender gap significant at this experience?

29

Page 30: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

30

_cons 1.697672 .0486394 34.90 0.000 1.602119 1.793225 exper .0066007 .0021976 3.00 0.003 .0022835 .0109179female_exper -.0058634 .0031567 -1.86 0.064 -.0120649 .000338 female -.2934319 .0685958 -4.28 0.000 -.4281897 -.158674 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .49018 Adj R-squared = 0.1496 Residual 125.424584 522 .24027698 R-squared = 0.1544 Model 22.9051677 3 7.63505589 Prob > F = 0.0000 F( 3, 522) = 31.78 Source SS df MS Number of obs = 526

. reg lwage female female_exper exper

Page 31: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Answer1.Gender gap is increasing with experience since the

coefficient on the interaction term is negative2.Gender gap at hiring =-0.293.Gender gap at experience equal to 10 = -0.293+(-.00586)*10=-0.35

This gap is significant at 5% level.

31

Prob > F = 0.0000 F( 1, 522) = 53.46

( 1) female + 10 female_exper = 0

. test female +female_exper*10=0

Page 32: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

The interaction between The interaction between two dummy variablestwo dummy variables

Suppose that you are interested in if gender wage gap is concentrated in particular group of people. For example, you want to know if gender wage gap is concentrated in married people.

32

Page 33: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Then you can estimate the following model.

Log(wage)=β0+δ0(female)

+δ1(female)(married)

+β1(experience)

+β2(married) +u

Then we have the following Gender gap for married people

=δ0+δ1

Gender gap for single people = δ0

33

Page 34: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

ExerciseExercise Using Wage1.dta, estimate the following model.Log(wage)=β0+δ0(female)

+δ1(female)(married)

+β1(experience)

+β2(married) +u

What is the gender wage gap within married people? Is it statistically significant?

What is the gender wage gap within single people? Is it statistically significant?

34

Page 35: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

35

_cons 1.510186 .053837 28.05 0.000 1.404422 1.615951 married .4167843 .0636418 6.55 0.000 .2917583 .5418104 exper .0009918 .0016056 0.62 0.537 -.0021623 .004146 fem_married -.3726252 .0858297 -4.34 0.000 -.5412401 -.2040104 female -.133259 .0668952 -1.99 0.047 -.2646766 -.0018414 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 148.329751 525 .28253286 Root MSE = .47312 Adj R-squared = 0.2077 Residual 116.620157 521 .223839073 R-squared = 0.2138 Model 31.7095946 4 7.92739865 Prob > F = 0.0000 F( 4, 521) = 35.42 Source SS df MS Number of obs = 526

. reg lwage female fem_married exper married

. gen fem_married =female*married

Page 36: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

1. Gender wage gap within married people = (-0.133)+ (-0.372)=-0.505. It is significant at 5% level.

36

Prob > F = 0.0000 F( 1, 521) = 88.64

( 1) female + fem_married = 0

. test female+fem_married=0

2. Gender wage gap within single people = -0.133. It is significant at 5% level. (This is based on the usual t-test. )

Page 37: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Testing for differences in Testing for differences in regression functions across regression functions across

groups (The Chow test)groups (The Chow test) Consider initially that you are interested in

examining the determinants of GPA of college students. So you have the following equation in mind.

(Cumulative GPA)=β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u

Where SAT is the SAT score, Hispanic is the dummy for Hispanics and (total hours) is the total hours of college courses.

37

Page 38: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

But suppose that you wonder if all the explanatory variables have different effects on GPA depending on gender.

That is, you wonder if males and females have different coefficients.

We can test if this is the case by estimating the following model.

38

Page 39: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

(Cumulative GPA)=β0+β1(SAT)+β2(Hispanic)+β3(total hours)

+δ0(female)

+δ1(female)(SAT)

+δ2(female)(Hispanic)

+δ3(female)(Total hours)+u

Then we can test of if males and females have different coefficients by testing the following hypotheses using F-test.

H0: δ0=0, δ1=0, δ2=0, δ3=0

H1: H0 is not true

39

Page 40: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

This particular F-test is called the Chow test.

Now, using GPA3.dta, conduct the Chow test described above.

40

Page 41: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

41

. gen female_tothrs=female*tothrs

. gen female_hsp=female*hsperc

. gen female_sat=female*sat

Prob > F = 0.0015 F( 4, 724) = 4.42

( 4) female_tothrs = 0 ( 3) female_hsp = 0 ( 2) female_sat = 0 ( 1) female = 0

. test (female=0) (female_sat=0) (female_hsp=0) (female_tothr=0)

_cons 1.213984 .2648281 4.58 0.000 .6940617 1.733907female_tot~s .0055599 .0020696 2.69 0.007 .0014968 .009623 female_hsp .0000508 .0041025 0.01 0.990 -.0080035 .008105 female_sat .0011167 .0005 2.23 0.026 .0001351 .0020984 female -1.113638 .528539 -2.11 0.035 -2.15129 -.0759859 tothrs .0103004 .0010928 9.43 0.000 .0081549 .0124459 hsperc -.0059675 .0017765 -3.36 0.001 -.0094551 -.0024798 sat .0006113 .000235 2.60 0.009 .0001499 .0010727 cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 715.898555 731 .979341389 Root MSE = .85907 Adj R-squared = 0.2464 Residual 534.309148 724 .73799606 R-squared = 0.2537 Model 181.589407 7 25.9413439 Prob > F = 0.0000 F( 7, 724) = 35.15 Source SS df MS Number of obs = 732

. reg cumgpa sat hsperc tothrs female female_sat female_hsp female_tothrs

We reject the null hypothesis that male and female have the same functional form at 5% significance level.

Page 42: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Chow testChow test: What to do when : What to do when you have a lot of variables.you have a lot of variables.

Chow test is easy when your initial model contains 3 or 4 variables.

But if your model contains many variables, creating interaction terms takes a lot of time.

Here is another way to do the same Chow test.

42

Page 43: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

The equivalent procedure of Chow test: (Let me explain this by using the same example)

Step 1: Estimate the initial model using only the male sample.

(Cumulative GPA)=β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u

The obtain SSR. Call this SSR1.

43

Page 44: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Step 2: Estimate the initial model using only the female sample.

(Cumulative GPA)=β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u

The obtain SSR. Call this SSR2.

44

Page 45: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Step 3: Estimate the initial model using pooled sample (both males and females included)

(Cumulative GPA)=β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u

The obtain SSR. Call this SSRp.

45

Page 46: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Step 4: Compute the following statistic

46

1

)1(2)(SSRF

21

21P

k

kn

SSRSSR

SSRSSR

This F-statistic follows F distribution with degree of freedom equal to [k+1, n-2(k+1)]

You reject the null hypothesis that males and females have the same coefficients if F-stat falls in the rejection region.

This particular F-stat is called Chow statistic. This F-stat will be the same as the F-stat when you include the interaction terms as described before.

k is the number of slope parameters in the initial model. Note k does not include female. So in our example, k=3.

n is the number of the observations.

Page 47: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

ExerciseExercise Conduct Chow test again using the

alternative method described above.

47

Page 48: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

48

_cons 1.213984 .2602697 4.66 0.000 .7027359 1.725233 tothrs .0103004 .001074 9.59 0.000 .0081907 .0124101 hsperc -.0059675 .0017459 -3.42 0.001 -.0093969 -.002538 sat .0006113 .000231 2.65 0.008 .0001576 .001065 cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 480.313125 551 .871711661 Root MSE = .84428 Adj R-squared = 0.1823 Residual 390.619421 548 .712809162 R-squared = 0.1867 Model 89.6937042 3 29.8979014 Prob > F = 0.0000 F( 3, 548) = 41.94 Source SS df MS Number of obs = 552

. reg cumgpa sat hsperc tothrs if female==0

_cons .1003465 .4810947 0.21 0.835 -.8491105 1.049803 tothrs .0158603 .0018485 8.58 0.000 .0122122 .0195085 hsperc -.0059167 .0038895 -1.52 0.130 -.0135927 .0017594 sat .0017281 .0004642 3.72 0.000 .0008119 .0026442 cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 227.171352 179 1.2691137 Root MSE = .90356 Adj R-squared = 0.3567 Residual 143.689727 176 .816418902 R-squared = 0.3675 Model 83.4816253 3 27.8272084 Prob > F = 0.0000 F( 3, 176) = 34.08 Source SS df MS Number of obs = 180

. reg cumgpa sat hsperc tothrs if female==1

Male only sample

Female only sample

SSR1

SSR2

Page 49: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

49

_cons .9291105 .2285515 4.07 0.000 .4804118 1.377809 tothrs .0119779 .0009314 12.86 0.000 .0101494 .0138064 hsperc -.0063791 .0015678 -4.07 0.000 -.0094572 -.0033011 sat .0009028 .0002079 4.34 0.000 .0004947 .0013109 cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 715.898555 731 .979341389 Root MSE = .86711 Adj R-squared = 0.2323 Residual 547.364897 728 .751874858 R-squared = 0.2354 Model 168.533658 3 56.1778861 Prob > F = 0.0000 F( 3, 728) = 74.72 Source SS df MS Number of obs = 732

. reg cumgpa sat hsperc tothrsSSRP

Pooled sample (both male and female)

42.41)(3

1)2(3-732

) 143.689727 1(390.61942

) 143.689727 1(390.61942-547.364897

F

This follows F[3+1, 724-2(3+1)]=F(4, 716)

The cutoff at 5% significance level is 2.37. Thus we reject the null hypothesis that males and females have the same coefficients.Also note that this F-stat is the same as the F-stat you obtained by using the other method.

Page 50: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Always think whether the Always think whether the policy variable is policy variable is

endogenous or notendogenous or not Consider that you are interested in

estimating the effects of employee training grants on the employee productivity.

Then you may estimate

(Productivity)=β0+β1(grant)+β2(sales)+(Other factors)+u

50

Page 51: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Using JTRAIN.dta, we estimate the above model.

We use the log of scrap rate as the measure of the productivity.

The lower the scrap rate, the higher the productivity.

51

Page 52: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

52

_cons 4.986779 4.655588 1.07 0.290 -4.384433 14.35799 lemploy .6394289 .3651366 1.75 0.087 -.095553 1.374411 lsales -.4548425 .3733152 -1.22 0.229 -1.206287 .2966021 grant -.0517781 .4312869 -0.12 0.905 -.9199137 .8163574 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 95.0906112 49 1.94062472 Root MSE = 1.3854 Adj R-squared = 0.0110 Residual 88.2852083 46 1.91924366 R-squared = 0.0716 Model 6.8054029 3 2.26846763 Prob > F = 0.3270 F( 3, 46) = 1.18 Source SS df MS Number of obs = 50

. reg lscrap grant lsales lemploy if year==1988

•So, we did not find evidence that (grant) reduces scrap rate (i.e., grand does not increase productivity). But is this effect the true effect?•Now, the most important condition for OLS is that the explanatory variables should be uncorrelated with the error term. Let us consider if (grant) is uncorrelated with the error term.

Page 53: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

The answer is that, it is likely that the grant is correlated with the error term. In other word, (grant) is likely to be endogenous. Thus, the coefficient on (grant) is likely to be biased.

The reason is the following.

This employee training grant is given to firms on first-come first-serve basis.

53

Page 54: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Thus, it is very likely that the firms with less productive workers saw a greater benefit in this training grant. Thus, less productive firms are more likely to have received the grant.

This causes the endogeneity problem.

To clear up the situation, consider a variable, (ability), which is the average ability of workers prior to the grant application.

54

Page 55: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

(Ability) would affect the scrap rate of the firm, but this is unobserved to the researcher.

Thus, this variable is contained in the error term. This can be written as:

55

u

eabilityemploymentsalesgrantScrap )()log()log()()log( 33210

So the error term u is equal to (β3ability+e)

Since firm with low ability workers are more likely to get the grant, (grant) and (ability) are negatively correlated.

Page 56: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

This means that u and (grant) are correlated. Thus (grant) is endogenous, and thererefore, the coefficeint for grant will be biased.

Notice that the endogeneity in this example is caused by the fact that the firms with low ability workers self-selected into the grant program.

Thus, this is often called self-selection problem.

56

Page 57: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Therefore, in a policy analysis, if the observations self-select into the program, you should always suspect endogeneity in the policy variable.

57

Page 58: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

Now, the next question is, what is the direction of the biases?

We can use the omitted variable bias framework to guess the direction.

58

Page 59: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

If we have the variable (ability), we can estimate the following model, which satisfies all the OLS assumptions.

59

(1) ..... )log()log()()log( 43210 eabilityemploymentsalesgrantScrap

However, since we do not observe (ability), we can only estimate the following model which omits ability.

(2) ..... )log()log()()log( 3210 uemploymentsalesgrantScrap

Page 60: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

If we had the variable ability, we could estimate (1). Let be the OLS coefficient for (grant) for equation (1)

Let be the estimated OLS coefficient for (grant) using equation (2).

Then using the result of omitted variable biases in the handout 2, the relationship between and is given by:

60

1

~

The bias term: This determines the direction of bias

The true effect of (grant)

The actual estimate of the effect of grant (which is unfortunately biased).

1411

~ˆˆ~

1

~

Page 61: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

is OLS coefficient for (grant) in the following regression.

61

1

~

erroremploymentsalesgrantability )log()log()( 3210

Since firm with high ability workers are more productive (i.e., scrap rate is low), will be negative

Since (grant) and (ability) are negatively correlated, is negative.

Therefore, we can predict the direction of bias as follows.

1

~

Page 62: 1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©

62

)(

)(

1

)(

4

)(

11

~ˆˆ~

Negative since grant is likely to reduce scrap rate (i.e., increase productivity)

There will be positive bias (or upward bias).

•Thus, even if the true effect of grant on scrap rate is negative, the bias term will cancel out this effect. •Thus, the endogeneity problem will bias the coefficient towards not finding the effects of grant on scrap rate.