1 Econ 495 - Econometric Review Contents - Faculty of...

Econ 495 - Econometric Review 1

Contents

1 Linear Regression Analysis 4

1.1 The Mincer Wage Equation . . . . . . . . . . . . . . . . . 4

1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Econometric Model . . . . . . . . . . . . . . . . . . . . . 9

1.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 17


1.6 Inference - Hypothesis Testing . . . . . . . . . . . . . . . 20

1.7 Reporting the results . . . . . . . . . . . . . . . . . . . . 25

1.8 Interpretation of the Estimates . . . . . . . . . . . . . . . 27

1.9 Multivariate Regression Analysis . . . . . . . . . . . . . . 33

1.10 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 36

1.11 Interpretation of the Estimates . . . . . . . . . . . . . . . 40

1.12 Choosing the Functional Form . . . . . . . . . . . . . . . 46

1.13 Potential Problems . . . . . . . . . . . . . . . . . . . . . 49


1.13.1 Multicollinearity . . . . . . . . . . . . . . . . . . . 49

1.13.2 Omitted Variables Bias . . . . . . . . . . . . . . . 52

1.13.3 Heteroscedasticity . . . . . . . . . . . . . . . . . . 55


1 Linear Regression Analysis

1.1 The Mincer Wage Equation

• Our first exercise in empirical analysis will focus on the determinants

of wages in a cross-section of individuals, that is, observations on

individuals at a specific point in time.

• A complete wage equation model would include the following human

capital variables

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (1)

where the term ui contains factors such as ability, quality of education,

family background and other factors influencing a person’s wage.


• For some specific purpose, we will also include gender and union status.

• We may think of the relationship between wages and their determi-

nants, including institutions and industrial characteristics, as the wage

structure.

• Let’s suppose to begin with that we are interested in the effect of

education, β1, measured in years of schooling, on wages

wagesi = β0 + β1educi + ui (2)


1.2 Data

• The Labour Force Survey selects individuals (close to) randomly and

ask them about their wage (Yi), education and other characteristics

(Xi).

• These data {(Xi, Yi) : i = 1, , n} will constitute our random sample

(A2) of size n from the population.

• A scatter plot of wages and education level indicates a positive rela-

tionship.


010

2030

4050

rwag

e

8 10 12 14 16 18schooling

Figure 1: Wages and Years of Schooling

• As do the average wages by education level


Table 1: Average hourly wages by education level

Education Level ≈ Years Wagesof Schooling

All workers 13 17.653810 to 8 years 8 14.15454Some secondary 10 13.39185Grade 11 to 13 12 15.81303Some post secondary 13 15.45389Post secondary diploma 14 18.26481University: bachelors 16 23.58583Graduate degree 18 29.11108


• But we may want to know by how much do wages increase whenschooling increases by one year

1.3 Econometric Model

• The (population) regression function

E(wagesi|educi) = β0 + β1educi (3)

describe the wages conditional on a level of schooling as a linear (A1)function of the parameters, under the zero conditional mean (A3)assumption E(ui|educi) = 0,

• For any given value of schooling, the distribution of wages is centeredabout E(wages|schooling)


010

2030

4050

8 10 12 14 16 18

rwag

e

schooling

Figure 2: E(wages|schooling) as a linear function of schooling

• Note that E(ui|educi) = 0 implies by the law of iterated expectations

that E(ui) = 0 and than Cov(ui, educi) = E(ui ∗ educi) = 0.

• This means that ui has a zero mean and is uncorrelated with educ,

which may be farfetched in this case.


• Another typical assumption (A5) is that V ar(ui|educ) = σ2 is con-

stant, a property called homoskedasticity.

• But it appears problematic here! We will see later how to test for it.

1.4 Estimation

• The objective is to obtain an estimate called β1 of the unknown pa-

rameter β1 from the data sample


• Let Yi denote wages and Xi denote education, we can write the model

as

Yi = β0 + β1Xi + ui (4)

• Either through the method of moments which substitutes the sample

average in the moments conditions

E(ui) = E[Yi − β0 − β1Xi)] = 0

E(ui ∗ Xi) = E[(Yi − β0 − β1Xi) ∗ Xi] = 0

• or with the ordinary least squares estimator which minimizes the sum-

of-squared errors, SS =∑n

i=1(Yi − β0 − β1Xi)2, we obtain the same

estimator

β1 =

∑ni=1(Xi − X)(Yi − Y )

∑ni=1(Xi − X)2

(5)

Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228.

Ordinary Least-Squares Regression

IntroductionOrdinary least-squares (OLS) regression is a generalized linear modelling technique that may be used to model a single response variable which has been recorded on at least an interval scale. The technique may be applied to single or multiple explanatory variables and also categorical explanatory variables that have been appropriately coded.

Key FeaturesAt a very basic level, the relationship between a continuous response variable (Y) and a continuous explanatory variable (X) may be represented using a line of best-fit, where Y is predicted, at least to some extent, by X. If this relationship is linear, it may be appropriately represented mathematically using the straight line equation 'Y = α + βx', as shown in Figure 1 (this line was computed using the least-squares procedure; see Ryan, 1997).

The relationship between variables Y and X is described using the equation of the line of best fit with α indicating the value of Y when X is equal to zero (also known as the intercept) and β indicating the slope of the line (also known as the regression coefficient). The regression coefficient β describes the change in Y that is associated with a unit change in X. As can be seen from Figure 1, β only provides an indication of the average expected change (the

observed data are scattered around the line), making it important to also interpret the confidence intervals for the estimate (the large sample 95% two-tailed approximation of the confidence intervals can be calculated as β ± 1.96 s.e. β).

In addition to the model parameters and confidence intervals for β, it is useful to also have an indication of how well the model fits the data. Model fit can be determined by comparing the observed scores of Y (the values of Y from the sample of data) with the expected values of Y (the values of Y predicted by the regression equation). The difference between these two values (the deviation, or residual as it is also called) provides an indication of how well the model predicts each data point. Adding up the deviances for all the data points after they have been squared (this basically removes negative deviations) provides a simple measure of the degree to which the data deviates from the model overall. The sum of all the squared residuals is known as the residual sum of squares (RSS) and provides a measure of model-fit for an OLS regression model. A poorly fitting model will deviate markedly from the data and will consequently have a relatively large RSS, whereas a good-fitting model will not deviate markedly from the data and will consequently have a relatively small RSS (a perfectly fitting model will have an RSS equal to zero, as there will be no deviation between observed and expected values of Y). It is important to understand how the RSS statistic (or the deviance as it is also known; see Agresti,1996, pages 96-97) operates as it is used to determine the significance of individual and groups of variables in a regression model. A graphical illustration of the residuals for a simple regression model is provided in Figure 2. Detailed examples of calculating deviances from residuals for null and simple regression models can be found in Hutcheson and Moutinho, 2008.

The deviance is an important statistic as it enables the contribution made by explanatory variables to the prediction of the response variable to be determined. If by adding a variable to the model, the deviance is greatly reduced, the added variable can be said to have had a large effect on the prediction of Y for that model. If, on the other hand, the deviance is not greatly reduced, the added variable can be said to have had a small effect on the prediction of Y for that model. The change in the deviance that results from the explanatory variable being added to the model is used to determine the significance of that variable's effect on the prediction of Y in that model. To assess the effect that a single explanatory variable has on the prediction of Y, one simply compares the deviance statistics before and after the variable has been added to the model. For a simple OLS regression model, the effect of the explanatory variable can be assessed by comparing the RSS statistic for the full regression model (Y = α + βx) with that for the null model (Y = α). The difference in deviance between the nested models can then be tested for significance using an F-test computed from the following equation.

F df p−dfp+q

,dfp+q

=RSS p−RSS p+q

df p−df p+q RSS p+q / df p+q

where p represents the null model, Y = α, p+q represents the model Y = α + βx, and df are the degrees of freedom associated with the designated model. It can be seen from this equation that the F-statistic is simply based on the difference in the deviances between the two models as a fraction of the deviance of the full model, whilst taking account of the number of parameters.

In addition to the model-fit statistics, the R-square statistic is also commonly quoted and provides ameasure that indicates the percentage of variation in the response variable that is `explained' by the model. R-square, which is also known as the coefficient of multiple determination, is defined as

R2 =RSS after regression

total RSSand basically gives the percentage of the deviance in the response variable that can be accounted for by adding the explanatory variable into the model. Although R-square is widely used, it will always increase as variables are added to the model (the deviance can only go down when additional variables are added to a model). One solution to this problem is to calculate an adjusted R-square statistic (R2

a) which takes into account the number of terms entered into the model and does not necessarily increase as more terms are added. Adjusted R-square can be derived using the following equation

Ra2 = R2−

k 1−R2 n−k−1

where n is the number of cases used to construct the model and k is the number of terms in the model (not including the constant).

An example of simple OLS regressionA simple OLS regression model with a single explanatory variable can be illustrated using the example of predicting ice cream sales given outdoor temperature (Koteswara, 1970). The model for this relationship


which will work provides that∑n

i=1(Xi − X)2 > 0, that is that there

is enough sampling variation (A4).

• But at the same times, OLS will be sensitive to outliers, so we do

not want too much variation. This can be written as finite fourth

moments: 0 < E(X4i ) < ∞ and 0 < E(Y 4

i ) < ∞.

• This makes sense since the population parameter β1

β1 =Cov(Yi, Xi)

V ar(Xi)

when E(ui) = 0 and Cov(ui, Xi) = 0.


• and β0 = E(Yi) − E(Xi)β1 will be estimated by

β0 =n∑

i=1

Yi − β1

n∑

i=1

Xi (6)

• Predicted wages are obtained from sample regression function

Y = β0 + β1X (7)

• The residual ui, is an estimate of the error term ui, and is the difference

between the fitted line (sample regression function) and the sample

point

ui = Yi − Yi


• Thus intuitively, OLS is fitting a line through the sample points such

that the vertical distance between the actual wages and the predicted

wage squared, that is, the squared residuals, is as small as possible

• Under assumptions (A1)-(A4), our OLS estimates will be unbiased,

that is E(β1) = β1 and E(β0) = β0. Adding assumption (A5), the

OLS estimator is BLUE is the sense that it is the minimum variance

linear unbiased estimator (Gauss-Markov Theorem).

• In practice, the computer software does the computation for us


. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals


1.5 Diagnostics - Goodness of Fit

• The STATA output gives many measures of whether our regression

model fits the data well

Model/Explained : SSE ≡n∑

i=1

(Yi − Y )2

Residual : SSR ≡n∑

i=1

(ui)2

Total : SST ≡n∑

i=1

(Yi − Y )2

and the R2 which is the ratio of the explained variation compared to

the total variation

R2 = SSE/SST = 1 − SSR/SST


• The R2 can also be shown to equal the squared correlation coefficient

between the actual Yi and the fitted values Yi.

• The adjusted R2 takes into account the number of explanatory vari-

ables R2a = 1 − (1 − R2)(n − c)/(n − k) where k is the number of

variables in the model and c = 1 if there is a constant.

• Here, a R2 = 0.15 means that 15% of the variation in wages across

individuals is explained by their education level

• This means that 85% of the variation in wages remains unexplained!

We will want to add more variables!







-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------


Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight


• Yet, typically in cross-sectional data R2 are very low.

• So by these standards, this regression is pretty good, but the R2 is

not the only way to judge the success of a model.

• Also reported are

Root MSE = s =√

SSR/(n − k) and F = SSE/(k − c)s2

The F-Statistic is used to test whether a group of variables should be

included in the model.







-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------


Siwan

Highlight

Siwan

Highlight

Siwan

Highlight


1.6 Inference - Hypothesis Testing

• The success of a model also depends on whether the variables included

in the model belong there, that is, are statistically significant.

• Under the assumption (A6) that the ui are normally distributed with

zero mean and variance σ2 : u ∼ Normal(0, σ2), the estimates β will

also be distributed normally distributed, and

(β − β)/se(β) ∼ tDF

will follow the Student-t distribution, where DF = n − k − 1 the

degrees of freedom in the model is equal to the number of observations

minus the number of variables minus 1 for the constant.


• We can use the t−statistic reported by STATA to test the null hy-

pothesis H0 : β = 0 against H1 : β 6= 0

• If the t−statistic is greater the critical value corresponding to our

degrees of freedom and the desired level of the test (5% or 1%), we

can reject the null

• The rule of thumb is: if |t| ≥ 2.0 then reject H0 : β = 0 at the

5% significance level. For more robustness, sometimes we prefer even

higher values.

• But we do not have to look the critical values in a table since STATA

gives us the p − value corresponding to our t−statistic


– If p ≤ 0.01 then the relationship is significant at the 1% level,



• Here with a t-statistic of 41.59, we can say that schooling is a very

significant factor explaining the variation in wages

• It is all very good to know that the coefficient of schooling is different

from zero, but we would also like know how precisely it is estimated

• The confidence intervals tells us that, under the classical OLS assump-


tions (A1-A6), there is a 95% chance that the true parameter lies

β ± α · se(β)

where α is the 97.5th percentile in a tn−k−1 distribution.

• If the degrees of freedom DF = n − k − 1 > 120, the tn−k−1 distri-

bution is close enough to the normal to use the 97.5th percentile of

the standard normal, the confidence intervals will be

[β − 1.96 · se(β), β + 1.96 · se(β)]

• Thus the rule of thumb: a coefficient is not significant if its magnitude

is less than twice its standard error.


• In our example, this means that there is 95% chance that the true

coefficient of schooling is between 1.47 and 1.61, that is within a 0.14

range, this is almost too precise!







-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------


Siwan

Highlight

Siwan

Highlight


1.7 Reporting the results

• The results from a STATA output are reported in a table that typically

contains

– estimated coefficients

– standard errors of the coefficients

– number of observations

– R2 or R2a

• In some instances, it may worthwhile to report other statistics. We

will discuss these issues when we will cover the readings.


• The custom command outreg used after the regress command han-

dles the formatting of the output. (See the course web site on how to

install custom commands.)

outreg schooling using tableM1, replace bdec(3) se 3aster title("Wage Regression")ctitle("(1)")

the file Table M1.out can then be opened in Excel (right-clicking on it)

Wage Regression(1)schooling 0.082[0.002]***Constant 1.684[0.027]***Observations 9720R-squared 0.14Standard errors in brackets* significant at 10\%; ** significant at 5\%; *** significant at 1\%







-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------


Siwan

Highlight

Siwan

Highlight

Siwan

Highlight


1.8 Interpretation of the Estimates

• In general, the β parameters measure the marginal effect of increasing

X by one unit on the predicted wages Y .

• In our example,

∆wage = β1∆educ

tell us that the wage value of an one additional year of schooling in

this sample is $1.54.

• But in this simple regression, we cannot claim to have found a causal

relationship, so we should be cautious in our interpretation


• The value of -2.42 for β0 says that a person with zero years of schooling

has a negative predicted wage, which is silly. This occurs because no

one in our sample has less than 8 years of schooling. For a person

with eight years of schooling, the predicted wage is

wage = −2.42 + 1.54 ∗ 8 = 9.90

which is above the minimum wage.

• If this person completes high school (4 more years), our model predicts

that the predicted wage would be higher by 4*$1.54=$6.60 per hour

more! This is more than the average wage of $15.80 for high school

graduates in Table 1, which may make us question the linearity in our

functional form assumption.


• Indeed, it is more common to estimate the following log-linear model

log(wagesi) = β0 + β1educi + ui (8)

where log(·) denotes the natural logarithm. Since wages tend to be

lognormal, this reduces the problem of heteroscedasticity.

• This is equivalent with writing wage = exp(β0+β1educi +ui), which

is consistent with the increasing returns to education that we found in

Table 1.

• In this case the interpretation of β1 is

%∆wage ≈ (100 · β1)∆educ

that is multiplying β1 by 100 gives us the percentage change in pre-

dicted wage given an additional year of schooling.


• We run the log wage regression by first taking the log of the dependent

variable

regress lrwage schooling


Model | 331.915467 1 331.915467 Prob > F = 0.0000Residual | 2034.6566 9718 .209369891 R-squared = 0.1403

-------------+------------------------------ Adj R-squared = 0.1402Total | 2366.57207 9719 .243499544 Root MSE = .45757

------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | .081804 .0020546 39.82 0.000 .0777767 .0858314

_cons | 1.684045 .027169 61.98 0.000 1.630788 1.737302------------------------------------------------------------------------------

• The coefficient on schooling has a percentage interpretation when it


is multiplied by 100. That is, predicted wages increase by 8.2 percent

for every additional year of education.

• In the human capital interpretation of the wage equation, this means

that the rate of return of one year of schooling is 8.2%, not bad!

• This easy interpretation of the rate of return of schooling is one of the

reasons why the log wage specification is the preferred one.

• The intercept of 1.684 is again not very meaningful, since it gives the

predicted log(wages) when schooling = 0


• The log-linear model imposes a constant percentage effect of schooling

on wages.

• Another important model is the log-log model which is a constant

elasticity model. It would be more meaningful if we had some measure

of output as in

log(salary)i = β0 + β1 log(sales)i + ui

• In this case, the interpretation of β1 is the estimated elasticity of salary

with respect to sales

%∆salary = β1%∆sales ⇐⇒ β1 =%∆salary

%∆sales


1.9 Multivariate Regression Analysis

• We have already improved our wage equation model by using log(wages),

now we would like to add more variables, in particular labour market

experience

• We can also use a more flexible functional form by adding higher order

terms (polynomial) in the explanatory variables

• For example, here a quadratic in experience can capture diminishing

returns to on-the-job training

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (9)


• In the US equivalent of the Canadian Labour Force Survey, the Current

Population Survey (CPS), data on years of schooling and age in years

is available, but not the number of years of actual labour market

experience

• So a potential experience variable is constructed as: exper = (age −educ − 6) and the regression results for US-CPS (for October 1997)

are


. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)




------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202

exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809

_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------


test exper= exp2=0

( 1) exper - exp2 = 0( 2) exper = 0

F( 2, 11889) = 873.19Prob > F = 0.0000

1.10 Diagnostics - Goodness of Fit

• As before, we can use the t-statistic to determine whether each variable

is statistically significant individually


• But we can also use the F-statistic to test the significance of the whole

model, that is the hypothesis that the variables are jointly significant

H0 : β1 = β2 = β3 = 0 vs. H1 : H0 is not true

• Here, we would overwhelmingly reject H0.

• The F-statistic can also be used to test a restricted model against an

unrestricted model.

• The model log(wagesi) = β0 + β1educi can be seen as a restricted

version of the model with experience where H0 : β2 = β3 = 0


• We can test this hypothesis using the R-squared form of the F-statistic

F ≡(R2

ur − R2r)/q

(1 − R2ur)/(n − k − 1)

where q is the number of exclusion restrictions and n− k− 1 = DFur

. regress lwage educ [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)




Siwan

Highlight


------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1080619 .0018107 59.68 0.000 .1045127 .111611

_cons | .9835394 .0245724 40.03 0.000 .9353734 1.031705------------------------------------------------------------------------------

• We get F=[(0.3291-0.2305)/(1-0.3291)](11889/2)=873.644, which is

greater than the critical value F2,11889 = 3.00, so we reject H0.


. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)




------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202

exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809

_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------

Siwan

Highlight


1.11 Interpretation of the Estimates

• The general model

Yi = β1X1 + β2X2 + β3X3 + . . . + βkXk (10)

can written in terms of changes

∆Y=β1∆X1 + β2∆X2 + β3∆X3 + . . . + βk∆Xk (11)

• the coefficient on the variable Xk measures the change in Yi due to

a one-unit increase in Xk, holding all the other explanatory variables

fixed (the so-called ceteris paribus) assumption: ∆Yk = βk∆Xk

• These effects are sometimes called marginal or partial effects.


• In our example, since the dependent variable is log(wages) the inter-

pretation of β1 = 0.108, the coefficient of educ, is of 10.8 percent

increase in predicted wages for every additional year of education.

• Since β2 = 0.04 > 0 and β3 = −0.0006 < 0, there is a concave

relationship between log wages and experience.

• With the experience variable, the ceteris paribus assumption does not

work directly, when we increase exper, exp2 will increase as well, so

we have to compute the partial effects:

∆ log(wages)

∆exper≈

∂ log(wages)

∂exper= β2 + 2 ∗ β3exper|exper (12)


. sum exper

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

exper | 11893 18.67317 11.58664 0 58

. scalar experbar=r(mean)

. di experbar18.673169. lincom exper+2*exp2*experbar

( 1) exper + 37.34634 exp2 = 0


-------------+----------------------------------------------------------------(1) | .0145183 .0003718 39.05 0.000 .0137894 .0152471

------------------------------------------------------------------------------

• Here, at the average experience level of 18.67 years, this gives: 0.0383817+2*(-


0.000639)*18.67=0.0145, or a return of about 1.5% per year of expe-

rience on average.

• The turning point exper∗ = |β1/(2β2)| = 30.03 appears to make

sense

• But we can compare the plots of the quadratic on experience with a

local polynomial estimate, a more flexible functional form

. regress lwage exper exp2 [weight=weight]

. gen twage=_b[_cons]+_b[exper]*exper+_b[exp2]*exp2

. lpoly lwage exper [aweight=weight], gen(lwagep experp) nograph

. twoway (scatter twage exper) (connected experp lwagep )


1.5

22.

5

0 20 40 60

twage lpoly smooth: lwage

Figure 3: Impact of experience on log(wages)

• Compared with the univariate regression, log(wages) = β0 + β1educ with the mul-

tivariate regression, log(wagesi) = β0 + β1educ + β2exper + β3exper2, we would

generally expect β1 6= β1

• Here, the estimates are pretty close! This means that educ and exper are uncor-

related in this sample. This could also happens if β2 = 0, that is if exper was


uncorrelated with wages|educ.


1.12 Choosing the Functional Form

• We have already tried a few functional forms

wagesi = β0 + β1educi + ui

log(wagesi) = β0 + β1educi + ui

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + ui

• Perhaps, we could soften the curvature of the relationship between exper andlog(wages) with a quartic

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + β4exper3

i + β5exper4i + ui

. regress lwage educ exper exp2 exp3 exp4 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)






-------------+----------------------------------------------------------------educ | .1086498 .0017533 61.97 0.000 .1052131 .1120864

exper | .0637706 .00501 12.73 0.000 .0539502 .0735909exp2 | -.0023563 .0004593 -5.13 0.000 -.0032567 -.001456exp3 | .0000358 .0000154 2.32 0.020 5.60e-06 .000066exp4 | -1.80e-07 1.70e-07 -1.06 0.289 -5.13e-07 1.53e-07

_cons | .4973948 .0269574 18.45 0.000 .4445538 .5502357------------------------------------------------------------------------------

• Now the model with the quadratic in experience in the restricted

model, we get F=[(0.3332-0.3291)/(1-0.3332)](11887/2)=36.55, which

is greater than the critical value F2,11887 = 3.00, so we reject H0.


• If the models were not nested (i.e. could not be derived one from the

anoter), we can use the adjusted R2a as a guide to choose our preferred

model

• Using log variables is also often convenient, especially for positive

dollar amounts, and for very large variables such as population

• Variables measured in years and variables that are a proportion or

percent are better used in level form


1.13 Potential Problems

1.13.1 Multicollinearity

• We could be tempted to use

log(wagesi) = β0 + β1educi + β2 log(experi) + β3 log(exper2i ) + ui

• But this would not work because log(exper2i ) = 2 ∗ log(experi), so

that log(exper2i ) and log(experi) would be perfectly correlated, we

would have a problem of multicollinearity

• In this case, STATA would drop log(experi), so you would know that

something is wrong


• We can ask STATA to compute the Variance Inflation Factor, V IF =

(1−R2k)

−1, which measures the degree to which the variance has been

inflated because regressor k is not orthogonal to the other regressors.

. estat vif

Variable | VIF 1/VIF-------------+----------------------

exp2 | 11.69 0.085522exper | 11.47 0.087159educ | 1.07 0.938056

-------------+----------------------Mean VIF | 8.08

• A rule of thumb states that there is evidence of collinearity if the

largest VIF is greater than 10.


• Here, it is not too surprising that exper and exper2 are correlated,

but the quadratic in experience provides a better fit.

• More generally, when we are adding explanatory variables to a regres-

sion model to reduce the error variance, we should always try to include

independent variables that affect Y and are uncorrelated with all of

the independent variables of interest.

• Because near-collinearity inflates standard errors, significant coeffi-

cients will become more significant if you include less collinear re-

gressors.

• If we include variables that do not belong, there is no effect on our

parameter estimate, and OLS remains unbiased, i.e. E(β1) = β1


• Here, we could add demographic characteristics such as marital status,

geographic location, etc.

1.13.2 Omitted Variables Bias

• If we omit variables that do belong, then the OLS estimate will likely

be biased, E(β1) 6= β1

• For example, suppose that the true wage equation model was

wagesi = β0 + β1educi + β2abil + ui

but that since we do not observe ability, we estimate

wagesi = β0 + β1educi + vi (13)


where vi = β2abil + ui

• Then, calling β1 the estimate from the equation (13) that omits ability

(13), we can show that

E[β1] = β1 + β2δ1

where δ1 =Cov(educi, abili)

V ar(educi)

• More generally, when X1 and X2 are correlated and β2 6= 0, the

estimate β1 will be biased.

• The sign of the bias depends on both the sign of β2 and of δ1


Corr(X1, X2) > 0 Corr(X1, X2) < 0β2 > 0 positive bias negative biasβ2 < 0 negative bias positive bias

• In the case of the wage equation , because more ability leads to higher

productivity, and higher wage: β2 > 0. There are also reason to

believe that educ and ability are positively correlated, so we would

think that the OLS estimates from equation (13) are too large

• What to do about it? This is not an easy problem to correct, if we do

not have some measures of ability in our sample. (One has to take an

quasi-experimental approach using IV for example.)

• However, in terms of reporting results, one would be aware of the


possibility of an “omitted variables” bias and qualify the results as

likely “upward biased” or “downward biased”.

1.13.3 Heteroscedasticity

• When the variance of the error terms is not constant across observa-

tions, we have a problem of heteroscedasticity: V ar(ui|educ) = σ2i =

σ2(Xi)

• The OLS estimates are still unbiased and consistent, but the standard

errors of the estimates are biased if we have heteroskedasticity


• If the standard errors are biased, we can not use the usual t statistics

or F statistics or LM statistics for drawing inferences

• But we can test for it using the Breusch-Pagan test, which amounts

to testing H0 : t = 0 in V ar(ui) = σ2 exp(Zt), that is running a

regression using the squared OLS residuals as dependent variable on

the fitted values or some explanatory variables

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticityHo: Constant varianceVariables: fitted values of lrwage

chi2(1) = 3.42Prob > chi2 = 0.0645


• Here we are happy when we fail to reject H0

• What to do about this? In STATA, heteroskedasticty-robust standard

errors are easily obtained using the robust option of reg

• The resulting White or Huber standard errors will be asympotically

valid in the presence of any form of heteroscedasticity, including ho-

moscedasticity.

• When the form of the heteroskedasticiy is know, for example σ2i =

σ2 ∗ educ, then we can use weighted least squares (vwls in STATA)

. regress lwage educ exper exp2 exp3 exp4 [weight=weight],robust(analytic weights assumed)


(sum of wgt is 2.6311e+07)

Linear regression Number of obs = 11893F( 5, 11887) = 1054.37Prob > F = 0.0000R-squared = 0.3332Root MSE = .45817

------------------------------------------------------------------------------| Robust

lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------

educ | .1086498 .0020186 53.82 0.000 .1046929 .1126066exper | .0637706 .005069 12.58 0.000 .0538344 .0737067exp2 | -.0023563 .0004762 -4.95 0.000 -.0032898 -.0014229exp3 | .0000358 .0000162 2.21 0.027 4.01e-06 .0000676exp4 | -1.80e-07 1.81e-07 -1.00 0.318 -5.34e-07 1.74e-07

_cons | .4973948 .0289503 17.18 0.000 .4406475 .5541421------------------------------------------------------------------------------

1 Econ 495 - Econometric Review Contents - Faculty of...

Documents

Transcript of 1 Econ 495 - Econometric Review Contents - Faculty of...