1 Econ 495 - Econometric Review Contents - Faculty of...

65
Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation ................. 4 1.2 Data ............................. 6 1.3 Econometric Model ..................... 9 1.4 Estimation .......................... 11 1.5 Diagnostics - Goodness of Fit ................ 17

Transcript of 1 Econ 495 - Econometric Review Contents - Faculty of...

Page 1: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 1

Contents

1 Linear Regression Analysis 4

1.1 The Mincer Wage Equation . . . . . . . . . . . . . . . . . 4

1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Econometric Model . . . . . . . . . . . . . . . . . . . . . 9

1.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 17

Page 2: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 2

1.6 Inference - Hypothesis Testing . . . . . . . . . . . . . . . 20

1.7 Reporting the results . . . . . . . . . . . . . . . . . . . . 25

1.8 Interpretation of the Estimates . . . . . . . . . . . . . . . 27

1.9 Multivariate Regression Analysis . . . . . . . . . . . . . . 33

1.10 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 36

1.11 Interpretation of the Estimates . . . . . . . . . . . . . . . 40

1.12 Choosing the Functional Form . . . . . . . . . . . . . . . 46

1.13 Potential Problems . . . . . . . . . . . . . . . . . . . . . 49

Page 3: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 3

1.13.1 Multicollinearity . . . . . . . . . . . . . . . . . . . 49

1.13.2 Omitted Variables Bias . . . . . . . . . . . . . . . 52

1.13.3 Heteroscedasticity . . . . . . . . . . . . . . . . . . 55

Page 4: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 4

1 Linear Regression Analysis

1.1 The Mincer Wage Equation

• Our first exercise in empirical analysis will focus on the determinants

of wages in a cross-section of individuals, that is, observations on

individuals at a specific point in time.

• A complete wage equation model would include the following human

capital variables

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (1)

where the term ui contains factors such as ability, quality of education,

family background and other factors influencing a person’s wage.

Page 5: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 5

• For some specific purpose, we will also include gender and union status.

• We may think of the relationship between wages and their determi-

nants, including institutions and industrial characteristics, as the wage

structure.

• Let’s suppose to begin with that we are interested in the effect of

education, β1, measured in years of schooling, on wages

wagesi = β0 + β1educi + ui (2)

Page 6: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 6

1.2 Data

• The Labour Force Survey selects individuals (close to) randomly and

ask them about their wage (Yi), education and other characteristics

(Xi).

• These data {(Xi, Yi) : i = 1, , n} will constitute our random sample

(A2) of size n from the population.

• A scatter plot of wages and education level indicates a positive rela-

tionship.

Page 7: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 7

010

2030

4050

rwag

e

8 10 12 14 16 18schooling

Figure 1: Wages and Years of Schooling

• As do the average wages by education level

Page 8: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 8

Table 1: Average hourly wages by education level

Education Level ≈ Years Wagesof Schooling

All workers 13 17.653810 to 8 years 8 14.15454Some secondary 10 13.39185Grade 11 to 13 12 15.81303Some post secondary 13 15.45389Post secondary diploma 14 18.26481University: bachelors 16 23.58583Graduate degree 18 29.11108

Page 9: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 9

• But we may want to know by how much do wages increase whenschooling increases by one year

1.3 Econometric Model

• The (population) regression function

E(wagesi|educi) = β0 + β1educi (3)

describe the wages conditional on a level of schooling as a linear (A1)function of the parameters, under the zero conditional mean (A3)assumption E(ui|educi) = 0,

• For any given value of schooling, the distribution of wages is centeredabout E(wages|schooling)

Page 10: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 10

010

2030

4050

8 10 12 14 16 18

rwag

e

schooling

Figure 2: E(wages|schooling) as a linear function of schooling

• Note that E(ui|educi) = 0 implies by the law of iterated expectations

that E(ui) = 0 and than Cov(ui, educi) = E(ui ∗ educi) = 0.

• This means that ui has a zero mean and is uncorrelated with educ,

which may be farfetched in this case.

Page 11: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 11

• Another typical assumption (A5) is that V ar(ui|educ) = σ2 is con-

stant, a property called homoskedasticity.

• But it appears problematic here! We will see later how to test for it.

1.4 Estimation

• The objective is to obtain an estimate called β1 of the unknown pa-

rameter β1 from the data sample

Page 12: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 12

• Let Yi denote wages and Xi denote education, we can write the model

as

Yi = β0 + β1Xi + ui (4)

• Either through the method of moments which substitutes the sample

average in the moments conditions

E(ui) = E[Yi − β0 − β1Xi)] = 0

E(ui ∗ Xi) = E[(Yi − β0 − β1Xi) ∗ Xi] = 0

• or with the ordinary least squares estimator which minimizes the sum-

of-squared errors, SS =∑n

i=1(Yi − β0 − β1Xi)2, we obtain the same

estimator

β1 =

∑ni=1(Xi − X)(Yi − Y )

∑ni=1(Xi − X)2

(5)

Page 13: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228.

Ordinary Least-Squares Regression

IntroductionOrdinary least-squares (OLS) regression is a generalized linear modelling technique that may be used to model a single response variable which has been recorded on at least an interval scale. The technique may be applied to single or multiple explanatory variables and also categorical explanatory variables that have been appropriately coded.

Key FeaturesAt a very basic level, the relationship between a continuous response variable (Y) and a continuous explanatory variable (X) may be represented using a line of best-fit, where Y is predicted, at least to some extent, by X. If this relationship is linear, it may be appropriately represented mathematically using the straight line equation 'Y = α + βx', as shown in Figure 1 (this line was computed using the least-squares procedure; see Ryan, 1997).

The relationship between variables Y and X is described using the equation of the line of best fit with α indicating the value of Y when X is equal to zero (also known as the intercept) and β indicating the slope of the line (also known as the regression coefficient). The regression coefficient β describes the change in Y that is associated with a unit change in X. As can be seen from Figure 1, β only provides an indication of the average expected change (the

observed data are scattered around the line), making it important to also interpret the confidence intervals for the estimate (the large sample 95% two-tailed approximation of the confidence intervals can be calculated as β ± 1.96 s.e. β).

In addition to the model parameters and confidence intervals for β, it is useful to also have an indication of how well the model fits the data. Model fit can be determined by comparing the observed scores of Y (the values of Y from the sample of data) with the expected values of Y (the values of Y predicted by the regression equation). The difference between these two values (the deviation, or residual as it is also called) provides an indication of how well the model predicts each data point. Adding up the deviances for all the data points after they have been squared (this basically removes negative deviations) provides a simple measure of the degree to which the data deviates from the model overall. The sum of all the squared residuals is known as the residual sum of squares (RSS) and provides a measure of model-fit for an OLS regression model. A poorly fitting model will deviate markedly from the data and will consequently have a relatively large RSS, whereas a good-fitting model will not deviate markedly from the data and will consequently have a relatively small RSS (a perfectly fitting model will have an RSS equal to zero, as there will be no deviation between observed and expected values of Y). It is important to understand how the RSS statistic (or the deviance as it is also known; see Agresti,1996, pages 96-97) operates as it is used to determine the significance of individual and groups of variables in a regression model. A graphical illustration of the residuals for a simple regression model is provided in Figure 2. Detailed examples of calculating deviances from residuals for null and simple regression models can be found in Hutcheson and Moutinho, 2008.

Page 14: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

The deviance is an important statistic as it enables the contribution made by explanatory variables to the prediction of the response variable to be determined. If by adding a variable to the model, the deviance is greatly reduced, the added variable can be said to have had a large effect on the prediction of Y for that model. If, on the other hand, the deviance is not greatly reduced, the added variable can be said to have had a small effect on the prediction of Y for that model. The change in the deviance that results from the explanatory variable being added to the model is used to determine the significance of that variable's effect on the prediction of Y in that model. To assess the effect that a single explanatory variable has on the prediction of Y, one simply compares the deviance statistics before and after the variable has been added to the model. For a simple OLS regression model, the effect of the explanatory variable can be assessed by comparing the RSS statistic for the full regression model (Y = α + βx) with that for the null model (Y = α). The difference in deviance between the nested models can then be tested for significance using an F-test computed from the following equation.

F df p−dfp+q

,dfp+q

=RSS p−RSS p+q

df p−df p+q RSS p+q / df p+q

where p represents the null model, Y = α, p+q represents the model Y = α + βx, and df are the degrees of freedom associated with the designated model. It can be seen from this equation that the F-statistic is simply based on the difference in the deviances between the two models as a fraction of the deviance of the full model, whilst taking account of the number of parameters.

In addition to the model-fit statistics, the R-square statistic is also commonly quoted and provides ameasure that indicates the percentage of variation in the response variable that is `explained' by the model. R-square, which is also known as the coefficient of multiple determination, is defined as

R2 =RSS after regression

total RSSand basically gives the percentage of the deviance in the response variable that can be accounted for by adding the explanatory variable into the model. Although R-square is widely used, it will always increase as variables are added to the model (the deviance can only go down when additional variables are added to a model). One solution to this problem is to calculate an adjusted R-square statistic (R2

a) which takes into account the number of terms entered into the model and does not necessarily increase as more terms are added. Adjusted R-square can be derived using the following equation

Ra2 = R2−

k 1−R2 n−k−1

where n is the number of cases used to construct the model and k is the number of terms in the model (not including the constant).

An example of simple OLS regressionA simple OLS regression model with a single explanatory variable can be illustrated using the example of predicting ice cream sales given outdoor temperature (Koteswara, 1970). The model for this relationship

Page 15: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 13

which will work provides that∑n

i=1(Xi − X)2 > 0, that is that there

is enough sampling variation (A4).

• But at the same times, OLS will be sensitive to outliers, so we do

not want too much variation. This can be written as finite fourth

moments: 0 < E(X4i ) < ∞ and 0 < E(Y 4

i ) < ∞.

• This makes sense since the population parameter β1

β1 =Cov(Yi, Xi)

V ar(Xi)

when E(ui) = 0 and Cov(ui, Xi) = 0.

Page 16: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 14

• and β0 = E(Yi) − E(Xi)β1 will be estimated by

β0 =n∑

i=1

Yi − β1

n∑

i=1

Xi (6)

• Predicted wages are obtained from sample regression function

Y = β0 + β1X (7)

• The residual ui, is an estimate of the error term ui, and is the difference

between the fitted line (sample regression function) and the sample

point

ui = Yi − Yi

Page 17: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 15

• Thus intuitively, OLS is fitting a line through the sample points such

that the vertical distance between the actual wages and the predicted

wage squared, that is, the squared residuals, is as small as possible

• Under assumptions (A1)-(A4), our OLS estimates will be unbiased,

that is E(β1) = β1 and E(β0) = β0. Adding assumption (A5), the

OLS estimator is BLUE is the sense that it is the minimum variance

linear unbiased estimator (Gauss-Markov Theorem).

• In practice, the computer software does the computation for us

Page 18: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 16

. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals

Page 19: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 17

1.5 Diagnostics - Goodness of Fit

• The STATA output gives many measures of whether our regression

model fits the data well

Model/Explained : SSE ≡n∑

i=1

(Yi − Y )2

Residual : SSR ≡n∑

i=1

(ui)2

Total : SST ≡n∑

i=1

(Yi − Y )2

and the R2 which is the ratio of the explained variation compared to

the total variation

R2 = SSE/SST = 1 − SSR/SST

Page 20: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 18

• The R2 can also be shown to equal the squared correlation coefficient

between the actual Yi and the fitted values Yi.

• The adjusted R2 takes into account the number of explanatory vari-

ables R2a = 1 − (1 − R2)(n − c)/(n − k) where k is the number of

variables in the model and c = 1 if there is a constant.

• Here, a R2 = 0.15 means that 15% of the variation in wages across

individuals is explained by their education level

• This means that 85% of the variation in wages remains unexplained!

We will want to add more variables!

Page 21: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 16

. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals

Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Page 22: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 19

• Yet, typically in cross-sectional data R2 are very low.

• So by these standards, this regression is pretty good, but the R2 is

not the only way to judge the success of a model.

• Also reported are

Root MSE = s =√

SSR/(n − k) and F = SSE/(k − c)s2

The F-Statistic is used to test whether a group of variables should be

included in the model.

Page 23: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 16

. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals

Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Page 24: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 20

1.6 Inference - Hypothesis Testing

• The success of a model also depends on whether the variables included

in the model belong there, that is, are statistically significant.

• Under the assumption (A6) that the ui are normally distributed with

zero mean and variance σ2 : u ∼ Normal(0, σ2), the estimates β will

also be distributed normally distributed, and

(β − β)/se(β) ∼ tDF

will follow the Student-t distribution, where DF = n − k − 1 the

degrees of freedom in the model is equal to the number of observations

minus the number of variables minus 1 for the constant.

Page 25: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 21

• We can use the t−statistic reported by STATA to test the null hy-

pothesis H0 : β = 0 against H1 : β 6= 0

• If the t−statistic is greater the critical value corresponding to our

degrees of freedom and the desired level of the test (5% or 1%), we

can reject the null

• The rule of thumb is: if |t| ≥ 2.0 then reject H0 : β = 0 at the

5% significance level. For more robustness, sometimes we prefer even

higher values.

• But we do not have to look the critical values in a table since STATA

gives us the p − value corresponding to our t−statistic

Page 26: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 22

– If p ≤ 0.01 then the relationship is significant at the 1% level,

– If p ≤ 0.05 then the relationship is significant at the 5% level,

– If p ≤ 0.10 then the relationship is significant at the 10% level,

• Here with a t-statistic of 41.59, we can say that schooling is a very

significant factor explaining the variation in wages

• It is all very good to know that the coefficient of schooling is different

from zero, but we would also like know how precisely it is estimated

• The confidence intervals tells us that, under the classical OLS assump-

Page 27: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 23

tions (A1-A6), there is a 95% chance that the true parameter lies

β ± α · se(β)

where α is the 97.5th percentile in a tn−k−1 distribution.

• If the degrees of freedom DF = n − k − 1 > 120, the tn−k−1 distri-

bution is close enough to the normal to use the 97.5th percentile of

the standard normal, the confidence intervals will be

[β − 1.96 · se(β), β + 1.96 · se(β)]

• Thus the rule of thumb: a coefficient is not significant if its magnitude

is less than twice its standard error.

Page 28: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 24

• In our example, this means that there is 95% chance that the true

coefficient of schooling is between 1.47 and 1.61, that is within a 0.14

range, this is almost too precise!

Page 29: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 16

. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals

Siwan
Highlight
Siwan
Highlight
Page 30: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 25

1.7 Reporting the results

• The results from a STATA output are reported in a table that typically

contains

– estimated coefficients

– standard errors of the coefficients

– number of observations

– R2 or R2a

• In some instances, it may worthwhile to report other statistics. We

will discuss these issues when we will cover the readings.

Page 31: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 26

• The custom command outreg used after the regress command han-

dles the formatting of the output. (See the course web site on how to

install custom commands.)

outreg schooling using tableM1, replace bdec(3) se 3aster title("Wage Regression")ctitle("(1)")

the file Table M1.out can then be opened in Excel (right-clicking on it)

Wage Regression(1)schooling 0.082[0.002]***Constant 1.684[0.027]***Observations 9720R-squared 0.14Standard errors in brackets* significant at 10\%; ** significant at 5\%; *** significant at 1\%

Page 32: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 16

. regress rwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94

Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511

-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521

------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769

_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------

. predict prwage(option xb assumed; fitted values). predict reswage, residuals

Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Page 33: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 27

1.8 Interpretation of the Estimates

• In general, the β parameters measure the marginal effect of increasing

X by one unit on the predicted wages Y .

• In our example,

∆wage = β1∆educ

tell us that the wage value of an one additional year of schooling in

this sample is $1.54.

• But in this simple regression, we cannot claim to have found a causal

relationship, so we should be cautious in our interpretation

Page 34: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 28

• The value of -2.42 for β0 says that a person with zero years of schooling

has a negative predicted wage, which is silly. This occurs because no

one in our sample has less than 8 years of schooling. For a person

with eight years of schooling, the predicted wage is

wage = −2.42 + 1.54 ∗ 8 = 9.90

which is above the minimum wage.

• If this person completes high school (4 more years), our model predicts

that the predicted wage would be higher by 4*$1.54=$6.60 per hour

more! This is more than the average wage of $15.80 for high school

graduates in Table 1, which may make us question the linearity in our

functional form assumption.

Page 35: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 29

• Indeed, it is more common to estimate the following log-linear model

log(wagesi) = β0 + β1educi + ui (8)

where log(·) denotes the natural logarithm. Since wages tend to be

lognormal, this reduces the problem of heteroscedasticity.

• This is equivalent with writing wage = exp(β0+β1educi +ui), which

is consistent with the increasing returns to education that we found in

Table 1.

• In this case the interpretation of β1 is

%∆wage ≈ (100 · β1)∆educ

that is multiplying β1 by 100 gives us the percentage change in pre-

dicted wage given an additional year of schooling.

Page 36: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 30

• We run the log wage regression by first taking the log of the dependent

variable

regress lrwage schooling

Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1585.31

Model | 331.915467 1 331.915467 Prob > F = 0.0000Residual | 2034.6566 9718 .209369891 R-squared = 0.1403

-------------+------------------------------ Adj R-squared = 0.1402Total | 2366.57207 9719 .243499544 Root MSE = .45757

------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------schooling | .081804 .0020546 39.82 0.000 .0777767 .0858314

_cons | 1.684045 .027169 61.98 0.000 1.630788 1.737302------------------------------------------------------------------------------

• The coefficient on schooling has a percentage interpretation when it

Page 37: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 31

is multiplied by 100. That is, predicted wages increase by 8.2 percent

for every additional year of education.

• In the human capital interpretation of the wage equation, this means

that the rate of return of one year of schooling is 8.2%, not bad!

• This easy interpretation of the rate of return of schooling is one of the

reasons why the log wage specification is the preferred one.

• The intercept of 1.684 is again not very meaningful, since it gives the

predicted log(wages) when schooling = 0

Page 38: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 32

• The log-linear model imposes a constant percentage effect of schooling

on wages.

• Another important model is the log-log model which is a constant

elasticity model. It would be more meaningful if we had some measure

of output as in

log(salary)i = β0 + β1 log(sales)i + ui

• In this case, the interpretation of β1 is the estimated elasticity of salary

with respect to sales

%∆salary = β1%∆sales ⇐⇒ β1 =%∆salary

%∆sales

Page 39: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 33

1.9 Multivariate Regression Analysis

• We have already improved our wage equation model by using log(wages),

now we would like to add more variables, in particular labour market

experience

• We can also use a more flexible functional form by adding higher order

terms (polynomial) in the explanatory variables

• For example, here a quadratic in experience can capture diminishing

returns to on-the-job training

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (9)

Page 40: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 34

• In the US equivalent of the Canadian Labour Force Survey, the Current

Population Survey (CPS), data on years of schooling and age in years

is available, but not the number of years of actual labour market

experience

• So a potential experience variable is constructed as: exper = (age −educ − 6) and the regression results for US-CPS (for October 1997)

are

Page 41: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 35

. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)

Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 3, 11889) = 1943.58

Model | 1231.32118 3 410.440394 Prob > F = 0.0000Residual | 2510.68698 11889 .211177305 R-squared = 0.3291

-------------+------------------------------ Adj R-squared = 0.3289Total | 3742.00816 11892 .314666008 Root MSE = .45954

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202

exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809

_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------

Page 42: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 36

test exper= exp2=0

( 1) exper - exp2 = 0( 2) exper = 0

F( 2, 11889) = 873.19Prob > F = 0.0000

1.10 Diagnostics - Goodness of Fit

• As before, we can use the t-statistic to determine whether each variable

is statistically significant individually

Page 43: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 37

• But we can also use the F-statistic to test the significance of the whole

model, that is the hypothesis that the variables are jointly significant

H0 : β1 = β2 = β3 = 0 vs. H1 : H0 is not true

• Here, we would overwhelmingly reject H0.

• The F-statistic can also be used to test a restricted model against an

unrestricted model.

• The model log(wagesi) = β0 + β1educi can be seen as a restricted

version of the model with experience where H0 : β2 = β3 = 0

Page 44: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 38

• We can test this hypothesis using the R-squared form of the F-statistic

F ≡(R2

ur − R2r)/q

(1 − R2ur)/(n − k − 1)

where q is the number of exclusion restrictions and n− k− 1 = DFur

. regress lwage educ [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)

Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 1, 11891) = 3561.85

Model | 862.524703 1 862.524703 Prob > F = 0.0000Residual | 2879.48346 11891 .242156544 R-squared = 0.2305

-------------+------------------------------ Adj R-squared = 0.2304Total | 3742.00816 11892 .314666008 Root MSE = .49209

Siwan
Highlight
Page 45: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 39

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1080619 .0018107 59.68 0.000 .1045127 .111611

_cons | .9835394 .0245724 40.03 0.000 .9353734 1.031705------------------------------------------------------------------------------

• We get F=[(0.3291-0.2305)/(1-0.3291)](11889/2)=873.644, which is

greater than the critical value F2,11889 = 3.00, so we reject H0.

Page 46: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 35

. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)

Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 3, 11889) = 1943.58

Model | 1231.32118 3 410.440394 Prob > F = 0.0000Residual | 2510.68698 11889 .211177305 R-squared = 0.3291

-------------+------------------------------ Adj R-squared = 0.3289Total | 3742.00816 11892 .314666008 Root MSE = .45954

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202

exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809

_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------

Siwan
Highlight
Page 47: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 40

1.11 Interpretation of the Estimates

• The general model

Yi = β1X1 + β2X2 + β3X3 + . . . + βkXk (10)

can written in terms of changes

∆Y=β1∆X1 + β2∆X2 + β3∆X3 + . . . + βk∆Xk (11)

• the coefficient on the variable Xk measures the change in Yi due to

a one-unit increase in Xk, holding all the other explanatory variables

fixed (the so-called ceteris paribus) assumption: ∆Yk = βk∆Xk

• These effects are sometimes called marginal or partial effects.

Page 48: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 41

• In our example, since the dependent variable is log(wages) the inter-

pretation of β1 = 0.108, the coefficient of educ, is of 10.8 percent

increase in predicted wages for every additional year of education.

• Since β2 = 0.04 > 0 and β3 = −0.0006 < 0, there is a concave

relationship between log wages and experience.

• With the experience variable, the ceteris paribus assumption does not

work directly, when we increase exper, exp2 will increase as well, so

we have to compute the partial effects:

∆ log(wages)

∆exper≈

∂ log(wages)

∂exper= β2 + 2 ∗ β3exper|exper (12)

Page 49: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 42

. sum exper

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

exper | 11893 18.67317 11.58664 0 58

. scalar experbar=r(mean)

. di experbar18.673169. lincom exper+2*exp2*experbar

( 1) exper + 37.34634 exp2 = 0

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------(1) | .0145183 .0003718 39.05 0.000 .0137894 .0152471

------------------------------------------------------------------------------

• Here, at the average experience level of 18.67 years, this gives: 0.0383817+2*(-

Page 50: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 43

0.000639)*18.67=0.0145, or a return of about 1.5% per year of expe-

rience on average.

• The turning point exper∗ = |β1/(2β2)| = 30.03 appears to make

sense

• But we can compare the plots of the quadratic on experience with a

local polynomial estimate, a more flexible functional form

. regress lwage exper exp2 [weight=weight]

. gen twage=_b[_cons]+_b[exper]*exper+_b[exp2]*exp2

. lpoly lwage exper [aweight=weight], gen(lwagep experp) nograph

. twoway (scatter twage exper) (connected experp lwagep )

Page 51: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 44

1.5

22.

5

0 20 40 60

twage lpoly smooth: lwage

Figure 3: Impact of experience on log(wages)

• Compared with the univariate regression, log(wages) = β0 + β1educ with the mul-

tivariate regression, log(wagesi) = β0 + β1educ + β2exper + β3exper2, we would

generally expect β1 6= β1

• Here, the estimates are pretty close! This means that educ and exper are uncor-

related in this sample. This could also happens if β2 = 0, that is if exper was

Page 52: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 45

uncorrelated with wages|educ.

Page 53: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 46

1.12 Choosing the Functional Form

• We have already tried a few functional forms

wagesi = β0 + β1educi + ui

log(wagesi) = β0 + β1educi + ui

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + ui

• Perhaps, we could soften the curvature of the relationship between exper andlog(wages) with a quartic

log(wagesi) = β0 + β1educi + β2experi + β3exper2i + β4exper3

i + β5exper4i + ui

. regress lwage educ exper exp2 exp3 exp4 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)

Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 5, 11887) = 1187.74

Model | 1246.66628 5 249.333256 Prob > F = 0.0000Residual | 2495.34188 11887 .209921921 R-squared = 0.3332

Page 54: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 47

-------------+------------------------------ Adj R-squared = 0.3329Total | 3742.00816 11892 .314666008 Root MSE = .45817

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------educ | .1086498 .0017533 61.97 0.000 .1052131 .1120864

exper | .0637706 .00501 12.73 0.000 .0539502 .0735909exp2 | -.0023563 .0004593 -5.13 0.000 -.0032567 -.001456exp3 | .0000358 .0000154 2.32 0.020 5.60e-06 .000066exp4 | -1.80e-07 1.70e-07 -1.06 0.289 -5.13e-07 1.53e-07

_cons | .4973948 .0269574 18.45 0.000 .4445538 .5502357------------------------------------------------------------------------------

• Now the model with the quadratic in experience in the restricted

model, we get F=[(0.3332-0.3291)/(1-0.3332)](11887/2)=36.55, which

is greater than the critical value F2,11887 = 3.00, so we reject H0.

Page 55: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 48

• If the models were not nested (i.e. could not be derived one from the

anoter), we can use the adjusted R2a as a guide to choose our preferred

model

• Using log variables is also often convenient, especially for positive

dollar amounts, and for very large variables such as population

• Variables measured in years and variables that are a proportion or

percent are better used in level form

Page 56: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 49

1.13 Potential Problems

1.13.1 Multicollinearity

• We could be tempted to use

log(wagesi) = β0 + β1educi + β2 log(experi) + β3 log(exper2i ) + ui

• But this would not work because log(exper2i ) = 2 ∗ log(experi), so

that log(exper2i ) and log(experi) would be perfectly correlated, we

would have a problem of multicollinearity

• In this case, STATA would drop log(experi), so you would know that

something is wrong

Page 57: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 50

• We can ask STATA to compute the Variance Inflation Factor, V IF =

(1−R2k)

−1, which measures the degree to which the variance has been

inflated because regressor k is not orthogonal to the other regressors.

. estat vif

Variable | VIF 1/VIF-------------+----------------------

exp2 | 11.69 0.085522exper | 11.47 0.087159educ | 1.07 0.938056

-------------+----------------------Mean VIF | 8.08

• A rule of thumb states that there is evidence of collinearity if the

largest VIF is greater than 10.

Page 58: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 51

• Here, it is not too surprising that exper and exper2 are correlated,

but the quadratic in experience provides a better fit.

• More generally, when we are adding explanatory variables to a regres-

sion model to reduce the error variance, we should always try to include

independent variables that affect Y and are uncorrelated with all of

the independent variables of interest.

• Because near-collinearity inflates standard errors, significant coeffi-

cients will become more significant if you include less collinear re-

gressors.

• If we include variables that do not belong, there is no effect on our

parameter estimate, and OLS remains unbiased, i.e. E(β1) = β1

Page 59: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 52

• Here, we could add demographic characteristics such as marital status,

geographic location, etc.

1.13.2 Omitted Variables Bias

• If we omit variables that do belong, then the OLS estimate will likely

be biased, E(β1) 6= β1

• For example, suppose that the true wage equation model was

wagesi = β0 + β1educi + β2abil + ui

but that since we do not observe ability, we estimate

wagesi = β0 + β1educi + vi (13)

Page 60: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 53

where vi = β2abil + ui

• Then, calling β1 the estimate from the equation (13) that omits ability

(13), we can show that

E[β1] = β1 + β2δ1

where δ1 =Cov(educi, abili)

V ar(educi)

• More generally, when X1 and X2 are correlated and β2 6= 0, the

estimate β1 will be biased.

• The sign of the bias depends on both the sign of β2 and of δ1

Page 61: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 54

Corr(X1, X2) > 0 Corr(X1, X2) < 0β2 > 0 positive bias negative biasβ2 < 0 negative bias positive bias

• In the case of the wage equation , because more ability leads to higher

productivity, and higher wage: β2 > 0. There are also reason to

believe that educ and ability are positively correlated, so we would

think that the OLS estimates from equation (13) are too large

• What to do about it? This is not an easy problem to correct, if we do

not have some measures of ability in our sample. (One has to take an

quasi-experimental approach using IV for example.)

• However, in terms of reporting results, one would be aware of the

Page 62: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 55

possibility of an “omitted variables” bias and qualify the results as

likely “upward biased” or “downward biased”.

1.13.3 Heteroscedasticity

• When the variance of the error terms is not constant across observa-

tions, we have a problem of heteroscedasticity: V ar(ui|educ) = σ2i =

σ2(Xi)

• The OLS estimates are still unbiased and consistent, but the standard

errors of the estimates are biased if we have heteroskedasticity

Page 63: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 56

• If the standard errors are biased, we can not use the usual t statistics

or F statistics or LM statistics for drawing inferences

• But we can test for it using the Breusch-Pagan test, which amounts

to testing H0 : t = 0 in V ar(ui) = σ2 exp(Zt), that is running a

regression using the squared OLS residuals as dependent variable on

the fitted values or some explanatory variables

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticityHo: Constant varianceVariables: fitted values of lrwage

chi2(1) = 3.42Prob > chi2 = 0.0645

Page 64: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 57

• Here we are happy when we fail to reject H0

• What to do about this? In STATA, heteroskedasticty-robust standard

errors are easily obtained using the robust option of reg

• The resulting White or Huber standard errors will be asympotically

valid in the presence of any form of heteroscedasticity, including ho-

moscedasticity.

• When the form of the heteroskedasticiy is know, for example σ2i =

σ2 ∗ educ, then we can use weighted least squares (vwls in STATA)

. regress lwage educ exper exp2 exp3 exp4 [weight=weight],robust(analytic weights assumed)

Page 65: 1 Econ 495 - Econometric Review Contents - Faculty of …faculty.arts.ubc.ca/asiwan/documents/econ-review-lec1.pdfEcon 495 - Econometric Review 1 ... regression equation). ... A simple

Econ 495 - Econometric Review 58

(sum of wgt is 2.6311e+07)

Linear regression Number of obs = 11893F( 5, 11887) = 1054.37Prob > F = 0.0000R-squared = 0.3332Root MSE = .45817

------------------------------------------------------------------------------| Robust

lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------

educ | .1086498 .0020186 53.82 0.000 .1046929 .1126066exper | .0637706 .005069 12.58 0.000 .0538344 .0737067exp2 | -.0023563 .0004762 -4.95 0.000 -.0032898 -.0014229exp3 | .0000358 .0000162 2.21 0.027 4.01e-06 .0000676exp4 | -1.80e-07 1.81e-07 -1.00 0.318 -5.34e-07 1.74e-07

_cons | .4973948 .0289503 17.18 0.000 .4406475 .5541421------------------------------------------------------------------------------