1 Econ 495 - Econometric Review Contents - Faculty of...
Transcript of 1 Econ 495 - Econometric Review Contents - Faculty of...
Econ 495 - Econometric Review 1
Contents
1 Linear Regression Analysis 4
1.1 The Mincer Wage Equation . . . . . . . . . . . . . . . . . 4
1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Econometric Model . . . . . . . . . . . . . . . . . . . . . 9
1.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 17
Econ 495 - Econometric Review 2
1.6 Inference - Hypothesis Testing . . . . . . . . . . . . . . . 20
1.7 Reporting the results . . . . . . . . . . . . . . . . . . . . 25
1.8 Interpretation of the Estimates . . . . . . . . . . . . . . . 27
1.9 Multivariate Regression Analysis . . . . . . . . . . . . . . 33
1.10 Diagnostics - Goodness of Fit . . . . . . . . . . . . . . . . 36
1.11 Interpretation of the Estimates . . . . . . . . . . . . . . . 40
1.12 Choosing the Functional Form . . . . . . . . . . . . . . . 46
1.13 Potential Problems . . . . . . . . . . . . . . . . . . . . . 49
Econ 495 - Econometric Review 3
1.13.1 Multicollinearity . . . . . . . . . . . . . . . . . . . 49
1.13.2 Omitted Variables Bias . . . . . . . . . . . . . . . 52
1.13.3 Heteroscedasticity . . . . . . . . . . . . . . . . . . 55
Econ 495 - Econometric Review 4
1 Linear Regression Analysis
1.1 The Mincer Wage Equation
• Our first exercise in empirical analysis will focus on the determinants
of wages in a cross-section of individuals, that is, observations on
individuals at a specific point in time.
• A complete wage equation model would include the following human
capital variables
log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (1)
where the term ui contains factors such as ability, quality of education,
family background and other factors influencing a person’s wage.
Econ 495 - Econometric Review 5
• For some specific purpose, we will also include gender and union status.
• We may think of the relationship between wages and their determi-
nants, including institutions and industrial characteristics, as the wage
structure.
• Let’s suppose to begin with that we are interested in the effect of
education, β1, measured in years of schooling, on wages
wagesi = β0 + β1educi + ui (2)
Econ 495 - Econometric Review 6
1.2 Data
• The Labour Force Survey selects individuals (close to) randomly and
ask them about their wage (Yi), education and other characteristics
(Xi).
• These data {(Xi, Yi) : i = 1, , n} will constitute our random sample
(A2) of size n from the population.
• A scatter plot of wages and education level indicates a positive rela-
tionship.
Econ 495 - Econometric Review 7
010
2030
4050
rwag
e
8 10 12 14 16 18schooling
Figure 1: Wages and Years of Schooling
• As do the average wages by education level
Econ 495 - Econometric Review 8
Table 1: Average hourly wages by education level
Education Level ≈ Years Wagesof Schooling
All workers 13 17.653810 to 8 years 8 14.15454Some secondary 10 13.39185Grade 11 to 13 12 15.81303Some post secondary 13 15.45389Post secondary diploma 14 18.26481University: bachelors 16 23.58583Graduate degree 18 29.11108
Econ 495 - Econometric Review 9
• But we may want to know by how much do wages increase whenschooling increases by one year
1.3 Econometric Model
• The (population) regression function
E(wagesi|educi) = β0 + β1educi (3)
describe the wages conditional on a level of schooling as a linear (A1)function of the parameters, under the zero conditional mean (A3)assumption E(ui|educi) = 0,
• For any given value of schooling, the distribution of wages is centeredabout E(wages|schooling)
Econ 495 - Econometric Review 10
010
2030
4050
8 10 12 14 16 18
rwag
e
schooling
Figure 2: E(wages|schooling) as a linear function of schooling
• Note that E(ui|educi) = 0 implies by the law of iterated expectations
that E(ui) = 0 and than Cov(ui, educi) = E(ui ∗ educi) = 0.
• This means that ui has a zero mean and is uncorrelated with educ,
which may be farfetched in this case.
Econ 495 - Econometric Review 11
• Another typical assumption (A5) is that V ar(ui|educ) = σ2 is con-
stant, a property called homoskedasticity.
• But it appears problematic here! We will see later how to test for it.
1.4 Estimation
• The objective is to obtain an estimate called β1 of the unknown pa-
rameter β1 from the data sample
Econ 495 - Econometric Review 12
• Let Yi denote wages and Xi denote education, we can write the model
as
Yi = β0 + β1Xi + ui (4)
• Either through the method of moments which substitutes the sample
average in the moments conditions
E(ui) = E[Yi − β0 − β1Xi)] = 0
E(ui ∗ Xi) = E[(Yi − β0 − β1Xi) ∗ Xi] = 0
• or with the ordinary least squares estimator which minimizes the sum-
of-squared errors, SS =∑n
i=1(Yi − β0 − β1Xi)2, we obtain the same
estimator
β1 =
∑ni=1(Xi − X)(Yi − Y )
∑ni=1(Xi − X)2
(5)
Hutcheson, G. D. (2011). Ordinary Least-Squares Regression. In L. Moutinho and G. D. Hutcheson, The SAGE Dictionary of Quantitative Management Research. Pages 224-228.
Ordinary Least-Squares Regression
IntroductionOrdinary least-squares (OLS) regression is a generalized linear modelling technique that may be used to model a single response variable which has been recorded on at least an interval scale. The technique may be applied to single or multiple explanatory variables and also categorical explanatory variables that have been appropriately coded.
Key FeaturesAt a very basic level, the relationship between a continuous response variable (Y) and a continuous explanatory variable (X) may be represented using a line of best-fit, where Y is predicted, at least to some extent, by X. If this relationship is linear, it may be appropriately represented mathematically using the straight line equation 'Y = α + βx', as shown in Figure 1 (this line was computed using the least-squares procedure; see Ryan, 1997).
The relationship between variables Y and X is described using the equation of the line of best fit with α indicating the value of Y when X is equal to zero (also known as the intercept) and β indicating the slope of the line (also known as the regression coefficient). The regression coefficient β describes the change in Y that is associated with a unit change in X. As can be seen from Figure 1, β only provides an indication of the average expected change (the
observed data are scattered around the line), making it important to also interpret the confidence intervals for the estimate (the large sample 95% two-tailed approximation of the confidence intervals can be calculated as β ± 1.96 s.e. β).
In addition to the model parameters and confidence intervals for β, it is useful to also have an indication of how well the model fits the data. Model fit can be determined by comparing the observed scores of Y (the values of Y from the sample of data) with the expected values of Y (the values of Y predicted by the regression equation). The difference between these two values (the deviation, or residual as it is also called) provides an indication of how well the model predicts each data point. Adding up the deviances for all the data points after they have been squared (this basically removes negative deviations) provides a simple measure of the degree to which the data deviates from the model overall. The sum of all the squared residuals is known as the residual sum of squares (RSS) and provides a measure of model-fit for an OLS regression model. A poorly fitting model will deviate markedly from the data and will consequently have a relatively large RSS, whereas a good-fitting model will not deviate markedly from the data and will consequently have a relatively small RSS (a perfectly fitting model will have an RSS equal to zero, as there will be no deviation between observed and expected values of Y). It is important to understand how the RSS statistic (or the deviance as it is also known; see Agresti,1996, pages 96-97) operates as it is used to determine the significance of individual and groups of variables in a regression model. A graphical illustration of the residuals for a simple regression model is provided in Figure 2. Detailed examples of calculating deviances from residuals for null and simple regression models can be found in Hutcheson and Moutinho, 2008.
The deviance is an important statistic as it enables the contribution made by explanatory variables to the prediction of the response variable to be determined. If by adding a variable to the model, the deviance is greatly reduced, the added variable can be said to have had a large effect on the prediction of Y for that model. If, on the other hand, the deviance is not greatly reduced, the added variable can be said to have had a small effect on the prediction of Y for that model. The change in the deviance that results from the explanatory variable being added to the model is used to determine the significance of that variable's effect on the prediction of Y in that model. To assess the effect that a single explanatory variable has on the prediction of Y, one simply compares the deviance statistics before and after the variable has been added to the model. For a simple OLS regression model, the effect of the explanatory variable can be assessed by comparing the RSS statistic for the full regression model (Y = α + βx) with that for the null model (Y = α). The difference in deviance between the nested models can then be tested for significance using an F-test computed from the following equation.
F df p−dfp+q
,dfp+q
=RSS p−RSS p+q
df p−df p+q RSS p+q / df p+q
where p represents the null model, Y = α, p+q represents the model Y = α + βx, and df are the degrees of freedom associated with the designated model. It can be seen from this equation that the F-statistic is simply based on the difference in the deviances between the two models as a fraction of the deviance of the full model, whilst taking account of the number of parameters.
In addition to the model-fit statistics, the R-square statistic is also commonly quoted and provides ameasure that indicates the percentage of variation in the response variable that is `explained' by the model. R-square, which is also known as the coefficient of multiple determination, is defined as
R2 =RSS after regression
total RSSand basically gives the percentage of the deviance in the response variable that can be accounted for by adding the explanatory variable into the model. Although R-square is widely used, it will always increase as variables are added to the model (the deviance can only go down when additional variables are added to a model). One solution to this problem is to calculate an adjusted R-square statistic (R2
a) which takes into account the number of terms entered into the model and does not necessarily increase as more terms are added. Adjusted R-square can be derived using the following equation
Ra2 = R2−
k 1−R2 n−k−1
where n is the number of cases used to construct the model and k is the number of terms in the model (not including the constant).
An example of simple OLS regressionA simple OLS regression model with a single explanatory variable can be illustrated using the example of predicting ice cream sales given outdoor temperature (Koteswara, 1970). The model for this relationship
Econ 495 - Econometric Review 13
which will work provides that∑n
i=1(Xi − X)2 > 0, that is that there
is enough sampling variation (A4).
• But at the same times, OLS will be sensitive to outliers, so we do
not want too much variation. This can be written as finite fourth
moments: 0 < E(X4i ) < ∞ and 0 < E(Y 4
i ) < ∞.
• This makes sense since the population parameter β1
β1 =Cov(Yi, Xi)
V ar(Xi)
when E(ui) = 0 and Cov(ui, Xi) = 0.
Econ 495 - Econometric Review 14
• and β0 = E(Yi) − E(Xi)β1 will be estimated by
β0 =n∑
i=1
Yi − β1
n∑
i=1
Xi (6)
• Predicted wages are obtained from sample regression function
Y = β0 + β1X (7)
• The residual ui, is an estimate of the error term ui, and is the difference
between the fitted line (sample regression function) and the sample
point
ui = Yi − Yi
Econ 495 - Econometric Review 15
• Thus intuitively, OLS is fitting a line through the sample points such
that the vertical distance between the actual wages and the predicted
wage squared, that is, the squared residuals, is as small as possible
• Under assumptions (A1)-(A4), our OLS estimates will be unbiased,
that is E(β1) = β1 and E(β0) = β0. Adding assumption (A5), the
OLS estimator is BLUE is the sense that it is the minimum variance
linear unbiased estimator (Gauss-Markov Theorem).
• In practice, the computer software does the computation for us
Econ 495 - Econometric Review 16
. regress rwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94
Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511
-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521
------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769
_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------
. predict prwage(option xb assumed; fitted values). predict reswage, residuals
Econ 495 - Econometric Review 17
1.5 Diagnostics - Goodness of Fit
• The STATA output gives many measures of whether our regression
model fits the data well
Model/Explained : SSE ≡n∑
i=1
(Yi − Y )2
Residual : SSR ≡n∑
i=1
(ui)2
Total : SST ≡n∑
i=1
(Yi − Y )2
and the R2 which is the ratio of the explained variation compared to
the total variation
R2 = SSE/SST = 1 − SSR/SST
Econ 495 - Econometric Review 18
• The R2 can also be shown to equal the squared correlation coefficient
between the actual Yi and the fitted values Yi.
• The adjusted R2 takes into account the number of explanatory vari-
ables R2a = 1 − (1 − R2)(n − c)/(n − k) where k is the number of
variables in the model and c = 1 if there is a constant.
• Here, a R2 = 0.15 means that 15% of the variation in wages across
individuals is explained by their education level
• This means that 85% of the variation in wages remains unexplained!
We will want to add more variables!
Econ 495 - Econometric Review 16
. regress rwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94
Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511
-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521
------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769
_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------
. predict prwage(option xb assumed; fitted values). predict reswage, residuals
Econ 495 - Econometric Review 19
• Yet, typically in cross-sectional data R2 are very low.
• So by these standards, this regression is pretty good, but the R2 is
not the only way to judge the success of a model.
• Also reported are
Root MSE = s =√
SSR/(n − k) and F = SSE/(k − c)s2
The F-Statistic is used to test whether a group of variables should be
included in the model.
Econ 495 - Econometric Review 16
. regress rwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94
Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511
-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521
------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769
_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------
. predict prwage(option xb assumed; fitted values). predict reswage, residuals
Econ 495 - Econometric Review 20
1.6 Inference - Hypothesis Testing
• The success of a model also depends on whether the variables included
in the model belong there, that is, are statistically significant.
• Under the assumption (A6) that the ui are normally distributed with
zero mean and variance σ2 : u ∼ Normal(0, σ2), the estimates β will
also be distributed normally distributed, and
(β − β)/se(β) ∼ tDF
will follow the Student-t distribution, where DF = n − k − 1 the
degrees of freedom in the model is equal to the number of observations
minus the number of variables minus 1 for the constant.
Econ 495 - Econometric Review 21
• We can use the t−statistic reported by STATA to test the null hy-
pothesis H0 : β = 0 against H1 : β 6= 0
• If the t−statistic is greater the critical value corresponding to our
degrees of freedom and the desired level of the test (5% or 1%), we
can reject the null
• The rule of thumb is: if |t| ≥ 2.0 then reject H0 : β = 0 at the
5% significance level. For more robustness, sometimes we prefer even
higher values.
• But we do not have to look the critical values in a table since STATA
gives us the p − value corresponding to our t−statistic
Econ 495 - Econometric Review 22
– If p ≤ 0.01 then the relationship is significant at the 1% level,
– If p ≤ 0.05 then the relationship is significant at the 5% level,
– If p ≤ 0.10 then the relationship is significant at the 10% level,
• Here with a t-statistic of 41.59, we can say that schooling is a very
significant factor explaining the variation in wages
• It is all very good to know that the coefficient of schooling is different
from zero, but we would also like know how precisely it is estimated
• The confidence intervals tells us that, under the classical OLS assump-
Econ 495 - Econometric Review 23
tions (A1-A6), there is a 95% chance that the true parameter lies
β ± α · se(β)
where α is the 97.5th percentile in a tn−k−1 distribution.
• If the degrees of freedom DF = n − k − 1 > 120, the tn−k−1 distri-
bution is close enough to the normal to use the 97.5th percentile of
the standard normal, the confidence intervals will be
[β − 1.96 · se(β), β + 1.96 · se(β)]
• Thus the rule of thumb: a coefficient is not significant if its magnitude
is less than twice its standard error.
Econ 495 - Econometric Review 24
• In our example, this means that there is 95% chance that the true
coefficient of schooling is between 1.47 and 1.61, that is within a 0.14
range, this is almost too precise!
Econ 495 - Econometric Review 16
. regress rwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94
Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511
-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521
------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769
_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------
. predict prwage(option xb assumed; fitted values). predict reswage, residuals
Econ 495 - Econometric Review 25
1.7 Reporting the results
• The results from a STATA output are reported in a table that typically
contains
– estimated coefficients
– standard errors of the coefficients
– number of observations
– R2 or R2a
• In some instances, it may worthwhile to report other statistics. We
will discuss these issues when we will cover the readings.
Econ 495 - Econometric Review 26
• The custom command outreg used after the regress command han-
dles the formatting of the output. (See the course web site on how to
install custom commands.)
outreg schooling using tableM1, replace bdec(3) se 3aster title("Wage Regression")ctitle("(1)")
the file Table M1.out can then be opened in Excel (right-clicking on it)
Wage Regression(1)schooling 0.082[0.002]***Constant 1.684[0.027]***Observations 9720R-squared 0.14Standard errors in brackets* significant at 10\%; ** significant at 5\%; *** significant at 1\%
Econ 495 - Econometric Review 16
. regress rwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1729.94
Model | 117804.085 1 117804.085 Prob > F = 0.0000Residual | 661769.296 9718 68.0972727 R-squared = 0.1511
-------------+------------------------------ Adj R-squared = 0.1510Total | 779573.38 9719 80.2112749 Root MSE = 8.2521
------------------------------------------------------------------------------rwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | 1.541137 .0370532 41.59 0.000 1.468505 1.613769
_cons | -2.426309 .489984 -4.95 0.000 -3.386779 -1.465838------------------------------------------------------------------------------
. predict prwage(option xb assumed; fitted values). predict reswage, residuals
Econ 495 - Econometric Review 27
1.8 Interpretation of the Estimates
• In general, the β parameters measure the marginal effect of increasing
X by one unit on the predicted wages Y .
• In our example,
∆wage = β1∆educ
tell us that the wage value of an one additional year of schooling in
this sample is $1.54.
• But in this simple regression, we cannot claim to have found a causal
relationship, so we should be cautious in our interpretation
Econ 495 - Econometric Review 28
• The value of -2.42 for β0 says that a person with zero years of schooling
has a negative predicted wage, which is silly. This occurs because no
one in our sample has less than 8 years of schooling. For a person
with eight years of schooling, the predicted wage is
wage = −2.42 + 1.54 ∗ 8 = 9.90
which is above the minimum wage.
• If this person completes high school (4 more years), our model predicts
that the predicted wage would be higher by 4*$1.54=$6.60 per hour
more! This is more than the average wage of $15.80 for high school
graduates in Table 1, which may make us question the linearity in our
functional form assumption.
Econ 495 - Econometric Review 29
• Indeed, it is more common to estimate the following log-linear model
log(wagesi) = β0 + β1educi + ui (8)
where log(·) denotes the natural logarithm. Since wages tend to be
lognormal, this reduces the problem of heteroscedasticity.
• This is equivalent with writing wage = exp(β0+β1educi +ui), which
is consistent with the increasing returns to education that we found in
Table 1.
• In this case the interpretation of β1 is
%∆wage ≈ (100 · β1)∆educ
that is multiplying β1 by 100 gives us the percentage change in pre-
dicted wage given an additional year of schooling.
Econ 495 - Econometric Review 30
• We run the log wage regression by first taking the log of the dependent
variable
regress lrwage schooling
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 1, 9718) = 1585.31
Model | 331.915467 1 331.915467 Prob > F = 0.0000Residual | 2034.6566 9718 .209369891 R-squared = 0.1403
-------------+------------------------------ Adj R-squared = 0.1402Total | 2366.57207 9719 .243499544 Root MSE = .45757
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | .081804 .0020546 39.82 0.000 .0777767 .0858314
_cons | 1.684045 .027169 61.98 0.000 1.630788 1.737302------------------------------------------------------------------------------
• The coefficient on schooling has a percentage interpretation when it
Econ 495 - Econometric Review 31
is multiplied by 100. That is, predicted wages increase by 8.2 percent
for every additional year of education.
• In the human capital interpretation of the wage equation, this means
that the rate of return of one year of schooling is 8.2%, not bad!
• This easy interpretation of the rate of return of schooling is one of the
reasons why the log wage specification is the preferred one.
• The intercept of 1.684 is again not very meaningful, since it gives the
predicted log(wages) when schooling = 0
Econ 495 - Econometric Review 32
• The log-linear model imposes a constant percentage effect of schooling
on wages.
• Another important model is the log-log model which is a constant
elasticity model. It would be more meaningful if we had some measure
of output as in
log(salary)i = β0 + β1 log(sales)i + ui
• In this case, the interpretation of β1 is the estimated elasticity of salary
with respect to sales
%∆salary = β1%∆sales ⇐⇒ β1 =%∆salary
%∆sales
Econ 495 - Econometric Review 33
1.9 Multivariate Regression Analysis
• We have already improved our wage equation model by using log(wages),
now we would like to add more variables, in particular labour market
experience
• We can also use a more flexible functional form by adding higher order
terms (polynomial) in the explanatory variables
• For example, here a quadratic in experience can capture diminishing
returns to on-the-job training
log(wagesi) = β0 + β1educi + β2experi + β3exper2i + . . . + ui (9)
Econ 495 - Econometric Review 34
• In the US equivalent of the Canadian Labour Force Survey, the Current
Population Survey (CPS), data on years of schooling and age in years
is available, but not the number of years of actual labour market
experience
• So a potential experience variable is constructed as: exper = (age −educ − 6) and the regression results for US-CPS (for October 1997)
are
Econ 495 - Econometric Review 35
. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)
Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 3, 11889) = 1943.58
Model | 1231.32118 3 410.440394 Prob > F = 0.0000Residual | 2510.68698 11889 .211177305 R-squared = 0.3291
-------------+------------------------------ Adj R-squared = 0.3289Total | 3742.00816 11892 .314666008 Root MSE = .45954
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]
-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202
exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809
_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------
Econ 495 - Econometric Review 36
test exper= exp2=0
( 1) exper - exp2 = 0( 2) exper = 0
F( 2, 11889) = 873.19Prob > F = 0.0000
1.10 Diagnostics - Goodness of Fit
• As before, we can use the t-statistic to determine whether each variable
is statistically significant individually
Econ 495 - Econometric Review 37
• But we can also use the F-statistic to test the significance of the whole
model, that is the hypothesis that the variables are jointly significant
H0 : β1 = β2 = β3 = 0 vs. H1 : H0 is not true
• Here, we would overwhelmingly reject H0.
• The F-statistic can also be used to test a restricted model against an
unrestricted model.
• The model log(wagesi) = β0 + β1educi can be seen as a restricted
version of the model with experience where H0 : β2 = β3 = 0
Econ 495 - Econometric Review 38
• We can test this hypothesis using the R-squared form of the F-statistic
F ≡(R2
ur − R2r)/q
(1 − R2ur)/(n − k − 1)
where q is the number of exclusion restrictions and n− k− 1 = DFur
. regress lwage educ [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)
Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 1, 11891) = 3561.85
Model | 862.524703 1 862.524703 Prob > F = 0.0000Residual | 2879.48346 11891 .242156544 R-squared = 0.2305
-------------+------------------------------ Adj R-squared = 0.2304Total | 3742.00816 11892 .314666008 Root MSE = .49209
Econ 495 - Econometric Review 39
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------educ | .1080619 .0018107 59.68 0.000 .1045127 .111611
_cons | .9835394 .0245724 40.03 0.000 .9353734 1.031705------------------------------------------------------------------------------
• We get F=[(0.3291-0.2305)/(1-0.3291)](11889/2)=873.644, which is
greater than the critical value F2,11889 = 3.00, so we reject H0.
Econ 495 - Econometric Review 35
. regress lwage educ exper exp2 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)
Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 3, 11889) = 1943.58
Model | 1231.32118 3 410.440394 Prob > F = 0.0000Residual | 2510.68698 11889 .211177305 R-squared = 0.3291
-------------+------------------------------ Adj R-squared = 0.3289Total | 3742.00816 11892 .314666008 Root MSE = .45954
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95\% Conf. Interval]
-------------+----------------------------------------------------------------educ | .1084982 .0017458 62.15 0.000 .1050761 .1119202
exper | .0383817 .0012279 31.26 0.000 .0359747 .0407887exp2 | -.000639 .0000296 -21.57 0.000 -.000697 -.0005809
_cons | .5751865 .0250984 22.92 0.000 .5259896 .6243835------------------------------------------------------------------------------
Econ 495 - Econometric Review 40
1.11 Interpretation of the Estimates
• The general model
Yi = β1X1 + β2X2 + β3X3 + . . . + βkXk (10)
can written in terms of changes
∆Y=β1∆X1 + β2∆X2 + β3∆X3 + . . . + βk∆Xk (11)
• the coefficient on the variable Xk measures the change in Yi due to
a one-unit increase in Xk, holding all the other explanatory variables
fixed (the so-called ceteris paribus) assumption: ∆Yk = βk∆Xk
• These effects are sometimes called marginal or partial effects.
Econ 495 - Econometric Review 41
• In our example, since the dependent variable is log(wages) the inter-
pretation of β1 = 0.108, the coefficient of educ, is of 10.8 percent
increase in predicted wages for every additional year of education.
• Since β2 = 0.04 > 0 and β3 = −0.0006 < 0, there is a concave
relationship between log wages and experience.
• With the experience variable, the ceteris paribus assumption does not
work directly, when we increase exper, exp2 will increase as well, so
we have to compute the partial effects:
∆ log(wages)
∆exper≈
∂ log(wages)
∂exper= β2 + 2 ∗ β3exper|exper (12)
Econ 495 - Econometric Review 42
. sum exper
Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------
exper | 11893 18.67317 11.58664 0 58
. scalar experbar=r(mean)
. di experbar18.673169. lincom exper+2*exp2*experbar
( 1) exper + 37.34634 exp2 = 0
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------(1) | .0145183 .0003718 39.05 0.000 .0137894 .0152471
------------------------------------------------------------------------------
• Here, at the average experience level of 18.67 years, this gives: 0.0383817+2*(-
Econ 495 - Econometric Review 43
0.000639)*18.67=0.0145, or a return of about 1.5% per year of expe-
rience on average.
• The turning point exper∗ = |β1/(2β2)| = 30.03 appears to make
sense
• But we can compare the plots of the quadratic on experience with a
local polynomial estimate, a more flexible functional form
. regress lwage exper exp2 [weight=weight]
. gen twage=_b[_cons]+_b[exper]*exper+_b[exp2]*exp2
. lpoly lwage exper [aweight=weight], gen(lwagep experp) nograph
. twoway (scatter twage exper) (connected experp lwagep )
Econ 495 - Econometric Review 44
1.5
22.
5
0 20 40 60
twage lpoly smooth: lwage
Figure 3: Impact of experience on log(wages)
• Compared with the univariate regression, log(wages) = β0 + β1educ with the mul-
tivariate regression, log(wagesi) = β0 + β1educ + β2exper + β3exper2, we would
generally expect β1 6= β1
• Here, the estimates are pretty close! This means that educ and exper are uncor-
related in this sample. This could also happens if β2 = 0, that is if exper was
Econ 495 - Econometric Review 45
uncorrelated with wages|educ.
Econ 495 - Econometric Review 46
1.12 Choosing the Functional Form
• We have already tried a few functional forms
wagesi = β0 + β1educi + ui
log(wagesi) = β0 + β1educi + ui
log(wagesi) = β0 + β1educi + β2experi + β3exper2i + ui
• Perhaps, we could soften the curvature of the relationship between exper andlog(wages) with a quartic
log(wagesi) = β0 + β1educi + β2experi + β3exper2i + β4exper3
i + β5exper4i + ui
. regress lwage educ exper exp2 exp3 exp4 [weight=weight](analytic weights assumed)(sum of wgt is 2.6311e+07)
Source | SS df MS Number of obs = 11893-------------+------------------------------ F( 5, 11887) = 1187.74
Model | 1246.66628 5 249.333256 Prob > F = 0.0000Residual | 2495.34188 11887 .209921921 R-squared = 0.3332
Econ 495 - Econometric Review 47
-------------+------------------------------ Adj R-squared = 0.3329Total | 3742.00816 11892 .314666008 Root MSE = .45817
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------educ | .1086498 .0017533 61.97 0.000 .1052131 .1120864
exper | .0637706 .00501 12.73 0.000 .0539502 .0735909exp2 | -.0023563 .0004593 -5.13 0.000 -.0032567 -.001456exp3 | .0000358 .0000154 2.32 0.020 5.60e-06 .000066exp4 | -1.80e-07 1.70e-07 -1.06 0.289 -5.13e-07 1.53e-07
_cons | .4973948 .0269574 18.45 0.000 .4445538 .5502357------------------------------------------------------------------------------
• Now the model with the quadratic in experience in the restricted
model, we get F=[(0.3332-0.3291)/(1-0.3332)](11887/2)=36.55, which
is greater than the critical value F2,11887 = 3.00, so we reject H0.
Econ 495 - Econometric Review 48
• If the models were not nested (i.e. could not be derived one from the
anoter), we can use the adjusted R2a as a guide to choose our preferred
model
• Using log variables is also often convenient, especially for positive
dollar amounts, and for very large variables such as population
• Variables measured in years and variables that are a proportion or
percent are better used in level form
Econ 495 - Econometric Review 49
1.13 Potential Problems
1.13.1 Multicollinearity
• We could be tempted to use
log(wagesi) = β0 + β1educi + β2 log(experi) + β3 log(exper2i ) + ui
• But this would not work because log(exper2i ) = 2 ∗ log(experi), so
that log(exper2i ) and log(experi) would be perfectly correlated, we
would have a problem of multicollinearity
• In this case, STATA would drop log(experi), so you would know that
something is wrong
Econ 495 - Econometric Review 50
• We can ask STATA to compute the Variance Inflation Factor, V IF =
(1−R2k)
−1, which measures the degree to which the variance has been
inflated because regressor k is not orthogonal to the other regressors.
. estat vif
Variable | VIF 1/VIF-------------+----------------------
exp2 | 11.69 0.085522exper | 11.47 0.087159educ | 1.07 0.938056
-------------+----------------------Mean VIF | 8.08
• A rule of thumb states that there is evidence of collinearity if the
largest VIF is greater than 10.
Econ 495 - Econometric Review 51
• Here, it is not too surprising that exper and exper2 are correlated,
but the quadratic in experience provides a better fit.
• More generally, when we are adding explanatory variables to a regres-
sion model to reduce the error variance, we should always try to include
independent variables that affect Y and are uncorrelated with all of
the independent variables of interest.
• Because near-collinearity inflates standard errors, significant coeffi-
cients will become more significant if you include less collinear re-
gressors.
• If we include variables that do not belong, there is no effect on our
parameter estimate, and OLS remains unbiased, i.e. E(β1) = β1
Econ 495 - Econometric Review 52
• Here, we could add demographic characteristics such as marital status,
geographic location, etc.
1.13.2 Omitted Variables Bias
• If we omit variables that do belong, then the OLS estimate will likely
be biased, E(β1) 6= β1
• For example, suppose that the true wage equation model was
wagesi = β0 + β1educi + β2abil + ui
but that since we do not observe ability, we estimate
wagesi = β0 + β1educi + vi (13)
Econ 495 - Econometric Review 53
where vi = β2abil + ui
• Then, calling β1 the estimate from the equation (13) that omits ability
(13), we can show that
E[β1] = β1 + β2δ1
where δ1 =Cov(educi, abili)
V ar(educi)
• More generally, when X1 and X2 are correlated and β2 6= 0, the
estimate β1 will be biased.
• The sign of the bias depends on both the sign of β2 and of δ1
Econ 495 - Econometric Review 54
Corr(X1, X2) > 0 Corr(X1, X2) < 0β2 > 0 positive bias negative biasβ2 < 0 negative bias positive bias
• In the case of the wage equation , because more ability leads to higher
productivity, and higher wage: β2 > 0. There are also reason to
believe that educ and ability are positively correlated, so we would
think that the OLS estimates from equation (13) are too large
• What to do about it? This is not an easy problem to correct, if we do
not have some measures of ability in our sample. (One has to take an
quasi-experimental approach using IV for example.)
• However, in terms of reporting results, one would be aware of the
Econ 495 - Econometric Review 55
possibility of an “omitted variables” bias and qualify the results as
likely “upward biased” or “downward biased”.
1.13.3 Heteroscedasticity
• When the variance of the error terms is not constant across observa-
tions, we have a problem of heteroscedasticity: V ar(ui|educ) = σ2i =
σ2(Xi)
• The OLS estimates are still unbiased and consistent, but the standard
errors of the estimates are biased if we have heteroskedasticity
Econ 495 - Econometric Review 56
• If the standard errors are biased, we can not use the usual t statistics
or F statistics or LM statistics for drawing inferences
• But we can test for it using the Breusch-Pagan test, which amounts
to testing H0 : t = 0 in V ar(ui) = σ2 exp(Zt), that is running a
regression using the squared OLS residuals as dependent variable on
the fitted values or some explanatory variables
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticityHo: Constant varianceVariables: fitted values of lrwage
chi2(1) = 3.42Prob > chi2 = 0.0645
Econ 495 - Econometric Review 57
• Here we are happy when we fail to reject H0
• What to do about this? In STATA, heteroskedasticty-robust standard
errors are easily obtained using the robust option of reg
• The resulting White or Huber standard errors will be asympotically
valid in the presence of any form of heteroscedasticity, including ho-
moscedasticity.
• When the form of the heteroskedasticiy is know, for example σ2i =
σ2 ∗ educ, then we can use weighted least squares (vwls in STATA)
. regress lwage educ exper exp2 exp3 exp4 [weight=weight],robust(analytic weights assumed)
Econ 495 - Econometric Review 58
(sum of wgt is 2.6311e+07)
Linear regression Number of obs = 11893F( 5, 11887) = 1054.37Prob > F = 0.0000R-squared = 0.3332Root MSE = .45817
------------------------------------------------------------------------------| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
educ | .1086498 .0020186 53.82 0.000 .1046929 .1126066exper | .0637706 .005069 12.58 0.000 .0538344 .0737067exp2 | -.0023563 .0004762 -4.95 0.000 -.0032898 -.0014229exp3 | .0000358 .0000162 2.21 0.027 4.01e-06 .0000676exp4 | -1.80e-07 1.81e-07 -1.00 0.318 -5.34e-07 1.74e-07
_cons | .4973948 .0289503 17.18 0.000 .4406475 .5541421------------------------------------------------------------------------------