MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple...

MULTIPLE REGRESSION ANALYSIS:ESTIMATION

Huseyin Tastan1

1Yıldız Technical UniversityDepartment of Economics

These presentation notes are based onIntroductory Econometrics: A Modern Approach (2nd ed.)

by J. Wooldridge.

17 Ekim 2012

Econometrics I: Multiple Regression: Estimation - H. Tastan 1

Multiple Regression Analysis

In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.

Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).

Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.

By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.

Additional advantage: multiple regression can incorporategeneral functional form relationships.

Multiple Regression Analysis Examples

The Model with Two Explanatory Variables

y = β0 + β1x1 + β2x2 + u

Wage Equation

wage = β0 + β1educ+ β2exper + u

wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)

β1: measures the impact of eduction on wage, holding allother factors fixed.

β2: measures the ceteris paribus effect of experience on wage.

This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.

y = β0 + β1x1 + β2x2 + u

Wage Equation

y = β0 + β1x1 + β2x2 + u

Wage Equation

y = β0 + β1x1 + β2x2 + u

Wage Equation

y = β0 + β1x1 + β2x2 + u

Wage Equation

Student Success and Family Income

avgscore = β0 + β1expend+ β2avginc+ u

avgscore: average standardized test score; expend: educationspending per student; avgincome: average family income

If avginc is not included in the model directly, its effect willbe included in the error term, u.

Because average family income is generally correlated witheducation expenditures, the key assumption SLR.3 will beinvalid: x (expend) will be correlated with u leading to biasedOLS estimators.

Multiple regression analysis allow us to use more generalfunctional forms.

Consider the following quadratic model of consumption:

cons = β0 + β1inc+ β2inc2 + u

where x1 = inc and x2 = inc2

β1: we cannot fix inc2 while changing inc.

Marginal propensity to consume (MPC) is approximatedby:

∆cons

∆inc≈ β1 + 2β2inc

MPC depends on the level of income.

∆cons

y = β0 + β1x1 + β2x2 + u

“Zero conditional mean” assumption:

E(u|x1, x2) = 0

In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.

E.g., in the wage equation:

E(u|educ, exper) = 0

This means that unobservable factors affecting wage are notrelated on average with educ and exper.

If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.

y = β0 + β1x1 + β2x2 + u

E(u|x1, x2) = 0

y = β0 + β1x1 + β2x2 + u

E(u|x1, x2) = 0

y = β0 + β1x1 + β2x2 + u

E(u|x1, x2) = 0

y = β0 + β1x1 + β2x2 + u

E(u|x1, x2) = 0

y = β0 + β1x1 + β2x2 + u

E(u|x1, x2) = 0

The model with two independent variables

E(u|x1, x2) = 0

Test scores and average family income:

E(u|expend, avginc) = 0

All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.

In the quadratic consumption model:

E(u|inc, inc2) = E(u|inc) = 0

Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.

E(u|x1, x2) = 0

The Model with k Independent Variables

y = β0 + β1x1 + β2x2 + . . .+ βkxk + u

In this model we have k explanatory variables and k + 1unknown β parameters.

The definition of the error term is the same: it represents allother factors affecting y that are not included in the model.

y = β0 + β1x1 + β2x2 + . . .+ βkxk + u

The Model with k Explanatory Variables

βj : measures the change in y in response to a unit change inxj holding all other xs and unobservable factors u fixed.

For example:

log(salary) = β0 + β1 log(sales) + β2ceoten+ β3ceoten2 + u

ceoten: tenure with the current company (years)

Interpretation of β1: elasticity of salary with respect to sales.Ceteris paribus salary will change by β1% in response to a 1%change in sales.

β2: does not measure the impact of ceoten. We need to takethe quadratic term into account.

For example:

Zero Conditional Mean Assumption

E(u|x1, x2, . . . , xk) = 0

Error term is unrelated with explanatory variables.

If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.

If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.

This assumption also means that functional form is correctlyspecified.

E(u|x1, x2, . . . , xk) = 0

The Model with k Explanatory Variables: OLS Estimation

Sample Regression Function - SRF

y = β0 + β1x1 + β2x2 + . . .+ βkxk

Ordinary Least Squares (OLS) estimators minimize the Sum ofSquared Residuals (SSR):

OLS Objective Function

n∑i=1

(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik)2

OLS estimators, βj , j = 0, 1, 2, . . . , k, can be found by solving theFirst Order Conditions (FOC) involving k + 1 equations with k + 1unknowns.

Sample Regression Function - SRF

y = β0 + β1x1 + β2x2 + . . .+ βkxk

Ordinary Least Squares (OLS) estimators minimize the Sum ofSquared Residuals (SSR):

OLS Objective Function

n∑i=1

(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik)2

OLS estimators, βj , j = 0, 1, 2, . . . , k, can be found by solving theFirst Order Conditions (FOC) involving k + 1 equations with k + 1unknowns.

OLS First Order Conditions

n∑i=1

(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0

n∑i=1

xi1(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0

n∑i=1

xi2(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0

......

...n∑

xik(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0

OLS and Method of Moments

OLS first order conditions can also be obtained using the methodof moments. They are the sample counterparts to the followingpopulation moment conditions:

E(u) = 0

E(x1u) = 0

E(x2u) = 0...

......

E(xku) = 0

Interpretation of SRF

The model with 2 explanatory variables

y = β0 + β1x1 + β2x2

Slope parameter estimates, βj , gives us the predicted changein y given the changes in x variables, ceteris paribus.

β1: holding x2 fixed, i.e. ∆x2 = 0

∆y = β1∆x1

Similarly, holding x1 fixed

∆y = β2∆x2

y = β0 + β1x1 + β2x2

∆y = β1∆x1

∆y = β2∆x2

y = β0 + β1x1 + β2x2

∆y = β1∆x1

∆y = β2∆x2

y = β0 + β1x1 + β2x2

∆y = β1∆x1

∆y = β2∆x2

Example: Determinants of College Success, gpa1.gdt

colGPA Estimation Results

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score

Intercept term is estimated as β0 = 1.29. This gives us theaverage colGPA when hsGPA = 0 and ACT = 0. Becausethere is no observation with ACT = 0 and hsGPA = 0, thisinterpretation is not meaningful.

Slope parameter on hsGPA is estimated to be 0.453. HoldingACT fixed, given a one-point change in high school GPA,college GPA is predicted to increase by 0.453 points, almosthalf a point.

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

Alternative interpretation of 0.453

If we choose two students, A and B, with the same ACT scorebut hsGPA of student A is one point higher than thehsGPA of student B, then we predict that student A’scollege GPA is 0.453 points higher than that of student B’s.

ACT has + sign but its effect is very small.

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

Regression with only ACT variable:

colGPA simple regression results

colGPA = 2.4 + 0.0271 ACT

The coefficient estimate on ACT is almost three times aslarge as the previous estimate.

But this regression does not allow us to compare two studentswith the same high school GPA

The impact ACT decreases considerably after controlling forthe high school GPA.

colGPA = 2.4 + 0.0271 ACT

Interpretation of Estimated Regression Function

SRF - Sample Regression Function

y = β0 + β1x1 + β2x2 + . . .+ βkxk.

In terms of changes

∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.

Interpretation of the coefficient on x1, β1: holding all othervariables fixed, i.e. ∆x2 = 0, ∆x3 = 0, . . ., ∆xk = 0

∆y = β1∆x1

Holding all other independent variables fixed (i.e. controllingfor all other x variables), the change in y given a one unitchange in x1 is β1 (in terms of y’s unit of measurement)

y = β0 + β1x1 + β2x2 + . . .+ βkxk.

In terms of changes

∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.

∆y = β1∆x1

y = β0 + β1x1 + β2x2 + . . .+ βkxk.

In terms of changes

∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.

∆y = β1∆x1

y = β0 + β1x1 + β2x2 + . . .+ βkxk.

In terms of changes

∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.

∆y = β1∆x1

Example: Logarithmic Wage Equation

Estimated equation

log(wage) = 0.284 + 0.092educ+ 0.0041exper + 0.022tenure

n = 526 workers.

Functional form: log-level, 100βj gives us % change in wages.

For example, holding experience and tenure fixed, another yearof education is predicted to increase 9.2% change in wages.

In other words, given two workers with the same level ofexperience and tenure, if one of the workers has a level ofeducation one year higher than the other one, then thepredicted difference between their wages is 9.2

Estimated equation

n = 526 workers.

Estimated equation

n = 526 workers.

Estimated equation

n = 526 workers.

Changing More than One Explanatory VariableSimultaneously

Sometimes we want to change more than one x variables atthe same time to find their impact on y.

Also in some cases when of the x variables changes the otherautomatically changes.

For example, in the wage equation when we increase tenure 1year experience also increases by 1 years

In this case the total effect on wage is % 2.61:

∆ log Ucret = 0.0041∆tecrube+ 0.022∆kidem

= 0.0041 + 0.022 = 0.0261

OLS Fitted Values and Residuals

Fitted (predicted) value for observation i

yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.

Residual for observation i

ui = yi − yi

u > 0 implies yi > yi, underprediction

u < 0 implies yi < yi, overprediction

yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.

ui = yi − yi

yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.

ui = yi − yi

yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.

ui = yi − yi

Algebraic Properties of Residuals

The sum (and also average) of OLS residuals is zero:

n∑i=1

ui = 0, ¯u = 0

follows directly from the first moment condition.

Sample covariance between OLS residuals and each xj is zero:

n∑i=1

xij ui = 0, j = 1, 2, . . . , k

also follows directly from the population moment conditions.Imposed by the OLS FOC.

The point (xj , y : j = 1, 2, . . . , k) is always on the OLSregression line.

y = ¯y

n∑i=1

ui = 0, ¯u = 0

n∑i=1

xij ui = 0, j = 1, 2, . . . , k

y = ¯y

n∑i=1

ui = 0, ¯u = 0

n∑i=1

xij ui = 0, j = 1, 2, . . . , k

y = ¯y

n∑i=1

ui = 0, ¯u = 0

n∑i=1

xij ui = 0, j = 1, 2, . . . , k

y = ¯y

An Alternative Derivation of Coefficient Estimates

The model with two explanatory variables

y = β0 + β1x1 + β2x2.

The slope estimator on x1 is:

∑ni=1 ri1yi∑ni=1 r

where ri1 is the residual obtained from the regression of x1 onx2:

xi1 = α0 + α1xi2 + ri1

This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:

yi = β0 + β1ri1 + residual

Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.

y = β0 + β1x1 + β2x2.

xi1 = α0 + α1xi2 + ri1

y = β0 + β1x1 + β2x2.

xi1 = α0 + α1xi2 + ri1

y = β0 + β1x1 + β2x2.

xi1 = α0 + α1xi2 + ri1

y = β0 + β1x1 + β2x2.

xi1 = α0 + α1xi2 + ri1

Comparison of Simple and Multiple Regression Estimates

Two models:

y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.

These regressions will generally give different results.

There are two special cases in which they give the sameestimates:

The partial effect of x2 on y is zero, β2 = 0

x1 and x2 are uncorrelated in the sample.

Two models:

y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.

Two models:

y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.

Two models:

y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.

Two models:

y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.

Goodness-of-Fit and Sums of Squares

SST: total variation in y.

n∑i=1

(yi − y)2

Note that: Var(y) = SST/(n− 1).SSE is the variation in the explained part:

n∑i=1

(yi − y)2

SSR is the variation in the unexplained part:

n∑i=1

Thus the total variation in y can be written as:

SST = SSE + SSR

n∑i=1

(yi − y)2

n∑i=1

(yi − y)2

n∑i=1

SST = SSE + SSR

n∑i=1

(yi − y)2

n∑i=1

(yi − y)2

n∑i=1

SST = SSE + SSR

n∑i=1

(yi − y)2

n∑i=1

(yi − y)2

n∑i=1

SST = SSE + SSR

Goodness-of-fit

Dividing both sides of this expression by SST:

1 =SSE

SST+SSR

Coefficient of determination: the ratio of the variation in theexplained part to the total variation:

R2 =SSE

SST= 1− SSR

Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1

R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.

R2 can also be calculated as follows: R2 = Corr(y, y)2

Goodness-of-fit

1 =SSE

SST+SSR

R2 =SSE

SST= 1− SSR

Goodness-of-fit

1 =SSE

SST+SSR

R2 =SSE

SST= 1− SSR

Goodness-of-fit

1 =SSE

SST+SSR

R2 =SSE

SST= 1− SSR

Goodness-of-fit

1 =SSE

SST+SSR

R2 =SSE

SST= 1− SSR

Goodness-of-fit

Coefficient of Determination:

R2 =SSE

SST= 1− SSR

When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.

The reason is that, when a new variable is added SSR alwaysdecreases.

For this reason R2 may not be a good indicator modelselection.

Instead of R2 we will use adjusted R2 for model selection.

Goodness-of-fit

R2 =SSE

SST= 1− SSR

Goodness-of-fit

R2 =SSE

SST= 1− SSR

Goodness-of-fit

R2 =SSE

SST= 1− SSR

Goodness-of-fit

R2 =SSE

SST= 1− SSR

Goodness-of-fit: Example

College GPA

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

n = 141 R2 = 0.176

The coefficient of determination is 0.176.

%17.6 of the total variation in college GPA can be explainedby the variations in hsGPA and ACT .

This is not very large because there are many factors notincluded in the model.

College GPA

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

n = 141 R2 = 0.176

College GPA

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

n = 141 R2 = 0.176

College GPA

colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT

n = 141 R2 = 0.176

Regression through the Origin

Predicted y is 0 when all xjs are 0

y = β1x1 + β2x2 + . . .+ βkxk.

Note that the regression does not have a constant term.

R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.

R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.

When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.

Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.

y = β1x1 + β2x2 + . . .+ βkxk.

Assumptions for Unbiasedness of OLS Estimators

MLR.1 Linear in Parameters

y = β0 + β1x1 + β2x2 + . . .+ βkxk + u

Population model is linear in parameters.

MLR.2 Random Sampling

We have a random sample of n observations drawn from thepopulation model defined in MLR.1:

{(xi1, xi2, . . . , xik, yi) : i = 1, 2, . . . , n}

MLR.1 Linear in Parameters

y = β0 + β1x1 + β2x2 + . . .+ βkxk + u

Population model is linear in parameters.

MLR.2 Random Sampling

We have a random sample of n observations drawn from thepopulation model defined in MLR.1:

{(xi1, xi2, . . . , xik, yi) : i = 1, 2, . . . , n}

MLR.3 Zero Conditional Mean

E(u|x1, x2, . . . , xk) = 0

This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.

There are several ways this assumption can fail.

1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.

2. Omitting an important variable that is correlated with anyof explanatory variables.

3. Measurement error in explanatory variables.

If this assumption fails then we have endogenous explanatoryvariables.

E(u|x1, x2, . . . , xk) = 0

MLR.4 No Perfect Collinearity

There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.

If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.

Finite Sample Properties of OLS Estimators

THEOREM: Unbiasedness of OLS

Under Assumptions MLR.1 through MLR.4 OLS estimators areunbiased:

E(βj) = βj , j = 0, 1, 2, . . . , k

The centers of the sampling distributions (i.e. expectations) ofOLS estimators are equal to the unknown population parameters.Remember that an estimate cannot be unbiased, it is just anumber based on a sample, and in general we cannot say anythingabout its proximity to the true value.

Including Irrelevant Variables in a Regression Model

What happens if we add an irrelevant variable in the model?(overspecifying the model)

Irrelevance of the variable means that its coefficient in thepopulation model is zero.

E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0

y = β0 + β1x1 + β2x2 + β3x3 + u

Taking the conditional expectation we have:

E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2

Even though the true model is given above x3 is added to themodel by mistake.

y = β0 + β1x1 + β2x2 + β3x3 + u

E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2

In this case SRF is given by

y = β0 + β1x1 + β2x2 + β3x3

OLS estimators are still unbiased:

E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,

True parameter value for the irrelevant variable is 0. Since thisvariable would have no explanatory power the expected valueof the OLS estimator will also be zero - i.e. unbiased.

However, even if they are unbiased, the variance of theregression will be larger if the model is overspecified.

y = β0 + β1x1 + β2x2 + β3x3

E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,

y = β0 + β1x1 + β2x2 + β3x3

E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,

y = β0 + β1x1 + β2x2 + β3x3

E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,

Omitting a Relevant Variable

What happens if we exclude an important variable?

If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.

In this case OLS estimators will be biased.

E.g. suppose that the PRF includes 2 independent variables:

y = β0 + β1x1 + β2x2 + u

Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is

y = β0 + β1x1

Is the OLS estimator on x1, β1, still unbiased?

y = β0 + β1x1 + β2x2 + u

y = β0 + β1x1

y = β0 + β1x1 + β2x2 + u

y = β0 + β1x1

y = β0 + β1x1 + β2x2 + u

y = β0 + β1x1

y = β0 + β1x1 + β2x2 + u

y = β0 + β1x1

y = β0 + β1x1 + β2x2 + u

y = β0 + β1x1

The impact of the omitted variable will be included in theerror term:

y = β0 + β1x1 + ν

True PRF includes x2’yi. Thus the error term ν can be writtenas:

ν = β2x2 + u

OLS estimator of β1 in the model above is:

∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2

To determine the magnitude and sign of the bias we willsubstitute y in the formula for β1, re-arrange and takeexpectation.

y = β0 + β1x1 + ν

ν = β2x2 + u

∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2

y = β0 + β1x1 + ν

ν = β2x2 + u

∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2

y = β0 + β1x1 + ν

ν = β2x2 + u

∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2

∑(xi1 − x1)(β0 + β1xi1 + β2xi2 + ui)∑

(xi1 − x1)2

= β1 + β2

∑(xi1 − x1)xi2∑(xi1 − x1)2

∑(xi1 − x1)ui∑(xi1 − x1)2

Taking (conditional) expectation we obtain

E(β1) = β1 + β2

∑(xi1 − x1)xi2∑(xi1 − x1)2

∑(xi1 − x1)

=0,MLR.3︷︸︸︷E(ui)∑

(xi1 − x1)2

= β1 + β2

(∑(xi1 − x1)xi2∑(xi1 − x1)2

E(β1) = β1 + β2

(∑(xi1 − x1)xi2∑(xi1 − x1)2

)The expression in the parenthesis to the right of β2 is just theregression of x2 on x1:

x2 = δ0 + δ1x1

ThusE(β1) = β1 + β2δ1

bias = E(β1)− β1 = β2δ1

This is called omitted variable bias.

Omitted Variable Bias

bias = E(β1)− β1 = β2δ1

Iff δ1 = 0 or β2 = 0 then bias is 0.

The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).

It is not possible to calculate this correlation if omittedvariable cannot be observed.

The following table summarizes possible cases:

Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

β2 > 0 positive bias negative bias

β2 < 0 negative bias positive bias

bias = E(β1)− β1 = β2δ1

Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

bias = E(β1)− β1 = β2δ1

Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

bias = E(β1)− β1 = β2δ1

Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

bias = E(β1)− β1 = β2δ1

Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0

bias = E(β1)− β1 = β2δ1

The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.

bias = E(β1)− β1 = β2δ1

The effect of omitted variable will be in u. Thus, MLR. 3 fails.

wage = β0 + β1educ+ β2ability + u

Instead of this model we estimate

wage = β0 + β1educ+ ν

ν = β2ability + u

Education will be correlated with the error term (ν).

E(ν|educ) 6= 0

Education is endogenous. If we omit ability the effect ofeducation on wages will be overestimated. Some part of theeffect of education on wage comes from ability.

ν = β2ability + u

E(ν|educ) 6= 0

ν = β2ability + u

E(ν|educ) 6= 0

ν = β2ability + u

E(ν|educ) 6= 0

In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.

True PRF is given by:

y = β0 + β1x1 + β2x2 + β3x3 + u

x3 is omitted and the following model is estimated

y = β0 + β1x1 + β2x2

Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.

In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.

y = β0 + β1x1 + β2x2 + β3x3 + u

y = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

y = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

y = β0 + β1x1 + β2x2

y = β0 + β1x1 + β2x2 + β3x3 + u

y = β0 + β1x1 + β2x2

The Variance of the OLS Estimators

Assumption MLR.5: Homoscedasticity

This assumption states that conditional on x variables error termhas constant variance:

Var(u|x1, x2, . . . , xk) = σ2

If this assumption fails the model exhibits heteroscedasticity.

This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.

We do not need this assumption for unbiasedness.

E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).

Var(u|x1, x2, . . . , xk) = σ2

Gauss-Markov Assumptions

Assumptions MLR.1 through MLR.5 are called Gauss-Markovassumptions. MLR.1: Linear in parameters,MLR.2: Random sampling,MLR.3: Zero conditional mean,MLR.4: No perfect collinearity,MLR.5: Homoscedasticity.

This assumptions are only valid for cross-sectional data. Theyneed to be modified for time series data.

Assumptions MLR.3 and MLR.5 can be restated in terms ofthe dependent variable:

E(y|x1, x2, . . . , xk) = β0 + β1x1 + β2x2 + . . .+ βkxk

Var(y|x1, x2, . . . , xk) = Var(u|x1, x2, . . . , xk) = σ2

Theorem: Variances of β

Under Gauss-Markov assumptions (MLR.1-MLR.5):

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

SSTj =

n∑i=1

(xij − xj)2

is the total sample variation in xj and R2j is the R-squared from

regressing xj on all other independent variables (including anintercept term).

Var(βj) moves in the same direction with σ2 but in oppositedirection with SSTj .To increase SSTj we need to collect more data (increase n).To reduce σ2 we need to find good explanatory variables.Econometrics I: Multiple Regression: Estimation - H. Tastan 46

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

SSTj =

n∑i=1

(xij − xj)2

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

SSTj =

n∑i=1

(xij − xj)2

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Var(βj) also depends on R2j which measures the degree of

correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2

j = 1 in which case the variance is infinite (note

that in this case βjs will be indefinite. But MLR.4 prohibitsthis.

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Var(βj) =σ2

SSTj(1−R2j ), j = 1, 2, . . . , k

Relationship between variance and R2j

Estimating Variances

An unbiased estimator of the error variance is:

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

σ2 =1

n− k − 1

n∑i=1

u2i =SSR

n− k − 1

Degrees of freedom:

dof = n− (k + 1)

Standard deviations of βjs

sd(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

Standard errors of βjs

se(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

σ =√σ2 is the standard error of regression (SER).

SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.

se(βj) is used in calculating confidence intervals and testinghypotheses.

sd(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

se(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

sd(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

se(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

sd(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

se(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

sd(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

se(βj) =σ√

SSTj(1−R2j ), j = 1, 2, . . . , k

Efficiency of OLS

Gauss-Markov Theorem

Under the assumptions MLR.1 through MLR.5, OLS estimators arethe best linear unbiased estimators (BLUE) for unknownpopulation parameters. In other words, under MLR.1-MLR.5, OLSestimators β0, β1, β2, . . . , βk are BLUEs of β0, β1, β2, . . . , βk

The Gauss-Markov theorem provides justification for the OLSestimation.

If this theorem is valid then we do not need to find otherestimation methods than OLS. The method of OLS gives usthe most efficient (best, minimum variance) estimators.

If any of the five assumptions fails the Gauss-Markov theoremdoes not hold. When MLR.3 fails unbiasedness property doesnot hold. When MLR.5 fails efficiency does not hold.

Efficiency of OLS

The Meaning of Linearity of Estimators in BLUE

Linearity of Estimators

An estimator βj of βj is linear, iff, it can be expressed as a linearfunction of the data on the dependent variable:

n∑i=1

where each wij can be a function of the sample values of all theindependent variables.

The OLS estimators can be written in the form given above:

∑ni=1 rijyi∑ni=1 r

wijyi, where wij =rij∑ni=1 r

where rij is the residual obtained from the regression of xj on allother explanatory variables.

MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple...

Documents

Transcript of MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple...