Multiple Regression Analysis: Estimationbvankamm/Files/360 Notes/02 - Multiple Regression...From...

Post on 12-Mar-2020

21 views 0 download

Transcript of Multiple Regression Analysis: Estimationbvankamm/Files/360 Notes/02 - Multiple Regression...From...

Multiple Regression Analysis: EstimationECONOMETRICS (ECON 360)

BEN VAN KAMMEN, PHD

OutlineMotivation.

Mechanics and Interpretation.

Expected Values and Variances of the Estimators.

Motivation for multiple regressionConsider the following results of a regression of the number of crimes reported in Milwaukee on the search volume (on Google) for the term โ€œice creamโ€โ—ฆ which Iโ€™m using as a proxy for ice cream sales.โ—ฆ A couple caveats about these data.

Dep. Var.: Crimes Coefficient ( ๏ฟฝ๐œท๐œท ) Std. Err. tIce Cream 1.5437 .4306 3.58

N=82; ๐‘…๐‘…2=0.1384

Motivation for multiple regression (continued)Ice cream is somehow an input into or a reason for committing crimes? โ—ฆ Evidenced by its positive association with crime.

Common sense and past experiences with ice cream, however, argue that there is probably no causal connection between the two. โ—ฆ A spurious correlation thatโ—ฆ disappears when we control for a confounding variable that is related to both x and y.

In this case summarized as โ€œgood weatherโ€, โ—ฆ i.e., people eat more ice cream when it is warm and also go outside more when it is warm (up to a

point, at least). โ—ฆ The increase in social interaction occasioned by warm weather, then, creates more opportunities for

conflicts and crimes, according to this informal theory, while coincidentally leading to more ice cream sales, as well.

Motivation for multiple regression (concluded)To use regression analysis to disconfirm the theory that ice cream causes more crime, perform a regression that controls for the effect of weather in some way.

Either,โ—ฆ Examine sub-samples of days in which the weather is (roughly) the same but ice cream consumption

varies, orโ—ฆ Explicitly control for the weather by including it in a multiple regression model.

Multiple regression definedMultiple regression expands the regression model using more than 1 regressor / explanatory variable / โ€œindependent variableโ€.

For 2 regressors, we would model the following relationship.๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐‘ข๐‘ข,

and estimate separate effects (๐›ฝ๐›ฝ1 and ๐›ฝ๐›ฝ2)โ€”for each explanatory variable (๐‘ฅ๐‘ฅ1 and ๐‘ฅ๐‘ฅ2).

Assuming enough data (and reasons for adding additional variables), the model with ๐‘˜๐‘˜ > 1regressors:

๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐›ฝ๐›ฝ3๐‘ฅ๐‘ฅ3+. . . +๐›ฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘˜๐‘˜ + ๐‘ข๐‘ข,

would not be anything fundamentally different from a simple regression.

Multiple OLS and simple OLSMultiple regression relaxes the assumption that all other factors are held fixed.

Instead of, ๐ธ๐ธ ๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 = 0, one needs only assume that ๐ธ๐ธ ๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 . . . ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ = 0.โ—ฆ A more realistic assumption when dealing with non-experimental data.

One other way of thinking about multiple regression:โ—ฆ simple regression as a special caseโ€”in which {๐‘ฅ๐‘ฅ2, . . . ๐‘ฅ๐‘ฅ๐‘˜๐‘˜} have been relegated to the error term and

treated as mean independent of ๐‘ฅ๐‘ฅ1. โ—ฆ I.e., simple regression has you estimating:

๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐œ€๐œ€; ๐œ€๐œ€ โ‰ก ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐›ฝ๐›ฝ3๐‘ฅ๐‘ฅ3+. . . +๐›ฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘˜๐‘˜ + ๐‘ข๐‘ข.

Multiple OLS and simple OLS (continued)To demonstrate why this could be a bad strategy, consider the ice cream-crime problem again, assuming that the true relationship follows:

๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘๐‘๐‘’๐‘’ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘๐‘๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘๐‘ + ๐›ฝ๐›ฝ2๐‘ก๐‘ก๐‘’๐‘’๐‘๐‘๐‘ก๐‘ก๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘ก๐‘ก๐‘ข๐‘ข๐‘’๐‘’๐‘’๐‘’ + ๐‘ข๐‘ข;๐ธ๐ธ ๐‘ข๐‘ข ๐‘๐‘๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘๐‘, ๐‘ก๐‘ก๐‘’๐‘’๐‘๐‘๐‘ก๐‘ก๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘ก๐‘ก๐‘ข๐‘ข๐‘’๐‘’๐‘’๐‘’ = 0.

My informal theory states that ๐›ฝ๐›ฝ1 = 0 and ๐›ฝ๐›ฝ2 > 0.

Multiple OLS and simple OLS (continued)Estimating the simple regression between ice cream and crime, was as if the model was transformed:

๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘๐‘๐‘’๐‘’ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘๐‘๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘๐‘ + ๐œ€๐œ€; ๐œ€๐œ€ = ๐›ฝ๐›ฝ2๐‘ก๐‘ก๐‘’๐‘’๐‘๐‘๐‘ก๐‘ก๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘ก๐‘ก๐‘ข๐‘ข๐‘’๐‘’๐‘’๐‘’ + ๐‘ข๐‘ข,

which I estimated under the assumption that: ๐ธ๐ธ ๐œ€๐œ€ ๐‘๐‘๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘๐‘ = 0.

However this likely a bad assumption, since it is probable that: ๐ธ๐ธ(๐‘ก๐‘ก๐‘’๐‘’๐‘๐‘๐‘ก๐‘ก๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘ก๐‘ก๐‘ข๐‘ข๐‘’๐‘’๐‘’๐‘’|๐‘๐‘๐‘๐‘๐‘’๐‘’๐‘๐‘๐‘’๐‘’๐‘’๐‘’๐‘–๐‘–๐‘๐‘) โ‰  0.

Multiple OLS and simple OLS (concluded)High (low) values of icecream correspond with high (low) values of temperature.

The assumptions underlying simple regression analysis state that when the conditional mean independence is violated, bias is introduced in OLS.โ—ฆ This bias would explain the positive estimate for ๐›ฝ๐›ฝ1 shown above.

We will examine the source of the bias more closely and how to estimate its direction later in this chapter.

First we turn our attention back to the technical aspects of estimating the OLS parameters with multiple regressors.

Results, controlling for temperatureNow the coefficient on ice cream is much smaller (closer to 0) and the effect of temperature is large and significant.โ—ฆ Itโ€™s not precisely zero, but we can no longer reject the null that it has zero effect on crime.

Dep. Var.: Crimes Coefficient ( ๏ฟฝ๐œท๐œท ) Std. Err. tIce Cream 0.4760 .4477 1.06Temperature 1.9492 .4198 4.64

N=82; ๐‘…๐‘…2=0.3231

Mechanics and interpretation of OLSEven with ๐‘˜๐‘˜ > 1 regressors, OLS minimizes the sum of the squares of the residuals.

With ๐‘˜๐‘˜ = 2 regressors:๏ฟฝ๐‘ข๐‘ข๐‘–๐‘– = ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2 โ†’ ๏ฟฝ๐‘ข๐‘ข๐‘–๐‘–2 = ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2

2.

๐‘†๐‘†๐‘ข๐‘ข๐‘๐‘ ๐‘’๐‘’๐‘œ๐‘œ ๐‘†๐‘†๐‘†๐‘†๐‘ข๐‘ข๐‘–๐‘–๐‘’๐‘’๐‘’๐‘’๐‘†๐‘† ๐‘…๐‘…๐‘’๐‘’๐‘…๐‘…๐‘๐‘๐‘†๐‘†๐‘ข๐‘ข๐‘–๐‘–๐‘…๐‘…๐‘…๐‘… (๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…) = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–22

Note is the change in subscripting: each observation is indexed by โ€œiโ€ as before, but the two โ€œxโ€ variables are now distinguished from one another by a โ€œ1โ€ and a โ€œ2โ€ in the subscript.

Minimizing SSRFor ๐‘˜๐‘˜ > 1 regressors:

๐‘†๐‘†๐‘†๐‘†๐‘…๐‘… = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜2 .

Once again minimization is accomplished by differentiating the above line with respect to each of the ๐‘˜๐‘˜ + 1 statistics,โ—ฆ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 and โ—ฆ all the (k) slope parameters.

The first order conditions for the minimum are:๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ0

= 0 and๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘—

= 0 โˆ€ ๐‘—๐‘— โˆˆ 1,2, . . . ๐‘˜๐‘˜ .

Minimizing SSR (continued)โ€œSimultaneously choose all the โ€˜betasโ€™ to make the regression model fit as closely as possible to all the data points.โ€

The partial derivatives for the problem look like this:

1๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ0

= โˆ’2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜

2๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ1

= โˆ’2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜

. . . ๐‘˜๐‘˜ + 1๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜

= โˆ’2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ .

Example with ๐‘˜๐‘˜ = 2As a representative (and comparatively easy to solve) example, consider the solution when ๐‘˜๐‘˜ =2.

Setting (1) equal to zero (FOC) and solving for ๏ฟฝฬ‚๏ฟฝ๐›ฝ0:

2๐‘›๐‘›๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ 2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฆ๐‘ฆ๐‘–๐‘– + 2๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 + 2๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2 = 0

4 ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 =1๐‘›๐‘›๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2 = ๏ฟฝ๐‘ฆ๐‘ฆ โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝฬ…๏ฟฝ๐‘ฅ2

Example with ๐‘˜๐‘˜ = 2 (continued)Doing the same thing with (2) and (3) gives you:

2๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ1

= 0 โ†’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

3๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ2

= 0 โ†’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–22 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

Example with ๐‘˜๐‘˜ = 2 (continued)Substituting (4):

2 ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝ๐‘ฆ๐‘ฆ โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝฬ…๏ฟฝ๐‘ฅ2 ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

3 ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–22 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝ๐‘ฆ๐‘ฆ โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝฬ…๏ฟฝ๐‘ฅ2 ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ2 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

Example with ๐‘˜๐‘˜ = 2 (continued)Solving (3) for ๏ฟฝฬ‚๏ฟฝ๐›ฝ2:

(3) โ†’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2 ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–22 โˆ’ ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ22 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ2 ๏ฟฝ๐‘ฆ๐‘ฆ + ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ2๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

โ‡” ๏ฟฝฬ‚๏ฟฝ๐›ฝ2 = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–22 โˆ’ ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ22โˆ’1

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ2 ๏ฟฝ๐‘ฆ๐‘ฆ + ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ2๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–2

โ‡” ๏ฟฝฬ‚๏ฟฝ๐›ฝ2 = ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2โˆ’1 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

Example with ๐‘˜๐‘˜ = 2 (continued)Solve it simultaneously with (2) by substituting it in:

2 โ†’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 = ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ + ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ12 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 ; now

๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 = ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ + ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ12 โˆ’๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2

Example with ๐‘˜๐‘˜ = 2 (continued)Collect all the ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 terms.

๏ฟฝฬ‚๏ฟฝ๐›ฝ1 ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 โˆ’ ๐‘›๐‘›๏ฟฝฬ…๏ฟฝ๐‘ฅ12 โˆ’๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2= ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ โˆ’

๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2

โ‡” ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 โˆ’๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2= ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ โˆ’

๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2

โ‡” ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2=๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2

Example with ๐‘˜๐‘˜ = 2 (concluded)Simplifying gives you:

๏ฟฝฬ‚๏ฟฝ๐›ฝ1 =๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ22

Then you can solve for ๏ฟฝฬ‚๏ฟฝ๐›ฝ2 which, after a lot of simplification, is:

๏ฟฝฬ‚๏ฟฝ๐›ฝ2 =๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ22 .

Lastly, substitute the above expressions into the solution for the intercept:๏ฟฝฬ‚๏ฟฝ๐›ฝ0 = ๏ฟฝ๐‘ฆ๐‘ฆ โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝฬ…๏ฟฝ๐‘ฅ1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๏ฟฝฬ…๏ฟฝ๐‘ฅ2.

Multiple OLS and variance/covarianceExamine the solutions closely.

They depend, as with simple regression, only on the variances and covariances of the regressors and their covariances with ๐‘ฆ๐‘ฆ.โ—ฆ This is true generally of OLS, even when there are many explanatory variables in the regression.

As this number grows, even to 2, the solution becomes difficult to work out with algebra, but software (like STATA) is very good at performing the calculations and solving for the ๐‘˜๐‘˜ + 1estimates, even when ๐‘˜๐‘˜ is quite large.โ—ฆ How software works out the solution: matrix algebra.

Multiple regression and โ€œpartialling outโ€The โ€œPartialling Outโ€ Interpretation of Multiple Regression is revealed by the matrix and non-matrix estimate of ๏ฟฝฬ‚๏ฟฝ๐›ฝ1.โ—ฆ What goes into ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 in a multiple regression is the variation in ๐‘ฅ๐‘ฅ1 that cannot be โ€œexplainedโ€ by its

relation to the other ๐‘ฅ๐‘ฅ variables.โ—ฆ The covariance between this residual variation in ๐‘ฅ๐‘ฅ1, not explained by other regressors, and ๐‘ฆ๐‘ฆ is what

matters.

It also provides a good way to think of ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 as the partial effect of ๐‘ฅ๐‘ฅ1, holding other factors fixed.

It is estimated using variation in ๐‘ฅ๐‘ฅ1 that is independent of variation in other regressors.

More.

Expected value of the OLS estimatorsNow we resume the discussion of a misspecified OLS regression, e.g., crime on ice cream, by considering the assumptions under which OLS yields unbiased estimates and how misspecifyinga regression can produce biased estimates.โ—ฆ Also what is the direction of the bias?

There is a population model that is linear in parameters.

Assumption MLR.1 states this model which represents the true relationship between ๐‘ฆ๐‘ฆ and all ๐‘ฅ๐‘ฅ.

I.e.,

๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐›ฝ๐›ฝ3๐‘ฅ๐‘ฅ3+. . . +๐›ฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘˜๐‘˜ + ๐‘ข๐‘ข.

OLS under assumptions MLR 1-4The sample of ๐‘›๐‘› observations is assumed random and indexed by ๐‘๐‘ (MLR.2).

From simple regression, we know that there must be variation in ๐‘ฅ๐‘ฅ for an estimate to exist.

With multiple regression, each regressor must have (at least some) variation that is not explained by the other regressors. โ—ฆ The other regressors cannot โ€œpartial outโ€ all of the variation in, say, ๐‘ฅ๐‘ฅ1 while still estimating ๐›ฝ๐›ฝ1. โ—ฆ Perfect multicollinearity is ruled out (by Assumption MLR.3).

When you have a set of regressors that violates this assumption, one of them must be removed from the regression. โ—ฆ Software will usually do this automatically for you.

OLS under assumptions MLR 1-4 (continued)Finally multiple regression assumes a less restrictive version of mean independence between regressors and error term.

Assumption MLR.4 states that the error term has zero conditional mean, i.e., conditional on all regressors.

๐ธ๐ธ ๐‘ข๐‘ข ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2, . . . , ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ = 0.

This can fail if the estimated regression is misspecified, in terms of the functional form of one of the variables or the omission of a relevant regressor.

The model contains exogenous regressors, with the alternative being an endogenous regressor (explanatory variable).โ—ฆ This is when some ๐‘ฅ๐‘ฅ๐‘—๐‘— is correlated with an omitted variable in the error term.

UnbiasednessUnder the four assumptions above, the OLS estimates are unbiased, i.e.,

๐ธ๐ธ ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘— = ๐›ฝ๐›ฝ๐‘—๐‘—โˆ€ ๐‘—๐‘—.

This includes cases in which variables have no effect on ๐‘ฆ๐‘ฆ and in which ๐›ฝ๐›ฝ๐‘—๐‘— = 0.โ—ฆ This is the โ€œoverspecifiedโ€ case in the text, in which an irrelevant variable is included in the regression. โ—ฆ Though this does not affect the unbiased-ness of OLS, it does impact the variance of the estimatesโ€”

sometimes in undesirable ways we will study later.

Expected value and variance of OLS estimators: under MLR assumptions 1-4Unbiasedness does not apply when you exclude a variable from the population model when estimating it.โ—ฆ This is called underspecifying the model.โ—ฆ It occurs when, because of an empiricistโ€™s mistake or failure to observe a variable in data, a relevant

regressor is excluded from the estimation.โ—ฆ For the sake of concreteness, consider an example of each.

1. An empiricist regresses crime on ice cream because he has an axe to grind about ice cream or something like that, and he wants to show that it causes crime.

2. An empiricist regresses earnings on education because he cannot observe other determinants of workersโ€™ productivity (โ€œabilityโ€) in data.

Bias from under-specificationSo in both cases, there is a population (โ€œtrueโ€) model that includes two variables:

๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐‘ข๐‘ข,

but the empiricist estimates a version that excludes ๐‘ฅ๐‘ฅ2, generating estimates different from unbiased OLSโ€”which are denoted with the โ€œtildeโ€ on the next line.

๏ฟฝ๐‘ฆ๐‘ฆ = ๏ฟฝ๐›ฝ๐›ฝ0 + ๏ฟฝ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1; ๐‘ฅ๐‘ฅ2 is omitted from the estimation. So,

๏ฟฝ๐›ฝ๐›ฝ1 =โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝ๐‘ฆ๐‘ฆ

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2 =โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 ๐‘ฆ๐‘ฆ๐‘–๐‘–โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2 .

Omitted variable biasIf the true relationship includes ๐‘ฅ๐‘ฅ2, however, the bias in this estimate is revealed by substituting the population model for ๐‘ฆ๐‘ฆ๐‘–๐‘–.

๏ฟฝ๐›ฝ๐›ฝ1 =โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2 + ๐‘ข๐‘ข๐‘–๐‘–

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2

Simplify the numerator:

๐›ฝ๐›ฝ0๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 + ๐›ฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 + ๐›ฝ๐›ฝ2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2 ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 + ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 ๐‘ข๐‘ข๐‘–๐‘– .

Then take expectations:

โ‡” ๐ธ๐ธ ๏ฟฝ๐›ฝ๐›ฝ1 =0 + ๐›ฝ๐›ฝ1 โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2 + ๐›ฝ๐›ฝ2๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 + 0

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2 ,

since the regressors are uncorrelated with the error from the correctly specified model.

Omitted variable bias (concluded)A little more simplifying gives:

โ‡” ๐ธ๐ธ ๏ฟฝ๐›ฝ๐›ฝ1 = ๐›ฝ๐›ฝ1 + ๐›ฝ๐›ฝ2๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ1 2 .

The estimator using the misspecified model is equal to the true parameter (๐›ฝ๐›ฝ1) plus a second term that captures the bias.

This bias is non-zero unless one of two things is true:

1. ๐›ฝ๐›ฝ2 is zero. There is no bias from excluding an irrelevant variable.

2. ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 is zero. This means that ๐‘ฅ๐‘ฅ1 is independent of the error term, varying it does not induce variation in an omitted variable.

Direction of omitted variable biasThe textbook calls the fraction in the bias term, ๐›ฟ๐›ฟ1,โ—ฆ the regression coefficient you would get if you regressed ๐‘ฅ๐‘ฅ2 on ๐‘ฅ๐‘ฅ1.

๐›ฟ๐›ฟ1 โ‰ก๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1

This term, along with ๐›ฝ๐›ฝ2, helps predict the direction of the bias.

Specifically, if ๐›ฟ๐›ฟ1and ๐›ฝ๐›ฝ2 are the same sign (+/-), the bias is positive, and if they are opposite signs, the bias is negative.

1. ๐›ฟ๐›ฟ1,๐›ฝ๐›ฝ2 same sign โ†’ ๐ธ๐ธ ๏ฟฝ๐›ฝ๐›ฝ1 > ๐›ฝ๐›ฝ1 "upward bias"

2. ๐›ฟ๐›ฟ1,๐›ฝ๐›ฝ2 opposite sign โ†’ ๐ธ๐ธ ๏ฟฝ๐›ฝ๐›ฝ1 < ๐›ฝ๐›ฝ1 "downward bias"

Omitted variable bias (concluded)One more way to think about bias: the ceteris paribus effect of ๐‘ฅ๐‘ฅ1 on ๐‘ฆ๐‘ฆ.

๏ฟฝ๐›ฝ๐›ฝ1 estimates this effect, but it does so by lumping the true effect in with an unknown quantity of misleading effects, e.g., you take the population model,

๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐‘ข๐‘ข, and estimate ๐‘ฆ๐‘ฆ = ๐›ฝ๐›ฝ0 + ๐›ฝ๐›ฝ1๐‘ฅ๐‘ฅ1 + ๐ถ๐ถ;๐ถ๐ถ โ‰ก ๐›ฝ๐›ฝ2๐‘ฅ๐‘ฅ2 + ๐‘ข๐‘ข. So,

๐œ•๐œ•๐ธ๐ธ(๐‘ฆ๐‘ฆ|๐‘ฅ๐‘ฅ)๐œ•๐œ•๐‘ฅ๐‘ฅ1

= ๐›ฝ๐›ฝ1 +๐œ•๐œ•๐ถ๐ถ๐œ•๐œ•๐‘ฅ๐‘ฅ1

= ๐›ฝ๐›ฝ1 + ๐›ฝ๐›ฝ2๐œ•๐œ•๐ธ๐ธ(๐‘ฅ๐‘ฅ2|๐‘ฅ๐‘ฅ1)

๐œ•๐œ•๐‘ฅ๐‘ฅ1.

The partial derivative of ๐‘ฅ๐‘ฅ2 with respect to ๐‘ฅ๐‘ฅ1 is the output you would get if you estimated a regression of ๐‘ฅ๐‘ฅ2 on ๐‘ฅ๐‘ฅ1, i.e., ๐›ฟ๐›ฟ1.

Standard error of the OLS estimatorWhy is variance of an estimator (โ€œstandard errorโ€) so important? โ—ฆ Inference: next chapter.

Generating and reporting only point estimates is irresponsible.โ—ฆ It leads your (especially an uninitiated) audience to an overly specific conclusion about your results. โ—ฆ This is bad for you (empiricist) and them (policy makers, customers, other scholars) because they may

base decisions on your findings, e.g., choosing a marginal income tax rate depends crucially on labor supply elasticity.

A point estimate outside the context of its standard error may prompt your audience to rash decisions that they would not make if they knew your estimates were not โ€œpinpointโ€ accurate.โ—ฆ This is bad for you because if they do the โ€œwrongโ€ thing with your results, they will blame you for the

policyโ€™s failure.โ—ฆ And empiricists, as a group, will have their credibility diminished a little, as well.

Variance of OLS estimatorsHere we show that the standard error of the ๐‘—๐‘—๐‘ก๐‘ก๐‘ก estimate, ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘— , is:

๐‘…๐‘…๐‘’๐‘’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘— โ‰ก ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘—12 =

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝ๐‘ข๐‘ข๐‘–๐‘–2

๐‘›๐‘› โˆ’ ๐‘˜๐‘˜ โˆ’ 1 1 โˆ’ ๐‘…๐‘…๐‘—๐‘—2 โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘—๐‘— โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ๐‘—๐‘—2 , where

๏ฟฝ๐‘ข๐‘ข๐‘–๐‘– are the residuals from the regression,

๐‘˜๐‘˜ is the number of regressors, and

๐‘…๐‘…๐‘—๐‘—2 is the proportion of ๐‘ฅ๐‘ฅ๐‘—๐‘—โ€ฒ๐‘…๐‘… variance explained by the other ๐‘ฅ๐‘ฅ variables.

Once again the estimate of the variance assumes homoskedasticity, i.e.,๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ข๐‘ข ๐‘–๐‘–๐‘…๐‘…๐‘…๐‘… ๐‘ฅ๐‘ฅ๐‘—๐‘— = ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’(๐‘ข๐‘ข) = ๐œŽ๐œŽ2.

Variance of OLS estimators (continued)Most of the components of the variance are familiar from simple regression.โ—ฆ Subscripts (๐‘—๐‘—) and that the degrees of freedom (๐‘›๐‘› โˆ’ ๐‘˜๐‘˜ โˆ’ 1 instead of ๐‘›๐‘› โˆ’ 1 โˆ’ 1) have been generalized

for ๐‘˜๐‘˜ regressors.

The term, (1 โˆ’ ๐‘…๐‘…๐‘—๐‘—2), is new, reflecting the consequence of (imperfect) multicollinearity.

Perfect multicollinearity has already been ruled out by Assumption MLR.3, but the standard error of a multicollinear variable (๐‘…๐‘…๐‘—๐‘—2 โ†’ 1) is very large because the denominator of the expression above will be very small ((1 โˆ’ ๐‘…๐‘…๐‘—๐‘—2) โ†’ 0).

Variance of OLS estimators (continued)Once again the variance of the errors is estimated without bias, i.e., (๐ธ๐ธ ๏ฟฝ๐œŽ๐œŽ2 = ๐œŽ๐œŽ2), by the (degrees of freedom-adjusted) mean squared error:

๐‘†๐‘†๐‘†๐‘†๐‘…๐‘… โ‰ก๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝ๐‘ข๐‘ข๐‘–๐‘–2 ; ๏ฟฝ๐œŽ๐œŽ2 =๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…

๐‘›๐‘› โˆ’ ๐‘˜๐‘˜ โˆ’ 1, so ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘— =

๏ฟฝ๐œŽ๐œŽ2

1 โˆ’ ๐‘…๐‘…๐‘—๐‘—2 โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘—๐‘— โˆ’ ๏ฟฝฬ…๏ฟฝ๐‘ฅ๐‘—๐‘—2 ,

Where the residuals are: ๏ฟฝ๐‘ข๐‘ข๐‘–๐‘– โ‰ก ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ .

Variance of OLS estimators (continued)Observations about the standard error.โ—ฆ When adding a (relevant) regressor to a model, the standard error of an existing regressor can increase

or decrease.โ—ฆ Generally the sum of squares of residuals decreases, since more information is being used to fit the

model. But the degrees of freedom decreases as well.โ—ฆ The standard error depends on Assumption MLR.5 (homoskedasticity). โ—ฆ If it is violated, ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘— is still unbiased, however, the above expression becomes a biased estimate of the

variance of ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘—๐‘—. โ—ฆ The 8th chapter in the text is devoted to dealing with the problem of heteroskedasticity in regression.

Gauss-Markov Theorem and BLUEIf, however, Assumptions MLR 1-5 hold, the OLS estimators (๏ฟฝฬ‚๏ฟฝ๐›ฝ0through ๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜) have minimal variance among the class of linear unbiased estimators, โ—ฆ i.e., it is the โ€œbestโ€ in the sense of minimal variance.

OLS is โ€œBLUEโ€, which stands for Best Linear Unbiased Estimator.

The 5 Assumptions that lead to this conclusion (MLR 1-5) are collectively known as the Gauss-Markov assumptions.

The BLUE-ness of the OLS estimators under those assumptions is known as the Gauss-Markov Theorem.

ConclusionMultiple regression solves the problem of omitted variables that are correlated with the dependent variable and regressor.

It is estimated in a manner analogous to simple OLS and has similar properties:โ—ฆ Unbiasedness under MLR 1-4,โ—ฆ BLUE under MLR 1-5.

Its estimates are interpreted as partial effects of changing one regressor and holding the others constant.

The estimates have standard errors that can be used to generate confidence intervals and test hypotheses about the parametersโ€™ values.

Data on crime and ice creamThe unit of observation is daily, i.e., these are 82 days on which ice cream search volume and crime reports have been counted in the City of Milwaukee.

All the variables have had the โ€œday of the weekโ€-specific mean subtracted from them, because all of them fluctuate across the calendar week, and this avoids creating another spurious correlation through the days of the week.

Back.

OLS estimates using matricesUnderstanding what software is doing when it puts out regression estimates is one reason for the following exercise.

Another reason is to demonstrate the treatment of OLS you will experience in graduate-level econometrics classesโ€”one in which everything is espoused using matrix algebra.

Imagine the variables in your regression as a spreadsheet, i.e., with ๐‘ฆ๐‘ฆ in column 1, ๐‘ฅ๐‘ฅ1 in column 2, and each row is a different observation.

With this image in mind, divide the spreadsheet into two matrices,* on which operations may be performed as they are on numbers (โ€œscalarsโ€ in matrix jargon).

*โ€œA rectangular array of numbers enclosed in parentheses. It is conventionally denoted by a capital letter.โ€

OLS estimates using matrices (continued)The dimensions of a matrix are expressed: ๐‘’๐‘’๐‘’๐‘’๐‘Ÿ๐‘Ÿ๐‘…๐‘… ร— ๐‘๐‘๐‘’๐‘’๐‘…๐‘…๐‘ข๐‘ข๐‘๐‘๐‘›๐‘›๐‘…๐‘….โ—ฆ I.e., ๐‘›๐‘› observations (rows) of a variable ๐‘ฆ๐‘ฆ, would be an ๐‘›๐‘› ร— 1 matrix (โ€œYโ€); ๐‘›๐‘› observations (rows) of ๐‘˜๐‘˜

explanatory variables ๐‘ฅ๐‘ฅ, would be an ๐‘›๐‘› ร— ๐‘˜๐‘˜ matrix (โ€œXโ€).

One operation that can be performed on a matrix is called transposition, denoted by an apostrophe behind the matrixโ€™s name.โ—ฆ Transposing a matrix just means exchanging its rows and columns. E.g.,

๐‘‹๐‘‹ โ‰ก 5 2 13 0 1 โ†’ ๐‘‹๐‘‹โ€ฒ =

5 32 01 1

.

Matrix multiplicationAnother operation that can be performed on some pairs of matrices is multiplication, but only if they are conformable.

โ€œTwo matrices A and B of dimensions ๐‘๐‘ ร— ๐‘›๐‘› and ๐‘›๐‘› ร— ๐‘†๐‘† respectively are conformable to form the product matrix AB, since the number of columns in A is equal to the number of rows in B. The product matrix AB is of dimension ๐‘๐‘ ร— ๐‘†๐‘†, and its ๐‘๐‘๐‘—๐‘—๐‘ก๐‘ก๐‘ก element, ๐‘๐‘๐‘–๐‘–๐‘—๐‘—, is obtained by multiplying the elements of the ๐‘๐‘๐‘ก๐‘ก๐‘ก row of A by the corresponding elements of the ๐‘—๐‘—๐‘ก๐‘ก๐‘ก column of B and adding the resulting products.โ€

OLS estimates using matrices (continued)Transposition and multiplication are useful in econometrics for obtaining a matrix of covariances between ๐‘ฆ๐‘ฆ and all the ๐‘ฅ๐‘ฅ, as well as among the ๐‘ฅ๐‘ฅ variables.

Specifically when you multiply Y by the transpose of X, you get a matrix (๐‘˜๐‘˜ ร— 1) with elements,

๐‘‹๐‘‹โ€ฒ๐‘Œ๐‘Œ =

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘–

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–2๐‘ฆ๐‘ฆ๐‘–๐‘–

โ‹ฎ

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘˜๐‘˜2๐‘ฆ๐‘ฆ๐‘–๐‘–

=

๐‘…๐‘…๐‘ฆ๐‘ฆ1๐‘…๐‘…๐‘ฆ๐‘ฆ2โ‹ฎ๐‘…๐‘…๐‘ฆ๐‘ฆ๐‘˜๐‘˜

;

x and y are expressed as deviations from their means.

OLS estimates using matrices (continued)Similarly multiplying X by its own transpose gives you a matrix (๐‘˜๐‘˜ ร— ๐‘˜๐‘˜) with elements,

๐‘‹๐‘‹โ€ฒ๐‘‹๐‘‹ =

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–12 โ€ฆ ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜

โ‹ฎ โ‹ฑ โ‹ฎ

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ โ€ฆ ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜2

=๐‘…๐‘…11 โ€ฆ ๐‘…๐‘…1๐‘˜๐‘˜โ‹ฎ โ‹ฑ โ‹ฎ๐‘…๐‘…๐‘˜๐‘˜1 โ€ฆ ๐‘…๐‘…๐‘˜๐‘˜๐‘˜๐‘˜

.

OLS estimates using matrices (continued)Xโ€™X and Xโ€™Y are matrices of the variances of the explanatory variables and covariances among all variables. They can be combined to obtain the matrix (๐‘˜๐‘˜ ร— 1) of coefficient estimates. โ—ฆ This is the part for which computers are really indispensable.

First of all, you invert the matrix ๐‘‹๐‘‹๐‘‹๐‘‹๐‘‹. This means solving for the values that make the following hold:

๐‘‹๐‘‹โ€ฒ๐‘‹๐‘‹ ๐‘‹๐‘‹โ€ฒ๐‘‹๐‘‹ โˆ’1 = ๐ผ๐ผ โ‡”๐‘…๐‘…11 โ€ฆ ๐‘…๐‘…1๐‘˜๐‘˜โ‹ฎ โ‹ฑ โ‹ฎ๐‘…๐‘…๐‘˜๐‘˜1 โ€ฆ ๐‘…๐‘…๐‘˜๐‘˜๐‘˜๐‘˜

๐‘–๐‘–11 โ€ฆ ๐‘–๐‘–1๐‘˜๐‘˜โ‹ฎ โ‹ฑ โ‹ฎ๐‘–๐‘–๐‘˜๐‘˜1 โ€ฆ ๐‘–๐‘–๐‘˜๐‘˜๐‘˜๐‘˜

=1 โ€ฆ 0โ‹ฎ โ‹ฑ โ‹ฎ0 โ€ฆ 1

.

OLS estimates using matrices (continued)To illustrate using the case of ๐‘˜๐‘˜ = 2, you have to solve for all the โ€œaโ€ terms using simultaneous equations.

๐‘…๐‘…11 ๐‘…๐‘…12๐‘…๐‘…21 ๐‘…๐‘…22

๐‘–๐‘–11 ๐‘–๐‘–12๐‘–๐‘–21 ๐‘–๐‘–22 = 1 0

0 1 , so๐‘–๐‘–11๐‘…๐‘…11 + ๐‘–๐‘–21๐‘…๐‘…12 = 1,๐‘–๐‘–11๐‘…๐‘…21 + ๐‘–๐‘–21๐‘…๐‘…22 = 0,

๐‘–๐‘–12๐‘…๐‘…11 + ๐‘–๐‘–22๐‘…๐‘…12 = 0, and๐‘–๐‘–12๐‘…๐‘…21 + ๐‘–๐‘–22๐‘…๐‘…22 = 1.

OLS estimates using matrices (continued)Solving the first two simultaneously gives you:

๐‘–๐‘–11 =๐‘…๐‘…22

๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21and ๐‘–๐‘–21 =

๐‘…๐‘…21๐‘…๐‘…12๐‘…๐‘…21 โˆ’ ๐‘…๐‘…11๐‘…๐‘…22

.

Similarly the last to give you:๐‘–๐‘–12 =

๐‘…๐‘…12๐‘…๐‘…12๐‘…๐‘…21 โˆ’ ๐‘…๐‘…11๐‘…๐‘…22

and ๐‘–๐‘–22 =๐‘…๐‘…11

๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21.

To get the coefficient estimates, you do the matrix equivalent of dividing covariance by variance, i.e., you solve for the vector B:

ฮ’ = ๐‘‹๐‘‹โ€ฒ๐‘‹๐‘‹ โˆ’1 ๐‘‹๐‘‹โ€ฒ๐‘Œ๐‘Œ .

OLS estimates using matrices (continued)For the ๐‘˜๐‘˜ = 2 case,

ฮ’ =

๐‘…๐‘…22๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21

๐‘…๐‘…12๐‘…๐‘…12๐‘…๐‘…21 โˆ’ ๐‘…๐‘…11๐‘…๐‘…22

๐‘…๐‘…21๐‘…๐‘…12๐‘…๐‘…21 โˆ’ ๐‘…๐‘…11๐‘…๐‘…22

๐‘…๐‘…11๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21

๐‘…๐‘…๐‘ฆ๐‘ฆ1๐‘…๐‘…๐‘ฆ๐‘ฆ2 .

Multiplying this out gives the same solution as minimizing the Sum of Squared Residuals.

ฮ’ =

๐‘…๐‘…22๐‘…๐‘…๐‘ฆ๐‘ฆ1 โˆ’ ๐‘…๐‘…12๐‘…๐‘…๐‘ฆ๐‘ฆ2๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21๐‘…๐‘…11๐‘…๐‘…๐‘ฆ๐‘ฆ2 โˆ’ ๐‘…๐‘…21๐‘…๐‘…๐‘ฆ๐‘ฆ1๐‘…๐‘…11๐‘…๐‘…22 โˆ’ ๐‘…๐‘…12๐‘…๐‘…21

=

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ(๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ) โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 2

๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ2,๐‘ฆ๐‘ฆ โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1,๐‘ฆ๐‘ฆ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ1 ๐‘‰๐‘‰๐‘–๐‘–๐‘’๐‘’ ๐‘ฅ๐‘ฅ2 โˆ’ ๐ถ๐ถ๐‘’๐‘’๐ถ๐ถ ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2 2

OLS estimates using matrices (concluded)Solving OLS for ๐‘˜๐‘˜ > 2 is not fundamentally different, and the solution will always be a function of the variances and covariances of the variables in the model.โ—ฆ It just gets more difficult (at an accelerating rate) to write down solutions for the individual coefficients. โ—ฆ It is left as an exercise to show that the โ€œpartialling outโ€ interpretation of OLS holds.

ฮ’ = ๐‘‹๐‘‹โ€ฒ๐‘‹๐‘‹ โˆ’1 ๐‘‹๐‘‹โ€ฒ๐‘Œ๐‘Œ =

โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘–โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–12

โ‹ฎโˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–๐‘˜๐‘˜๐‘ฆ๐‘ฆ๐‘–๐‘–โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘˜๐‘˜12

,

in which ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–๐‘—๐‘— is the residual for observation ๐‘๐‘ from a regression of ๐‘ฅ๐‘ฅ๐‘—๐‘—on all the other regressors.โ—ฆ ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1 โ‰ก ๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1; ๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1 is the fitted value.

Back.

Partialling out interpretation of OLSThe expression for ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 can be derived from the first order condition for ๐‘ฅ๐‘ฅ1 from the least squares minimization.

This states that:๐œ•๐œ•๐‘†๐‘†๐‘†๐‘†๐‘…๐‘…๐œ•๐œ•๏ฟฝฬ‚๏ฟฝ๐›ฝ1

= โˆ’2๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๐‘ฅ๐‘ฅ๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ = 0.

Substituting in the definition of the residuals from the previous line, you have:

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1 + ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ = 0

Partialling out interpretation of OLS (continued)

โ‡”๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ +

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ = 0.

The term in brackets is the OLS residual from regressing ๐‘ฆ๐‘ฆ on ๐‘ฅ๐‘ฅ1. . . ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ , โ€œ๏ฟฝ๐‘ข๐‘ข๐‘–๐‘–โ€, and ๏ฟฝ๐‘ฅ๐‘ฅ1 is a linear function of the other ๐‘ฅ๐‘ฅ variables, i.e.,

๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1 = ๏ฟฝ๐›พ๐›พ0 + ๏ฟฝ๐›พ๐›พ1๐‘ฅ๐‘ฅ๐‘–๐‘–2+. . . +๏ฟฝ๐›พ๐›พ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ ,

. . .

Partialling out interpretation of OLS (continued). . . so one may write:

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝ๐›พ๐›พ0 + ๏ฟฝ๐›พ๐›พ1๐‘ฅ๐‘ฅ๐‘–๐‘–2+. . . +๏ฟฝ๐›พ๐›พ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ ๏ฟฝ๐‘ข๐‘ข๐‘–๐‘– .

Since the OLS residuals are uncorrelated with each ๐‘ฅ๐‘ฅ๐‘—๐‘—, and the sum of the residuals is zero, this entire line equals zero and drops out.

Partialling out interpretation of OLS (continued)Then you have:

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1 ๐‘ฆ๐‘ฆ๐‘–๐‘– โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ0 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๐‘ฅ๐‘ฅ๐‘–๐‘–1 โˆ’ ๏ฟฝฬ‚๏ฟฝ๐›ฝ2๐‘ฅ๐‘ฅ๐‘–๐‘–2โˆ’. . .โˆ’๏ฟฝฬ‚๏ฟฝ๐›ฝ๐‘˜๐‘˜๐‘ฅ๐‘ฅ๐‘–๐‘–๐‘˜๐‘˜ = 0, and

the residuals (๏ฟฝฬ‚๏ฟฝ๐‘’1) are uncorrelated with the other regressors (2 through ๐‘˜๐‘˜) and sum to zero, as well. So,

๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘– = ๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐›ฝ1(๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–12+๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1 ๏ฟฝ๐‘ฅ๐‘ฅ๐‘–๐‘–1) โ‡”๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘– = ๏ฟฝฬ‚๏ฟฝ๐›ฝ1๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›

๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–12 .

Partialling out interpretation of OLS (concluded)Then you get the โ€œpartialling outโ€ expression:

โ‡” ๏ฟฝฬ‚๏ฟฝ๐›ฝ1 =โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1๐‘ฆ๐‘ฆ๐‘–๐‘–โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๏ฟฝฬ‚๏ฟฝ๐‘’๐‘–๐‘–1

2 .

Back.