Multiple Regression Analysis: Estimationbvankamm/Files/360 Notes/02 - Multiple Regression...From...
Transcript of Multiple Regression Analysis: Estimationbvankamm/Files/360 Notes/02 - Multiple Regression...From...
Multiple Regression Analysis: EstimationECONOMETRICS (ECON 360)
BEN VAN KAMMEN, PHD
OutlineMotivation.
Mechanics and Interpretation.
Expected Values and Variances of the Estimators.
Motivation for multiple regressionConsider the following results of a regression of the number of crimes reported in Milwaukee on the search volume (on Google) for the term โice creamโโฆ which Iโm using as a proxy for ice cream sales.โฆ A couple caveats about these data.
Dep. Var.: Crimes Coefficient ( ๏ฟฝ๐ท๐ท ) Std. Err. tIce Cream 1.5437 .4306 3.58
N=82; ๐ ๐ 2=0.1384
Motivation for multiple regression (continued)Ice cream is somehow an input into or a reason for committing crimes? โฆ Evidenced by its positive association with crime.
Common sense and past experiences with ice cream, however, argue that there is probably no causal connection between the two. โฆ A spurious correlation thatโฆ disappears when we control for a confounding variable that is related to both x and y.
In this case summarized as โgood weatherโ, โฆ i.e., people eat more ice cream when it is warm and also go outside more when it is warm (up to a
point, at least). โฆ The increase in social interaction occasioned by warm weather, then, creates more opportunities for
conflicts and crimes, according to this informal theory, while coincidentally leading to more ice cream sales, as well.
Motivation for multiple regression (concluded)To use regression analysis to disconfirm the theory that ice cream causes more crime, perform a regression that controls for the effect of weather in some way.
Either,โฆ Examine sub-samples of days in which the weather is (roughly) the same but ice cream consumption
varies, orโฆ Explicitly control for the weather by including it in a multiple regression model.
Multiple regression definedMultiple regression expands the regression model using more than 1 regressor / explanatory variable / โindependent variableโ.
For 2 regressors, we would model the following relationship.๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ข๐ข,
and estimate separate effects (๐ฝ๐ฝ1 and ๐ฝ๐ฝ2)โfor each explanatory variable (๐ฅ๐ฅ1 and ๐ฅ๐ฅ2).
Assuming enough data (and reasons for adding additional variables), the model with ๐๐ > 1regressors:
๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ฝ๐ฝ3๐ฅ๐ฅ3+. . . +๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐ + ๐ข๐ข,
would not be anything fundamentally different from a simple regression.
Multiple OLS and simple OLSMultiple regression relaxes the assumption that all other factors are held fixed.
Instead of, ๐ธ๐ธ ๐๐๐๐๐๐๐๐๐๐ ๐ฅ๐ฅ1 = 0, one needs only assume that ๐ธ๐ธ ๐๐๐๐๐๐๐๐๐๐ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 . . . ๐ฅ๐ฅ๐๐ = 0.โฆ A more realistic assumption when dealing with non-experimental data.
One other way of thinking about multiple regression:โฆ simple regression as a special caseโin which {๐ฅ๐ฅ2, . . . ๐ฅ๐ฅ๐๐} have been relegated to the error term and
treated as mean independent of ๐ฅ๐ฅ1. โฆ I.e., simple regression has you estimating:
๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐๐; ๐๐ โก ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ฝ๐ฝ3๐ฅ๐ฅ3+. . . +๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐ + ๐ข๐ข.
Multiple OLS and simple OLS (continued)To demonstrate why this could be a bad strategy, consider the ice cream-crime problem again, assuming that the true relationship follows:
๐๐๐๐๐๐๐๐๐๐ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ + ๐ฝ๐ฝ2๐ก๐ก๐๐๐๐๐ก๐ก๐๐๐๐๐๐๐ก๐ก๐ข๐ข๐๐๐๐ + ๐ข๐ข;๐ธ๐ธ ๐ข๐ข ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐, ๐ก๐ก๐๐๐๐๐ก๐ก๐๐๐๐๐๐๐ก๐ก๐ข๐ข๐๐๐๐ = 0.
My informal theory states that ๐ฝ๐ฝ1 = 0 and ๐ฝ๐ฝ2 > 0.
Multiple OLS and simple OLS (continued)Estimating the simple regression between ice cream and crime, was as if the model was transformed:
๐๐๐๐๐๐๐๐๐๐ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ + ๐๐; ๐๐ = ๐ฝ๐ฝ2๐ก๐ก๐๐๐๐๐ก๐ก๐๐๐๐๐๐๐ก๐ก๐ข๐ข๐๐๐๐ + ๐ข๐ข,
which I estimated under the assumption that: ๐ธ๐ธ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ = 0.
However this likely a bad assumption, since it is probable that: ๐ธ๐ธ(๐ก๐ก๐๐๐๐๐ก๐ก๐๐๐๐๐๐๐ก๐ก๐ข๐ข๐๐๐๐|๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐) โ 0.
Multiple OLS and simple OLS (concluded)High (low) values of icecream correspond with high (low) values of temperature.
The assumptions underlying simple regression analysis state that when the conditional mean independence is violated, bias is introduced in OLS.โฆ This bias would explain the positive estimate for ๐ฝ๐ฝ1 shown above.
We will examine the source of the bias more closely and how to estimate its direction later in this chapter.
First we turn our attention back to the technical aspects of estimating the OLS parameters with multiple regressors.
Results, controlling for temperatureNow the coefficient on ice cream is much smaller (closer to 0) and the effect of temperature is large and significant.โฆ Itโs not precisely zero, but we can no longer reject the null that it has zero effect on crime.
Dep. Var.: Crimes Coefficient ( ๏ฟฝ๐ท๐ท ) Std. Err. tIce Cream 0.4760 .4477 1.06Temperature 1.9492 .4198 4.64
N=82; ๐ ๐ 2=0.3231
Mechanics and interpretation of OLSEven with ๐๐ > 1 regressors, OLS minimizes the sum of the squares of the residuals.
With ๐๐ = 2 regressors:๏ฟฝ๐ข๐ข๐๐ = ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2 โ ๏ฟฝ๐ข๐ข๐๐2 = ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2
2.
๐๐๐ข๐ข๐๐ ๐๐๐๐ ๐๐๐๐๐ข๐ข๐๐๐๐๐๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐๐๐๐๐ข๐ข๐๐๐ ๐ ๐ ๐ (๐๐๐๐๐ ๐ ) = ๏ฟฝ๐๐=1
๐๐
๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐22
Note is the change in subscripting: each observation is indexed by โiโ as before, but the two โxโ variables are now distinguished from one another by a โ1โ and a โ2โ in the subscript.
Minimizing SSRFor ๐๐ > 1 regressors:
๐๐๐๐๐ ๐ = ๏ฟฝ๐๐=1
๐๐
๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐2 .
Once again minimization is accomplished by differentiating the above line with respect to each of the ๐๐ + 1 statistics,โฆ ๏ฟฝฬ๏ฟฝ๐ฝ0 and โฆ all the (k) slope parameters.
The first order conditions for the minimum are:๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ0
= 0 and๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ๐๐
= 0 โ ๐๐ โ 1,2, . . . ๐๐ .
Minimizing SSR (continued)โSimultaneously choose all the โbetasโ to make the regression model fit as closely as possible to all the data points.โ
The partial derivatives for the problem look like this:
1๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ0
= โ2๏ฟฝ๐๐=1
๐๐
๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐
2๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ1
= โ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐
. . . ๐๐ + 1๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ๐๐
= โ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐๐๐ ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ .
Example with ๐๐ = 2As a representative (and comparatively easy to solve) example, consider the solution when ๐๐ =2.
Setting (1) equal to zero (FOC) and solving for ๏ฟฝฬ๏ฟฝ๐ฝ0:
2๐๐๏ฟฝฬ๏ฟฝ๐ฝ0 โ 2๏ฟฝ๐๐=1
๐๐
๐ฆ๐ฆ๐๐ + 2๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 + 2๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2 = 0
4 ๏ฟฝฬ๏ฟฝ๐ฝ0 =1๐๐๏ฟฝ๐๐=1
๐๐
๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2 = ๏ฟฝ๐ฆ๐ฆ โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝฬ ๏ฟฝ๐ฅ2
Example with ๐๐ = 2 (continued)Doing the same thing with (2) and (3) gives you:
2๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ1
= 0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
3๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ2
= 0 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐22 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
Example with ๐๐ = 2 (continued)Substituting (4):
2 ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฆ๐ฆ๐๐ โ ๏ฟฝ๐ฆ๐ฆ โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝฬ ๏ฟฝ๐ฅ2 ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
3 ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐22 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐ โ ๏ฟฝ๐ฆ๐ฆ โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝฬ ๏ฟฝ๐ฅ2 ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ2 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
Example with ๐๐ = 2 (continued)Solving (3) for ๏ฟฝฬ๏ฟฝ๐ฝ2:
(3) โ ๏ฟฝฬ๏ฟฝ๐ฝ2 ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐22 โ ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ22 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐ โ ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ2 ๏ฟฝ๐ฆ๐ฆ + ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ2๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
โ ๏ฟฝฬ๏ฟฝ๐ฝ2 = ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐22 โ ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ22โ1
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐ โ ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ2 ๏ฟฝ๐ฆ๐ฆ + ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ2๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐2
โ ๏ฟฝฬ๏ฟฝ๐ฝ2 = ๐๐๐๐๐๐ ๐ฅ๐ฅ2โ1 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ โ ๏ฟฝฬ๏ฟฝ๐ฝ1 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
Example with ๐๐ = 2 (continued)Solve it simultaneously with (2) by substituting it in:
2 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 = ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ + ๏ฟฝฬ๏ฟฝ๐ฝ1๐๐๏ฟฝฬ ๏ฟฝ๐ฅ12 โ ๏ฟฝฬ๏ฟฝ๐ฝ2 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 ; now
๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 = ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ + ๏ฟฝฬ๏ฟฝ๐ฝ1๐๐๏ฟฝฬ ๏ฟฝ๐ฅ12 โ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 โ ๏ฟฝฬ๏ฟฝ๐ฝ1 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
2
๐๐๐๐๐๐ ๐ฅ๐ฅ2
Example with ๐๐ = 2 (continued)Collect all the ๏ฟฝฬ๏ฟฝ๐ฝ1 terms.
๏ฟฝฬ๏ฟฝ๐ฝ1 ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 โ ๐๐๏ฟฝฬ ๏ฟฝ๐ฅ12 โ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
2
๐๐๐๐๐๐ ๐ฅ๐ฅ2= ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ โ
๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2๐๐๐๐๐๐ ๐ฅ๐ฅ2
โ ๏ฟฝฬ๏ฟฝ๐ฝ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ1 โ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
2
๐๐๐๐๐๐ ๐ฅ๐ฅ2= ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ โ
๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2๐๐๐๐๐๐ ๐ฅ๐ฅ2
โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
2
๐๐๐๐๐๐ ๐ฅ๐ฅ2=๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
๐๐๐๐๐๐ ๐ฅ๐ฅ2
Example with ๐๐ = 2 (concluded)Simplifying gives you:
๏ฟฝฬ๏ฟฝ๐ฝ1 =๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ22
Then you can solve for ๏ฟฝฬ๏ฟฝ๐ฝ2 which, after a lot of simplification, is:
๏ฟฝฬ๏ฟฝ๐ฝ2 =๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ ๐๐๐๐๐๐ ๐ฅ๐ฅ1 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ22 .
Lastly, substitute the above expressions into the solution for the intercept:๏ฟฝฬ๏ฟฝ๐ฝ0 = ๏ฟฝ๐ฆ๐ฆ โ ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝฬ ๏ฟฝ๐ฅ1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๏ฟฝฬ ๏ฟฝ๐ฅ2.
Multiple OLS and variance/covarianceExamine the solutions closely.
They depend, as with simple regression, only on the variances and covariances of the regressors and their covariances with ๐ฆ๐ฆ.โฆ This is true generally of OLS, even when there are many explanatory variables in the regression.
As this number grows, even to 2, the solution becomes difficult to work out with algebra, but software (like STATA) is very good at performing the calculations and solving for the ๐๐ + 1estimates, even when ๐๐ is quite large.โฆ How software works out the solution: matrix algebra.
Multiple regression and โpartialling outโThe โPartialling Outโ Interpretation of Multiple Regression is revealed by the matrix and non-matrix estimate of ๏ฟฝฬ๏ฟฝ๐ฝ1.โฆ What goes into ๏ฟฝฬ๏ฟฝ๐ฝ1 in a multiple regression is the variation in ๐ฅ๐ฅ1 that cannot be โexplainedโ by its
relation to the other ๐ฅ๐ฅ variables.โฆ The covariance between this residual variation in ๐ฅ๐ฅ1, not explained by other regressors, and ๐ฆ๐ฆ is what
matters.
It also provides a good way to think of ๏ฟฝฬ๏ฟฝ๐ฝ1 as the partial effect of ๐ฅ๐ฅ1, holding other factors fixed.
It is estimated using variation in ๐ฅ๐ฅ1 that is independent of variation in other regressors.
More.
Expected value of the OLS estimatorsNow we resume the discussion of a misspecified OLS regression, e.g., crime on ice cream, by considering the assumptions under which OLS yields unbiased estimates and how misspecifyinga regression can produce biased estimates.โฆ Also what is the direction of the bias?
There is a population model that is linear in parameters.
Assumption MLR.1 states this model which represents the true relationship between ๐ฆ๐ฆ and all ๐ฅ๐ฅ.
I.e.,
๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ฝ๐ฝ3๐ฅ๐ฅ3+. . . +๐ฝ๐ฝ๐๐๐ฅ๐ฅ๐๐ + ๐ข๐ข.
OLS under assumptions MLR 1-4The sample of ๐๐ observations is assumed random and indexed by ๐๐ (MLR.2).
From simple regression, we know that there must be variation in ๐ฅ๐ฅ for an estimate to exist.
With multiple regression, each regressor must have (at least some) variation that is not explained by the other regressors. โฆ The other regressors cannot โpartial outโ all of the variation in, say, ๐ฅ๐ฅ1 while still estimating ๐ฝ๐ฝ1. โฆ Perfect multicollinearity is ruled out (by Assumption MLR.3).
When you have a set of regressors that violates this assumption, one of them must be removed from the regression. โฆ Software will usually do this automatically for you.
OLS under assumptions MLR 1-4 (continued)Finally multiple regression assumes a less restrictive version of mean independence between regressors and error term.
Assumption MLR.4 states that the error term has zero conditional mean, i.e., conditional on all regressors.
๐ธ๐ธ ๐ข๐ข ๐ฅ๐ฅ1, ๐ฅ๐ฅ2, . . . , ๐ฅ๐ฅ๐๐ = 0.
This can fail if the estimated regression is misspecified, in terms of the functional form of one of the variables or the omission of a relevant regressor.
The model contains exogenous regressors, with the alternative being an endogenous regressor (explanatory variable).โฆ This is when some ๐ฅ๐ฅ๐๐ is correlated with an omitted variable in the error term.
UnbiasednessUnder the four assumptions above, the OLS estimates are unbiased, i.e.,
๐ธ๐ธ ๏ฟฝฬ๏ฟฝ๐ฝ๐๐ = ๐ฝ๐ฝ๐๐โ ๐๐.
This includes cases in which variables have no effect on ๐ฆ๐ฆ and in which ๐ฝ๐ฝ๐๐ = 0.โฆ This is the โoverspecifiedโ case in the text, in which an irrelevant variable is included in the regression. โฆ Though this does not affect the unbiased-ness of OLS, it does impact the variance of the estimatesโ
sometimes in undesirable ways we will study later.
Expected value and variance of OLS estimators: under MLR assumptions 1-4Unbiasedness does not apply when you exclude a variable from the population model when estimating it.โฆ This is called underspecifying the model.โฆ It occurs when, because of an empiricistโs mistake or failure to observe a variable in data, a relevant
regressor is excluded from the estimation.โฆ For the sake of concreteness, consider an example of each.
1. An empiricist regresses crime on ice cream because he has an axe to grind about ice cream or something like that, and he wants to show that it causes crime.
2. An empiricist regresses earnings on education because he cannot observe other determinants of workersโ productivity (โabilityโ) in data.
Bias from under-specificationSo in both cases, there is a population (โtrueโ) model that includes two variables:
๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ข๐ข,
but the empiricist estimates a version that excludes ๐ฅ๐ฅ2, generating estimates different from unbiased OLSโwhich are denoted with the โtildeโ on the next line.
๏ฟฝ๐ฆ๐ฆ = ๏ฟฝ๐ฝ๐ฝ0 + ๏ฟฝ๐ฝ๐ฝ1๐ฅ๐ฅ1; ๐ฅ๐ฅ2 is omitted from the estimation. So,
๏ฟฝ๐ฝ๐ฝ1 =โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝ๐ฆ๐ฆ
โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2 =โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 ๐ฆ๐ฆ๐๐โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2 .
Omitted variable biasIf the true relationship includes ๐ฅ๐ฅ2, however, the bias in this estimate is revealed by substituting the population model for ๐ฆ๐ฆ๐๐.
๏ฟฝ๐ฝ๐ฝ1 =โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ๐๐1 + ๐ฝ๐ฝ2๐ฅ๐ฅ๐๐2 + ๐ข๐ข๐๐
โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2
Simplify the numerator:
๐ฝ๐ฝ0๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 + ๐ฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 + ๐ฝ๐ฝ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2 ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 + ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 ๐ข๐ข๐๐ .
Then take expectations:
โ ๐ธ๐ธ ๏ฟฝ๐ฝ๐ฝ1 =0 + ๐ฝ๐ฝ1 โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2 + ๐ฝ๐ฝ2๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 + 0
โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2 ,
since the regressors are uncorrelated with the error from the correctly specified model.
Omitted variable bias (concluded)A little more simplifying gives:
โ ๐ธ๐ธ ๏ฟฝ๐ฝ๐ฝ1 = ๐ฝ๐ฝ1 + ๐ฝ๐ฝ2๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2
โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ ๏ฟฝ๐ฅ1 2 .
The estimator using the misspecified model is equal to the true parameter (๐ฝ๐ฝ1) plus a second term that captures the bias.
This bias is non-zero unless one of two things is true:
1. ๐ฝ๐ฝ2 is zero. There is no bias from excluding an irrelevant variable.
2. ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 is zero. This means that ๐ฅ๐ฅ1 is independent of the error term, varying it does not induce variation in an omitted variable.
Direction of omitted variable biasThe textbook calls the fraction in the bias term, ๐ฟ๐ฟ1,โฆ the regression coefficient you would get if you regressed ๐ฅ๐ฅ2 on ๐ฅ๐ฅ1.
๐ฟ๐ฟ1 โก๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2๐๐๐๐๐๐ ๐ฅ๐ฅ1
This term, along with ๐ฝ๐ฝ2, helps predict the direction of the bias.
Specifically, if ๐ฟ๐ฟ1and ๐ฝ๐ฝ2 are the same sign (+/-), the bias is positive, and if they are opposite signs, the bias is negative.
1. ๐ฟ๐ฟ1,๐ฝ๐ฝ2 same sign โ ๐ธ๐ธ ๏ฟฝ๐ฝ๐ฝ1 > ๐ฝ๐ฝ1 "upward bias"
2. ๐ฟ๐ฟ1,๐ฝ๐ฝ2 opposite sign โ ๐ธ๐ธ ๏ฟฝ๐ฝ๐ฝ1 < ๐ฝ๐ฝ1 "downward bias"
Omitted variable bias (concluded)One more way to think about bias: the ceteris paribus effect of ๐ฅ๐ฅ1 on ๐ฆ๐ฆ.
๏ฟฝ๐ฝ๐ฝ1 estimates this effect, but it does so by lumping the true effect in with an unknown quantity of misleading effects, e.g., you take the population model,
๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ข๐ข, and estimate ๐ฆ๐ฆ = ๐ฝ๐ฝ0 + ๐ฝ๐ฝ1๐ฅ๐ฅ1 + ๐ถ๐ถ;๐ถ๐ถ โก ๐ฝ๐ฝ2๐ฅ๐ฅ2 + ๐ข๐ข. So,
๐๐๐ธ๐ธ(๐ฆ๐ฆ|๐ฅ๐ฅ)๐๐๐ฅ๐ฅ1
= ๐ฝ๐ฝ1 +๐๐๐ถ๐ถ๐๐๐ฅ๐ฅ1
= ๐ฝ๐ฝ1 + ๐ฝ๐ฝ2๐๐๐ธ๐ธ(๐ฅ๐ฅ2|๐ฅ๐ฅ1)
๐๐๐ฅ๐ฅ1.
The partial derivative of ๐ฅ๐ฅ2 with respect to ๐ฅ๐ฅ1 is the output you would get if you estimated a regression of ๐ฅ๐ฅ2 on ๐ฅ๐ฅ1, i.e., ๐ฟ๐ฟ1.
Standard error of the OLS estimatorWhy is variance of an estimator (โstandard errorโ) so important? โฆ Inference: next chapter.
Generating and reporting only point estimates is irresponsible.โฆ It leads your (especially an uninitiated) audience to an overly specific conclusion about your results. โฆ This is bad for you (empiricist) and them (policy makers, customers, other scholars) because they may
base decisions on your findings, e.g., choosing a marginal income tax rate depends crucially on labor supply elasticity.
A point estimate outside the context of its standard error may prompt your audience to rash decisions that they would not make if they knew your estimates were not โpinpointโ accurate.โฆ This is bad for you because if they do the โwrongโ thing with your results, they will blame you for the
policyโs failure.โฆ And empiricists, as a group, will have their credibility diminished a little, as well.
Variance of OLS estimatorsHere we show that the standard error of the ๐๐๐ก๐ก๐ก estimate, ๏ฟฝฬ๏ฟฝ๐ฝ๐๐ , is:
๐ ๐ ๐๐ ๏ฟฝฬ๏ฟฝ๐ฝ๐๐ โก ๐๐๐๐๐๐ ๏ฟฝฬ๏ฟฝ๐ฝ๐๐12 =
โ๐๐=1๐๐ ๏ฟฝ๐ข๐ข๐๐2
๐๐ โ ๐๐ โ 1 1 โ ๐ ๐ ๐๐2 โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๏ฟฝฬ ๏ฟฝ๐ฅ๐๐2 , where
๏ฟฝ๐ข๐ข๐๐ are the residuals from the regression,
๐๐ is the number of regressors, and
๐ ๐ ๐๐2 is the proportion of ๐ฅ๐ฅ๐๐โฒ๐ ๐ variance explained by the other ๐ฅ๐ฅ variables.
Once again the estimate of the variance assumes homoskedasticity, i.e.,๐๐๐๐๐๐ ๐ข๐ข ๐๐๐ ๐ ๐ ๐ ๐ฅ๐ฅ๐๐ = ๐๐๐๐๐๐(๐ข๐ข) = ๐๐2.
Variance of OLS estimators (continued)Most of the components of the variance are familiar from simple regression.โฆ Subscripts (๐๐) and that the degrees of freedom (๐๐ โ ๐๐ โ 1 instead of ๐๐ โ 1 โ 1) have been generalized
for ๐๐ regressors.
The term, (1 โ ๐ ๐ ๐๐2), is new, reflecting the consequence of (imperfect) multicollinearity.
Perfect multicollinearity has already been ruled out by Assumption MLR.3, but the standard error of a multicollinear variable (๐ ๐ ๐๐2 โ 1) is very large because the denominator of the expression above will be very small ((1 โ ๐ ๐ ๐๐2) โ 0).
Variance of OLS estimators (continued)Once again the variance of the errors is estimated without bias, i.e., (๐ธ๐ธ ๏ฟฝ๐๐2 = ๐๐2), by the (degrees of freedom-adjusted) mean squared error:
๐๐๐๐๐ ๐ โก๏ฟฝ๐๐=1
๐๐
๏ฟฝ๐ข๐ข๐๐2 ; ๏ฟฝ๐๐2 =๐๐๐๐๐ ๐
๐๐ โ ๐๐ โ 1, so ๐๐๐๐๐๐ ๏ฟฝฬ๏ฟฝ๐ฝ๐๐ =
๏ฟฝ๐๐2
1 โ ๐ ๐ ๐๐2 โ๐๐=1๐๐ ๐ฅ๐ฅ๐๐๐๐ โ ๏ฟฝฬ ๏ฟฝ๐ฅ๐๐2 ,
Where the residuals are: ๏ฟฝ๐ข๐ข๐๐ โก ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ .
Variance of OLS estimators (continued)Observations about the standard error.โฆ When adding a (relevant) regressor to a model, the standard error of an existing regressor can increase
or decrease.โฆ Generally the sum of squares of residuals decreases, since more information is being used to fit the
model. But the degrees of freedom decreases as well.โฆ The standard error depends on Assumption MLR.5 (homoskedasticity). โฆ If it is violated, ๏ฟฝฬ๏ฟฝ๐ฝ๐๐ is still unbiased, however, the above expression becomes a biased estimate of the
variance of ๏ฟฝฬ๏ฟฝ๐ฝ๐๐. โฆ The 8th chapter in the text is devoted to dealing with the problem of heteroskedasticity in regression.
Gauss-Markov Theorem and BLUEIf, however, Assumptions MLR 1-5 hold, the OLS estimators (๏ฟฝฬ๏ฟฝ๐ฝ0through ๏ฟฝฬ๏ฟฝ๐ฝ๐๐) have minimal variance among the class of linear unbiased estimators, โฆ i.e., it is the โbestโ in the sense of minimal variance.
OLS is โBLUEโ, which stands for Best Linear Unbiased Estimator.
The 5 Assumptions that lead to this conclusion (MLR 1-5) are collectively known as the Gauss-Markov assumptions.
The BLUE-ness of the OLS estimators under those assumptions is known as the Gauss-Markov Theorem.
ConclusionMultiple regression solves the problem of omitted variables that are correlated with the dependent variable and regressor.
It is estimated in a manner analogous to simple OLS and has similar properties:โฆ Unbiasedness under MLR 1-4,โฆ BLUE under MLR 1-5.
Its estimates are interpreted as partial effects of changing one regressor and holding the others constant.
The estimates have standard errors that can be used to generate confidence intervals and test hypotheses about the parametersโ values.
Data on crime and ice creamThe unit of observation is daily, i.e., these are 82 days on which ice cream search volume and crime reports have been counted in the City of Milwaukee.
All the variables have had the โday of the weekโ-specific mean subtracted from them, because all of them fluctuate across the calendar week, and this avoids creating another spurious correlation through the days of the week.
Back.
OLS estimates using matricesUnderstanding what software is doing when it puts out regression estimates is one reason for the following exercise.
Another reason is to demonstrate the treatment of OLS you will experience in graduate-level econometrics classesโone in which everything is espoused using matrix algebra.
Imagine the variables in your regression as a spreadsheet, i.e., with ๐ฆ๐ฆ in column 1, ๐ฅ๐ฅ1 in column 2, and each row is a different observation.
With this image in mind, divide the spreadsheet into two matrices,* on which operations may be performed as they are on numbers (โscalarsโ in matrix jargon).
*โA rectangular array of numbers enclosed in parentheses. It is conventionally denoted by a capital letter.โ
OLS estimates using matrices (continued)The dimensions of a matrix are expressed: ๐๐๐๐๐๐๐ ๐ ร ๐๐๐๐๐ ๐ ๐ข๐ข๐๐๐๐๐ ๐ .โฆ I.e., ๐๐ observations (rows) of a variable ๐ฆ๐ฆ, would be an ๐๐ ร 1 matrix (โYโ); ๐๐ observations (rows) of ๐๐
explanatory variables ๐ฅ๐ฅ, would be an ๐๐ ร ๐๐ matrix (โXโ).
One operation that can be performed on a matrix is called transposition, denoted by an apostrophe behind the matrixโs name.โฆ Transposing a matrix just means exchanging its rows and columns. E.g.,
๐๐ โก 5 2 13 0 1 โ ๐๐โฒ =
5 32 01 1
.
Matrix multiplicationAnother operation that can be performed on some pairs of matrices is multiplication, but only if they are conformable.
โTwo matrices A and B of dimensions ๐๐ ร ๐๐ and ๐๐ ร ๐๐ respectively are conformable to form the product matrix AB, since the number of columns in A is equal to the number of rows in B. The product matrix AB is of dimension ๐๐ ร ๐๐, and its ๐๐๐๐๐ก๐ก๐ก element, ๐๐๐๐๐๐, is obtained by multiplying the elements of the ๐๐๐ก๐ก๐ก row of A by the corresponding elements of the ๐๐๐ก๐ก๐ก column of B and adding the resulting products.โ
OLS estimates using matrices (continued)Transposition and multiplication are useful in econometrics for obtaining a matrix of covariances between ๐ฆ๐ฆ and all the ๐ฅ๐ฅ, as well as among the ๐ฅ๐ฅ variables.
Specifically when you multiply Y by the transpose of X, you get a matrix (๐๐ ร 1) with elements,
๐๐โฒ๐๐ =
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฆ๐ฆ๐๐
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐
โฎ
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐2๐ฆ๐ฆ๐๐
=
๐ ๐ ๐ฆ๐ฆ1๐ ๐ ๐ฆ๐ฆ2โฎ๐ ๐ ๐ฆ๐ฆ๐๐
;
x and y are expressed as deviations from their means.
OLS estimates using matrices (continued)Similarly multiplying X by its own transpose gives you a matrix (๐๐ ร ๐๐) with elements,
๐๐โฒ๐๐ =
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐12 โฆ ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐๐๐
โฎ โฑ โฎ
๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1๐ฅ๐ฅ๐๐๐๐ โฆ ๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐๐๐2
=๐ ๐ 11 โฆ ๐ ๐ 1๐๐โฎ โฑ โฎ๐ ๐ ๐๐1 โฆ ๐ ๐ ๐๐๐๐
.
OLS estimates using matrices (continued)XโX and XโY are matrices of the variances of the explanatory variables and covariances among all variables. They can be combined to obtain the matrix (๐๐ ร 1) of coefficient estimates. โฆ This is the part for which computers are really indispensable.
First of all, you invert the matrix ๐๐๐๐๐. This means solving for the values that make the following hold:
๐๐โฒ๐๐ ๐๐โฒ๐๐ โ1 = ๐ผ๐ผ โ๐ ๐ 11 โฆ ๐ ๐ 1๐๐โฎ โฑ โฎ๐ ๐ ๐๐1 โฆ ๐ ๐ ๐๐๐๐
๐๐11 โฆ ๐๐1๐๐โฎ โฑ โฎ๐๐๐๐1 โฆ ๐๐๐๐๐๐
=1 โฆ 0โฎ โฑ โฎ0 โฆ 1
.
OLS estimates using matrices (continued)To illustrate using the case of ๐๐ = 2, you have to solve for all the โaโ terms using simultaneous equations.
๐ ๐ 11 ๐ ๐ 12๐ ๐ 21 ๐ ๐ 22
๐๐11 ๐๐12๐๐21 ๐๐22 = 1 0
0 1 , so๐๐11๐ ๐ 11 + ๐๐21๐ ๐ 12 = 1,๐๐11๐ ๐ 21 + ๐๐21๐ ๐ 22 = 0,
๐๐12๐ ๐ 11 + ๐๐22๐ ๐ 12 = 0, and๐๐12๐ ๐ 21 + ๐๐22๐ ๐ 22 = 1.
OLS estimates using matrices (continued)Solving the first two simultaneously gives you:
๐๐11 =๐ ๐ 22
๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21and ๐๐21 =
๐ ๐ 21๐ ๐ 12๐ ๐ 21 โ ๐ ๐ 11๐ ๐ 22
.
Similarly the last to give you:๐๐12 =
๐ ๐ 12๐ ๐ 12๐ ๐ 21 โ ๐ ๐ 11๐ ๐ 22
and ๐๐22 =๐ ๐ 11
๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21.
To get the coefficient estimates, you do the matrix equivalent of dividing covariance by variance, i.e., you solve for the vector B:
ฮ = ๐๐โฒ๐๐ โ1 ๐๐โฒ๐๐ .
OLS estimates using matrices (continued)For the ๐๐ = 2 case,
ฮ =
๐ ๐ 22๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21
๐ ๐ 12๐ ๐ 12๐ ๐ 21 โ ๐ ๐ 11๐ ๐ 22
๐ ๐ 21๐ ๐ 12๐ ๐ 21 โ ๐ ๐ 11๐ ๐ 22
๐ ๐ 11๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21
๐ ๐ ๐ฆ๐ฆ1๐ ๐ ๐ฆ๐ฆ2 .
Multiplying this out gives the same solution as minimizing the Sum of Squared Residuals.
ฮ =
๐ ๐ 22๐ ๐ ๐ฆ๐ฆ1 โ ๐ ๐ 12๐ ๐ ๐ฆ๐ฆ2๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21๐ ๐ 11๐ ๐ ๐ฆ๐ฆ2 โ ๐ ๐ 21๐ ๐ ๐ฆ๐ฆ1๐ ๐ 11๐ ๐ 22 โ ๐ ๐ 12๐ ๐ 21
=
๐๐๐๐๐๐ ๐ฅ๐ฅ2 ๐ถ๐ถ๐๐๐ถ๐ถ(๐ฅ๐ฅ1,๐ฆ๐ฆ) โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 2
๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ2,๐ฆ๐ฆ โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1,๐ฆ๐ฆ๐๐๐๐๐๐ ๐ฅ๐ฅ1 ๐๐๐๐๐๐ ๐ฅ๐ฅ2 โ ๐ถ๐ถ๐๐๐ถ๐ถ ๐ฅ๐ฅ1, ๐ฅ๐ฅ2 2
OLS estimates using matrices (concluded)Solving OLS for ๐๐ > 2 is not fundamentally different, and the solution will always be a function of the variances and covariances of the variables in the model.โฆ It just gets more difficult (at an accelerating rate) to write down solutions for the individual coefficients. โฆ It is left as an exercise to show that the โpartialling outโ interpretation of OLS holds.
ฮ = ๐๐โฒ๐๐ โ1 ๐๐โฒ๐๐ =
โ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐1๐ฆ๐ฆ๐๐โ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐12
โฎโ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐๐๐๐ฆ๐ฆ๐๐โ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐12
,
in which ๏ฟฝฬ๏ฟฝ๐๐๐๐๐ is the residual for observation ๐๐ from a regression of ๐ฅ๐ฅ๐๐on all the other regressors.โฆ ๏ฟฝฬ๏ฟฝ๐๐๐1 โก ๐ฅ๐ฅ๐๐1 โ ๏ฟฝ๐ฅ๐ฅ๐๐1; ๏ฟฝ๐ฅ๐ฅ๐๐1 is the fitted value.
Back.
Partialling out interpretation of OLSThe expression for ๏ฟฝฬ๏ฟฝ๐ฝ1 can be derived from the first order condition for ๐ฅ๐ฅ1 from the least squares minimization.
This states that:๐๐๐๐๐๐๐ ๐ ๐๐๏ฟฝฬ๏ฟฝ๐ฝ1
= โ2๏ฟฝ๐๐=1
๐๐
๐ฅ๐ฅ๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ = 0.
Substituting in the definition of the residuals from the previous line, you have:
๏ฟฝ๐๐=1
๐๐
๏ฟฝ๐ฅ๐ฅ๐๐1 + ๏ฟฝฬ๏ฟฝ๐๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ = 0
Partialling out interpretation of OLS (continued)
โ๏ฟฝ๐๐=1
๐๐
๏ฟฝ๐ฅ๐ฅ๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ +
๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ = 0.
The term in brackets is the OLS residual from regressing ๐ฆ๐ฆ on ๐ฅ๐ฅ1. . . ๐ฅ๐ฅ๐๐ , โ๏ฟฝ๐ข๐ข๐๐โ, and ๏ฟฝ๐ฅ๐ฅ1 is a linear function of the other ๐ฅ๐ฅ variables, i.e.,
๏ฟฝ๐ฅ๐ฅ๐๐1 = ๏ฟฝ๐พ๐พ0 + ๏ฟฝ๐พ๐พ1๐ฅ๐ฅ๐๐2+. . . +๏ฟฝ๐พ๐พ๐๐๐ฅ๐ฅ๐๐๐๐ ,
. . .
Partialling out interpretation of OLS (continued). . . so one may write:
๏ฟฝ๐๐=1
๐๐
๏ฟฝ๐ฅ๐ฅ๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ = ๏ฟฝ๐๐=1
๐๐
๏ฟฝ๐พ๐พ0 + ๏ฟฝ๐พ๐พ1๐ฅ๐ฅ๐๐2+. . . +๏ฟฝ๐พ๐พ๐๐๐ฅ๐ฅ๐๐๐๐ ๏ฟฝ๐ข๐ข๐๐ .
Since the OLS residuals are uncorrelated with each ๐ฅ๐ฅ๐๐, and the sum of the residuals is zero, this entire line equals zero and drops out.
Partialling out interpretation of OLS (continued)Then you have:
๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐๐๐1 ๐ฆ๐ฆ๐๐ โ ๏ฟฝฬ๏ฟฝ๐ฝ0 โ ๏ฟฝฬ๏ฟฝ๐ฝ1๐ฅ๐ฅ๐๐1 โ ๏ฟฝฬ๏ฟฝ๐ฝ2๐ฅ๐ฅ๐๐2โ. . .โ๏ฟฝฬ๏ฟฝ๐ฝ๐๐๐ฅ๐ฅ๐๐๐๐ = 0, and
the residuals (๏ฟฝฬ๏ฟฝ๐1) are uncorrelated with the other regressors (2 through ๐๐) and sum to zero, as well. So,
๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐๐๐1๐ฆ๐ฆ๐๐ = ๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐ฝ1(๏ฟฝฬ๏ฟฝ๐๐๐12+๏ฟฝฬ๏ฟฝ๐๐๐1 ๏ฟฝ๐ฅ๐ฅ๐๐1) โ๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐๐๐1๐ฆ๐ฆ๐๐ = ๏ฟฝฬ๏ฟฝ๐ฝ1๏ฟฝ๐๐=1
๐๐
๏ฟฝฬ๏ฟฝ๐๐๐12 .
Partialling out interpretation of OLS (concluded)Then you get the โpartialling outโ expression:
โ ๏ฟฝฬ๏ฟฝ๐ฝ1 =โ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐1๐ฆ๐ฆ๐๐โ๐๐=1๐๐ ๏ฟฝฬ๏ฟฝ๐๐๐1
2 .
Back.