Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties...
Transcript of Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties...
![Page 1: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/1.jpg)
Properties of Simple Regression Model
February 6, 2011
![Page 2: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/2.jpg)
Outline
Terminology
Units and Functional Form
Mean of the OLS Estimate
Omitted Variable Bias
![Page 3: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/3.jpg)
Next we will address some properties of the regression model
Forget about the three different motivations for the model, noneare relevant for these properties
![Page 4: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/4.jpg)
Outline
Terminology
Units and Functional Form
Mean of the OLS Estimate
Omitted Variable Bias
![Page 5: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/5.jpg)
Consider the following terminology from Wooldridge
![Page 6: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/6.jpg)
Graphically the model is defined in the following way
Population Model
![Page 7: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/7.jpg)
Sample Model
![Page 8: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/8.jpg)
Also when we have the regression model
yi = β0 + β1xi + ui
we call this a regression of “y on x”
![Page 9: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/9.jpg)
Normal Equations
It must be the case that
0 =1N
N∑i=1
ui
=1N
N∑i=1
(yi − β0 − β1xi
)
0 =1N
N∑i=1
xi ui
=1N
N∑i=1
xi
(yi − β0 − β1xi
)
![Page 10: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/10.jpg)
This is not approximately true, this is exactly true
This really is not surprising
These are the equations we started with when we derived β0and β1
![Page 11: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/11.jpg)
Goodness of Fit
We want to say something about how well our model fits thedata
![Page 12: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/12.jpg)
We will make use of the following three things
Total Sum of Squares (SST)
SST =N∑
i=1
(yi − y)2
Explained Sum of Squares (SSE)
SSE =N∑
i=1
(yi − y)2
Residual Sum of Squares (SSR)
SSR =N∑
i=1
u2i
![Page 13: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/13.jpg)
It turns out that there is a nice relationship between theseconcepts
SST =N∑
i=1
(yi − y)2
=N∑
i=1
((yi − yi) + (yi − y))2
=N∑
i=1
(ui + (yi − y))2
=N∑
i=1
[u2
i + (yi − y)2 + 2ui (yi − y)]
![Page 14: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/14.jpg)
=N∑
i=1
u2i +
N∑i=1
(yi − y)2 +N∑
i=1
2ui (yi − y)
=SSR + SSE +N∑
i=1
2ui (yi − y)
![Page 15: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/15.jpg)
So this is really nice except for that last part. Lets deal with it
N∑i=1
2ui (yi − y) =2N∑
i=1
ui
({β0 + β1xi
}−{β0 + β1x
})
=2N∑
i=1
ui
(β1xi − β1x
)
=2β1
N∑i=1
uixi − 2β1xN∑
i=1
ui
=0
ThusSST = SSE + SSR
![Page 16: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/16.jpg)
This gives us a really nice way of describing the goodness of fitof the model
R2 =SSESST
= 1− SSRSST
![Page 17: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/17.jpg)
Lets look at examples of models that fit well versus those thatdon’t
![Page 18: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/18.jpg)
Outline
Terminology
Units and Functional Form
Mean of the OLS Estimate
Omitted Variable Bias
![Page 19: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/19.jpg)
Units
What happens when we change the units of measurement of avariable?
Lets think about the smoking example we looked at in theStatistics Review
Before we looked at whether someone smoked on Birthweight
Instead lets look at the number of cigarettes per day onbirthweight from the data set bwght
The units here are sort of arbitrary, we can look at at cigarettesper day and weight in ounces
We could have just as easily looked at packs per day andweight in pounds.
Hopefully this shouldn’t change the basic results
![Page 20: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/20.jpg)
Lets define the following four variables:
O Weight in OuncesL Weight in PoundsC Cigarettes per dayP Packs per day
Notice that
O =16LC =20P
![Page 21: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/21.jpg)
Next think about the following three Conditional Expectations:
E (O | C) =β10 + β1
1C
E (L | C) =β20 + β2
1C
E (O | P) =β30 + β3
1P
What is the relationship between these things?
![Page 22: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/22.jpg)
It should be the case that
β10 + β1
1C =E (O | C)
=E (16L | C)
=16E (L | C)
=16β20 + 16β2
1C
Thus it should be the case that
β10 =16β2
0
β11 =16β2
1
It turns out that this is exactly true when you run the regression
![Page 23: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/23.jpg)
Along Similar lines
β30 + β3
3p =E (O | P = p)
=E (O | C = 20p)
=β10 + β1
1 (20p)
Then it should be the case that
β30 = β1
0
β31 = 20β1
1
Once again it turns out that this is precisely true in the data.
![Page 24: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/24.jpg)
Functional Form
A really important assumption that we have made is that themodel is linear
However that is really not as strong as an assumption as youmight think because we can pick y and x to be whatever wewant
![Page 25: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/25.jpg)
For example consider the following figure
![Page 26: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/26.jpg)
This clearly isn’t linear, wages are growing exponentially witheducation
However, this is easy to fix if wages are growing exponentiallywith education, then the logarithm of wages are growinglinearly with wages
Rather than regressing wages on schooling we regress the logof wages on schooling
log(wagei) = β0 + β1Edi + ui
![Page 27: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/27.jpg)
Figuring out exactly how to do this involves playing with the data
Here are some specifications and what they are called
Lets look at a number of different examples
![Page 28: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/28.jpg)
Outline
Terminology
Units and Functional Form
Mean of the OLS Estimate
Omitted Variable Bias
![Page 29: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/29.jpg)
Classical Linear Regression
In this section I will follow section 2.5 of Wooldridge very closely
Our goal is to derive the mean and variance of the OLSestimator
In doing so we need to make some assumptions about thepopulation and the sample.
This set of assumptions is often referred to as the ClassicalLinear Regression Model
![Page 30: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/30.jpg)
First we need to define the basic model.
Assumption (SLR.1-Linear in Parameters)
In the population model, the dependent variable, y , is related tothe independent variable, x, and the error (or disturbance), u,as
y = β0 + β1x + u,
where β0 and β1 are the population intercept and slopeparameters, respectively.
Notice that in making this assumption we have really moved tothe “structural world.” That is we are really saying that this is theactual data generating process and our goal is to uncover thetrue parameters.
![Page 31: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/31.jpg)
Now we need to assume something about the sample
Assumption (SLR.2-Random Sampling)
We have a random sample of size n, (xi , yi), i = 1, ...,nfollowing the population model defined in SLR.1.
This now defines the basic environment.
![Page 32: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/32.jpg)
Next we need an assumption that allows us to estimate themodel
Assumption (SLR.3-Sample Variation in the ExplanatoryVariable)
The sample outcomes on x , namely, {xi , i = 1, ...,n} , are not allthe same value.
![Page 33: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/33.jpg)
Without this assumption we would have real trouble.
Practically the denominator for β1 is∑N
i=1 (xi − x)2 .
This would be zero if there is no variation in xi .
![Page 34: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/34.jpg)
Further suppose that the data looked like this:
How would you estimate the slope?
![Page 35: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/35.jpg)
Assumption (SLR.4-Zero Conditional Mean)
The error u has an expected value of zero given any value ofthe explanatory variable. In other words
E (u | x) = 0.
This will turn out to be the most important assumption forcausal work.
![Page 36: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/36.jpg)
Our goal now is to figure out
E(β0
)and
E(β1
)I am now going to be explicit about something that Wooldridgeis a bit loose about.
We need to use the law of iterated expectations which statesthat for random variables W and Z ,
E [E (Z |W )] = E [Z ]
This can kind of give you a headache to think about. Ratherthan trying to derive it and get into details, lets just see why it isuseful.
![Page 37: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/37.jpg)
This means that
E[E(β1 | x1, ..., xn
)]= E
(β1
)Which further means that if we can show that
E(β1 | x1, ..., xn
)= β1
thenE(β1
)= β1.
This is what we will do (and what Wooldridge does as well).
![Page 38: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/38.jpg)
First we will make use of the fact that
β1 =
∑Ni=1 (xi − x) (yi − y)∑N
i=1 (xi − x)2
=
∑Ni=1 (xi − x) (β0 + β1xi + ui − β0 − β1x − u)∑N
i=1 (xi − x)2
=
∑Ni=1 (xi − x) (β1xi − β1x)∑N
i=1 (xi − x)2 +
∑Ni=1 (xi − x) (ui − u)∑N
i=1 (xi − x)2
= β1
∑Ni=1 (xi − x) (xi − x)∑N
i=1 (xi − x)2 +
∑Ni=1 (xi − x) (ui − u)∑N
i=1 (xi − x)2
= β1 +
∑Ni=1 (xi − x) (ui − u)∑N
i=1 (xi − x)2
![Page 39: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/39.jpg)
Then notice that
E(β1 | x1, ..., xN
)=E
(β1 +
∑Ni=1 (xi − x) (ui − u)∑N
i=1 (xi − x)2 | x1, ..., xN
)
=β1 +
∑Ni=1 (xi − x) E (ui − u | x1, ..., xN)∑N
i=1 (xi − x)2
=β1
This means that β1 is UNBIASED
![Page 40: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/40.jpg)
This is a really nice property
Thinking about it, the only assumption that was really importantfor this was SLR.4
Lets go through a number of examples and think aboutcausality and think about whether we believe this assumption
![Page 41: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/41.jpg)
Outline
Terminology
Units and Functional Form
Mean of the OLS Estimate
Omitted Variable Bias
![Page 42: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/42.jpg)
Omitted Variable Bias
In general the problem is that there is some other variable outthere that affects y other than x .
(The material I am discussing here is covered in Wooldridge inChapter 3 rather than Chapter 2)
To see why this is a problem suppose that in reality theunobserved variable depends on two things x and z so that it isstill true that
y = β0 + β1x + u,
but now suppose that
u = β2z + ε
and further we are worried that x and z are related.
![Page 43: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/43.jpg)
Lets think of this as determined by the model
z = δ1x + ξ
Now make the following assumptions about these new errorterms
E (ε | x) = E (ξ | x) = 0.
Does this satisfy the assumptoins of the classical linearregression model?
We are assuming all of them except SLR.4
![Page 44: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/44.jpg)
For that one notice that
E (u | x) = E (β2z + ε | x)
= β2E (z | x) + E (ε | x)
= β2E (δ1x + ξ | x)
= β2δ1x
![Page 45: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/45.jpg)
Lets think about the bias of OLS again
E(β1 | x1, ..., xn
)=E
(β1 +
∑ni=1 (xi − x) ui∑ni=1 (xi − x)2 | x1, ..., xn
)
=β1 +
∑ni=1 (xi − x) E (ui | x1, ..., xn)∑n
i=1 (xi − x)2
=β1 +
∑ni=1 (xi − x)β2δ1xi∑n
i=1 (xi − x)2
=β1 + β2δ1
∑ni=1 (xi − x) xi∑ni=1 (xi − x)2
=β1 + β2δ1
![Page 46: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/46.jpg)
Lets think about the sign of the bias
β2 δ1 Bias+ + ++ - -- + -- - +
![Page 47: Properties of Simple Regression Modelctaber/410/sreg2.pdf · Next we will address some properties of the regression model Forget about the three different motivations for the model,](https://reader034.fdocuments.net/reader034/viewer/2022042322/5f0c43c27e708231d4348ac6/html5/thumbnails/47.jpg)
What is the best solution for omitted variable bias?
Don’t omit the variable.
That is the motivation for the Multiple Regression model that wewill consider next.