2.5 Variances of the OLS Estimators -We have proven that the sampling distribution of OLS estimates...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 2.5 Variances of the OLS Estimators -We have proven that the sampling distribution of OLS estimates...
2.5 Variances of the OLS Estimators-We have proven that the sampling
distribution of OLS estimates (B0hat and B1hat) are centered around the true value
-How FAR are they distributed around the true values?
-The best estimator will be most narrowly distributed around the true values
-To determine the best estimator, we must calculate OLS variance or standard deviation
-recall that standard deviation is the square root of the variance.
Gauss-Markov Assumption SLR.5(Homoskedasticity)
The error u has the same variance given any value of the explanatory variable. In other words,
2x)|ar(u V
Gauss-Markov Assumption SLR.5(Homoskedasticity)
-While variance can be calculated via assumptions SLR.1 to SLR.4, this is very complicated
-The traditional simplifying assumption is homoskedasticity, or that the unobservable error, u, has a CONSTANT VARIANCE
-Note that SLR.5 has no impact on unbiasness-SLR.5 simply simplifies variance calculations
and gives OLS certain efficiency properties
Gauss-Markov Assumption SLR.5(Homoskedasticity)
-While assuming x and u are independent will also simplify matters, independence is too strong an assumption
-Note that:
)|(E
0)|(E
)]|([)|(x)|ar(u
22
2
22
xu
xu
xuExuEV
Gauss-Markov Assumption SLR.5(Homoskedasticity)
-if the variance is constant given x, it is always constant
-Therefore:
)()(E 22 uVaru -σ2 is also called the ERROR VARIANCE
or DISTURBANCE VARIANCE
Gauss-Markov Assumption SLR.5(Homoskedasticity)
-SLR.4 and SLR.5 can be rewritten as conditions of y (using the fact we expect the error to be zero and y only varies due to the error)
(2.56) )|(
(2.55) )|(2
10
xyVar
xxyE
Heteroskedastistic Example• Consider the following model:
(ie) u065.0130 i ii incomeweight
-here it is assumed that weight is a function of income
-SLR.5 requires that:
(ie) )|( 2incomeweightVar i
-but income affects weight range-rich people can both afford fatty foods and expensive weight loss
-HETEROSKEDATICITY is present
Theorem 2.2(Sampling Variances of the OLS
Estimators)Using assumptions SLR.1 through
SLR.5,(2.57)
)()ˆar(
2
2
2
1xi SSTxx
V
Where these are conditional on sample values {x1,….,xn}
(2.58) )(
)ˆar(2
22
0
xx
xnV
i
i
Theorem 2.2 ProofFrom OLS’s unbiasedness we know,
(2.52) ud)/1(ˆii11 xSST
-we know that B1 is a constant and SST and d are non-random, therefore:
x
xx
ix
iix
iix
SST
SSTSST
dSST
uVardSST
udVarSSTV
/
)/1(
)/1(
)()/1(
)()/1()ˆar(
2
22
222
22
21
Theorem 2.2 Notes-(2.57) and (2.58) are “standard” ols variances which are NOT valid if heteroskedasticity is present1) Larger error variances increase B1’s variance
-more variance in y’s determinants make B1 harder to estimate2) Larger x variances decrease B1’s variance
-more variance in x makes OLS easier to accurately plot3) Bigger sample size increases x’s variation
-therefore bigger sample size decreases B1’s variation
Theorem 2.2 Notes-to get the best estimate of B1, we should choose x to be as spread out as possible
-this is most easily done by obtaining a large sample size-large sample sizes aren’t always possible
-In order to conduct tests and create confidence intervals, we need the STANDARD DEVIATIONS of B1hat and B2hat-these are simply the square roots of the variances
2.5 Estimating the Error VarianceUnfortunately, we rarely know σ2, although we can use
data to estimate it-recall that the error term comes from the population
equation:
(2.48) ux ii10 iy-recall that residuals come from the
estimated equation:
(2.32) uxˆˆii10i y
-errors aren’t observable and residuals are calculated from data
2.5 Estimating the Error VarianceThese two formulas combine to give us:
(2.59) )x-ˆ(-)ˆ(-u
xˆ-ˆ-)ux(
xˆ-ˆ-yu
i1100i
i10ii10
i10ii
-Furthermore, an unbiased estimator of σ2 is:
2iu
n
1
-But we don’t have data on ui, we only have data on uihat
2.5 Estimating the Error VarianceA true estimator of σ2 using uihat is:
n
SSR u
n
1 2i
-Unfortunately this estimator is biased as it doesn’t account for two OLS restrictions:
(2.60) 0ux ,0un
1iii
2.5 Estimating the Error Variance-Since we have two different OLS restrictions, if we have n-2 of the residuals,
we can then calculate the remaining 2 residuals that satisfy the restrictions-While the errors have n degrees of freedom, residuals have n-2 degrees of
freedom-an unbiased estimator of σ2 takes this into account:
(2.61) )2(
u2)-(n
1ˆ 2
i2
n
SSR
Theorem 2.3(Unbiased Estimation of σ2)
Using assumptions SLR.1 through SLR.5,
22 )ˆ( E
Theorem 2.3 ProofIf we average (2.59) and remember
that OLS residuals average to zero we get:
ixu )ˆ()ˆ(0 1100 Subtracting this from (2.59) we get:
))(ˆ()(ˆ 11 iiii xxuuu
Theorem 2.3 ProofSquaring, we get:
))(ˆ)((2
)()ˆ()(ˆ
11
2211
22
xxuu
xxuuu
ii
iii
Summing, we get:
)()ˆ(2
)()ˆ()(ˆ
11
2211
22
xxu
xxuuu
ii
iii
Theorem 2.3 ProofExpectations and statistics give us:
2)-(n
SSR
2)/()1()ˆ( 222222
xxi ssnuE
Given (2.61), we now prove the theorem since
22
2
)ˆ(
))2(
(
E
n
SSRE
2.5 Standard Error-Given (2.57) and (2.58) we now have unbiased estimators of the variance of B1hat and B0hat
-furthermore, we can estimate σ as
(2.62) ˆˆ 2 -which is called the STANDARD ERROR OF THE REGRESSION (SER) or the STANDARD ERROR OF THE ESTIMATE (Shazam)-although σhat is not an unbiased estimator, it is consistent and appropriate to our needs
2.5 Standard Error-since σhat estimates u’s standard deviation, it in turn estimates y’s standard deviation when x is factored out-the natural estimator of sd(B1hat) is:
2
1 )(/ˆ /ˆ)ˆe( xxSSTs ix
-which is called the STANDARD ERROR of B1hat (se(B1hat))
-note that this is a random variable as σhat varies across samples-replacing σ with σhat creates se(B0hat)
2.6 Regression through the Origin-sometimes it makes sense to impose the restriction that when x=0, we also expect y=0
Examples:
-Calorie intake and weight gain
-Credit card debt and monthly payments
-Amount of House watched and how cool you are
-Number of classes attended and number of notes taken
2.6 Regression through the Origin-this restriction sets our typical B0hat equal to zero and creates the following line:
(2.63) ~~
1xy -where the tilde’s distinguish this problem from the typical OLS estimation-since this line passes through (0,0), it is called a REGRESSION THROUGH THE ORIGIN
-since the line is forced to go through a point, B1tilde is a BIASED estimator of B1
2.6 Regression through the Origin
(2.64) )~
( 21 ii xy
(2.65) 0)~
( 1 iii xyx
-our estimate of B1tilde comes from minimizing the sum of our new squared residuals:
-through the First Order Condition, we get:
-which solves to
(2.66) ~
21
i
ii
x
yx
-note that all x cannot equal zero and if xbar equals zero, B1tide equals B1hat