8. Instrumental variables regression · Implication of these conditions: ‘ The relevant and...
Transcript of 8. Instrumental variables regression · Implication of these conditions: ‘ The relevant and...
8. Instrumental variables regression
Recall:
• In Section 5 we analyzed five sources of estimation bias aris-ing because the regressor is correlated with the error term
−→ Violation of the first OLS assumption
• These threats to internal validity are
Omitted variable bias
Misspecification of the functional form
Measurement error
Sample selection bias
Simultaneous causality
213
Now:
• General technique that helps to obtain a consistent estimatorof the unknown coefficients when the regressor X is corre-lated with the error term u
−→ Instrumental variables (IV) regression
Basic idea:
• Think of the variation in X as having two parts:
one part that is correlated with u(the problematic part)
a second part that is uncorrelated with u(the unproblematic part which can be used for estimation)
214
Issues of this section:
• How can we isolate the problematic from the unproblematicparts in the variations of X?
−→ By the use of instrumental variables(instruments)
• What are good instruments and how can we find them?
215
8.1. The IV estimator with a single regressor anda single instrument
IV model and assumptions:
• We consider the single-regressor model
Yi = β0 + β1 ·Xi + ui, i = 1, . . . , n, (8.1)
• Xi and ui are assumed to be correlated, that is
Corr(Xi, ui) 6= 0
• We use the additional instrumental variable Z to isolate thatpart of Xi that is uncorrelated with ui
216
Terminology:
• We call variables correlated with the error term endogenous
• We call variables uncorrelated with the error term exogenous
Two conditions for a valid instrument Z:
1. Instrument relevance condition:
Corr(Zi, Xi) 6= 0
(variation in the instrument Zi is related to variation in Xi)
2. Instrument exogeneity condition:
Corr(Zi, ui) = 0
(that part of the variation in Xi captured by Zi is exogenous)
217
Implication of these conditions:• The relevant and exogenous instrument Z can capture move-
ments in X that are exogenous
• This exogenous part of X can be used to consistently esti-mate β1
Formalization of this concept:• Two stage least squares estimation
(TSLS)
• First stage:
Decomposition of X into the problematic and the problem-free components
• Second stage:
Use the problem-free component to estimate β1
218
Two stage least squares estimator:
1. Consider the regression equation
Xi = π0 + π1 · Zi︸ ︷︷ ︸
Part #1
+ vi︸︷︷︸
Part #2
(8.2)
Part #1 is that part of Xi that can be predicted by Zi
Since Zi is exogenous it follows that
Corr(π0 + π1 · Zi, ui) =π1
|π1|·Corr(Zi, ui) = 0
(Part #1 is the problem-free part)
Part #2 is vi for which we have Corr(vi, ui) 6= 0(Part #2 is the problematic part)
We apply OLS to Eq. (8.2) to obtain π0 and π1
219
Two stage least squares estimator: [continued]
2. We use the predicted values Xi = π0 + π1 · Zi and considerthe regression equation
Yi = β0 + β1 · Xi + ui (8.3)
We apply OLS to Eq. (8.3) and obtain the TSLS estima-tors βTSLS
0 of β0 and βTSLS1 of β1
220
Example:
• Estimation of the demand curve for butter based on data onthe quantity of butter consumed (Qbutter
i ) and butter prices(P butter
i ) sampled over n years (i = 1, . . . , n)
• We aim at estimating the butter demand curve
Yi = β0 + β1 ·Xi + ui,
where
Yi = ln(Qbutteri )
Xi = ln(P butteri )
β1 = price elasticity of butter demand
221
Example: [continued]
• We have a simultaneous causality bias here since there arecausal links from ln(P butter
i ) to ln(Qbutteri ), but also from
ln(Qbutteri ) to ln(P butter
i ) via the interaction between the de-mand for and the supply of butter
• It follows from Section 5.1.5. (Slides 143–145) that the re-gressor ln(P butter
i ) is likely to be correlated with the errorterm
−→ OLS estimator of β1 will be inconsistent
222
Equilibrium price and quantity data
223
Equilibrium price and quantity data [continued]
224
Equilibrium price and quantity data [continued]
225
Example: [continued]
• To circumvent this problem we need an instrumental variableZi which shifts the supply curve but leaves the demand curveunaffected
• Such an instrument Zi could be the the variable RAINFALL inthe butter-producing region
Relevance condition:Below average rainfall reduces cattle-grazing and thus re-duces butter production at a given price:
Corr(RAINFALLi, ln(P butteri )) 6= 0
Exogeneity condition:Demand for butter does not depend on the rainfall:
Corr(RAINFALLi, ui) = 0
226
Example: [continued]
• TSLS estimation:
Stage 1:Regress ln(P butter
i ) on RAINFALLi and compute ln(P butteri )
(Isolation of price changes due to shifts in the supplycurve)
Stage 2:Regress ln(Qbutter
i ) on ln(P butteri )
227
Statistical inference for TSLS:
• It can be shown that the TSLS estimator βTSLS1 is consistent
and, in large samples, approximately normally distributed:
βTSLS1 ∼ N(β1, σ2
βTSLS1
),
where
σ2βTSLS1
=1nVar {[Zi − E(Z)] · ui}
[Cov(Zi, Xi)]2 (8.4)
• The standard error of βTSLS1 can be estimated by estimating
the variance and covariance terms appearing on the right-hand side of Eq. (8.4) and taking the square root of theestimate of σ2
βTSLS1
228
Statistical inference for TSLS: [continued]
• These standard errors are routinely computed by economo-metric software packages like EViews
• Because βTSLS1 is normally distributed in large samples, hy-
pothesis tests and confidence intervals about β1 can be con-ducted in the usual way
Attention:
• The ususal OLS standard errors of Stage 2 are not identicalto the TSLS standard errors described above and thus areinvalid(since these ignore the prediction errors of the Xi)
• One should use the special TSLS routines implemented inthe software packages
229
8.2. The general IV regression model
Now:
• Generalization of the IV regression model to multiple regres-sors and instruments
Four types of variables:
• The dependent variable Y
• Problematic endogenous regressors
• Included exogenous regressors
• Instrumental variables
230
Definition 8.1: (General IV regression model)
The general IV regression model is
Yi = β0+β1X1i+. . .+βkXki+βk+1W1i+. . .+βk+rWri+ui, (8.5)
i = 1, . . . , n, where
• Yi is the dependent variable,
• β0, β1, . . . , βk+r are unknown regression coefficients,
• X1i, . . . , Xki are k endogenous regressors potentially corre-lated with ui,
• W1i, . . . , Wri are r included exogenous regressors which areuncorrelated with ui or are control variables,
• ui is the error term,
• Z1i, . . . , Zmi are m instrumental variables.
231
Definition 8.1: (General IV regression model) [continued]
The coefficients are overidentified if there are more instrumentsthan endogenous regressors (m > k), they are underidentified ifm < k, and they are exactly identified if m = k. Estimation ofthe IV regression model requires exact identification or overiden-tification.
Now:
• Adaption of the TSLS principle to the general IV model de-scribed in Definition 8.1
232
TSLS in the general IV model:
Consider the general IV regression model (8.5) from Slide 231
1. First-stage regression(s):
Regress X1i on the instrumental variables (Z1i, . . . , Zmi)and the included exogenous variables (W1i, . . . , Wri) usingOLS, that is estimate the following equation via OLS:
X1i = π0 + π1Z1i + . . . + πmZmi
+πm+1W1i + . . . + πm+rWri + vi (8.6)
Compute the predicted values X1i from this regression
Repeat this for all endogenous regressors X2i, . . . , Xki,thereby computing the predicted values X2i, . . . , Xki
233
TSLS in the general IV model: [continued]2. Second-stage regression
Regress Yi on the predicted values of the endogenous vari-ables X1i, . . . , Xki and the included exogenous variables(W1i, . . . , Wri), that is estimate the following equation viaOLS:
Y1i = β0 + β1X1i + . . . + βkXki+βk+1W1i + . . . + βk+rWri + ui (8.7)
The TSLS estimators βTSLS0 , . . . , βTSLS
k+r are the OLS esti-mators from the second-stage regression (8.7)
Remark:• The two stages are done automatically within TSLS estima-
tion commands in EViews
234
Now:
• Adaption of the conditions for a valid instrument Z from Slide217 (relevance and exogeneity) to the general IV regressionmodel
Intuitively:
• When there are multiple included endogenous variables, thecondition for instrument relevance
must be formulated in a way that it rules out multi-collinearity in the second-stage regression
should reflect that the instruments provide enough infor-mation about the exogenous movements in the endoge-nous variables to sort out their seperate effects on Y
235
Definition 8.2: (Conditions for valid instruments)
A set of m instruments Z1i, . . . , Zmi must satisfy the followingtwo conditions to be valid:
1. Instrument relevance:
In general, let X∗1i be the predicted value of X1i from
the regression of X1i on the instruments Z1i, . . . , Zmi andthe included exogenous regressors W1i, . . . , Wri and letX∗
2i, . . . , X∗ki be analogously defined. Furthermore, let 1
denote the n-dimensional vector 1 ≡ (1, . . . ,1)′. Then(X∗
1, . . . , X∗k, W1, . . . , Wr,1) are not perfectly multicollinear.
236
Definition 8.2: (Conditions for valid instruments) [continued]
1. Instrument relevance: [continued]
If there is only one endogenous regressor Xi, then forthe previous condition to hold, at least one instrumentZji, (j = 1, . . . , m), must have a non-zero coefficient inthe regression equation
Xi = π0 + π1Z1i + . . . + πmZmi
+πm+1W1i + . . . + πm+rWri + vi.
2. Instrument exogeneity:
All instruments are uncorrelated with the error term:
Corr(Z1i, ui) = 0, . . . ,Corr(Zmi, ui) = 0.
237
Next:
• Under which conditions are the TSLS estimators consistentand do have a sampling distribution that is normal in largesamples?
• If we can specify conditions under which this is the case,then the principles of statistical inference for TSLS in thesingle-regressor case as described on Sildes 228–229 carryover to the general case of multiple instruments and multipleendogenous variables(t-statistics, F -statistics, confidence intervals)
238
The IV regression assumptions:
The variables and errors in the IV regression model in Eq. (8.5)should satisfy the following conditions:
1. E(ui|W1i, . . . , Wri) = 0
2. (X1i, . . . , Xki, W1i, . . . , Wri, Z1i, . . . , Zmi, Yi) are i.i.d. draws fromtheir joint distribution
3. Large outliers are unlikely: X’s, W ’s, Z’s, and Y variableshave nonzero finite fourth moments
4. The two conditions for valid instruments stated in Definition8.2 hold
239
Remarks:
• The calculation of TSLS standard errors is done automati-cally by software packages like EViews
• One should use heteroskedasticity-robust standard errors forthe same reasoning as in the conventional multiple linearregression model
240
8.3. Checking instrument validity
Important question:
• Is a given set of instruments valid in a particular application?
Meaning of ’instrument relevance’:
• Instrumental relevance plays a role akin to the sample size
• A more relevant instrument produces a more accurate esti-mator, just as a large sample size produces a more accurateestimator
• The more relevant is the instrument, the better is the nor-mal approximation to the sampling distribution of the TSLSestimator and its t- and F -statistics
241
Problems with ’weak’ instruments:• If the instruments are ’weak’, then the TSLS estimator can
be badly biased and the normal distribution is a poor approx-imation to the sampling distribution of the TSLS estimator
−→ No justification for performing statistical inference as de-scribed even when the sampling size is large
−→ TSLS is no longer reliable
Checking for ’weak’ instruments:• How relevant must instruments be for the normal distribution
to provide a good approximation in practice?
• Complicated answer in the general IV model
• Simple rule of thumb in the practically most relevant case ofa single endogenous regressor
242
Rule of thumb 8.3: (Checking for weak instruments)
Consider the first-stage F -statistic testing the hypothesis thatthe coefficients on the instruments Z1i, . . . , Zmi in the first-stageregression (8.6) on Slide 233 are all simultaneously equal to zero:
H0 : π1 = π2 = . . . = πm = 0 vs.
H1 : At least one πj 6= 0 (j = 1, . . . , m).
When there is a single endogenous regressor, a first-stage F -statistic less than 10 indicates that the instruments are weak. Inthis case the TSLS estimator is biased (even in large samples)and the TSLS t-statistics and confidence intervals are unreliable.
243
Meaning of ’instrument exogeneity’:
• If the instruments are not exogenous, then the TSLS is in-consistent
−→ TSLS estimation and inference based on it are unreliable
Statistical tests for exogenous instruments:
• No statistical tests are available when the coefficients areexactly identified(that is when m = k in the IV model (8.5) on Slide 231)
• If the coefficients are overidentified, that is when m > k inEq. (8.5), it is possible to test the hypothesis that the ’extra’instruments are exogenous under the maintained assumptionthat there are enough valid instruments to identify the coef-ficients of interest
244
Theorem 8.4: (The overidentifying restrictions test)
Let uTSLSi be the residuals from TSLS estimation of Eq. (8.5)
from Slide 231. Use OLS to estimate the regression coefficientsin
uTSLSi = δ0 + δ1Z1i + . . . + δmZmi
+ δm+1W1i + . . . + δm+rWri + ei, (8.8)
where ei is the regression error term. Let F denote the homoske-dasticity-only F -statistic testing the null hypothesis
H0 : δ1 = . . . = δm = 0.
The overidentifying restrictions test statistic is
J = m · F.
(The J-test.)
245
Theorem 8.4: (The J-test) [continued]
Under the null hypothesis that all instruments are exogenous(suggesting that the instruments should approximately be uncor-related with uTSLS
i ), and if ei is homoskedastic, in large samplesJ is distributed χ2
m−k, where m − k is the ’degree of overidenti-fication’, that is, the number of instruments minus the numberof endogenous regressors.
Remark:
• An application of the J-test is provided in the case study’The demand for cigarettes’
−→ See class for details
246
8.4. Where do valid instruments come from?
Important question:
• How can we find instrumental variables for a given applicationthat are both relevant and exogenous?
Two main approaches:
1. Use economic theory to suggest instruments
2. Find an exogenous source of variation in X arising from arandom phenomenon that induces shifts in the endogenousregressor
247
Example of Approach #1:
• Consider the butter demand example from Section 8.1.
• Understanding of the economics of agricultural markets leadsus to look for an instrument that shifts the supply curve butnot the demand curve
• This leads us to consider weather conditions in agriculturalregions
−→ Instrument variable: RAINFALL in agricultural regions
248
Example of Approach #2:
• Consider the effect on test scores of class size
• The regressor CLASS SIZE may be correlated with the errorterm because of omitted variable bias
• In some districts, however, earthquake damages may increasethe average class size
• This variation in class size may be unrelated to potentiallyomitted variables that affect student achievement
−→ Instrument variable: that portion of CLASS SIZE that acr-rues to earthquake damage
249
Case studies:
• Three examples of how researchers use their expert knowl-edge of their empirical problem to find adequate instrumentalvariables:
Does putting criminals in jail reduce crime?
Does cutting class sizes increase test scores?
Does aggressive treatment of heart attacks prolong lives?
(see class for a thorough discussion)
250