Econometrics Course: Endogeneity & Simultaneity Mark W. Smith.
-
Upload
samson-sparks -
Category
Documents
-
view
241 -
download
7
Transcript of Econometrics Course: Endogeneity & Simultaneity Mark W. Smith.
Econometrics Course:Econometrics Course:Endogeneity & SimultaneityEndogeneity & Simultaneity
Mark W. SmithMark W. Smith
22
OverviewOverview EndogeneityEndogeneity
– SourcesSources
– ResponsesResponses Omitted VariablesOmitted Variables Measurement ErrorMeasurement Error Proxy VariablesProxy Variables Method of Instrumental VariablesMethod of Instrumental Variables
– PropertiesProperties
– Validity and strength of instrumentsValidity and strength of instruments
33
Definition of EndogeneityDefinition of Endogeneity
Suppose we have a regression equationSuppose we have a regression equation
yy1xx1xx
The variable xThe variable x11 is is endogenousendogenous if it is correlated with if it is correlated with . .
Note that this is related to, but not identical to, the Note that this is related to, but not identical to, the heuristic definition that “x1 is determined within the heuristic definition that “x1 is determined within the model.”model.”
44
Sources of EndogeneitySources of Endogeneity
1. Omitted variables1. Omitted variables If the true model underlying the data isIf the true model underlying the data is
yy1xx1xxxx
but you estimate the modelbut you estimate the model
yy1xx1xx
then variable xthen variable x11 will be endogenous will be endogenous ifif it is correlated it is correlated with xwith x33. Why? Because . Why? Because f f xx
55
Sources of EndogeneitySources of Endogeneity
2. Measurement error2. Measurement error
Suppose the true model underlying the data isSuppose the true model underlying the data is
yy1xx1xx
but you estimate the modelbut you estimate the model
yy1xx1xx
where (xwhere (xxx
66
Sources of EndogeneitySources of Endogeneity
2. Measurement error - continued2. Measurement error - continued
Variable xVariable xwill be endogenous if depends on xx
Example: Suppose that xxmeasures hospital size
(no. of beds), and that the measurement error is greater for larger hospitals. Then as xx22 grows, so grows, so
does does . Thus . Thus is correlated with x is correlated with x2, 2, causing causing
endogeneity.endogeneity.
77
Sources of EndogeneitySources of Endogeneity
2. Measurement error - continued2. Measurement error - continued
Rearranging the equation, we have Rearranging the equation, we have
yy1xx1xx
yy1xx1(x(x
yy1xx1xx
If If = f(x = f(x22) then error term is correlated with x) then error term is correlated with x2,2,
causing endogeneity.causing endogeneity.
88
Sources of EndogeneitySources of Endogeneity
3. Simultaneity3. Simultaneity
A system of simultaneous equations occurs when two or A system of simultaneous equations occurs when two or more left-hand side variables are functions of each more left-hand side variables are functions of each other (there are other ways of stating it, too):other (there are other ways of stating it, too):
yy111xx1yy
yy221xx1yy11
99
Sources of EndogeneitySources of Endogeneity
3. Simultaneity 3. Simultaneity
With some algebra you can rewrite these two With some algebra you can rewrite these two equations in “reduced form” as a single equation equations in “reduced form” as a single equation with an endogenous regressor. with an endogenous regressor.
1010
Pretesting for EndogeneityPretesting for Endogeneity
The most famous test is Hausman (1978). Many others are described in Nakamura and Nakamura (1998).
Idea: the method of instrumental variables (IV) uses two-stage least squares (2SLS). If there is no endogeneity, it is more efficient to use OLS. If there is endogeneity, OLS is inconsistent and so 2SLS is best.
1111
Pretesting for EndogeneityPretesting for Endogeneity
Problem: the tests all have low power, particularly when 2SLS would cause a significant loss of efficiency.
In practice, many people use a Hausman test, fail to reject the null hypothesis of no endogeneity, and then use OLS.
A more statistically reliable approach is to base judgments of endogeneity on how the system under study works.
1212
Responses to EndogeneityResponses to Endogeneity
What if you are unsure whether a variable is endogenous?
Approach #1: ignore it
Approach #2: use instrumental variables (IV) -- described later -- for every possibly endogenous variable
Approach #3: subtract out the variable using time-series (panel) data
1313
Responses to Endogeneity Responses to Endogeneity
Approach #1: ignore it -- Not advisable: true endogeneity causes OLS to be
inconsistent
Approach #2: use IV on every possibly endogenous variable
-- Not advisable: it will cause a loss of efficiency (and hence wider confidence intervals) and may lead to bias.
1414
Responses to EndogeneityResponses to Endogeneity
Approach #3: Difference it out
Suppose that the endogeneity is fixed over time, such as measurement error or an omitted variable. Further, suppose that observe data in two time periods.
A difference-in-difference (DD) model can be used: subtract values at time 1 (“before”) from values at time 2 (“after”) and the endogenous variable will drop out.
1515
Responses to EndogeneityResponses to Endogeneity
Approach #3: Difference it out -- continued
Limitations:
- DD models will not eliminate selection bias.
- DD models only eliminate fixed variables; sometimes endogenous variables change values over time
1717
Dealing with Omitted Variables
The investigator should have a conceptual model of the process under study. Guided by this understanding, there are a few options for dealing with omitted variables.
1. Find additional data so that every relevant variable is included.
2. Ignore it
- Acceptable only if omitted variable is uncorrelated with all included variables; otherwise the coefficient estimates will be biased up or down.
1818
Dealing with Omitted Variables
3. Find proxy variable
Suppose the following:
y is the outcome
q is the omitted variable
z is the proxy for q
What properties should the proxy z have?
1919
Dealing with Omitted Variables
a. Proxy z should be strongly correlated with q.
b. Proxy z must be redundant (= ignorable)
E (y | x, q, z) = E (y | x, q)
c. Omitted q must be uncorrelated with other regressors conditional on z:
(corr (q , x(corr (q , xjj) = 0 | z) for each x) = 0 | z) for each xjj
2020
Dealing with Omitted VariablesDealing with Omitted Variables
The last two mean roughly that q and z provide The last two mean roughly that q and z provide similar information about the outcome.similar information about the outcome.
You don’t observe q, so how can you prove these You don’t observe q, so how can you prove these conditions are met? Either argue it from theory conditions are met? Either argue it from theory
or test the assumption using other data.or test the assumption using other data.
2222
Dealing with Measurement ErrorDealing with Measurement Error
1. Improve measurement- DSS improved by refusing extreme outlier values
- NPPD improved by requiring more complete data
2. Argue that the degree of error is small- Use outside data for validation
3. Argue that error is uncorrelated with included variables
2424
Dealing with Proxy VariablesDealing with Proxy Variables
1. What if proxy variable z is correlated with a regressor x?
OLS is inconsistent, but one can hope and argue that the inconsistency is less than if z is omitted.
2525
Dealing with Proxy VariablesDealing with Proxy Variables
2. Consider using a lagged dependent variable as a proxy variable.
Example: If you believe that omitted variable qt strongly affects outcome yt, then a lagged value of y (such as yt-2) is probably correlated with qt as well.
Problem: yt-2 may be correlated with other x’s as well, leading to inconsistency.
2626
Dealing with Proxy VariablesDealing with Proxy Variables
3. Consider using multiple proxy variables for a single omitted variable.
How? Simply put all proxy variables in the equation.
Note: they all must meet the requirements for proxies.
2727
Dealing with Proxy VariablesDealing with Proxy Variables
4. What if omitted variable q interacts with a regressor x?
yyxxqqxx
y/x = q
marginal effect of x on y involves q, which is unobserved
2828
Dealing with Proxy VariablesDealing with Proxy Variables
Demean z: take every value of z and subtract out the grand (overall) average value. Call it zd.
yy1xxzdzdxx
y/x = zd
= because E[zd] = 0
3030
Method of Instrumental VariablesMethod of Instrumental Variables
Often used to deal with simultaneity.Often used to deal with simultaneity.
More generally, IV applies whenever a regressor More generally, IV applies whenever a regressor xx is is correlated with the error term correlated with the error term ..
3131
IV DefinitionIV Definition
Model: y = Model: y = + + 1xx1 + + 2xx2 + +
Suppose that xSuppose that x22 is endogenous to y. An instrumental is endogenous to y. An instrumental variable is one thatvariable is one that
(a) is correlated with the endogenous variable x(a) is correlated with the endogenous variable x22
(b) is uncorrelated with error term (b) is uncorrelated with error term (c) should not enter the main equation (i.e., does not (c) should not enter the main equation (i.e., does not explain y)explain y)
3232
Two-Stage Least SquaresTwo-Stage Least Squares
Two-stage least squares (2SLS) approachTwo-stage least squares (2SLS) approach
Stage 1: Stage 1:
Predict xPredict x22 as a function of all other variables plus an as a function of all other variables plus an
IV (call it z):IV (call it z):
xx2 = a + = a + xx1 + + 2z + +
Create predicted values of xCreate predicted values of x2 2 – call them x– call them x22pp
3333
Two-Stage Least SquaresTwo-Stage Least Squares
Two-stage least squares (2SLS) approachTwo-stage least squares (2SLS) approach
Stage 2: Stage 2:
Predict y as a function of xPredict y as a function of x22pp and all other variables (but and all other variables (but
not z):not z):
y = a + y = a + xx1 + + 2 xx22pp + +
Note: adjust the standard errors to account for the fact that xx22
pp is predicted.is predicted.
3434
Two-Stage Residual InclusionTwo-Stage Residual Inclusion
2SLS is only consistent when the Stage 2 equation is linear.
If Stage 2 is nonlinear, use the two-stage residual inclusion (2SRI) method:
- Stage 1 as in 2SLS, leading to predicted xx22pp
- Develop residuals v = xx22 - xx22p p
3535
Two-Stage Residual InclusionTwo-Stage Residual Inclusion
- Stage 2: Stage 2:
Predict y as a function of xPredict y as a function of x11, x, x2 2 (not x(not x22pp) and the new ) and the new
residuals v:residuals v:
y = f (a + y = f (a + xx1 + + 2 xx22 ++ 3v)v) + +
where f(.) is a nonlinear function.where f(.) is a nonlinear function.
Note that if Stage 2 is linear, then 2SRI yields the Note that if Stage 2 is linear, then 2SRI yields the same results as 2SLS.same results as 2SLS.
3636
Multiple IVsMultiple IVsWhat if you have multiple endogenous variables?What if you have multiple endogenous variables?
1. The number of IVs must equal or exceed the number of endogenous variables
2. Estimate a separate 1st-stage regression for each endogenous variable
3. Every 1st-stage regression should contain all IVs
3737
IV IssuesIV Issues
Two issues plague the IV method:
1. No IV is available
2. A potential IV is found, but its quality is uncertain
3838
IV IssuesIV Issues
What if there is no IV?
State that no IV exists and forge ahead anyway, arguing that any bias in OLS is likely to be small. - Argue that the endogeneity is weak on theoretical grounds.
- Argue that external data indicate that the bias from OLS is likely to be small.
3939
IV PropertiesIV Properties
What if you have an IV of unknown quality?
Two characteristics mark a good IV:
1. Validity
2. Strength
4040
IV ValidityIV Validity
Validity has several components:
a. Non-zero correlation with x2
b. Uncorrelated with error term
c. Uncorrelated with y except through x2
d. Monotonicity: as z increases, x2 increases
4141
IV ValidityIV Validity
There are several ways to show validity of an IV:
• Non-zero correlation with the endogenous variable can be shown directly.
• Robustness: do alternative IVs yield similar results?
• Non-correlation with the outcome variable of the 2nd
stage. This point must be argued from theory, an understanding of how the system under study works.
4242
IV ValidityIV Validity
Warning: one cannot simply add a candidate IV to the main model (i.e., the 2nd stage) to see whether it is significant. The result is biased.
BUT
If there are multiple IVs, one can use a test of over-identifying restrictions.
4343
IV ValidityIV Validity
Overidentification: number of candidate IVs exceeds number of endogenous variables.
Suppose that
(a) You have one endogenous variable and three candidate IVs
(b) You know that one of the IVs is truly valid.
Use the known-valid IV in the 1st stage and put the remaining two IVs in the 2nd stage.
4444
IV ValidityIV Validity
Over-identification test, continued
If the two remaining IVs are jointly insignificant in the 2nd stage, then this supports their use as alternative IVs.
Problem: this only works if the IV(s) in the 1st stage are truly valid – and you don’t know that!
4545
IV ValidityIV Validity
Over-identification test, continued
Partial solution: use Sargan’s (1984) test, which assumes only that one or more of your IVs are valid –you don’t have to specify which. This method fails only if none of the IVs is valid.
In the end, you must argue for validity on conceptual grounds at a minimum.
4646
IV ValidityIV Validity
Conceptual arguments:
1. Explain why z should influence x2
2. Explain why z should not influence y directly
3. Anticipate objections about omitted variables that link z to the error term . Show that z is not related to those omitted variables, perhaps using outside data. For example, use data on non-veterans to support a claim about how veterans act.
4747
IV PropertiesIV Properties
Two characteristics mark a good instrumental variable:
1. Validity
2. Strength
4848
Strong IVsStrong IVs
A strong instrument has a high correlation with the endogenous variable.
How strong a correlation? Staiger & Stock (1997) recommend a partial F statistic of 5 or greater.
- Run 1st stage with and without the IV.
- Compare the overall F statistics: a difference of 5 or
more is sufficient evidence of strength.
4949
Weak IVsWeak IVs
If the IVs are weak,• 2SLS and 2SRI are consistent, but there can be
considerable bias even in large samples• standard errors are too small • 2SLS and 2SRI perform poorly
5050
Weak IVsWeak IVs
What to do if IVs are weak?
If there is a single endogenous variable, use a conditional likelihood ratio (CLR) test:
* perform a regular likelihood ratio test
* adjust the critical values
* available in Stata; see Stata Journal, 3, 57-70 and http://elsa.berkeley.edu/wp/marcelo.pdf by Moreira
and Poi
5151
Weak IVsWeak IVs
What if there are multiple endogenous variables and only weak IVs?
A solution has not been developed … yet!
5353
Selected ReferencesSelected ReferencesJM Wooldridge. Econometric analysis of cross section and panel JM Wooldridge. Econometric analysis of cross section and panel
data. Cambridge, MA: MIT Press, 2002.data. Cambridge, MA: MIT Press, 2002.
A graduate-level econometrics textbook with lengthy textual A graduate-level econometrics textbook with lengthy textual descriptions of practical issues.descriptions of practical issues.
HS Bloom, ed. Learning more from social experiments: evolving HS Bloom, ed. Learning more from social experiments: evolving analytic approaches. Russell Sage. analytic approaches. Russell Sage.
A largely non-technical exploration of how instrumental A largely non-technical exploration of how instrumental variables are found and used, with examples from welfare variables are found and used, with examples from welfare reform studies.reform studies.
5454
Selected ReferencesSelected ReferencesMP Murray. Avoiding invalid instruments and coping with weak MP Murray. Avoiding invalid instruments and coping with weak
instruments. instruments. Journal of Economic PerspectivesJournal of Economic Perspectives 2006;20(4): 2006;20(4): 111-132. 111-132.
A superb reference with relatively few equations. Has an A superb reference with relatively few equations. Has an extensive reference list. extensive reference list.
A Nakamura, M Nakamura. Model specification and A Nakamura, M Nakamura. Model specification and endogeneity. endogeneity. Journal of EconometricsJournal of Econometrics 1998;83:213-237. 1998;83:213-237.
Presents major endogeneity tests, explores approaches to Presents major endogeneity tests, explores approaches to endogeneity testing. Somewhat iconoclastic. endogeneity testing. Somewhat iconoclastic.
5555
Selected ReferencesSelected References
M McClellan, B McNeil, J Newhouse. Does more intensive M McClellan, B McNeil, J Newhouse. Does more intensive treatment of acute myocardial infarction in the elderly reduce treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. mortality? Analysis using instrumental variables. JAMAJAMA1994;272(11):859-66 1994;272(11):859-66
Classic paper using IV in health, but challenging to read.Classic paper using IV in health, but challenging to read.
J Newhouse, M McClellan. Econometrics in outcomes research: J Newhouse, M McClellan. Econometrics in outcomes research: the use of instrumental variables.the use of instrumental variables. Ann Rev Pub Health Ann Rev Pub Health 1998; 1998; 19:17-34.19:17-34.
Non-technical introduction to IV.Non-technical introduction to IV.
5656
Selected ReferencesSelected References
J Terza, A Basu, P Rathouz. Two-stage residual inclusion J Terza, A Basu, P Rathouz. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric estimation: Addressing endogeneity in health econometric modeling. modeling. Journal of Health EconomicsJournal of Health Economics 2008;27:531-543. 2008;27:531-543.
Explains two-stage residual inclusion models and contrasts Explains two-stage residual inclusion models and contrasts them to two-stage least squares. Moderately technical.them to two-stage least squares. Moderately technical.
5757
AcknowledgementsAcknowledgements
Much of the content of this presentation is derived from Much of the content of this presentation is derived from Wooldridge (2002), Murray (2006), and Nakamura and Wooldridge (2002), Murray (2006), and Nakamura and Nakamura (2006). Nakamura (2006).
Helpful comments were also provided by HERC staff.Helpful comments were also provided by HERC staff.