Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex...
-
Upload
hilary-glenn -
Category
Documents
-
view
221 -
download
0
Transcript of Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex...
![Page 1: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/1.jpg)
Estimation taking account of sample selection with Stata
Cheti NicolettiISER, University of Essex
2009
![Page 2: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/2.jpg)
• Estimation commands:
truncreg, tobit,
heckman, heckprobit,
treatreg, ivreg
• Other useful commands:ivprobit, ivtobit
• Useful option in the estimation commands:
pweights
![Page 3: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/3.jpg)
truncreg• The truncreg command is useful to estimate regression
models with a truncated sample• Ex: Health insurance claims observed only when amount
claimed is higher than a fixed threshold.
truncreg y x1 x1 x2 … xk , ll(c)
),0( 2* Niidxy
xc)YE(Y that soc y*ifonly * yobserve we
Normal Truncated yobserve We ~** cyy
xc
and where
1
![Page 4: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/4.jpg)
tobit
• The tobit command is useful to estimate regression models with a censored dependent variable (deterministic censure)
• 3 Different types of models:Tobit with fixed censoring value (tobit)Censored regression with varying censoring
value (cnreg)Regression with interval data (intreg)
![Page 5: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/5.jpg)
tobit• Tobit first type (consumption of a good)
tobit y x1 x2 … xk , ll(0)
tobit y x1 x2 … xk , ul(c)
),0( 2* Niidxy
0if 0
0if*
**
y
yyy
cyc
cyyy
*
**
2 if
if
![Page 6: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/6.jpg)
cnreg
• Tobit first type Ex. minimum wage with different levels in different years
• cnreg y x1 x2 … xk censored(d)
),0( 2* Niidxy
index individual the is if
ifi
cyc
cyyy
iii
ii
i *
**
otherwise
if
1
0 *
ii
i
cyd
![Page 7: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/7.jpg)
intreg• Interval data regression (Ex:Bracket information on
income for people refusing to give the exact value)
• Whet yi* is not declared we observe the range to which yi* belong
(0, 5000], (5000,15000], (15000,30000], (30000,+∞] say (ai, bi]
otherwise0
declared is of valued exact the if *1 yd
n
i
d
ii
d
ii
ii
abxyL
1
12
2
*
2
1exp
2
1
),0( 2* Niidxy
![Page 8: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/8.jpg)
Estimating the regression with interval data in Stata
The command intreg needs two variables to define the dependent variable, say y1 and y2
intreg y1 y2 x1 x2 … xk
Individuals giving y1 y2
An exact value of their income
Example
A range for their incomeExample
Example
y*
5980
y* in (a,b)(5000, 15000]
(30000, +∞]
y*
5980
a5000
30000
y*
5980
b15000
.
![Page 9: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/9.jpg)
heckman• The heckman command is used to estimate Generalized Tobit or
Tobit of the 2nd type using ML estimation (default option) or the two-step estimation (option [twostep])
heckman y x1 x2 … xk, select(z1 z2 … zs)
heckman y x1 x2 … xk, select(d = z1 z2 … zs)
heckman y x1 x2 … xk, select(z1 z2 … zs) twostep
),0( 2* Niidxy )1,0(* Niidvwherevzd
otherwise0if 0
srespondentfor 0if1
otherwise .
0if*
***
d
dd
dyy
![Page 10: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/10.jpg)
heckprobit• The heckman command is used to estimate a probit
model with selection (option twostep does not exist because inconsistent)
heckprobit p x1 x2 … xk, select(z1 z2 … zs)
)1,0(* Niidvwherevzd
otherwiseif
responds th-i individual if if
otherwise
if
00
01
.
0*
***
d
dd
dpp
otherwise
if
0
0*1*)1,0(*
ypNiidxy
i
d
i
dp
ii
p
iiiiii ZZXZXL 11
22 )(),,(),,(
![Page 11: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/11.jpg)
Impact of an endogenous dummy Homogenous treatment effect
y1= earnings for trained people
y0= earnings for non-trained people
d dummy indicating participation to the training program
y=y1 d +y0 (1-d)
y=x+ d+
d*=z +u where d=l(d*>0)
We have a selection problem because of the correlation
between u and . This implies that d is not independent of .
1
,0
0 2
uNiidu
![Page 12: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/12.jpg)
treatreg
• The treatreg command is used to evaluate the effect of a endogenous binary variables (treatment, program, …) on a continuous variable of interest (see previous slide).
treatreg y x1 x2 … xk , treat(d=z1 z2 … zs)• Ex: Sample of graduated students with and without a
master degree • y=log earnings, d=1 if master degree, 0 otherwise• x = age, age square, d, sex, type first degree• z = mother’s level of education, father’s level of
education, sex, type first degree
![Page 13: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/13.jpg)
How to use weights in Stata• Most Stata commands can deal with weighted data. Stata
allows four kinds of weights:1. fweights, or frequency weights, are weights that
indicate the number of duplicated observations.2.pweights, or sampling weights, are weights that
denote the inverse of the probability that the observation is included due to the sampling design and or nonresponse.
3.aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; i.e., the variance of the j-th observation is assumed to be sigma^2/w_j, where w_j are the weights.
4. iweights, or importance weights, are weights that indicate the "importance" of the observation in some vague sense.
![Page 14: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/14.jpg)
Option pweights• Usually sample surveys provide weights to take account of sampling
design and nonresponse. • Let p be individual weight• Then we can run a regression with weighted observationsregress y x1 x2 … xk [pweight=p]
• Let us assume to have a sample with a sample selection problem (due to observables), then we can use propensity score weighting
• A possible “simplified” way to estimate your own weights is described in the following:
probit d z1 z2 … zs
predict propgen invprop=1/propreg y x1 x2 … xk [pweight=invprop]
![Page 15: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/15.jpg)
For complex survey design it is better to use
• svyset [pweight=p]
• svy: regress y x1 x2 … xk
• svyset have options for cluster sampling designs or other complex design
• Declare survey design for dataset
• svyset [pweight=p], strata(stratid)
![Page 16: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/16.jpg)
ivreg
• The ivreg command is used to estimate regression model by using instrumental variables for potential endogenous explanatory variables.
• Evaluation of the impact of years of schooling on earnings
y=x+ d*+ Problem: d* and are correlatedSolution 1: IV estimation ( IV=z: parental interest in the
child education, bad financial shock of the family when the child is age 11-16, presence of older siblings, Blundell et al 2003)
ivreg y x1 x1 x2 … xk (d*=z1 z2 … zs)
![Page 17: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/17.jpg)
STATA program for evaluation
Abadie A., Drukker D., Herr J.L., Imbens G.W. (2001), Implementing Matching Estimators for Average Treatment Effects in Stata, The Stata Journal, 1, 1-18 http://ksghome.harvard.edu/~.aabadie.academic.ksg/software.html
Becker S.O., Ichino A. (2002), Estimation of average treatment effects based on propensity scores. The Stata Journal, 2, 358-377 http://www.lrz-muenchen.de/~sobecker/pscore.html
Sianesi B. (2001), Implementing Propensity Score Matching Estimators with STATA, UK Stata Users Group, VII Meeting London, http://ideas.repec.org/c/boc/bocode/s432001.html
![Page 18: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/18.jpg)
Text Book References:• Amemiya T. (1985), Advanced Econometrics, Basil Blackwell,
Oxford. • Gourieroux C. (2000), Econometrics of Qualitative Dependent
Variables, Cambridge University Press, Cambridge. • Greene W.H. (2000), Econometric Analysis, Third edition, Prentice-
hall, London. • Maddala G. S. (1983), Limited-Dependent and Qualitative Variables
in Econometrics, Cambridge University Press, Cambridge.• Wooldridge J.M. (2002), Econometric Analysis of Cross-Section and
Panel Data, MIT press• Lee M. (2005) Micro-Econometrics for policy, program and
treatment effects. Advanced Text in Econometrics. Oxford University Press, Oxford
![Page 19: Estimation taking account of sample selection with Stata Cheti Nicoletti ISER, University of Essex 2009.](https://reader036.fdocuments.net/reader036/viewer/2022081805/56649e955503460f94b99b63/html5/thumbnails/19.jpg)
Survey Articles:• Angrist J. (2001), Estimation of Limited-Dependent Variable Models with Binary
Endogenous Regressors: Simple Strategies for Empirical Practice,” Journal of Business and Economic Statistics, 19, 2-28.
• Angrist J.D., Krueger A.B. (1999), Empirical strategies in labor economics, published as working paper Princeton University, 401, and in O. Ashenfelter and D. Card, eds., Handbook of Labor Economics, Volume 3A, Amsterda,, 1277-1366.
• Blundell R., Costa-Dias M. (2002), Alternative approaches to evaluation in empirical microeconomics', published as IFS, Cemmap working paper, 10, and in Portuguese Economic Journal, Vol.1, 91-115, 2002.
• Blundell R., Powell J.L. (2001), Endogeneity in nonparametric and semiparametric regression models, IFS, Cemmap working paper, CWP09/01, Chapter 8 in Advances in Economics and Econometrics , M. Dewatripont, Hansen, L. and S. J. Turnsovsky (eds.), Cambridge University Press, ESM 36, pp 312-357,2003.
• Heckman J.J., Ichimura H., Smith J.A., Todd P. (1998), Characterization of Selection Bias Using Experimental Data, Econometrica, 66, 1017-1098.
• Heckman J.J., LaLonde R.J., Smith J.A. (2000), The economics and econometrics of active labor market programs, in O. Ashenfelter and D. Card, (eds.), Handbook of Labor Economics, vol. 3, North Holland, Amsterdam.
• Moffitt R. (2004), An introduction to the symposium of matching econometrics, Review of Economics and Statistics, vol. 1, a collection of articles on matching by various authors.
• Vella F. (1998), Estimating models with sample selection bias: a survey', The Journal of Human Resources, vol. 3, 127-169.