1. Descriptive Tools, Regression, Panel Data
description
Transcript of 1. Descriptive Tools, Regression, Panel Data
![Page 1: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/1.jpg)
1. Descriptive Tools, Regression, Panel Data
![Page 2: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/2.jpg)
Model Building in Econometrics
• Parameterizing the model• Nonparametric analysis• Semiparametric analysis• Parametric analysis
• Sharpness of inferences follows from the strength of the assumptions
A Model Relating (Log)Wage to Gender and Experience
![Page 3: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/3.jpg)
Cornwell and Rupert Panel DataCornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 YearsVariables in the file areEXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationLWAGE = log of wage = dependent variable in regressionsThese data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.
![Page 4: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/4.jpg)
![Page 5: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/5.jpg)
Nonparametric RegressionKernel regression of y on x
Semiparametric Regression: Least absolute deviations regression of y on x
Parametric Regression: Least squares – maximum likelihood – regression of y on x
Application: Is there a relationship between Log(wage) and Education?
![Page 6: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/6.jpg)
A First Look at the DataDescriptive Statistics
• Basic Measures of Location and Dispersion
• Graphical Devices• Box Plots• Histogram• Kernel Density Estimator
![Page 7: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/7.jpg)
![Page 8: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/8.jpg)
Box Plots
![Page 9: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/9.jpg)
From Jones and Schurer (2011)
![Page 10: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/10.jpg)
Histogram for LWAGE
![Page 11: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/11.jpg)
![Page 12: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/12.jpg)
The kernel density estimator is ahistogram (of sorts).
n i mm mi 1
** *x x1 1f̂(x ) K , for a set of points x
n B B
B "bandwidth" chosen by the analystK the kernel function, such as the normal or logistic pdf (or one of several others)x* the point at which the density is approximated.This is essentially a histogram with small bins.
![Page 13: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/13.jpg)
Kernel Density Estimator
n i mm mi 1
** *x x1 1f̂(x ) K , for a set of points x
n B B
B "bandwidth"K the kernel functionx* the point at which the density is approximated.
f̂(x*) is an estimator of f(x*)1
The curse of dimensionality
nii 1
3/5
Q(x | x*) Q(x*). n
1 1But, Var[Q(x*)] Something. Rather, Var[Q(x*)] * SomethingN N
ˆI.e.,f(x*) does not converge to f(x*) at the same rate as a meanconverges to a population mean.
![Page 14: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/14.jpg)
Kernel Estimator for LWAGE
![Page 15: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/15.jpg)
From Jones and Schurer (2011)
![Page 16: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/16.jpg)
Objective: Impact of Education on (log) Wage
• Specification: What is the right model to use to analyze this association?
• Estimation• Inference• Analysis
![Page 17: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/17.jpg)
Simple Linear RegressionLWAGE = 5.8388 + 0.0652*ED
![Page 18: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/18.jpg)
Multiple Regression
![Page 19: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/19.jpg)
Specification: Quadratic Effect of Experience
![Page 20: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/20.jpg)
Partial Effects
Education: .05654Experience .04045 - 2*.00068*ExpFEM -.38922
![Page 21: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/21.jpg)
Model Implication: Effect of Experience and Male vs. Female
![Page 22: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/22.jpg)
Hypothesis Test About Coefficients• Hypothesis
• Null: Restriction on β: Rβ – q = 0• Alternative: Not the null
• Approaches• Fitting Criterion: R2 decrease under the null?• Wald: Rb – q close to 0 under the
alternative?
![Page 23: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/23.jpg)
HypothesesAll Coefficients = 0?R = [ 0 | I ] q = [0]
ED Coefficient = 0?R = 0,1,0,0,0,0,0,0,0,0,0q = 0
No Experience effect?R = 0,0,1,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0q = 0 0
![Page 24: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/24.jpg)
Hypothesis Test Statistics
2
2 21 0
121 1
Subscript 0 = the model under the null hypothesisSubscript 1 = the model under the alternative hypothesis
1. Based on the Fitting Criterion R
(R -R ) / J F = =F[J,N-K ]
(1-R ) / (N-K )
2. Bas
-12 -1
1 1
ed on the Wald Distance : Note, for linear models, W = JF.
Chi Squared = ( - ) s ( ) ( - )Rb q R X X R Rb q
![Page 25: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/25.jpg)
Hypothesis: All Coefficients Equal Zero
All Coefficients = 0?R = [0 | I] q = [0]R1
2 = .41826R0
2 = .00000F = 298.7 with [10,4154]Wald = b2-11[V2-11]-1b2-11
= 2988.3355Note that Wald = JF = 10(298.7)(some rounding error)
![Page 26: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/26.jpg)
Hypothesis: Education Effect = 0ED Coefficient = 0?R = 0,1,0,0,0,0,0,0,0,0,0,0q = 0R1
2 = .41826R0
2 = .35265 (not shown)F = 468.29Wald = (.05654-0)2/(.00261)2
= 468.29Note F = t2 and Wald = FFor a single hypothesis about 1 coefficient.
![Page 27: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/27.jpg)
Hypothesis: Experience Effect = 0No Experience effect?R = 0,0,1,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0q = 0 0R0
2 = .33475, R12 = .41826
F = 298.15Wald = 596.3 (W* = 5.99)
![Page 28: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/28.jpg)
Built In Test
![Page 29: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/29.jpg)
Robust Covariance Matrix
• What does robustness mean?• Robust to: Heteroscedasticty• Not robust to:
• Autocorrelation• Individual heterogeneity• The wrong model specification
• ‘Robust inference’
-1 2 -1i i ii
The White Estimator
Est.Var[ ] = ( ) e ( )b X X x x X X
![Page 30: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/30.jpg)
Robust Covariance Matrix
Uncorrected
![Page 31: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/31.jpg)
Bootstrapping and Quantile Regresion
![Page 32: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/32.jpg)
Estimating the Asymptotic Variance of an Estimator
• Known form of asymptotic variance: Compute from known results
• Unknown form, known generalities about properties: Use bootstrapping• Root N consistency• Sampling conditions amenable to central limit
theorems• Compute by resampling mechanism within the
sample.
![Page 33: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/33.jpg)
BootstrappingMethod:
1. Estimate parameters using full sample: b2. Repeat R times:
Draw n observations from the n, with replacement
Estimate with b(r). 3. Estimate variance with
V = (1/R)r [b(r) - b][b(r) - b]’ (Some use mean of replications instead of b.
Advocated (without motivation) by original designers of the method.)
![Page 34: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/34.jpg)
Application: Correlation between Age and Education
![Page 35: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/35.jpg)
Bootstrap Regression - Replications
namelist;x=one,y,pg$ Define Xregress;lhs=g;rhs=x$ Compute and
display bproc Define
procedureregress;quietly;lhs=g;rhs=x$ … Regression
(silent)endproc Ends
procedureexecute;n=20;bootstrap=b$ 20 bootstrap repsmatrix;list;bootstrp $ Display replications
![Page 36: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/36.jpg)
--------+-------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| -79.7535*** 8.67255 -9.196 .0000 Y| .03692*** .00132 28.022 .0000 9232.86 PG| -15.1224*** 1.88034 -8.042 .0000 2.31661--------+-------------------------------------------------------------Completed 20 bootstrap iterations.----------------------------------------------------------------------Results of bootstrap estimation of model.Model has been reestimated 20 times.Means shown below are the means of thebootstrap estimates. Coefficients shownbelow are the original estimates basedon the full sample.bootstrap samples have 36 observations.--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- B001| -79.7535*** 8.35512 -9.545 .0000 -79.5329 B002| .03692*** .00133 27.773 .0000 .03682 B003| -15.1224*** 2.03503 -7.431 .0000 -14.7654--------+-------------------------------------------------------------
Results of Bootstrap Procedure
![Page 37: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/37.jpg)
Bootstrap Replications
Full sample result
Bootstrapped sample results
![Page 38: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/38.jpg)
Quantile Regression• Q(y|x,) = x, = quantile• Estimated by linear programming• Q(y|x,.50) = x, .50 median regression• Median regression estimated by LAD (estimates
same parameters as mean regression if symmetric conditional distribution)
• Why use quantile (median) regression?• Semiparametric• Robust to some extensions (heteroscedasticity?)• Complete characterization of conditional distribution
![Page 39: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/39.jpg)
Estimated Variance for Quantile Regression
• Asymptotic Theory
• Bootstrap – an ideal application
![Page 40: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/40.jpg)
1 1
Model : , ( | , ) , [ , ] 0ˆˆResiduals: u
1Asymptotic Variance:
= E[f (0) ] Estimated by
Asymptotic Theory Based Estimator of Variance of Q - REGx | x
A C A
A xx
i i i i i i i i
i i i
u
y u Q y Q u
y
N
βx βx-βx
1
.2
1 1 1 ˆ1 | | BB 2
Bandwidth B can be Silverman's Rule of Thumb: ˆ ˆ( | .75) ( | .25)1.06 ,
1.349(1- )(1- ) [ ] Estimated by
x x
C = xx
Ni i ii
i iu
uN
Q u Q uMin s
N
EN
12For =.5 and normally distributed u, this all simplifies to .2
But, this is an ideal application for bootstrapping
X
X
.
X
Xus
![Page 41: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/41.jpg)
= .25
= .50
= .75
![Page 42: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/42.jpg)
OLS vs. Least Absolute Deviations----------------------------------------------------------------------Least absolute deviations estimator...............Residuals Sum of squares = 1537.58603 Standard error of e = 6.82594Fit R-squared = .98284 Adjusted R-squared = .98180Sum of absolute deviations = 189.3973484--------+-------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X--------+------------------------------------------------------------- |Covariance matrix based on 50 replications.Constant| -84.0258*** 16.08614 -5.223 .0000 Y| .03784*** .00271 13.952 .0000 9232.86 PG| -17.0990*** 4.37160 -3.911 .0001 2.31661--------+-------------------------------------------------------------Ordinary least squares regression ............Residuals Sum of squares = 1472.79834 Standard error of e = 6.68059 Standard errors are based onFit R-squared = .98356 50 bootstrap replications Adjusted R-squared = .98256--------+-------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| -79.7535*** 8.67255 -9.196 .0000 Y| .03692*** .00132 28.022 .0000 9232.86 PG| -15.1224*** 1.88034 -8.042 .0000 2.31661--------+-------------------------------------------------------------
![Page 43: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/43.jpg)
![Page 44: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/44.jpg)
![Page 45: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/45.jpg)
Nonlinear Models
![Page 46: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/46.jpg)
Nonlinear Models• Specifying the model
• Multinomial Choice• How do the covariates relate to the
outcome of interest• What are the implications of the
estimated model?
![Page 47: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/47.jpg)
![Page 48: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/48.jpg)
Unordered Choices of 210 Travelers
![Page 49: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/49.jpg)
Data on Discrete Choices
![Page 50: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/50.jpg)
Specifying the Probabilities• Choice specific attributes (X) vary by choices, multiply by generic coefficients. E.g., TTME=terminal time, GC=generalized cost of travel mode• Generic characteristics (Income, constants) must be interacted
with choice specific constants. • Estimation by maximum likelihood; dij = 1 if person i chooses j
],
itj it i,t,j i,t,k
j itj j itJ(i,t)
j itj j itj=1
N J(i)iji=1 j=1
P[choice = j | , ,i, t] = Prob[U U k = 1,...,J(i, t)
exp(α + + ' ) =
exp(α + ' + ' )
logL = d lo
x zβ'x γ z
β x γ z
ijgP
![Page 51: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/51.jpg)
Estimated MNL Model
],
itj it i,t,j i,t,k
j itj j itJ(i,t)
j itj j itj=1
P[choice = j | , ,i,t] = Prob[U U k = 1,...,J(i, t)
exp(α + + ' ) =
exp(α + ' + ' )
x zβ'x γ z
β x γ z
![Page 52: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/52.jpg)
Endogeneity
![Page 53: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/53.jpg)
The Effect of Education on LWAGE
1 2 3 4 ... ε
What is ε? ,... + everything elAbil seity, Motivation
Ability, Motivation = f( , , , ,...)
LWAGE EDUC EXP
EDUC GENDER SMSA SOUTH
2EXP
![Page 54: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/54.jpg)
What Influences LWAGE?
1 2
3 4
Ability, Motivation
Ability, Motivat
( , ,...)
... ε( )
Increased is associated with increases in
ion
AbilityAbility, Motivati( , ,on
LWAGE EDUC XEXP
EDUC X
2EXP
2
...) and ε( )What looks like an effect due to increase in maybe an increase in . The estimate of picks up the effect of and the hidden effect of .
Ability, Motivation
AbilityAbility
EDUC
EDUC
![Page 55: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/55.jpg)
An Exogenous Influence
1 2
3 4
( , , ,...)
... ε( )
Increased is asso
Abili
ciate
ty, Motivation
Ability, Motivation
Ability, Motivad with increases in
( , , ,ti .n .o
LWAGE EDU Z
ZZ
C XEXP
EDUC X
2EXP
2
.) and not ε( )An effect due to the effect of an increase on willonly be an increase in . The estimate of picks up the effect of only.
Ability, Motiv
ationEDUC
EDUCED
Z
Z UCis an Instrumental Variable
![Page 56: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/56.jpg)
Instrumental Variables• Structure
• LWAGE (ED,EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION)
• ED (MS, FEM)
• Reduced Form: LWAGE[ ED (MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]
![Page 57: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/57.jpg)
Two Stage Least Squares Strategy• Reduced Form:
LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]
• Strategy • (1) Purge ED of the influence of everything but
MS, FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z).
• (2) Regress LWAGE on this prediction of ED and everything else.
• Standard errors must be adjusted for the predicted ED
![Page 58: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/58.jpg)
The weird results for the coefficient on ED happened because the instruments, MS and FEM are dummy variables. There is not enough variation in these variables.
![Page 59: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/59.jpg)
Source of Endogeneity• LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) +
• ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u
![Page 60: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/60.jpg)
Remove the Endogeneity• LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u +
• Strategy Estimate u Add u to the equation. ED is uncorrelated with
when u is in the equation.
![Page 61: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/61.jpg)
Auxiliary Regression for ED to Obtain Residuals
![Page 62: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/62.jpg)
OLS with Residual (Control Function) Added
2SLS
![Page 63: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/63.jpg)
A Warning About Control Function
![Page 64: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/64.jpg)
Endogenous Dummy Variable• Y = xβ + δT + ε (unobservable factors)• T = a dummy variable (treatment)• T = 0/1 depending on:
• x and z• The same unobservable factors
• T is endogenous – same as ED
![Page 65: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/65.jpg)
Application: Health Care Panel DataGerman Health Care Usage Data,Variables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education
![Page 66: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/66.jpg)
A study of moral hazardRiphahn, Wambach, Million: “Incentive Effects in the Demand for Healthcare”Journal of Applied Econometrics, 2003
Did the presence of the ADDON insurance influence the demand for health care – doctor visits and hospital visits?
For a simple example, we examine the PUBLIC insurance (89%) instead of ADDON insurance (2%).
![Page 67: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/67.jpg)
Evidence of Moral Hazard?
![Page 68: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/68.jpg)
Regression Study
![Page 69: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/69.jpg)
Endogenous Dummy Variable
• Doctor Visits = f(Age, Educ, Health, Presence of Insurance, Other unobservables)
• Insurance = f(Expected Doctor Visits, Other unobservables)
![Page 70: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/70.jpg)
Approaches• (Parametric) Control Function: Build a
structural model for the two variables (Heckman)
• (Semiparametric) Instrumental Variable: Create an instrumental variable for the dummy variable (Barnow/Cain/ Goldberger, Angrist, current generation of researchers)
• (?) Propensity Score Matching (Heckman et al., Becker/Ichino, Many recent researchers)
![Page 71: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/71.jpg)
Heckman’s Control Function Approach• Y = xβ + δT + E[ε|T] + {ε - E[ε|T]}• λ = E[ε|T] , computed from a model for whether T = 0 or 1
Magnitude = 11.1200 is nonsensical in this context.
![Page 72: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/72.jpg)
Instrumental Variable Approach• Construct a prediction for T using only the exogenous information• Use 2SLS using this instrumental variable.
Magnitude = 23.9012 is also nonsensical in this context.
![Page 73: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/73.jpg)
Propensity Score Matching• Create a model for T that produces probabilities for T=1: “Propensity
Scores”• Find people with the same propensity score – some with T=1, some
with T=0• Compare number of doctor visits of those with T=1 to those with T=0.
![Page 74: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/74.jpg)
Panel Data
![Page 75: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/75.jpg)
Benefits of Panel Data• Time and individual variation in behavior
unobservable in cross sections or aggregate time series
• Observable and unobservable individual heterogeneity
• Rich hierarchical structures• More complicated models• Features that cannot be modeled with only
cross section or aggregate time series data alone
• Dynamics in economic behavior
![Page 76: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/76.jpg)
![Page 77: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/77.jpg)
![Page 78: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/78.jpg)
![Page 79: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/79.jpg)
![Page 80: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/80.jpg)
![Page 81: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/81.jpg)
![Page 82: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/82.jpg)
![Page 83: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/83.jpg)
![Page 84: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/84.jpg)
![Page 85: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/85.jpg)
![Page 86: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/86.jpg)
Application: Health Care UsageGerman Health Care Usage Data This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987. Downloaded from the JAE Archive.Variables in the file include DOCTOR = 1(Number of doctor visits > 0) HOSPITAL = 1(Number of hospital visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 INCOME = household nominal monthly net income in German marks / 10000. (4 observations with income=0 will sometimes be dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status
![Page 87: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/87.jpg)
Balanced and Unbalanced Panels• Distinction: Balanced vs. Unbalanced
Panels• A notation to help with mechanics
zi,t, i = 1,…,N; t = 1,…,Ti• The role of the assumption
• Mathematical and notational convenience: Balanced, n=NT Unbalanced:
• Is the fixed Ti assumption ever necessary? Almost never.
• Is unbalancedness due to nonrandom attrition from an otherwise balanced panel? This would require special considerations.
Nii=1n T
![Page 88: 1. Descriptive Tools, Regression, Panel Data](https://reader035.fdocuments.net/reader035/viewer/2022070421/56816323550346895dd39e09/html5/thumbnails/88.jpg)
An Unbalanced Panel: RWM’s GSOEP Data on Health Care