Pooled Cross-Section Pooled Cross-Section Time Series DataTime Series Data
Wooldridge Chapters 13 and 14Wooldridge Chapters 13 and 14
22
Types of DataTypes of Data Pooled Cross SectionsPooled Cross Sections: Independent cross : Independent cross
section data at different points in time. section data at different points in time.
Panel / LongitudinalPanel / Longitudinal: Uniquely identified : Uniquely identified cross section units (cross section units (ii) followed over time.) followed over time.• Balanced Panel: All Balanced Panel: All ii appear in every period. appear in every period.• Unbalanced Panel: Some Unbalanced Panel: Some i i are missing for are missing for
some time periods.some time periods.
33
Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2
ii tt Consumption (Y)Consumption (Y) Income Income (X)(X)
11 11 7272 9898
11 22 7575 102102
22 11 3131 4040
22 22 2626 3939
33 11 5555 6666
33 22 6262 7070
44 11 4141 5959
44 22 4545 6060
44
YYitit = B = B00 + B + B11XXitit + e + eitit
BB11 = 0.72, but how to interpret? = 0.72, but how to interpret?
2040
6080
40 60 80 100x
Fitted values y
55
Interpreting CoefficientsInterpreting Coefficients
YYitit = B = B00 + B + B11XXitit + e + eitit
jtit
jtit
it
it
XX
YY
X
YB
1Change in Yi across individuals at time t.
itti
itti
it
it
XX
YY
X
YB
1,
1,1
Change in Yt over time for a given individual.
66
Use intercept dummies to Use intercept dummies to differentiate between “time” and differentiate between “time” and
“type” effects “type” effects
Time DummiesTime Dummies: the effect of being : the effect of being in time in time period 2period 2 vs. time vs. time period 1period 1 on on the expected value of the expected value of YYitit, holding all , holding all else constant.else constant.
Type DummiesType Dummies: : the effect of being the effect of being of of type Btype B vs. vs. type Atype A on the expected on the expected value of value of YYitit, holding all else constant., holding all else constant.
77
Time DummiesTime Dummies
Let Let DD2,t2,t = 0 = 0 if if t = 1t = 1
1 if 1 if t = 2t = 2
YYitit = B = B00 + + TTDD2,t2,t + e + eitit
)( 12 YYT Where is the mean at time 2 across all i.
2Y
88
Example: Two Period Panel Data Example: Two Period Panel Data with Time Dummywith Time Dummy
ii tt DDTT (Y(Yitit)) (X(Xitit))
11 11 00 7272 9898
11 22 11 7575 102102
22 11 00 3131 4040
22 22 11 2626 3939
33 11 00 5555 6666
33 22 11 6262 7070
44 11 00 4141 5959
44 22 11 4545 6060
99
Time Dummy ExampleTime Dummy Example
sum y if t==1sum y if t==1
Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 49.7549.75
sum y if t==2sum y if t==2
Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 52 52
Reg y dtReg y dtCoeff = 52 - 49.75 = 2.25Coeff = 52 - 49.75 = 2.25
1010
2040
6080
40 60 80 100x
Fitted values y
Time Dummy represents shift of regression line from period Time Dummy represents shift of regression line from period 1 to period 2. When regressed on Y1 to period 2. When regressed on Yitit along with X along with Xitit::
t=2
t=1
T = 2.25
1111
Type DummiesType Dummies Separate cross-sectional dimension of Separate cross-sectional dimension of
sample into qualitative “types” (e.g. male vs. sample into qualitative “types” (e.g. male vs. female, rural vs. urban, foreign vs. domestic, female, rural vs. urban, foreign vs. domestic, treatment vs. controltreatment vs. control, etc.), etc.)
Let DLet DiBiB = 1 if individual = 1 if individual i i is Type Bis Type B = 0 otherwise.= 0 otherwise.
YYitit = B = B00 + + BBDDiBiB + e + eitit )( ABB YY
When Xit is included in regression, B represents shift in intercept.
1212
Example: Two Period Panel Data Example: Two Period Panel Data with Type Dummywith Type Dummy
ii tt TypeType DDBB (Y(Yitit)) (X(Xitit))
11 11 AA 00 7272 9898
11 22 AA 00 7575 102102
22 11 BB 11 3131 4040
22 22 BB 11 2626 3939
33 11 BB 11 5555 6666
33 22 BB 11 6262 7070
44 11 AA 00 4141 5959
44 22 AA 00 4545 6060
1313
From Simple ExampleFrom Simple Example
reg y dbreg y db y | Coef. Std. Err. ty | Coef. Std. Err. t db | -14.75 12.51582 -1.18 db | -14.75 12.51582 -1.18 _cons | 58.25 8.850024 6.58 _cons | 58.25 8.850024 6.58
sum y if db==1sum y if db==1Variable | Obs Mean Variable | Obs Mean y | 4 43.5 y | 4 43.5
sum y if db==0sum y if db==0Variable | Obs MeanVariable | Obs Mean y | 4 58.25 y | 4 58.25
Coefficient = difference in meansCoefficient = difference in means= 43.5 - 58.25 = -14.75= 43.5 - 58.25 = -14.75
1414
Type Dummy represents shift of regression line from type B Type Dummy represents shift of regression line from type B to Type A. When regressed on Yto Type A. When regressed on Yitit along with X along with Xitit::
2040
6080
40 60 80 100x
Fitted values y
type=A
type=B
B = -14.25
1515
Difference-in-Differences EstimatorDifference-in-Differences Estimator Estimates the difference across types, and Estimates the difference across types, and
over time, using simple dummy variable over time, using simple dummy variable framework.framework.
Excellent for policy analysis. Takes advantage Excellent for policy analysis. Takes advantage of “natural experiment” quality of panel data.of “natural experiment” quality of panel data.
Can be expanded beyond two period Can be expanded beyond two period framework.framework.
Examples: stadium construction, natural Examples: stadium construction, natural disaster, water treatment facility, tax cuts.disaster, water treatment facility, tax cuts.
1616
Use interaction term between type Use interaction term between type and time dummies.and time dummies.
)()( 1,1,2,2,
,,2,1,2010
ABABDD
ititBtDDitBtitit
YYYY
eDDDDXBBY
Difference
“After”
Difference
“Before”
1717
Difference CoefficientDifference Coefficient
Also known as “Average Treatment Also known as “Average Treatment Effect”,Effect”,
Can also be written asCan also be written as
)()( 1,2,1,2, AABBDD YYYY
Treatment Impact on ‘treated’
Treatment Impact on control group.
1818
D-in-D exampleD-in-D example
ii tt TypeType DDBB D2D2TT DDBB*D2*D2TT (Y(Yitit)) (X(Xitit))
11 11 AA 00 00 00 7272 9898
11 22 AA 00 11 00 7575 102102
22 11 BB 11 00 00 3131 4040
22 22 BB 11 11 11 2626 3939
33 11 BB 11 00 00 5555 6666
33 22 BB 11 11 11 6262 7070
44 11 AA 00 00 00 4141 5959
44 22 AA 00 11 00 4545 6060
1919
From simple exampleFrom simple example
Reg y db d2 dd Reg y db d2 dd
y | Coef. Std. Err. t y | Coef. Std. Err. t ---------------------------------------------------------------------------------------- db | -13.5 21.6015 -0.62db | -13.5 21.6015 -0.62 d2 | 3.5 21.6015 0.16d2 | 3.5 21.6015 0.16 dd | dd | -2.5-2.5 30.54914 -0.08 30.54914 -0.08_cons | 56.5 15.27457 3.70 _cons | 56.5 15.27457 3.70
Mean of y for type b when t=2: Mean of y for type b when t=2: 44.0044.00Mean of y for type a when t=2: Mean of y for type a when t=2: 60.0060.00 Mean of y for type b when t=1: Mean of y for type b when t=1: 43.0043.00Mean of y for type a when t=1: Mean of y for type a when t=1: 56.5056.50
Coefficient = (44.00 - 60.00) - (43.00- 56.50) Coefficient = (44.00 - 60.00) - (43.00- 56.50)
= (-16)-(-13.5) = = (-16)-(-13.5) = -2.5-2.5
2020
t=1 t=2
1,BY
)(
)(
1,2,
1,2,
AA
BBDD
YY
YY
1,AY
2,AY
2,BY
How much more did treatment group (B) outcome increase than control group (A) from time 1 to time 2?
2121
Panel Data Problem!Panel Data Problem!Unobserved HeterogeneityUnobserved Heterogeneity
There exist characteristics of There exist characteristics of each each
individual that persistindividual that persist over timeover time
which cannot be included in the which cannot be included in the
regression (unobservable in available regression (unobservable in available
data), but which none-the-less impact data), but which none-the-less impact
the observed variation in our the observed variation in our
dependent variable.dependent variable.
2222
Composite ErrorsComposite Errors
These These time-invarianttime-invariant unobserved unobserved
effects are best modeled as a effects are best modeled as a
component in the regression error term.component in the regression error term.
It is this It is this ““composite errorcomposite error”” approach approach
that sets apart panel regression from that sets apart panel regression from
OLS. OLS.
2323
ExamplesExamples
Unobservable Unobservable motivational skillsmotivational skills of of firm manager in a production function.firm manager in a production function.
Skills, charisma, connections, Skills, charisma, connections, nepotismnepotism in a wage model. in a wage model.
Levels of unobserved macro-level Levels of unobserved macro-level institutional institutional corruption or corruption or inefficiencyinefficiency in a cross-sectional in a cross-sectional growth model.growth model.
2424
The Composite Error ModelThe Composite Error Model
YYitit = B = B00 + + XXitit + v + vitit
WhereWhere vvitit = u = uitit + a + aii is the composite is the composite error, and… error, and…
uuitit is the random, time-varying is the random, time-varying idiosyncratic error.idiosyncratic error.
aaii is the is the time invariant errortime invariant error component.component.
2525
The Composite Error ProblemsThe Composite Error Problems
1.)1.) If If COV(COV(aaii, X, Xitit) ) 0 0, then OLS estimates , then OLS estimates will be biased.will be biased.
Very much like simultaneous equations Very much like simultaneous equations (endogeneity) bias, but here covariance (endogeneity) bias, but here covariance with error term will only involve with error term will only involve cross cross sectional variation.sectional variation.
2626
Composite Error BiasComposite Error Bias
221
221
21
10
)(
),(
)(
),()ˆ(
)(
)(
)(
)()ˆ(
)(
)()(ˆ
XX
uXCOV
XX
aXCOVBE
XX
uXXE
XX
aXXEBE
XX
uaXXB
uaXBBY
it
itit
it
iit
it
itit
it
iit
it
itiit
itiitit
2727
Examples:Examples: 1. manager charisma correlated with 1. manager charisma correlated with firm size in production function.firm size in production function. 2. Nepotism/networking correlated 2. Nepotism/networking correlated with education in wage equation. with education in wage equation.
3. Institutional quality associated 3. Institutional quality associated with development in corruption with development in corruption equation. equation.
2828
2.) Since 2.) Since aaii represents a time-invariant represents a time-invariant
component of the error, composite errors component of the error, composite errors will be correlated over time – will be correlated over time –
Serial CorrelationSerial Correlation is the result: is the result:
Corr(Corr(vvitit, v, vi,t+si,t+s) ) 0 0
Estimates will not be biased, but Estimates will not be biased, but goodness of fit and significance of goodness of fit and significance of coefficients will be overstated.coefficients will be overstated.
2929
How to deal with the Composite How to deal with the Composite Error problem? Error problem?
Pooled OLSPooled OLS – do nothing about it. – do nothing about it.
First DifferenceFirst Difference – eliminate – eliminate aaii..
Dummy VariablesDummy Variables – estimate the – estimate the aaii when N when N smallsmall
Fixed EffectsFixed Effects. – estimate . – estimate aaii when when NN large. large.
Random EffectsRandom Effects. – account for serial correlation. – account for serial correlation
3030
First Difference Transformation (two First Difference Transformation (two period panel) with Time dummyperiod panel) with Time dummy
YYitit = B = B00 + + 00DDTtTt + + XXitit + a + aii + u + uitit
For Period 2: For Period 2:
YYi2i2 = (B = (B00 + + 00) + ) + XXi2i2 + a + aii + u + ui2i2
For Period 1: For Period 1:
YYi1i1 = B = B00 + + XXi1i1 + a + aii + u + ui1i1
First Difference = First Difference = YYii = = YYi2i2 – Y – Yi1i1
YYii = = 00 + B + B11(X(Xi2i2 – X – Xi1i1) + (u) + (ui2i2 – u – ui1i1))
YYii = = 00 + B + B11((XXii) + ) + uuii
3131
First DifferenceFirst Difference
Transformation eliminates Transformation eliminates aaii terms.terms.
Corrects for heterogeneity bias and serial Corrects for heterogeneity bias and serial correlation.correlation.
Problems:Problems:• 1. 1. Eliminates all time invariant variablesEliminates all time invariant variables (type (type
dummies)dummies)
• 2. 2. Eliminates time dimensionEliminates time dimension in two period in two period panel (reduces panel (reduces TT by 1 in general) by 1 in general)
3232
““Type” Dummy Variables for each Type” Dummy Variables for each ii If If aaii terms are viewed as coefficients to be terms are viewed as coefficients to be
estimated, a dummy can be constructed estimated, a dummy can be constructed
that uniquely identifies each individual in that uniquely identifies each individual in
the sample.the sample.
Dummy coefficient will represent effect of Dummy coefficient will represent effect of
the sum of all unobserved attributes.the sum of all unobserved attributes.
3333
Type DummiesType Dummies
Solves ‘time invariant bias’ problem Solves ‘time invariant bias’ problem by removing by removing aaii from error from error component, and directly estimating component, and directly estimating the effects.the effects.
Obvious problem is that degrees of Obvious problem is that degrees of freedom are vastly reduced. freedom are vastly reduced. Requires a large number of time Requires a large number of time periods relative to cross sectional periods relative to cross sectional units.units.
3434
Example: 4 country panel over 250 Example: 4 country panel over 250 monthsmonths
Step 1: append the separate country data Step 1: append the separate country data files:files:
use c:/stata627/nfa/canada.dtause c:/stata627/nfa/canada.dta append using append using c:/stata627/nfa/italy.dtac:/stata627/nfa/italy.dta
append using append using c:/stata627/nfa/japan.dtac:/stata627/nfa/japan.dta
append using c:/stata627/nfa/uk.dtaappend using c:/stata627/nfa/uk.dta tsset code timetsset code time
3535
Dummy Example – estimates of Dummy Example – estimates of aaii
xi:reg y cpi r er i.codexi:reg y cpi r er i.codei.code _Icode_1-5 (naturally coded; _Icode_1 omitted)i.code _Icode_1-5 (naturally coded; _Icode_1 omitted)
Number of obs = 990Number of obs = 990Prob > F = 0.0000Prob > F = 0.0000R-squared = 0.7464R-squared = 0.7464Adj R-squared = 0.7448Adj R-squared = 0.7448------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------+---------------------------------------- cpi | .3817633 .0117365 32.53 0.000 cpi | .3817633 .0117365 32.53 0.000 r | -.4944136 .0780945 -6.33 0.000 r | -.4944136 .0780945 -6.33 0.000 er | -.0196729 .0014589 -13.49 0.000 er | -.0196729 .0014589 -13.49 0.000 _Icode_3_Icode_3 | | 26.41765 26.41765 2.053128 12.87 0.000 2.053128 12.87 0.000 _Icode_4_Icode_4 | | -12.51685-12.51685 .6298041 -19.87 0.000 .6298041 -19.87 0.000 _Icode_5_Icode_5 | | -1.729212-1.729212 .5753217 -3.01 0.003 .5753217 -3.01 0.003 _cons | 67.36739 1.632653 41.26 0.000 _cons | 67.36739 1.632653 41.26 0.000 -------------------------------------------------------------------------------------------------------------- Code 1 = Canada, omittedCode 1 = Canada, omitted Code 3 = Italy, positive estimate of aCode 3 = Italy, positive estimate of aii Code 4 = Japan, negative aCode 4 = Japan, negative aii Code 5 = UK, negative aCode 5 = UK, negative aii
3636
Fixed EffectsFixed Effects Assume CORR(Assume CORR(aaii, X, Xitit) ) 0, but 0, but CORR(uCORR(uitit, X, Xitit) = 0.) = 0.
An alternative to the first difference An alternative to the first difference transformation is the “transformation is the “Time De-Time De-meaningmeaning” transformation of the ” transformation of the fixed fixed effects modeleffects model..
Results in a model essentially identical to Results in a model essentially identical to the Dummy model, without having to the Dummy model, without having to estimate estimate N-1 N-1 dummy coefficients.dummy coefficients.
3737
Fixed Effects TransformationFixed Effects Transformation
)()()(
)3(
)()()()(
:period each time level thefrommean heSubtract t
element. timeout the takings,individualbetween
variationshowsonly equation, between"" theis )2(
(2)
1 over time individualeach for avg.
model original )1(
1
1
100
10
1
10
ititit
iitiitiit
iitiiiitiit
iiii
T
titi
itiitit
uxy
uuxxyy
uuaaxxyy
uaxy
yT
y
uaxy
3838
unbiased. is ˆ
0) ,(
)()()( )3(
1
1
1
FE
itit
ititit
iitiitiit
uxCORR
uxy
uuxxyy
Fixed Effects Regression is equivalent to running OLS on Equation 3:
This is also known as the “within” estimation equation, as it shows the variation within a group over time.
3939
Fixed Effects CoefficientsFixed Effects Coefficients
Will have same “two-dimension” Will have same “two-dimension” interpretation as pooled OLS.interpretation as pooled OLS.
Variation in transformed variables are Variation in transformed variables are same as in same as in YYitit and and XXitit..
it
it
it
it
X
Y
X
YB
1
4040
Fixed Effects Transformation With Fixed Effects Transformation With Time-Invariant Time-Invariant Dummy Independent VariableDummy Independent Variable
.eliminated are and Both :Problem
)()()(
)3(
)()()()()(
(2)
11 over time individualeach for avg.
ieach for invariant timeand ,)1,0(
)1(
1
1
0100
010
1
010
ii
ititit
iitiitiit
iitiiiiiitiit
iiiii
T
tiiiti
it
itiititit
Da
uxy
uuxxyy
uuaaDDxxyy
uaDxy
DTDT
DT
D
D
uaDxy
4141
Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2
ii tt ((YYitit))
11 11 7272 73.573.5 -1.5-1.5
11 22 7575 73.573.5 1.51.5
22 11 3131 28.528.5 2.52.5
22 22 2626 28.528.5 -2.5-2.5
33 11 5555 58.558.5 -3.5-3.5
33 22 6262 58.558.5 3.53.5
44 11 4141 4343 -2-2
44 22 4545 4343 22
itiit YYY )(iY
4242
Goodness of FitGoodness of Fit
A fixed effects regression returns A fixed effects regression returns threethree “R-square” measures. They are each “R-square” measures. They are each actually squared correlations between actually squared correlations between predicted and observed values:predicted and observed values:
1. 1. Within RWithin R22: fitted : fitted de-meaned yde-meaned yitit
2. 2. Between RBetween R22: fitted : fitted y_bary_barii
3. 3. Overall ROverall R22: fitted : fitted yyit it (pooled OLS)(pooled OLS)
4343
Panel Regressions in Panel Regressions in StataStata XT = cross-section time series.XT = cross-section time series. ““xtreg y x, fe” will run a panel fixed xtreg y x, fe” will run a panel fixed
effects regression.effects regression.
Must declare your “i” and “t” identifiers:Must declare your “i” and “t” identifiers:• tsset code time, for example.tsset code time, for example.
Unfortunately, Unfortunately, StataStata refers to the time- refers to the time-invariant error component (our invariant error component (our aaii) as ) as u_iu_i..
4444
Fixed Effects Fixed Effects StataStata Example Examplextreg y cpi r er,fextreg y cpi r er,fe
Fixed-effects (within) regression Number of obs = 990Fixed-effects (within) regression Number of obs = 990Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4
R-sq: within = 0.7071 Obs per group: min = 244R-sq: within = 0.7071 Obs per group: min = 244 between = 0.0335 avg = 247.5between = 0.0335 avg = 247.5 overall = 0.1827 max = 250overall = 0.1827 max = 250
F(3,983) = 791.14F(3,983) = 791.14corr(u_i, Xb) = -0.7495 Prob > F = 0.0000corr(u_i, Xb) = -0.7495 Prob > F = 0.0000
------------------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948 r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625 er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101 _cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715_cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- sigma_u | 16.538008 (std. error of time-invariant error) sigma_u | 16.538008 (std. error of time-invariant error) sigma_e | 6.3818613 (std. error of idiosyncratic error) sigma_e | 6.3818613 (std. error of idiosyncratic error) rho | .87038904 (fraction of variance due to u_i)rho | .87038904 (fraction of variance due to u_i)------------------------------------------------------------------------------------------------------------------------------------------------------------
F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000
4545
Random EffectsRandom Effects Assumes CORR(Assumes CORR(aaii, X, Xitit) = 0.) = 0. Therefore, OLS coefficients will not Therefore, OLS coefficients will not
suffer “composite error bias”, as was suffer “composite error bias”, as was assumed with Fixed Effects.assumed with Fixed Effects.
we do not need to we do not need to eliminateeliminate aaii terms.terms.
Although Although aaii terms do not truly have terms do not truly have to be “randomly” assigned, there is to be “randomly” assigned, there is no structural relationship between no structural relationship between aaii and and XXitit in a correctly specified model.in a correctly specified model.
4646
Random EffectsRandom Effects
Even when Even when CORR(CORR(aaii, X, Xitit) = 0, ) = 0, we still have we still have to account for the serial correlation to account for the serial correlation introduced by the introduced by the aaii error component.error component.
A “Quasi-demeaned” data transformation is A “Quasi-demeaned” data transformation is used to accomplish this, wherein used to accomplish this, wherein aaii are are altered but not eliminated.altered but not eliminated.
A bonus is that time-invariant dummies are A bonus is that time-invariant dummies are not eliminated.not eliminated.
4747
Random Effects AssumptionsRandom Effects Assumptions 1. E(a1. E(aii |X |Xitit) = E(a) = E(aii) = 0, ) = 0,
• independence of aindependence of aii’s’s and X’s. cov(aand X’s. cov(aii,X,Xitit)=0)=0
2. E(u2. E(uitit | X | Xitit, a, aii) = 0) = 0
3. E(u3. E(uitituuisis) = cov(u) = cov(uitit,u,uisis) = 0 for all t≠s.) = 0 for all t≠s.
4. E(u4. E(uitit22 |X |Xitit,a,aii) = ) = 22
uu = constant = constant
5. E(a5. E(aii22 | X | Xitit) = Var(a) = Var(aii) = ) = 22
aa
4848
Random EffectsRandom Effects Under the preceding criteria, the Under the preceding criteria, the
composite error does not violate OLS composite error does not violate OLS assumptions.assumptions.
Unnecessarily eliminating the Unnecessarily eliminating the aaii terms terms will cause estimates to be will cause estimates to be inefficientinefficient..
Don’t use Don’t use Fixed EffectsFixed Effects unless unless warranted.warranted.
4949
Random EffectsRandom Effects
However, running Pooled OLS will not However, running Pooled OLS will not be appropriate because the be appropriate because the composite errors are still serially composite errors are still serially correlated over time.correlated over time.
It can be shown that:It can be shown that:
stvvcorrua
aisit
,),(
22
2
Where, again: vit = uit + ai
5050
Random EffectsRandom Effects
Random effects transformation is Random effects transformation is
more complicated than FD or FE, but more complicated than FD or FE, but
basic idea is to eliminate serial basic idea is to eliminate serial
correlation in the error term by using correlation in the error term by using
information on variances of fixed and information on variances of fixed and
idiosyncratic errors.idiosyncratic errors.
5151
Random Effects (Random Effects (RERE))
Transformation results in a Transformation results in a weighted weighted
averageaverage of the estimates provided by of the estimates provided by
the “within” and “between” the “within” and “between”
estimators.estimators.
5252
RERE Transformation Transformation
22
2
10
10
10
ˆˆ
ˆ1ˆ where
)()1()()1()(
(1) fromsubtract then
:average weighteda Define
)1(
uai
ui
iitiiitiit
iiii
itiitit
T
uuaxxyy
uaxy
uaxy
5353
It can be shown that the composite error It can be shown that the composite error
term term vvitit augmented by the weighting augmented by the weighting
term term (lambda) will NOT suffer from (lambda) will NOT suffer from
serial correlation.serial correlation.
Corr(Corr(vvitit, , vvisis) = 0) = 0
22
2
ˆˆ
ˆ1ˆGiven
uai
ui T
5454
NOTE:NOTE: If var(If var(aaii) = 0) = 0, meaning , meaning aaii is always zero is always zero
(no time-invariant effects), then lambda (no time-invariant effects), then lambda equals 0 and RE regression is equals 0 and RE regression is equivalent to Pooled OLS equation (1) - equivalent to Pooled OLS equation (1) - all lambda-weighted terms drop out. all lambda-weighted terms drop out.
As As 22aa dominates dominates 22
uu, , aaii terms become terms become more important, more important, goes to 1, and goes to 1, and RERE→FE.→FE.
22
2
ˆˆ
ˆ1ˆGiven
uai
ui T
5555
RE StataRE Stata Example (N=4) Example (N=4)xtreg y cpi r erxtreg y cpi r er
Random-effects GLS regression Number of obs = 990Random-effects GLS regression Number of obs = 990
Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4
R-sq: within = 0.6252 Obs per group:min = 24R-sq: within = 0.6252 Obs per group:min = 24
beween = 0.7702 avg = 247.5beween = 0.7702 avg = 247.5
overall = 0.4662 max = 250overall = 0.4662 max = 250
Random effects u_i ~ Gaussian Wald chi2(3) = 861.17Random effects u_i ~ Gaussian Wald chi2(3) = 861.17
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
--------------------------------------------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| y | Coef. Std. Err. z P>|z|
-------------+---------------------------------------------------------+--------------------------------------------
cpi | .3468475 .0158341 21.91 0.000 cpi | .3468475 .0158341 21.91 0.000
r | .0072637 .1077631 0.07 0.946 r | .0072637 .1077631 0.07 0.946
er | -.0002592 .0005834 -0.44 0.657 er | -.0002592 .0005834 -0.44 0.657
_cons | 61.17895 2.168505 28.21 0.000 _cons | 61.17895 2.168505 28.21 0.000
-------------+---------------------------------------------------------+--------------------------------------------
sigma_u | 0sigma_u | 0
sigma_e | 6.3818613sigma_e | 6.3818613
rho | 0 (fraction of variance due to u_i)rho | 0 (fraction of variance due to u_i)
5656
Fixed vs. Random EffectsFixed vs. Random Effects
As a practical matter, Random Effects is preferred As a practical matter, Random Effects is preferred
when key explanatory variables are time-when key explanatory variables are time-
invariant.invariant.
The Fixed Effects view is that the unobserved The Fixed Effects view is that the unobserved
heterogeneity is in itself an explanatory variable heterogeneity is in itself an explanatory variable
that ideally would have a coefficient to be that ideally would have a coefficient to be
estimated.estimated.
5757
Fixed vs. Random EffectsFixed vs. Random Effects
The The Random EffectsRandom Effects view is that view is that
unobserved heterogeneity is unobserved heterogeneity is
“randomly assigned” to each “randomly assigned” to each
cross sectional entity and not cross sectional entity and not
correlated with other correlated with other
explanatory variables.explanatory variables.
5858
When to use FE vs. RE? When to use FE vs. RE? The Hausman Coefficient TestThe Hausman Coefficient Test
The logic of the test is the following:The logic of the test is the following:• If CORR(If CORR(aaii, , XXitit) ) 0, then RE is biased. 0, then RE is biased.
• If CORR(If CORR(aaii, , XXitit) = 0, then both RE and FE are ) = 0, then both RE and FE are unbiasedunbiased, but it can be shown that RE is more , but it can be shown that RE is more efficient (smaller standard error of efficient (smaller standard error of coefficienents)coefficienents)
• Therefore, if the FE coefficients are significantly Therefore, if the FE coefficients are significantly different from the RE coefficients, then RE must different from the RE coefficients, then RE must be biased, so use FE.be biased, so use FE.
• If FE coefficients are not significantly different If FE coefficients are not significantly different from RE, then neither is biased, so use RE.from RE, then neither is biased, so use RE.
5959
General Hausman TestGeneral Hausman Test test the equality of the vector of coefficients:test the equality of the vector of coefficients:
)ˆˆ()]ˆ()ˆ([)'ˆˆ( H
StatisticTest
t.coefficieneach for terms varianceof vector )ˆ(
ˆ
ˆ
ˆ
ˆ ,ˆ
ˆ
ˆ
ˆ
1
*
3
2
1
3
2
1
REFEREFEREFE
E
RE
RE
RE
RE
FE
FE
FE
FE
VV
V
H is distributed Chi-square with k degrees of freedom
6060
Single Coefficient VersionSingle Coefficient Version
If we are primarily interested in a single If we are primarily interested in a single parameter, there is a parameter, there is a t-statistict-statistic version version of the Hausman test.of the Hausman test.
Let Let BB11FEFE
and and BB11RERE
be the fixed- and be the fixed- and random effects coefficients for random effects coefficients for XX1,it1,it
2/121
21
11
)()(
)(REFE
REFE
BseBse
BBt
Where t is
asymptotically normally distributed
6161
Note: Hausman Test ProblemNote: Hausman Test Problem Most of the time the Hausman test Most of the time the Hausman test
works fine, however…works fine, however…
The test statistic is based on the The test statistic is based on the
assumption that assumption that RE RE is more efficient is more efficient
(estimates have a smaller variance) (estimates have a smaller variance)
than than FE.FE.
6262
While this can be shown to be While this can be shown to be asymptotically true, it may not hold for a asymptotically true, it may not hold for a given sample.given sample.
If this is the case, then the test statistic is If this is the case, then the test statistic is negative, and cannot be interpreted as a negative, and cannot be interpreted as a Chi-square. Chi-square.
This is why it is important to type :This is why it is important to type :• ’’Hausman unbiased efficient’Hausman unbiased efficient’
Where ‘unbiased’ is the vector of Where ‘unbiased’ is the vector of FEFE coefficients and ‘efficient’ is the vector of coefficients and ‘efficient’ is the vector of RERE coefficients coefficients
6363
Hausman Test InterpretationHausman Test Interpretation
HH00: : FEFE = = RERE (difference in coefficients (difference in coefficients is NOT systematic)is NOT systematic)
HHAA: : FEFE RERE..
If If HH > critical value, we reject H > critical value, we reject H00, , • conclude that since conclude that since FEFE RERE
• Random Effects is biased, thereforeRandom Effects is biased, therefore
• CORR(CORR(aaii, , XXitit) ) 0, and 0, and
• Fixed Effects is the appropriate model. Fixed Effects is the appropriate model.
6464
Hausman Test in Hausman Test in StataStataxtreg y cpi r er,fextreg y cpi r er,feestimates store feestimates store fextreg y cpi r erxtreg y cpi r erestimates store reestimates store rehausman fe rehausman fe re ---- Coefficients -------- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B))| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E.| fe re Difference S.E.----+----------------------------------------------------------------+------------------------------------------------------------ cpi | .3817633 .3468475 .0349158 .cpi | .3817633 .3468475 .0349158 . r | -.4944136 .0072637 -.5016774 .r | -.4944136 .0072637 -.5016774 . er | -.0196729 -.0002592 -.0194137 .0013371er | -.0196729 -.0002592 -.0194137 .0013371---------------------------------------------------------------------------------------------------------------------------------- b = consistent under Ho and Ha; obtained from xtregb = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtregB = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematicTest: Ho: difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 162.38= 162.38 Prob>chi2 = Prob>chi2 = 0.00000.0000 (V_b-V_B is not positive definite)(V_b-V_B is not positive definite)
Reject HReject H00 in this case, so go with Fixed Effects in this case, so go with Fixed Effects
6565
Lagrange Multiplier Test for Lagrange Multiplier Test for Random EffectsRandom Effects
Essentially, this is a derivation of a test for Essentially, this is a derivation of a test for heteroskedasticity in a panel composite heteroskedasticity in a panel composite error setting, where error setting, where vvitit = a = aii + u + uitit..
Assume Assume var(uvar(uitit) ) is constant, and is constant, and uuitit is not is not correlated with correlated with XXitit..
Then any correlation between var(Then any correlation between var(vvitit) ) and and XXitit must be due to the time-invariant error must be due to the time-invariant error aaii. .
6666
StataStata Note for Panel Regressions Note for Panel Regressions
You will notice that running You will notice that running FE / REFE / RE regressions with large regressions with large NN can be time consuming, which is really annoying during the specification search process.
This is because each regression requires Stata to perform the ‘de-meaning’ transformation for each observation from the original data.
6767
Stata NoteStata Note The ‘The ‘xtdata’xtdata’ command allows you to command allows you to
create a new data set of the transformed create a new data set of the transformed variables.variables.
Running OLS on the transformed variables Running OLS on the transformed variables is equivalent to the transformed FE/RE is equivalent to the transformed FE/RE regression.regression.
Typing ‘Typing ‘xtdata y x1 x2,fextdata y x1 x2,fe’ will create a new ’ will create a new .dta.dta file with the fixed effect de-meaned file with the fixed effect de-meaned values of the specified variables for each values of the specified variables for each observation.observation.
6868
Extensions to Panel RegressionExtensions to Panel Regression 1.)1.) 2SLS/IV with panel 2SLS/IV with panel
Xtivreg y x1 (x2=z), feXtivreg y x1 (x2=z), fe
2.) Cluster effects for cross-sectional 2.) Cluster effects for cross-sectional data. data.
3.)3.) Auto-correlated idiosynchratic errors Auto-correlated idiosynchratic errors ((uuitit))
6969
Extension 1: IV PanelExtension 1: IV Panel
When an independent variable is When an independent variable is endogenous in a panel regression, endogenous in a panel regression, each stage of the two stage least each stage of the two stage least squares process must take into squares process must take into account the composite error issue.account the composite error issue.
i.e. the first stage and second stage i.e. the first stage and second stage will either be RE or FE regression, will either be RE or FE regression, depending on which is appropriate.depending on which is appropriate.
7070
YYitit = B = B00 + + XXitit + a + aii + u + uitit
The fixed effects transformation will The fixed effects transformation will address the issue of address the issue of
COV(COV(XXitit,a,aii) ) ≠ 0.≠ 0.
But what about when But what about when
COV(COV(XXitit,u,uitit) ) ≠ 0?≠ 0?
7171
Panel 2SLSPanel 2SLS
.instrument effectivean as used be to
ation transformeffects fixed therequire will
but (1),in exogenous be will Therefore
0) ,(
but 0) ,(
such that variableDefine
0) ,(
0) ,(
)1( 10
it
itit
iit
it
itit
iit
itiitit
z
uzCORR
azCORR
z
uxCORR
axCORR
uaxy
7272
First Stage FEFirst Stage FE
.ˆ of valuesed, transformfitted, Save
unbiased. is ˆ
)( where,
1
10
it
FE
iititititit
x
zzzezx
7373
Second Stage FESecond Stage FE
unbiased. be willˆ
hat) theof (because 0) ,ˆ(
umlat) theof (because 0) ,ˆ(
ˆ )'1(
12,
1
SLSFE
itit
iit
ititit
uxCORR
axCORR
uxy
7474
Extension 2: Cluster RegressionExtension 2: Cluster Regression
Allows for a Fixed Effects transformation with Allows for a Fixed Effects transformation with single period cross-section data.single period cross-section data.
““cluster-” or “group-” invariant errors replace cluster-” or “group-” invariant errors replace “time-invariant” errors (a“time-invariant” errors (aii).).
For example, there may be “within village For example, there may be “within village effects” that will be the same for all households effects” that will be the same for all households in Village A that differ from Village B.in Village A that differ from Village B.
Often can be controlled for with “cluster Often can be controlled for with “cluster dummy” variables.dummy” variables.
7575
Cross Section Cluster ExampleCross Section Cluster Example
HouseholdHousehold
((ii))Village Village
((jj))ConsumptioConsumptio
n (Yn (Yijij))Income Income
(X(Xijij))
11 11 500500 750750
22 11 650650 10001000
33 11 475475 725725
11 22 600600 700700
22 22 625625 750750
33 22 550550 600600
11 33 575575 11001100
22 33 625625 12001200
33 33 600600 10001000
7676
Cluster RegressionCluster Regression
Model:Model: XXijij = observation for household = observation for household ii in village in village jj
YYijij = B = B00 + + XXijij + a + ajj + u + uijij
The analogy to panel structure is that The analogy to panel structure is that ii acts like acts like
the time variable, and the time variable, and jj acts like the cross- acts like the cross-
sectional identifier.sectional identifier.
Multiple observations for a given village Multiple observations for a given village jj..
aajj is the “cluster invariant error” or “is the “cluster invariant error” or “village village
level fixed effectlevel fixed effect””
7777
Fixed Effects for ClusterFixed Effects for Cluster Again, if there is correlation between Again, if there is correlation between
the “cluster-invariant” error (athe “cluster-invariant” error (ajj) and the ) and the independent variables (Xindependent variables (Xijij), then the ), then the coefficient estimates will be biased.coefficient estimates will be biased.
Fixed Effects transformation eliminates Fixed Effects transformation eliminates the the aajj by subtracting the cluster mean by subtracting the cluster mean from each observation.from each observation.
ijijFE
ij
j
jijjjjijjij
uXBY
Y
uuaaXXBYY
1
1
mean level village
)()()()(
7878
Cluster Effects TransformationCluster Effects Transformation
ii jj yy xx ybarybarjj y_umlaty_umlatijij xbarxbarjj x_umlatx_umlatijij
11 11 500500 750750 541.67541.67 -41.67-41.67 825825 -75-75
22 11 650650 10001000 541.67541.67 108.33108.33 825825 175175
33 11 475475 725725 541.67541.67 -66.67-66.67 825825 -100-100
11 22 600600 700700 591.67591.67 8.33338.3333 683.33683.33 16.6716.67
22 22 625625 750750 591.67591.67 33.33333.333 683.33683.33 66.6766.67
33 22 550550 600600 591.67591.67 -41.67-41.67 683.33683.33 -83.3-83.3
11 33 575575 11001100 600600 -25-25 11001100 00
22 33 625625 12001200 600600 2525 11001100 100100
33 33 600600 10001000 600600 00 11001100 -100-100
7979
Transformed OLS RegressionTransformed OLS Regressionreg y_umlat x_umlatreg y_umlat x_umlat
Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 1, 7) = 27.86-------------+------------------------------ F( 1, 7) = 27.86 Model | 17649.2873 1 17649.2873 Prob > F = .0011Model | 17649.2873 1 17649.2873 Prob > F = .0011 Residual | 4434.04639 7 633.435199 R-squared = .7992Residual | 4434.04639 7 633.435199 R-squared = .7992-------------+------------------------------ Adj R-squared = .7705-------------+------------------------------ Adj R-squared = .7705 Total | 22083.3337 8 2760.41671 Root MSE = 25.168Total | 22083.3337 8 2760.41671 Root MSE = 25.168
------------------------------------------------------------------------------------------------------------------------------------------------ y_umlat | Coef. Std. Err. t P>|t| y_umlat | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- x_umlat | x_umlat | .4759358.4759358 .0901646 5.28 0.001 .0901646 5.28 0.001 _cons | 4.09e-07 8.38938 0.00 1.000 _cons | 4.09e-07 8.38938 0.00 1.000 ------------------------------------------------------------------------------------------------------------------------------------------------
8080
FIXED EFFECTStsset j i panel variable: j (strongly balanced) time variable: i, 1 to 3 delta: 1 unit
xtreg y x,feFixed-effects (within) regression Number of obs = 9Group variable: j Number of groups = 3 within = 0.7992 Obs per group: min = 3 between = 0.0961 avg = 3.0 overall = 0.2517 max = 3
F(1,5) = 19.90corr(u_i, Xb) = -0.8365 Prob > F = 0.0066--------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| -------------+------------------------------------------------------- x | .4759358 .1066842 4.46 0.007 _cons | 163.978 93.28558 1.76 0.139 -------------+------------------------------------------------------- sigma_u | 95.865925 sigma_e | 29.779343 rho | .91199744 (fraction of variance due to u_i)---------------------------------------------------------------------F test that all u_i=0: F(2, 5) = 9.34 Prob > F = 0.0205
8181
Cluster (village) DummiesCluster (village) Dummiesxi:reg y x i.jxi:reg y x i.j
i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)
Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 3, 5) = 8.88-------------+------------------------------ F( 3, 5) = 8.88 Model | 23621.5092 3 7873.8364 Prob > F = 0.0191Model | 23621.5092 3 7873.8364 Prob > F = 0.0191 Residual | 4434.04635 5 886.809269 R-squared = 0.8420Residual | 4434.04635 5 886.809269 R-squared = 0.8420-------------+------------------------------ Adj R-squared = 0.7471-------------+------------------------------ Adj R-squared = 0.7471 Total | 28055.5556 8 3506.94444 Root MSE = 29.779Total | 28055.5556 8 3506.94444 Root MSE = 29.779
---------------------------------------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+----------------------------------------------------------------------+--------------------------------------------------------- x | x | .4759358.4759358 .1066842 4.46 0.007 .1066842 4.46 0.007 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg1 _cons | 149.0196 89.678 1.66 0.157 Vlg1 _cons | 149.0196 89.678 1.66 0.157 ----------------------------------------------------------------------------------------------------------------------------------------------
8282
““predict ai, u” to view the estimated predict ai, u” to view the estimated aaii
ii jj _Ij_2_Ij_2 _Ij_3_Ij_3 aiai
11 11 00 00 -14.9584-14.9584
22 11 00 00 -14.9584-14.9584
33 11 00 00 -14.9584-14.9584
11 22 11 00 102.4658102.4658
22 22 11 00 102.4658102.4658
33 22 11 00 102.4658102.4658
11 33 00 11 -87.5074-87.5074
22 33 00 11 -87.5074-87.5074
33 33 00 11 -87.5074-87.5074
8383
Aside. . . ”xtdes” commandAside. . . ”xtdes” command xtdes j: 1, 2, ..., 3 n = 3 i: 1, 2, ..., 3 T = 3 Delta(i) = 1 unit Span(i) = 3 periods (j*i uniquely identifies each observation)
Distribution of T_i:min 5% 25% 50% 75% 95% max 3 3 3 3 3 3 3
Freq. Percent Cum. | Pattern ---------------------------+--------- 3 100.00 100.00 | 111 ---------------------------+--------- 3 100.00 | XXX
8484
Extension 3: Autocorrelation of Extension 3: Autocorrelation of uuitit’s’s
Random Effects transformation eliminated Random Effects transformation eliminated autocorrelation amongst composite errors autocorrelation amongst composite errors due to presence of due to presence of aaii..
Fixed Effects eliminated autocorrelation due Fixed Effects eliminated autocorrelation due to to aaii by eliminating the time-invariant error.by eliminating the time-invariant error.
What if, in addition, What if, in addition, uuitit is autocorrelated?is autocorrelated?
RE or FE alone will not address the issue. RE or FE alone will not address the issue.
8585
Panel FE Regression with ACPanel FE Regression with AC
ittitiitit
iittiitit
it
ittiit
iit
itiitit
uuxy
uuxy
N
uu
axCORR
uaxy
)( )1.2(
)2(
),0(~
1 1-
0) ,(
(1)
1,1,1
1,1
2
1,
10
8686
Equation (2.1) is now a linear AR(1) Model.Equation (2.1) is now a linear AR(1) Model.
To solve, we need to use the Cochrane-To solve, we need to use the Cochrane-Orcutt method of estimating Orcutt method of estimating , then using , then using the generalized difference equation to the generalized difference equation to eliminate the term:eliminate the term:
)( 1,1, titi uu
8787
STATA to the rescue again!STATA to the rescue again!
The command:The command:
““xtregar y x,fe”xtregar y x,fe”
Will simultaneously transform the data Will simultaneously transform the data to eliminate the to eliminate the aaii terms AND terms AND estimate estimate AND provide consistent AND provide consistent standard errors with the generalized standard errors with the generalized difference equation. difference equation.
8888
Xtregar Example from 4 country panelXtregar Example from 4 country panelxtregar y r cpi er,fextregar y r cpi er,fe
FE (within) regression with AR(1) disturbances Number of obs =986FE (within) regression with AR(1) disturbances Number of obs =986Group variable: code Number of groups =4Group variable: code Number of groups =4
R-sq: within = 0.0155 Obs per group: min =243R-sq: within = 0.0155 Obs per group: min =243 between = 0.5840 avg =246.5between = 0.5840 avg =246.5 overall = 0.4567 max =249overall = 0.4567 max =249
F(3,979) = 5.13F(3,979) = 5.13corr(u_i, Xb) = -0.1308 Prob > F = 0.0016corr(u_i, Xb) = -0.1308 Prob > F = 0.0016------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- r | -.0362285 .0875633 -0.41 0.679 r | -.0362285 .0875633 -0.41 0.679 cpi | .2832925 .076438 3.71 0.000 cpi | .2832925 .076438 3.71 0.000 er | .0015201 .0029347 0.52 0.605 er | .0015201 .0029347 0.52 0.605 _cons | 68.766 .2288196 300.52 0.000 _cons | 68.766 .2288196 300.52 0.000 -------------+-----------------------------------------------------------------------+---------------------------------------------------------- rho_ar | .9718915rho_ar | .9718915 sigma_u | 6.3918957sigma_u | 6.3918957 sigma_e | 1.7246814sigma_e | 1.7246814 rho_fov | .93213626 (fraction of variance because of u_i)rho_fov | .93213626 (fraction of variance because of u_i)------------------------------------------------------------------------------------------------------------------------------------------------F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094
8989
Stata Note – balancing your panelStata Note – balancing your panel
It may be useful to use only those It may be useful to use only those “entities” that appear in all time “entities” that appear in all time periods. Suppose periods. Suppose T=20T=20 – use the – use the following:following:
Sort entity timeSort entity time
by entity: gen count=_Nby entity: gen count=_N
keep if count==20keep if count==20
9090
Panel Data Management in STATAPanel Data Management in STATA
Common problem is that original data Common problem is that original data is stored in “wide” or “rectangular” is stored in “wide” or “rectangular” form, wherein values for a given year form, wherein values for a given year are stored in a separate column.are stored in a separate column.
For example, in a cross-country panel, For example, in a cross-country panel, FDI in 2000 has one column, with each FDI in 2000 has one column, with each row representing a unique country. row representing a unique country. Likewise for FDI in 2001, etc. Likewise for FDI in 2001, etc.
9191
Example of “wide” form data setExample of “wide” form data set
Countries Countries CodeCode fdi2000fdi2000 fdi2001fdi2001 fdi2002fdi2002
Argentina Argentina 11 1.04E+101.04E+10 2.17E+092.17E+09 2.15E+092.15E+09
Australia Australia 22 1.36E+101.36E+10 8.26E+098.26E+09 1.77E+101.77E+10
Austria Austria 33 8.52E+098.52E+09 5.91E+095.91E+09 3.19E+083.19E+08
Bangladesh Bangladesh 44 2.80E+082.80E+08 7.90E+077.90E+07 5.20E+075.20E+07
9292
ProblemProblem
In order to run a panel regression in In order to run a panel regression in STATA, we need data to be stored in STATA, we need data to be stored in “long” form.“long” form.
Here, each row is identified by both a Here, each row is identified by both a time period and country code. A time period and country code. A variable like FDI will have a single variable like FDI will have a single column.column.
9393
Example of “long” form data setExample of “long” form data set
codecode yearyear countries countries fdifdi11 20002000 Argentina Argentina 1.040e+101.040e+1011 20012001 Argentina Argentina 2.170e+092.170e+0911 20022002 Argentina Argentina 2.150e+092.150e+09
22 20002000 Australia Australia 1.360e+101.360e+1022 20012001 Australia Australia 8.260e+098.260e+0922 20022002 Australia Australia 1.770e+101.770e+10
9494
The “reshape” STATA commandThe “reshape” STATA command
Instead of copying and pasting in Instead of copying and pasting in excel, load the data into STATA as excel, load the data into STATA as “wide” form, then transform.“wide” form, then transform.
The “reshape” command will The “reshape” command will generate the “time” variable for you, generate the “time” variable for you, and combine separate time periods and combine separate time periods into a single column.into a single column.
9595
reshape long fdi, i(code) j(year)reshape long fdi, i(code) j(year)
Keys on specified variable, here “fdi”.Keys on specified variable, here “fdi”.
Must declare cross-section identifier Must declare cross-section identifier i.i.
Generates “within” group identifier Generates “within” group identifier jj. . Put new Put new varnamevarname in parentheses. in parentheses. Typically Typically jj will represent time, but not will represent time, but not necessarily. necessarily.
9696
Reshape NotesReshape Notes In general, list all variables that must be In general, list all variables that must be
combined into a single column.combined into a single column. You do not need to list time-invariant You do not need to list time-invariant
variables, but they will be converted to variables, but they will be converted to “long” as well.“long” as well.
Note that “reshape wide” will convert data Note that “reshape wide” will convert data
from long to wide format.from long to wide format. Seems to be touchy about year values. Seems to be touchy about year values.
‘99 for 1999 is ok, but ‘00 for 2000 is a ‘99 for 1999 is ok, but ‘00 for 2000 is a
problem.problem.
9797
Fixed Effects LogitFixed Effects Logit
)](1log[)1()(loglog
1)(
))((1
))(Pr(
)0Pr()1Pr(
0 if 1
0* if 1
)1,0( where*
iititiitit
X
X
iit
iit
iitit
itiitit
itiitit
itit
ititiitit
XGyXGyL
e
eXG
XG
Xu
uXy
uXy
yy
yuXy
iit
iit
Top Related