Download - Pooled Cross-Section Time Series Data

Pooled Cross-Section Pooled Cross-Section Time Series DataTime Series Data

Wooldridge Chapters 13 and 14Wooldridge Chapters 13 and 14

22

Types of DataTypes of Data Pooled Cross SectionsPooled Cross Sections: Independent cross : Independent cross

section data at different points in time. section data at different points in time.

Panel / LongitudinalPanel / Longitudinal: Uniquely identified : Uniquely identified cross section units (cross section units (ii) followed over time.) followed over time.• Balanced Panel: All Balanced Panel: All ii appear in every period. appear in every period.• Unbalanced Panel: Some Unbalanced Panel: Some i i are missing for are missing for

some time periods.some time periods.

33

Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2

ii tt Consumption (Y)Consumption (Y) Income Income (X)(X)

11 11 7272 9898

11 22 7575 102102

22 11 3131 4040

22 22 2626 3939

33 11 5555 6666

33 22 6262 7070

44 11 4141 5959

44 22 4545 6060

44

YYitit = B = B00 + B + B11XXitit + e + eitit

BB11 = 0.72, but how to interpret? = 0.72, but how to interpret?

2040

6080

40 60 80 100x

Fitted values y

55

Interpreting CoefficientsInterpreting Coefficients

YYitit = B = B00 + B + B11XXitit + e + eitit

jtit

jtit

it

it

XX

YY

X

YB

1Change in Yi across individuals at time t.

itti

itti

it

it

XX

YY

X

YB

1,

1,1

Change in Yt over time for a given individual.

66

Use intercept dummies to Use intercept dummies to differentiate between “time” and differentiate between “time” and

“type” effects “type” effects

Time DummiesTime Dummies: the effect of being : the effect of being in time in time period 2period 2 vs. time vs. time period 1period 1 on on the expected value of the expected value of YYitit, holding all , holding all else constant.else constant.

Type DummiesType Dummies: : the effect of being the effect of being of of type Btype B vs. vs. type Atype A on the expected on the expected value of value of YYitit, holding all else constant., holding all else constant.

77

Time DummiesTime Dummies

Let Let DD2,t2,t = 0 = 0 if if t = 1t = 1

1 if 1 if t = 2t = 2

YYitit = B = B00 + + TTDD2,t2,t + e + eitit

)( 12 YYT Where is the mean at time 2 across all i.

2Y

88

Example: Two Period Panel Data Example: Two Period Panel Data with Time Dummywith Time Dummy

ii tt DDTT (Y(Yitit)) (X(Xitit))

11 11 00 7272 9898

11 22 11 7575 102102

22 11 00 3131 4040

22 22 11 2626 3939

33 11 00 5555 6666

33 22 11 6262 7070

44 11 00 4141 5959

44 22 11 4545 6060

99

Time Dummy ExampleTime Dummy Example

sum y if t==1sum y if t==1

Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 49.7549.75

sum y if t==2sum y if t==2

Variable | Obs Mean Variable | Obs Mean -------------+---------------------------------------+-------------------------- y | 4 y | 4 52 52

Reg y dtReg y dtCoeff = 52 - 49.75 = 2.25Coeff = 52 - 49.75 = 2.25

1010

2040

6080

40 60 80 100x

Fitted values y

Time Dummy represents shift of regression line from period Time Dummy represents shift of regression line from period 1 to period 2. When regressed on Y1 to period 2. When regressed on Yitit along with X along with Xitit::

t=2

t=1

T = 2.25

1111

Type DummiesType Dummies Separate cross-sectional dimension of Separate cross-sectional dimension of

sample into qualitative “types” (e.g. male vs. sample into qualitative “types” (e.g. male vs. female, rural vs. urban, foreign vs. domestic, female, rural vs. urban, foreign vs. domestic, treatment vs. controltreatment vs. control, etc.), etc.)

Let DLet DiBiB = 1 if individual = 1 if individual i i is Type Bis Type B = 0 otherwise.= 0 otherwise.

YYitit = B = B00 + + BBDDiBiB + e + eitit )( ABB YY

When Xit is included in regression, B represents shift in intercept.

1212

Example: Two Period Panel Data Example: Two Period Panel Data with Type Dummywith Type Dummy

ii tt TypeType DDBB (Y(Yitit)) (X(Xitit))

11 11 AA 00 7272 9898

11 22 AA 00 7575 102102

22 11 BB 11 3131 4040

22 22 BB 11 2626 3939

33 11 BB 11 5555 6666

33 22 BB 11 6262 7070

44 11 AA 00 4141 5959

44 22 AA 00 4545 6060

1414

Type Dummy represents shift of regression line from type B Type Dummy represents shift of regression line from type B to Type A. When regressed on Yto Type A. When regressed on Yitit along with X along with Xitit::

2040

6080

40 60 80 100x

Fitted values y

type=A

type=B

B = -14.25

1515

Difference-in-Differences EstimatorDifference-in-Differences Estimator Estimates the difference across types, and Estimates the difference across types, and

over time, using simple dummy variable over time, using simple dummy variable framework.framework.

Excellent for policy analysis. Takes advantage Excellent for policy analysis. Takes advantage of “natural experiment” quality of panel data.of “natural experiment” quality of panel data.

Can be expanded beyond two period Can be expanded beyond two period framework.framework.

Examples: stadium construction, natural Examples: stadium construction, natural disaster, water treatment facility, tax cuts.disaster, water treatment facility, tax cuts.

1616

Use interaction term between type Use interaction term between type and time dummies.and time dummies.

)()( 1,1,2,2,

,,2,1,2010

ABABDD

ititBtDDitBtitit

YYYY

eDDDDXBBY

Difference

“After”

Difference

“Before”

1717

Difference CoefficientDifference Coefficient

Also known as “Average Treatment Also known as “Average Treatment Effect”,Effect”,

Can also be written asCan also be written as

)()( 1,2,1,2, AABBDD YYYY

Treatment Impact on ‘treated’

Treatment Impact on control group.

1818

D-in-D exampleD-in-D example

ii tt TypeType DDBB D2D2TT DDBB*D2*D2TT (Y(Yitit)) (X(Xitit))

11 11 AA 00 00 00 7272 9898

11 22 AA 00 11 00 7575 102102

22 11 BB 11 00 00 3131 4040

22 22 BB 11 11 11 2626 3939

33 11 BB 11 00 00 5555 6666

33 22 BB 11 11 11 6262 7070

44 11 AA 00 00 00 4141 5959

44 22 AA 00 11 00 4545 6060

1919

From simple exampleFrom simple example

Reg y db d2 dd Reg y db d2 dd

y | Coef. Std. Err. t y | Coef. Std. Err. t ---------------------------------------------------------------------------------------- db | -13.5 21.6015 -0.62db | -13.5 21.6015 -0.62 d2 | 3.5 21.6015 0.16d2 | 3.5 21.6015 0.16 dd | dd | -2.5-2.5 30.54914 -0.08 30.54914 -0.08_cons | 56.5 15.27457 3.70 _cons | 56.5 15.27457 3.70

Mean of y for type b when t=2: Mean of y for type b when t=2: 44.0044.00Mean of y for type a when t=2: Mean of y for type a when t=2: 60.0060.00 Mean of y for type b when t=1: Mean of y for type b when t=1: 43.0043.00Mean of y for type a when t=1: Mean of y for type a when t=1: 56.5056.50

Coefficient = (44.00 - 60.00) - (43.00- 56.50) Coefficient = (44.00 - 60.00) - (43.00- 56.50)

= (-16)-(-13.5) = = (-16)-(-13.5) = -2.5-2.5

2020

t=1 t=2

1,BY

)(

)(

1,2,

1,2,

AA

BBDD

YY

YY

1,AY

2,AY

2,BY

How much more did treatment group (B) outcome increase than control group (A) from time 1 to time 2?

2121

Panel Data Problem!Panel Data Problem!Unobserved HeterogeneityUnobserved Heterogeneity

There exist characteristics of There exist characteristics of each each

individual that persistindividual that persist over timeover time

which cannot be included in the which cannot be included in the

regression (unobservable in available regression (unobservable in available

data), but which none-the-less impact data), but which none-the-less impact

the observed variation in our the observed variation in our

dependent variable.dependent variable.

2222

Composite ErrorsComposite Errors

These These time-invarianttime-invariant unobserved unobserved

effects are best modeled as a effects are best modeled as a

component in the regression error term.component in the regression error term.

It is this It is this ““composite errorcomposite error”” approach approach

that sets apart panel regression from that sets apart panel regression from

OLS. OLS.

2323

ExamplesExamples

Unobservable Unobservable motivational skillsmotivational skills of of firm manager in a production function.firm manager in a production function.

Skills, charisma, connections, Skills, charisma, connections, nepotismnepotism in a wage model. in a wage model.

Levels of unobserved macro-level Levels of unobserved macro-level institutional institutional corruption or corruption or inefficiencyinefficiency in a cross-sectional in a cross-sectional growth model.growth model.

2424

The Composite Error ModelThe Composite Error Model

YYitit = B = B00 + + XXitit + v + vitit

WhereWhere vvitit = u = uitit + a + aii is the composite is the composite error, and… error, and…

uuitit is the random, time-varying is the random, time-varying idiosyncratic error.idiosyncratic error.

aaii is the is the time invariant errortime invariant error component.component.

2525

The Composite Error ProblemsThe Composite Error Problems

1.)1.) If If COV(COV(aaii, X, Xitit) ) 0 0, then OLS estimates , then OLS estimates will be biased.will be biased.

Very much like simultaneous equations Very much like simultaneous equations (endogeneity) bias, but here covariance (endogeneity) bias, but here covariance with error term will only involve with error term will only involve cross cross sectional variation.sectional variation.

2626

Composite Error BiasComposite Error Bias

221

221

21

10

)(

),(

)(

),()ˆ(

)(

)(

)(

)()ˆ(

)(

)()(ˆ

XX

uXCOV

XX

aXCOVBE

XX

uXXE

XX

aXXEBE

XX

uaXXB

uaXBBY

it

itit

it

iit

it

itit

it

iit

it

itiit

itiitit

2727

Examples:Examples: 1. manager charisma correlated with 1. manager charisma correlated with firm size in production function.firm size in production function. 2. Nepotism/networking correlated 2. Nepotism/networking correlated with education in wage equation. with education in wage equation.

3. Institutional quality associated 3. Institutional quality associated with development in corruption with development in corruption equation. equation.

2828

2.) Since 2.) Since aaii represents a time-invariant represents a time-invariant

component of the error, composite errors component of the error, composite errors will be correlated over time – will be correlated over time –

Serial CorrelationSerial Correlation is the result: is the result:

Corr(Corr(vvitit, v, vi,t+si,t+s) ) 0 0

Estimates will not be biased, but Estimates will not be biased, but goodness of fit and significance of goodness of fit and significance of coefficients will be overstated.coefficients will be overstated.

2929

How to deal with the Composite How to deal with the Composite Error problem? Error problem?

Pooled OLSPooled OLS – do nothing about it. – do nothing about it.

First DifferenceFirst Difference – eliminate – eliminate aaii..

Dummy VariablesDummy Variables – estimate the – estimate the aaii when N when N smallsmall

Fixed EffectsFixed Effects. – estimate . – estimate aaii when when NN large. large.

Random EffectsRandom Effects. – account for serial correlation. – account for serial correlation

3030

First Difference Transformation (two First Difference Transformation (two period panel) with Time dummyperiod panel) with Time dummy

YYitit = B = B00 + + 00DDTtTt + + XXitit + a + aii + u + uitit

For Period 2: For Period 2:

YYi2i2 = (B = (B00 + + 00) + ) + XXi2i2 + a + aii + u + ui2i2

For Period 1: For Period 1:

YYi1i1 = B = B00 + + XXi1i1 + a + aii + u + ui1i1

First Difference = First Difference = YYii = = YYi2i2 – Y – Yi1i1

YYii = = 00 + B + B11(X(Xi2i2 – X – Xi1i1) + (u) + (ui2i2 – u – ui1i1))

YYii = = 00 + B + B11((XXii) + ) + uuii

3131

First DifferenceFirst Difference

Transformation eliminates Transformation eliminates aaii terms.terms.

Corrects for heterogeneity bias and serial Corrects for heterogeneity bias and serial correlation.correlation.

Problems:Problems:• 1. 1. Eliminates all time invariant variablesEliminates all time invariant variables (type (type

dummies)dummies)

• 2. 2. Eliminates time dimensionEliminates time dimension in two period in two period panel (reduces panel (reduces TT by 1 in general) by 1 in general)

3232

““Type” Dummy Variables for each Type” Dummy Variables for each ii If If aaii terms are viewed as coefficients to be terms are viewed as coefficients to be

estimated, a dummy can be constructed estimated, a dummy can be constructed

that uniquely identifies each individual in that uniquely identifies each individual in

the sample.the sample.

Dummy coefficient will represent effect of Dummy coefficient will represent effect of

the sum of all unobserved attributes.the sum of all unobserved attributes.

3333

Type DummiesType Dummies

Solves ‘time invariant bias’ problem Solves ‘time invariant bias’ problem by removing by removing aaii from error from error component, and directly estimating component, and directly estimating the effects.the effects.

Obvious problem is that degrees of Obvious problem is that degrees of freedom are vastly reduced. freedom are vastly reduced. Requires a large number of time Requires a large number of time periods relative to cross sectional periods relative to cross sectional units.units.

3434

Example: 4 country panel over 250 Example: 4 country panel over 250 monthsmonths

Step 1: append the separate country data Step 1: append the separate country data files:files:

use c:/stata627/nfa/canada.dtause c:/stata627/nfa/canada.dta append using append using c:/stata627/nfa/italy.dtac:/stata627/nfa/italy.dta

append using append using c:/stata627/nfa/japan.dtac:/stata627/nfa/japan.dta

append using c:/stata627/nfa/uk.dtaappend using c:/stata627/nfa/uk.dta tsset code timetsset code time

3535

Dummy Example – estimates of Dummy Example – estimates of aaii

xi:reg y cpi r er i.codexi:reg y cpi r er i.codei.code _Icode_1-5 (naturally coded; _Icode_1 omitted)i.code _Icode_1-5 (naturally coded; _Icode_1 omitted)

Number of obs = 990Number of obs = 990Prob > F = 0.0000Prob > F = 0.0000R-squared = 0.7464R-squared = 0.7464Adj R-squared = 0.7448Adj R-squared = 0.7448------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------+---------------------------------------- cpi | .3817633 .0117365 32.53 0.000 cpi | .3817633 .0117365 32.53 0.000 r | -.4944136 .0780945 -6.33 0.000 r | -.4944136 .0780945 -6.33 0.000 er | -.0196729 .0014589 -13.49 0.000 er | -.0196729 .0014589 -13.49 0.000 _Icode_3_Icode_3 | | 26.41765 26.41765 2.053128 12.87 0.000 2.053128 12.87 0.000 _Icode_4_Icode_4 | | -12.51685-12.51685 .6298041 -19.87 0.000 .6298041 -19.87 0.000 _Icode_5_Icode_5 | | -1.729212-1.729212 .5753217 -3.01 0.003 .5753217 -3.01 0.003 _cons | 67.36739 1.632653 41.26 0.000 _cons | 67.36739 1.632653 41.26 0.000 -------------------------------------------------------------------------------------------------------------- Code 1 = Canada, omittedCode 1 = Canada, omitted Code 3 = Italy, positive estimate of aCode 3 = Italy, positive estimate of aii Code 4 = Japan, negative aCode 4 = Japan, negative aii Code 5 = UK, negative aCode 5 = UK, negative aii

3636

Fixed EffectsFixed Effects Assume CORR(Assume CORR(aaii, X, Xitit) ) 0, but 0, but CORR(uCORR(uitit, X, Xitit) = 0.) = 0.

An alternative to the first difference An alternative to the first difference transformation is the “transformation is the “Time De-Time De-meaningmeaning” transformation of the ” transformation of the fixed fixed effects modeleffects model..

Results in a model essentially identical to Results in a model essentially identical to the Dummy model, without having to the Dummy model, without having to estimate estimate N-1 N-1 dummy coefficients.dummy coefficients.

3737

Fixed Effects TransformationFixed Effects Transformation

)()()(

)3(

)()()()(

:period each time level thefrommean heSubtract t

element. timeout the takings,individualbetween

variationshowsonly equation, between"" theis )2(

(2)

1 over time individualeach for avg.

model original )1(

1

1

100

10

1

10

ititit

iitiitiit

iitiiiitiit

iiii

T

titi

itiitit

uxy

uuxxyy

uuaaxxyy

uaxy

yT

y

uaxy

3838

unbiased. is ˆ

0) ,(

)()()( )3(

1

1

1

FE

itit

ititit

iitiitiit

uxCORR

uxy

uuxxyy

Fixed Effects Regression is equivalent to running OLS on Equation 3:

This is also known as the “within” estimation equation, as it shows the variation within a group over time.

3939

Fixed Effects CoefficientsFixed Effects Coefficients

Will have same “two-dimension” Will have same “two-dimension” interpretation as pooled OLS.interpretation as pooled OLS.

Variation in transformed variables are Variation in transformed variables are same as in same as in YYitit and and XXitit..

it

it

it

it

X

Y

X

YB

1

4040

Fixed Effects Transformation With Fixed Effects Transformation With Time-Invariant Time-Invariant Dummy Independent VariableDummy Independent Variable

.eliminated are and Both :Problem

)()()(

)3(

)()()()()(

(2)

11 over time individualeach for avg.

ieach for invariant timeand ,)1,0(

)1(

1

1

0100

010

1

010

ii

ititit

iitiitiit

iitiiiiiitiit

iiiii

T

tiiiti

it

itiititit

Da

uxy

uuxxyy

uuaaDDxxyy

uaDxy

DTDT

DT

D

D

uaDxy

4141

Example: Two Period Panel DataExample: Two Period Panel DataN=4, T=2N=4, T=2

ii tt ((YYitit))

11 11 7272 73.573.5 -1.5-1.5

11 22 7575 73.573.5 1.51.5

22 11 3131 28.528.5 2.52.5

22 22 2626 28.528.5 -2.5-2.5

33 11 5555 58.558.5 -3.5-3.5

33 22 6262 58.558.5 3.53.5

44 11 4141 4343 -2-2

44 22 4545 4343 22

itiit YYY )(iY

4242

Goodness of FitGoodness of Fit

A fixed effects regression returns A fixed effects regression returns threethree “R-square” measures. They are each “R-square” measures. They are each actually squared correlations between actually squared correlations between predicted and observed values:predicted and observed values:

1. 1. Within RWithin R22: fitted : fitted de-meaned yde-meaned yitit

2. 2. Between RBetween R22: fitted : fitted y_bary_barii

3. 3. Overall ROverall R22: fitted : fitted yyit it (pooled OLS)(pooled OLS)

4343

Panel Regressions in Panel Regressions in StataStata XT = cross-section time series.XT = cross-section time series. ““xtreg y x, fe” will run a panel fixed xtreg y x, fe” will run a panel fixed

effects regression.effects regression.

Must declare your “i” and “t” identifiers:Must declare your “i” and “t” identifiers:• tsset code time, for example.tsset code time, for example.

Unfortunately, Unfortunately, StataStata refers to the time- refers to the time-invariant error component (our invariant error component (our aaii) as ) as u_iu_i..

4444

Fixed Effects Fixed Effects StataStata Example Examplextreg y cpi r er,fextreg y cpi r er,fe

Fixed-effects (within) regression Number of obs = 990Fixed-effects (within) regression Number of obs = 990Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4

R-sq: within = 0.7071 Obs per group: min = 244R-sq: within = 0.7071 Obs per group: min = 244 between = 0.0335 avg = 247.5between = 0.0335 avg = 247.5 overall = 0.1827 max = 250overall = 0.1827 max = 250

F(3,983) = 791.14F(3,983) = 791.14corr(u_i, Xb) = -0.7495 Prob > F = 0.0000corr(u_i, Xb) = -0.7495 Prob > F = 0.0000

------------------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948cpi | .3817633 .0117365 32.53 0.000 .3587318 .4047948 r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625r | -.4944136 .0780945 -6.33 0.000 -.6476647 -.3411625 er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101er | -.0196729 .0014589 -13.49 0.000 -.0225358 -.0168101 _cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715_cons | 70.49544 1.529625 46.09 0.000 67.49374 73.49715-------------+-----------------------------------------------------------------------------+---------------------------------------------------------------- sigma_u | 16.538008 (std. error of time-invariant error) sigma_u | 16.538008 (std. error of time-invariant error) sigma_e | 6.3818613 (std. error of idiosyncratic error) sigma_e | 6.3818613 (std. error of idiosyncratic error) rho | .87038904 (fraction of variance due to u_i)rho | .87038904 (fraction of variance due to u_i)------------------------------------------------------------------------------------------------------------------------------------------------------------

F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000F test that all u_i=0: F(3, 983) = 362.02 Prob > F = 0.0000

4545

Random EffectsRandom Effects Assumes CORR(Assumes CORR(aaii, X, Xitit) = 0.) = 0. Therefore, OLS coefficients will not Therefore, OLS coefficients will not

suffer “composite error bias”, as was suffer “composite error bias”, as was assumed with Fixed Effects.assumed with Fixed Effects.

we do not need to we do not need to eliminateeliminate aaii terms.terms.

Although Although aaii terms do not truly have terms do not truly have to be “randomly” assigned, there is to be “randomly” assigned, there is no structural relationship between no structural relationship between aaii and and XXitit in a correctly specified model.in a correctly specified model.

4646

Random EffectsRandom Effects

Even when Even when CORR(CORR(aaii, X, Xitit) = 0, ) = 0, we still have we still have to account for the serial correlation to account for the serial correlation introduced by the introduced by the aaii error component.error component.

A “Quasi-demeaned” data transformation is A “Quasi-demeaned” data transformation is used to accomplish this, wherein used to accomplish this, wherein aaii are are altered but not eliminated.altered but not eliminated.

A bonus is that time-invariant dummies are A bonus is that time-invariant dummies are not eliminated.not eliminated.

4747

Random Effects AssumptionsRandom Effects Assumptions 1. E(a1. E(aii |X |Xitit) = E(a) = E(aii) = 0, ) = 0,

• independence of aindependence of aii’s’s and X’s. cov(aand X’s. cov(aii,X,Xitit)=0)=0

2. E(u2. E(uitit | X | Xitit, a, aii) = 0) = 0

3. E(u3. E(uitituuisis) = cov(u) = cov(uitit,u,uisis) = 0 for all t≠s.) = 0 for all t≠s.

4. E(u4. E(uitit22 |X |Xitit,a,aii) = ) = 22

uu = constant = constant

5. E(a5. E(aii22 | X | Xitit) = Var(a) = Var(aii) = ) = 22

aa

4848

Random EffectsRandom Effects Under the preceding criteria, the Under the preceding criteria, the

composite error does not violate OLS composite error does not violate OLS assumptions.assumptions.

Unnecessarily eliminating the Unnecessarily eliminating the aaii terms terms will cause estimates to be will cause estimates to be inefficientinefficient..

Don’t use Don’t use Fixed EffectsFixed Effects unless unless warranted.warranted.

4949


However, running Pooled OLS will not However, running Pooled OLS will not be appropriate because the be appropriate because the composite errors are still serially composite errors are still serially correlated over time.correlated over time.

It can be shown that:It can be shown that:

stvvcorrua

aisit

,),(

22

2

Where, again: vit = uit + ai

5050


Random effects transformation is Random effects transformation is

more complicated than FD or FE, but more complicated than FD or FE, but

basic idea is to eliminate serial basic idea is to eliminate serial

correlation in the error term by using correlation in the error term by using

information on variances of fixed and information on variances of fixed and

idiosyncratic errors.idiosyncratic errors.

5151

Random Effects (Random Effects (RERE))

Transformation results in a Transformation results in a weighted weighted

averageaverage of the estimates provided by of the estimates provided by

the “within” and “between” the “within” and “between”

estimators.estimators.

5252

RERE Transformation Transformation

22

2

10

10

10

ˆˆ

ˆ1ˆ where

)()1()()1()(

(1) fromsubtract then

:average weighteda Define

)1(

uai

ui

iitiiitiit

iiii

itiitit

T

uuaxxyy

uaxy

uaxy

5353

It can be shown that the composite error It can be shown that the composite error

term term vvitit augmented by the weighting augmented by the weighting

term term (lambda) will NOT suffer from (lambda) will NOT suffer from

serial correlation.serial correlation.

Corr(Corr(vvitit, , vvisis) = 0) = 0

22

2

ˆˆ

ˆ1ˆGiven

uai

ui T

5454

NOTE:NOTE: If var(If var(aaii) = 0) = 0, meaning , meaning aaii is always zero is always zero

(no time-invariant effects), then lambda (no time-invariant effects), then lambda equals 0 and RE regression is equals 0 and RE regression is equivalent to Pooled OLS equation (1) - equivalent to Pooled OLS equation (1) - all lambda-weighted terms drop out. all lambda-weighted terms drop out.

As As 22aa dominates dominates 22

uu, , aaii terms become terms become more important, more important, goes to 1, and goes to 1, and RERE→FE.→FE.

22

2

ˆˆ

ˆ1ˆGiven

uai

ui T

5555

RE StataRE Stata Example (N=4) Example (N=4)xtreg y cpi r erxtreg y cpi r er

Random-effects GLS regression Number of obs = 990Random-effects GLS regression Number of obs = 990

Group variable (i): code Number of groups = 4Group variable (i): code Number of groups = 4

R-sq: within = 0.6252 Obs per group:min = 24R-sq: within = 0.6252 Obs per group:min = 24

beween = 0.7702 avg = 247.5beween = 0.7702 avg = 247.5

overall = 0.4662 max = 250overall = 0.4662 max = 250

Random effects u_i ~ Gaussian Wald chi2(3) = 861.17Random effects u_i ~ Gaussian Wald chi2(3) = 861.17

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

--------------------------------------------------------------------------------------------------------------------

y | Coef. Std. Err. z P>|z| y | Coef. Std. Err. z P>|z|

-------------+---------------------------------------------------------+--------------------------------------------

cpi | .3468475 .0158341 21.91 0.000 cpi | .3468475 .0158341 21.91 0.000

r | .0072637 .1077631 0.07 0.946 r | .0072637 .1077631 0.07 0.946

er | -.0002592 .0005834 -0.44 0.657 er | -.0002592 .0005834 -0.44 0.657

_cons | 61.17895 2.168505 28.21 0.000 _cons | 61.17895 2.168505 28.21 0.000

-------------+---------------------------------------------------------+--------------------------------------------

sigma_u | 0sigma_u | 0

sigma_e | 6.3818613sigma_e | 6.3818613

rho | 0 (fraction of variance due to u_i)rho | 0 (fraction of variance due to u_i)

5656

Fixed vs. Random EffectsFixed vs. Random Effects

As a practical matter, Random Effects is preferred As a practical matter, Random Effects is preferred

when key explanatory variables are time-when key explanatory variables are time-

invariant.invariant.

The Fixed Effects view is that the unobserved The Fixed Effects view is that the unobserved

heterogeneity is in itself an explanatory variable heterogeneity is in itself an explanatory variable

that ideally would have a coefficient to be that ideally would have a coefficient to be

estimated.estimated.

5757

Fixed vs. Random EffectsFixed vs. Random Effects

The The Random EffectsRandom Effects view is that view is that

unobserved heterogeneity is unobserved heterogeneity is

“randomly assigned” to each “randomly assigned” to each

cross sectional entity and not cross sectional entity and not

correlated with other correlated with other

explanatory variables.explanatory variables.

5858

When to use FE vs. RE? When to use FE vs. RE? The Hausman Coefficient TestThe Hausman Coefficient Test

The logic of the test is the following:The logic of the test is the following:• If CORR(If CORR(aaii, , XXitit) ) 0, then RE is biased. 0, then RE is biased.

• If CORR(If CORR(aaii, , XXitit) = 0, then both RE and FE are ) = 0, then both RE and FE are unbiasedunbiased, but it can be shown that RE is more , but it can be shown that RE is more efficient (smaller standard error of efficient (smaller standard error of coefficienents)coefficienents)

• Therefore, if the FE coefficients are significantly Therefore, if the FE coefficients are significantly different from the RE coefficients, then RE must different from the RE coefficients, then RE must be biased, so use FE.be biased, so use FE.

• If FE coefficients are not significantly different If FE coefficients are not significantly different from RE, then neither is biased, so use RE.from RE, then neither is biased, so use RE.

5959

General Hausman TestGeneral Hausman Test test the equality of the vector of coefficients:test the equality of the vector of coefficients:

)ˆˆ()]ˆ()ˆ([)'ˆˆ( H

StatisticTest

t.coefficieneach for terms varianceof vector )ˆ(

ˆ

ˆ

ˆ

ˆ ,ˆ

ˆ

ˆ

ˆ

1

*

3

2

1

3

2

1

REFEREFEREFE

E

RE

RE

RE

RE

FE

FE

FE

FE

VV

V

H is distributed Chi-square with k degrees of freedom

6060

Single Coefficient VersionSingle Coefficient Version

If we are primarily interested in a single If we are primarily interested in a single parameter, there is a parameter, there is a t-statistict-statistic version version of the Hausman test.of the Hausman test.

Let Let BB11FEFE

and and BB11RERE

be the fixed- and be the fixed- and random effects coefficients for random effects coefficients for XX1,it1,it

2/121

21

11

)()(

)(REFE

REFE

BseBse

BBt

Where t is

asymptotically normally distributed

6161

Note: Hausman Test ProblemNote: Hausman Test Problem Most of the time the Hausman test Most of the time the Hausman test

works fine, however…works fine, however…

The test statistic is based on the The test statistic is based on the

assumption that assumption that RE RE is more efficient is more efficient

(estimates have a smaller variance) (estimates have a smaller variance)

than than FE.FE.

6262

While this can be shown to be While this can be shown to be asymptotically true, it may not hold for a asymptotically true, it may not hold for a given sample.given sample.

If this is the case, then the test statistic is If this is the case, then the test statistic is negative, and cannot be interpreted as a negative, and cannot be interpreted as a Chi-square. Chi-square.

This is why it is important to type :This is why it is important to type :• ’’Hausman unbiased efficient’Hausman unbiased efficient’

Where ‘unbiased’ is the vector of Where ‘unbiased’ is the vector of FEFE coefficients and ‘efficient’ is the vector of coefficients and ‘efficient’ is the vector of RERE coefficients coefficients

6363

Hausman Test InterpretationHausman Test Interpretation

HH00: : FEFE = = RERE (difference in coefficients (difference in coefficients is NOT systematic)is NOT systematic)

HHAA: : FEFE RERE..

If If HH > critical value, we reject H > critical value, we reject H00, , • conclude that since conclude that since FEFE RERE

• Random Effects is biased, thereforeRandom Effects is biased, therefore

• CORR(CORR(aaii, , XXitit) ) 0, and 0, and

• Fixed Effects is the appropriate model. Fixed Effects is the appropriate model.

6464

Hausman Test in Hausman Test in StataStataxtreg y cpi r er,fextreg y cpi r er,feestimates store feestimates store fextreg y cpi r erxtreg y cpi r erestimates store reestimates store rehausman fe rehausman fe re ---- Coefficients -------- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B))| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E.| fe re Difference S.E.----+----------------------------------------------------------------+------------------------------------------------------------ cpi | .3817633 .3468475 .0349158 .cpi | .3817633 .3468475 .0349158 . r | -.4944136 .0072637 -.5016774 .r | -.4944136 .0072637 -.5016774 . er | -.0196729 -.0002592 -.0194137 .0013371er | -.0196729 -.0002592 -.0194137 .0013371---------------------------------------------------------------------------------------------------------------------------------- b = consistent under Ho and Ha; obtained from xtregb = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtregB = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematicTest: Ho: difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 162.38= 162.38 Prob>chi2 = Prob>chi2 = 0.00000.0000 (V_b-V_B is not positive definite)(V_b-V_B is not positive definite)

Reject HReject H00 in this case, so go with Fixed Effects in this case, so go with Fixed Effects

6565

Lagrange Multiplier Test for Lagrange Multiplier Test for Random EffectsRandom Effects

Essentially, this is a derivation of a test for Essentially, this is a derivation of a test for heteroskedasticity in a panel composite heteroskedasticity in a panel composite error setting, where error setting, where vvitit = a = aii + u + uitit..

Assume Assume var(uvar(uitit) ) is constant, and is constant, and uuitit is not is not correlated with correlated with XXitit..

Then any correlation between var(Then any correlation between var(vvitit) ) and and XXitit must be due to the time-invariant error must be due to the time-invariant error aaii. .

6666

StataStata Note for Panel Regressions Note for Panel Regressions

You will notice that running You will notice that running FE / REFE / RE regressions with large regressions with large NN can be time consuming, which is really annoying during the specification search process.

This is because each regression requires Stata to perform the ‘de-meaning’ transformation for each observation from the original data.

6767

Stata NoteStata Note The ‘The ‘xtdata’xtdata’ command allows you to command allows you to

create a new data set of the transformed create a new data set of the transformed variables.variables.

Running OLS on the transformed variables Running OLS on the transformed variables is equivalent to the transformed FE/RE is equivalent to the transformed FE/RE regression.regression.

Typing ‘Typing ‘xtdata y x1 x2,fextdata y x1 x2,fe’ will create a new ’ will create a new .dta.dta file with the fixed effect de-meaned file with the fixed effect de-meaned values of the specified variables for each values of the specified variables for each observation.observation.

6868

Extensions to Panel RegressionExtensions to Panel Regression 1.)1.) 2SLS/IV with panel 2SLS/IV with panel

Xtivreg y x1 (x2=z), feXtivreg y x1 (x2=z), fe

2.) Cluster effects for cross-sectional 2.) Cluster effects for cross-sectional data. data.

3.)3.) Auto-correlated idiosynchratic errors Auto-correlated idiosynchratic errors ((uuitit))

6969

Extension 1: IV PanelExtension 1: IV Panel

When an independent variable is When an independent variable is endogenous in a panel regression, endogenous in a panel regression, each stage of the two stage least each stage of the two stage least squares process must take into squares process must take into account the composite error issue.account the composite error issue.

i.e. the first stage and second stage i.e. the first stage and second stage will either be RE or FE regression, will either be RE or FE regression, depending on which is appropriate.depending on which is appropriate.

7070

YYitit = B = B00 + + XXitit + a + aii + u + uitit

The fixed effects transformation will The fixed effects transformation will address the issue of address the issue of

COV(COV(XXitit,a,aii) ) ≠ 0.≠ 0.

But what about when But what about when

COV(COV(XXitit,u,uitit) ) ≠ 0?≠ 0?

7171

Panel 2SLSPanel 2SLS

.instrument effectivean as used be to

ation transformeffects fixed therequire will

but (1),in exogenous be will Therefore

0) ,(

but 0) ,(

such that variableDefine

0) ,(

0) ,(

)1( 10

it

itit

iit

it

itit

iit

itiitit

z

uzCORR

azCORR

z

uxCORR

axCORR

uaxy

7272

First Stage FEFirst Stage FE

.ˆ of valuesed, transformfitted, Save

unbiased. is ˆ

)( where,

1

10

it

FE

iititititit

x

zzzezx

7373

Second Stage FESecond Stage FE

unbiased. be willˆ

hat) theof (because 0) ,ˆ(

umlat) theof (because 0) ,ˆ(

ˆ )'1(

12,

1

SLSFE

itit

iit

ititit

uxCORR

axCORR

uxy

7474

Extension 2: Cluster RegressionExtension 2: Cluster Regression

Allows for a Fixed Effects transformation with Allows for a Fixed Effects transformation with single period cross-section data.single period cross-section data.

““cluster-” or “group-” invariant errors replace cluster-” or “group-” invariant errors replace “time-invariant” errors (a“time-invariant” errors (aii).).

For example, there may be “within village For example, there may be “within village effects” that will be the same for all households effects” that will be the same for all households in Village A that differ from Village B.in Village A that differ from Village B.

Often can be controlled for with “cluster Often can be controlled for with “cluster dummy” variables.dummy” variables.

7575

Cross Section Cluster ExampleCross Section Cluster Example

HouseholdHousehold

((ii))Village Village

((jj))ConsumptioConsumptio

n (Yn (Yijij))Income Income

(X(Xijij))

11 11 500500 750750

22 11 650650 10001000

33 11 475475 725725

11 22 600600 700700

22 22 625625 750750

33 22 550550 600600

11 33 575575 11001100

22 33 625625 12001200

33 33 600600 10001000

7676

Cluster RegressionCluster Regression

Model:Model: XXijij = observation for household = observation for household ii in village in village jj

YYijij = B = B00 + + XXijij + a + ajj + u + uijij

The analogy to panel structure is that The analogy to panel structure is that ii acts like acts like

the time variable, and the time variable, and jj acts like the cross- acts like the cross-

sectional identifier.sectional identifier.

Multiple observations for a given village Multiple observations for a given village jj..

aajj is the “cluster invariant error” or “is the “cluster invariant error” or “village village

level fixed effectlevel fixed effect””

7777

Fixed Effects for ClusterFixed Effects for Cluster Again, if there is correlation between Again, if there is correlation between

the “cluster-invariant” error (athe “cluster-invariant” error (ajj) and the ) and the independent variables (Xindependent variables (Xijij), then the ), then the coefficient estimates will be biased.coefficient estimates will be biased.

Fixed Effects transformation eliminates Fixed Effects transformation eliminates the the aajj by subtracting the cluster mean by subtracting the cluster mean from each observation.from each observation.

ijijFE

ij

j

jijjjjijjij

uXBY

Y

uuaaXXBYY

1

1

mean level village

)()()()(

7878

Cluster Effects TransformationCluster Effects Transformation

ii jj yy xx ybarybarjj y_umlaty_umlatijij xbarxbarjj x_umlatx_umlatijij

11 11 500500 750750 541.67541.67 -41.67-41.67 825825 -75-75

22 11 650650 10001000 541.67541.67 108.33108.33 825825 175175

33 11 475475 725725 541.67541.67 -66.67-66.67 825825 -100-100

11 22 600600 700700 591.67591.67 8.33338.3333 683.33683.33 16.6716.67

22 22 625625 750750 591.67591.67 33.33333.333 683.33683.33 66.6766.67

33 22 550550 600600 591.67591.67 -41.67-41.67 683.33683.33 -83.3-83.3

11 33 575575 11001100 600600 -25-25 11001100 00

22 33 625625 12001200 600600 2525 11001100 100100

33 33 600600 10001000 600600 00 11001100 -100-100

7979

Transformed OLS RegressionTransformed OLS Regressionreg y_umlat x_umlatreg y_umlat x_umlat

Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 1, 7) = 27.86-------------+------------------------------ F( 1, 7) = 27.86 Model | 17649.2873 1 17649.2873 Prob > F = .0011Model | 17649.2873 1 17649.2873 Prob > F = .0011 Residual | 4434.04639 7 633.435199 R-squared = .7992Residual | 4434.04639 7 633.435199 R-squared = .7992-------------+------------------------------ Adj R-squared = .7705-------------+------------------------------ Adj R-squared = .7705 Total | 22083.3337 8 2760.41671 Root MSE = 25.168Total | 22083.3337 8 2760.41671 Root MSE = 25.168

------------------------------------------------------------------------------------------------------------------------------------------------ y_umlat | Coef. Std. Err. t P>|t| y_umlat | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- x_umlat | x_umlat | .4759358.4759358 .0901646 5.28 0.001 .0901646 5.28 0.001 _cons | 4.09e-07 8.38938 0.00 1.000 _cons | 4.09e-07 8.38938 0.00 1.000 ------------------------------------------------------------------------------------------------------------------------------------------------

8080

FIXED EFFECTStsset j i panel variable: j (strongly balanced) time variable: i, 1 to 3 delta: 1 unit

xtreg y x,feFixed-effects (within) regression Number of obs = 9Group variable: j Number of groups = 3 within = 0.7992 Obs per group: min = 3 between = 0.0961 avg = 3.0 overall = 0.2517 max = 3

F(1,5) = 19.90corr(u_i, Xb) = -0.8365 Prob > F = 0.0066--------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| -------------+------------------------------------------------------- x | .4759358 .1066842 4.46 0.007 _cons | 163.978 93.28558 1.76 0.139 -------------+------------------------------------------------------- sigma_u | 95.865925 sigma_e | 29.779343 rho | .91199744 (fraction of variance due to u_i)---------------------------------------------------------------------F test that all u_i=0: F(2, 5) = 9.34 Prob > F = 0.0205

8181

Cluster (village) DummiesCluster (village) Dummiesxi:reg y x i.jxi:reg y x i.j

i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)i.j _Ij_1-3 (naturally coded; _Ij_1 omitted)

Source | SS df MS Number of obs = 9Source | SS df MS Number of obs = 9-------------+------------------------------ F( 3, 5) = 8.88-------------+------------------------------ F( 3, 5) = 8.88 Model | 23621.5092 3 7873.8364 Prob > F = 0.0191Model | 23621.5092 3 7873.8364 Prob > F = 0.0191 Residual | 4434.04635 5 886.809269 R-squared = 0.8420Residual | 4434.04635 5 886.809269 R-squared = 0.8420-------------+------------------------------ Adj R-squared = 0.7471-------------+------------------------------ Adj R-squared = 0.7471 Total | 28055.5556 8 3506.94444 Root MSE = 29.779Total | 28055.5556 8 3506.94444 Root MSE = 29.779

---------------------------------------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+----------------------------------------------------------------------+--------------------------------------------------------- x | x | .4759358.4759358 .1066842 4.46 0.007 .1066842 4.46 0.007 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg2 _Ij_2 | 117.4242 28.62912 4.10 0.009 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg3 _Ij_3 | -72.54902 38.10424 -1.90 0.115 Vlg1 _cons | 149.0196 89.678 1.66 0.157 Vlg1 _cons | 149.0196 89.678 1.66 0.157 ----------------------------------------------------------------------------------------------------------------------------------------------

8282

““predict ai, u” to view the estimated predict ai, u” to view the estimated aaii

ii jj _Ij_2_Ij_2 _Ij_3_Ij_3 aiai

11 11 00 00 -14.9584-14.9584

22 11 00 00 -14.9584-14.9584

33 11 00 00 -14.9584-14.9584

11 22 11 00 102.4658102.4658

22 22 11 00 102.4658102.4658

33 22 11 00 102.4658102.4658

11 33 00 11 -87.5074-87.5074

22 33 00 11 -87.5074-87.5074

33 33 00 11 -87.5074-87.5074

8383

Aside. . . ”xtdes” commandAside. . . ”xtdes” command xtdes j: 1, 2, ..., 3 n = 3 i: 1, 2, ..., 3 T = 3 Delta(i) = 1 unit Span(i) = 3 periods (j*i uniquely identifies each observation)

Distribution of T_i:min 5% 25% 50% 75% 95% max 3 3 3 3 3 3 3

Freq. Percent Cum. | Pattern ---------------------------+--------- 3 100.00 100.00 | 111 ---------------------------+--------- 3 100.00 | XXX

8484

Extension 3: Autocorrelation of Extension 3: Autocorrelation of uuitit’s’s

Random Effects transformation eliminated Random Effects transformation eliminated autocorrelation amongst composite errors autocorrelation amongst composite errors due to presence of due to presence of aaii..

Fixed Effects eliminated autocorrelation due Fixed Effects eliminated autocorrelation due to to aaii by eliminating the time-invariant error.by eliminating the time-invariant error.

What if, in addition, What if, in addition, uuitit is autocorrelated?is autocorrelated?

RE or FE alone will not address the issue. RE or FE alone will not address the issue.

8585

Panel FE Regression with ACPanel FE Regression with AC

ittitiitit

iittiitit

it

ittiit

iit

itiitit

uuxy

uuxy

N

uu

axCORR

uaxy

)( )1.2(

)2(

),0(~

1 1-

0) ,(

(1)

1,1,1

1,1

2

1,

10

8686

Equation (2.1) is now a linear AR(1) Model.Equation (2.1) is now a linear AR(1) Model.

To solve, we need to use the Cochrane-To solve, we need to use the Cochrane-Orcutt method of estimating Orcutt method of estimating , then using , then using the generalized difference equation to the generalized difference equation to eliminate the term:eliminate the term:

)( 1,1, titi uu

8787

STATA to the rescue again!STATA to the rescue again!

The command:The command:

““xtregar y x,fe”xtregar y x,fe”

Will simultaneously transform the data Will simultaneously transform the data to eliminate the to eliminate the aaii terms AND terms AND estimate estimate AND provide consistent AND provide consistent standard errors with the generalized standard errors with the generalized difference equation. difference equation.

8888

Xtregar Example from 4 country panelXtregar Example from 4 country panelxtregar y r cpi er,fextregar y r cpi er,fe

FE (within) regression with AR(1) disturbances Number of obs =986FE (within) regression with AR(1) disturbances Number of obs =986Group variable: code Number of groups =4Group variable: code Number of groups =4

R-sq: within = 0.0155 Obs per group: min =243R-sq: within = 0.0155 Obs per group: min =243 between = 0.5840 avg =246.5between = 0.5840 avg =246.5 overall = 0.4567 max =249overall = 0.4567 max =249

F(3,979) = 5.13F(3,979) = 5.13corr(u_i, Xb) = -0.1308 Prob > F = 0.0016corr(u_i, Xb) = -0.1308 Prob > F = 0.0016------------------------------------------------------------------------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| y | Coef. Std. Err. t P>|t| -------------+-----------------------------------------------------------------------+---------------------------------------------------------- r | -.0362285 .0875633 -0.41 0.679 r | -.0362285 .0875633 -0.41 0.679 cpi | .2832925 .076438 3.71 0.000 cpi | .2832925 .076438 3.71 0.000 er | .0015201 .0029347 0.52 0.605 er | .0015201 .0029347 0.52 0.605 _cons | 68.766 .2288196 300.52 0.000 _cons | 68.766 .2288196 300.52 0.000 -------------+-----------------------------------------------------------------------+---------------------------------------------------------- rho_ar | .9718915rho_ar | .9718915 sigma_u | 6.3918957sigma_u | 6.3918957 sigma_e | 1.7246814sigma_e | 1.7246814 rho_fov | .93213626 (fraction of variance because of u_i)rho_fov | .93213626 (fraction of variance because of u_i)------------------------------------------------------------------------------------------------------------------------------------------------F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094F test that all u_i=0: F(3,979) = 2.14 Prob > F = 0.094

8989

Stata Note – balancing your panelStata Note – balancing your panel

It may be useful to use only those It may be useful to use only those “entities” that appear in all time “entities” that appear in all time periods. Suppose periods. Suppose T=20T=20 – use the – use the following:following:

Sort entity timeSort entity time

by entity: gen count=_Nby entity: gen count=_N

keep if count==20keep if count==20

9090

Panel Data Management in STATAPanel Data Management in STATA

Common problem is that original data Common problem is that original data is stored in “wide” or “rectangular” is stored in “wide” or “rectangular” form, wherein values for a given year form, wherein values for a given year are stored in a separate column.are stored in a separate column.

For example, in a cross-country panel, For example, in a cross-country panel, FDI in 2000 has one column, with each FDI in 2000 has one column, with each row representing a unique country. row representing a unique country. Likewise for FDI in 2001, etc. Likewise for FDI in 2001, etc.

9191

Example of “wide” form data setExample of “wide” form data set

Countries Countries CodeCode fdi2000fdi2000 fdi2001fdi2001 fdi2002fdi2002

Argentina Argentina 11 1.04E+101.04E+10 2.17E+092.17E+09 2.15E+092.15E+09

Australia Australia 22 1.36E+101.36E+10 8.26E+098.26E+09 1.77E+101.77E+10

Austria Austria 33 8.52E+098.52E+09 5.91E+095.91E+09 3.19E+083.19E+08

Bangladesh Bangladesh 44 2.80E+082.80E+08 7.90E+077.90E+07 5.20E+075.20E+07

9292

ProblemProblem

In order to run a panel regression in In order to run a panel regression in STATA, we need data to be stored in STATA, we need data to be stored in “long” form.“long” form.

Here, each row is identified by both a Here, each row is identified by both a time period and country code. A time period and country code. A variable like FDI will have a single variable like FDI will have a single column.column.

9393

Example of “long” form data setExample of “long” form data set

codecode yearyear countries countries fdifdi11 20002000 Argentina Argentina 1.040e+101.040e+1011 20012001 Argentina Argentina 2.170e+092.170e+0911 20022002 Argentina Argentina 2.150e+092.150e+09

22 20002000 Australia Australia 1.360e+101.360e+1022 20012001 Australia Australia 8.260e+098.260e+0922 20022002 Australia Australia 1.770e+101.770e+10

9494

The “reshape” STATA commandThe “reshape” STATA command

Instead of copying and pasting in Instead of copying and pasting in excel, load the data into STATA as excel, load the data into STATA as “wide” form, then transform.“wide” form, then transform.

The “reshape” command will The “reshape” command will generate the “time” variable for you, generate the “time” variable for you, and combine separate time periods and combine separate time periods into a single column.into a single column.

9595

reshape long fdi, i(code) j(year)reshape long fdi, i(code) j(year)

Keys on specified variable, here “fdi”.Keys on specified variable, here “fdi”.

Must declare cross-section identifier Must declare cross-section identifier i.i.

Generates “within” group identifier Generates “within” group identifier jj. . Put new Put new varnamevarname in parentheses. in parentheses. Typically Typically jj will represent time, but not will represent time, but not necessarily. necessarily.

9696

Reshape NotesReshape Notes In general, list all variables that must be In general, list all variables that must be

combined into a single column.combined into a single column. You do not need to list time-invariant You do not need to list time-invariant

variables, but they will be converted to variables, but they will be converted to “long” as well.“long” as well.

Note that “reshape wide” will convert data Note that “reshape wide” will convert data

from long to wide format.from long to wide format. Seems to be touchy about year values. Seems to be touchy about year values.

‘99 for 1999 is ok, but ‘00 for 2000 is a ‘99 for 1999 is ok, but ‘00 for 2000 is a

problem.problem.

9797

Fixed Effects LogitFixed Effects Logit

)](1log[)1()(loglog

1)(

))((1

))(Pr(

)0Pr()1Pr(

0 if 1

0* if 1

)1,0( where*

iititiitit

X

X

iit

iit

iitit

itiitit

itiitit

itit

ititiitit

XGyXGyL

e

eXG

XG

Xu

uXy

uXy

yy

yuXy

iit

iit