Topic1 Panel

download Topic1 Panel

of 16

Transcript of Topic1 Panel

  • 8/3/2019 Topic1 Panel

    1/16

    Panel Data

  • 8/3/2019 Topic1 Panel

    2/16

    Outline

    Panel Data

    Fixed-effects vs. random-effects

    First-differencing or fixed-effects

    Strict Exogeneity Assumption

  • 8/3/2019 Topic1 Panel

    3/16

    Panel Data (or Longitudinal Data)

    A typical panel data set has both across-sectional dimension and a timeseries dimension. In particular, thesame cross-sectional units (e.g.individuals, families, firms, cities,states) are observed over time. Panel data is different from pooling

    independent cross sections across time(or pooled OLS). Estimating the latteris a simple extension ofOLS.

  • 8/3/2019 Topic1 Panel

    4/16

    Large N or Large T? N is the number of cross-sectional units and T is

    the number of time periods.

    Small N and small T (of little use)

    * Large N and small T (Traditional Panel Data)

    N is large enough for the Law of Large Numbers toapply while T is not.

    Convenient to use if cross-sectional units areindependent.

    Small N and Large T

    T is large enough for the Law of Large Numbers toapply while N is not.

    Autocorrelation has to be addressed.

    Large N and Large T (Still under exploration)

  • 8/3/2019 Topic1 Panel

    5/16

    Fixed Effects Panel-data Model

    (individual-specific intercepts)

    yit=0+t+1xit1+2xit2+ai+uit

    Strict Exogeneity Assumption

    Cov(Xit,uis)=0 for all tand s Ruling out dynamic models, which have lagged

    dependent variables (e.g. yi,t-1) as explanatoryvariables. Models with the lags of dependentvariables as ind. Var. are still fine.

    The effects of time-constant independentvariables can not be directly estimated becausethey are mixed in ai

    t (time-specific intercepts) controls forcommon shocks to all agents at period t.

  • 8/3/2019 Topic1 Panel

    6/16

    Names The individual-specific intercept ai may be called ai fixed

    effector unobserved heterogenity.

    The term uit is called idiosyncratic error.

    The sum ai+uit is often called the composite error.

    If Cov(Xit,ai) is nonzero but the pooled OLS method isused, estimates of all parameters might be biased.This

    bias can be called heterogeneitybias.

    Balanced Panelindicates panel data with observationsfor the same time periods for all individuals. Otherwise,the data are unbalanced.

  • 8/3/2019 Topic1 Panel

    7/16

    Random Effects Models

    yit=0+t+1xit1+2xit2+ai+uit

    Key assumption: ai is uncorrelated with each explanatory variable in all

    time periods.

    Difference between RE and FE estimators

    In FE, we effectively control for ai using dummy

    variables. In RE, ai is omitted and is part of the disturbance

    RE estimates are more efficient (or more precise) ifthe RE assumption is valid.

  • 8/3/2019 Topic1 Panel

    8/16

    Random Effects Models

    (continued)

    Difference between RE and pooled OLS Since ai is in the error term, observations over time

    are correlated for the same individual i

    In RE approach, the correlation over time iseliminated using some sophisticated GLS(generalized least square) method.

    In pooled OLS, the GLS correction is not used.

    Hauman test

    Compare the RE and FE estimates, if theestimates are very different, then the REassumption is probably invalid. In this case FEhas to be used. Otherwise, RE is more efficient.

  • 8/3/2019 Topic1 Panel

    9/16

    Estimation of the Fixed-effect Panel

    Data Model Fixed-effects (or Within) Estimator

    Each variable is demeaned (i.e. subtracted by itsaverage)

    Dummy Variable Regression (i.e. put in adummy variable for each cross-sectional unit,along with other explanatory variables.) Thismay cause estimation difficulty when N is large.

    First-difference Estimator Each variable is differenced once over time, so

    we are effectively estimating the relationshipbetween changes of variables.

  • 8/3/2019 Topic1 Panel

    10/16

    First Differencing or Fixed-Effect? Theoretically, when N is large and T is small but

    greater than 2, FE is more efficient when uit areserially uncorrelated while FD is more efficient whenuit follows a random walk.

    When T is large and N is small

    FD has advantage for processes with large positiveautocorrelation. FE is more sensitive to nonnormality,heteroskedasticity, and serial correlation in the

    idiosyncratic errors. On the other hand, FE is less sensitive to violation of

    the strict exogeneity assumption. So FE is preferredwhen the processes are weakly dependent over time.

  • 8/3/2019 Topic1 Panel

    11/16

    With Classical Measurement Errors

    When T>2, the measurement errorbias using FE estimator may be

    smaller than that with FD approachbut higher than that with OLS.(Griliches and Hausman, 1986)

    Natural IV for Measurement Error:Lagged dependent variables

  • 8/3/2019 Topic1 Panel

    12/16

    Violation of the Strict Exogeneity

    Assumption

    Parameter estimates are inconsistent,natural experiment approach (e.g. IV)

    is needed.

  • 8/3/2019 Topic1 Panel

    13/16

    With Strict Exogeneity and

    DependentO

    bservations Parameter estimates are consistent

    Standard errors estimates co

    uld still bebiased:

    Cross-sectional correlation or serial correlation(over time) in error terms

    Heteroskedasticity

  • 8/3/2019 Topic1 Panel

    14/16

    Possible Solutions (Need Large N and

    Zero Cross-Sectional Correlation) Heteroskedasticity

    Use White robust standard errors

    Autocorrelation

    Group the sample time dimension into twoperiods and apply the first-difference estimator(need large N). (Perform the best with D-in-Dapproach by Bertrand et al. 2004)

    Clustered robust errors Newey-West standard errors (which also

    accounts for heteroskedasticity) Cross-sectional Correlations

    Clustered robust errors

  • 8/3/2019 Topic1 Panel

    15/16

    Clustered Standard Errors

    Key Assumption

    Correlations within a cluster (a group of firms, aregion, different years for the same firm, differentyears for the same region) are the same are thesame for different observations.

    Procedure

    Identify clusters using economic theory (clustered byindustry, year, industry and year)

    Let comp

    uter calc

    ulate cl

    ustered standard errors

    Try different ways of defining clusters and see howestimated standard errors are affected.

  • 8/3/2019 Topic1 Panel

    16/16

    Unbalanced Panels If a panel data set is unbalanced for

    reasons uncorrelated with uit, estimationconsistency using FE will not be affected

    The attrition problem: If an unbalancedpanel is a result of some selection processrelated to uit, then endogeneity problem ispresent and need to be dealt with usingsome correction methods.