Static Panel Data Suitable for Viewing

48
Slide 1 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics L14205 Applied Microeconometrics Lecture 1: Static Panel Data Modelling Professor Sourafel Girma [email protected]

Transcript of Static Panel Data Suitable for Viewing

  • Slide 1 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    L14205 Applied Microeconometrics Lecture 1:

    Static Panel Data Modelling

    Professor Sourafel Girma

    [email protected]

  • Slide 2 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    Lecture objectives :

    1. Explain the nature of panel data

    2. Discuss the modelling of time effects and estimation

    of robust standard errors.

    3. Discuss the estimation and testing of the random and

    fixed effects models.

    4. Explain the Hausman test for correlated effects.

    5. Demonstrate the practical estimation of panel data

    models.

  • Slide 3 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

    2.The example dataset

    3.Time effects

    4.Robust standard errors

    5.The random effects model

    6.The fixed effects model

    7. Summary

  • Slide 4 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1

    Introduction

  • Slide 5 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

    Year Invest Assets Invest Assets Invest Assets Invest Assets

    1980 33.1 1170.6 317.6 3078.5 209.9 1362.4 12.93 191.5

    1981 45 2015.8 391.8 4661.7 355.3 1807.1 25.9 516

    1982 77.2 2803.3 410.6 5387.1 469.9 2673.3 35.05 729

    1983 44.6 2039.7 257.7 2792.2 262.3 1801.9 22.89 560.4

    1984 48.1 2256.2 330.8 4313.2 230.4 1957.3 18.84 519.9

    1985 74.4 2132.2 461.2 4643.9 361.6 2202.9 28.57 628.5

    1986 113 1834.1 512 4551.2 472.8 2380.5 48.51 537.1

    1987 91.9 1588 448 3244.1 445.6 2168.6 43.34 561.2

    1988 61.3 1749.4 499.6 4053.7 361.6 1985.1 37.02 617.2

    1989 56.8 1687.2 547.5 4379.3 288.2 1813.9 37.81 626.7

    1990 93.6 2007.7 561.2 4840.9 258.7 1850.2 39.27 737.2

    Firm 1 Firm 2 Firm 3 Firm 4

    We can see the above dataset as 4 separate time series datasets, one for each firm.

    Alternatively, we can see it as 11 separate cross-sectional datasets, one for each year.

    Or we can see it as one big dataset by pooling (combining) the time series and cross sectional observations. In this case the

    pooled data set is called a panel data set.

  • Slide 6 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

    Firm Year Invest Assets

    1 1980 33.1 1170.6

    1 1981 45 2015.8

    1

    1

    2 1980 317.6 3078.5

    2 1981 391.8 4661.7

    2

    2

    3 1980 209.9 1362.4

    3 1981 355.3 1807.1

    3

    3

    4 1980 12.93 191.5

    4 1981 25.9 516

    4

    4

    If we pool the data, we have one big dataset.

    We can estimate one regression

    (y= Invest; X=Assets)

    yit = + xit + eit where i = 1, 2, 3, 4 (firm) and

    t= 1980, 1981, 1990(time period)

    we now have 44 observations in the model (4 x 11)

    Examples of indexing:

    Y3,1981 = 355.3

    X1,1980 = 1170.6

    If we apply a simple regression technique (OLS) to the model, we call

    this the Pooled Model.

  • Slide 7 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

    Typically panel data consist of observations for the same units (e.g. firms, countries, or individuals) across time (e.g. yearly or

    monthly). This is also referred to as longitudinal data.

    However any two or more dimensional data can be treated as panel data (e.g. universities-courses; regions-firms; farms-plots).

    Simple regression analysis using OLS may not always be adequate for such complicated datasets. Hence we may need

    special techniques or estimators.

    Two such estimators are especially useful in the context of linear static panel data models: the fixed effects and the random

    effects estimators.

    For all intents and purposes one can see linear static panel data modelling as a three-horse race: For the particular model we propose

    to estimate and the specific data in hand, which of the OLS, fixed effects

    and random effects estimators is most appropriate?

  • Slide 8 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

    The good news is that there are formal statistical tests that help identify the winner of this race, and in this lecture we will

    discuss the practical implementation of these tests via an

    empirical example. A typical panel data model can be written as

    = 0 + 1 + + ; i=1,,N;t=1,.T

    ~ 0,2

    ~ 0,2

    e is the usual idiosyncratic random error term which vary with

    i and t. The innovation in panel data modelling is the introduction

    of the term ui which is time-invariant for each individual unit. It

    is the permanent effect associated with individual unit and can

    be thought of as capturing unobserved individual heterogeneity.

  • Slide 9 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    For example if y denotes wages and x is education, ui would capture the impact of time-invariant individual characteristics such as ability or family connections that affect earnings.

    A three-horse race:

    1. There are no individual effects, that is 2=0 or = 0 for

    all i, in which case OLS will be most appropriate .

    2. There are individual effects and these are not correlated with the regressor x. In this case the random effects estimator will be most appropriate .

    3. There are individual effects and these are correlated with the regressor x. In this case, the fixed effects estimator is the only appropriate method, and the OLS and random effects estimators should not be used.

    1. Introduction

  • Slide 10 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    1. Introduction

  • Slide 11 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    2.

    The example dataset

  • Slide 12 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    2. Example dataset

    To fix ideas consider the research question Does advertisement work?

    A management consultant was asked to write a report on whether advertisement expenditure leads to statistically and economically

    significant improvements in company profitability.

    The consultant decided to use an econometric analysis based on a panel dataset of 250 companies over the period 2003-2006, and

    collected the following variables for this purpose:

    1. Log of companys profitability (profits) which is before-tax profits

    divided by sales.

    2. Log of companys market share (mkshare) in its industry (expected

    to have positive effect on profitability).

    3. Log of index of industry competition (expected to have negative

    effect on profits)

    4. Log of advertisement (advert) expenditure divided by sales ( ?).

    5. Panel unit identifier is company and time identifier is year

  • Slide 13 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    A screenshot of the dataset (advert.dta)

    2. Example dataset

  • Slide 14 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    Start by declaring which variable identifies panel units (company) and which one indicates time (year) :

    We can see that the panel data span the period 2003 to 2006, and

    it is a balanced panel. That is all companies were observed over

    the whole period.

    In an unbalanced panel data, different panel units have different

    number of observations.

    - e.g. some companies might be observed from 2003 to

    2006, while others are observed for 2004 and 2005 only.

    2. Example dataset

  • Slide 15 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    Summary statistics by year

    2. Example dataset

  • Slide 16 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    Take note of the within versus between companies

    variability of the variables

    2. Example dataset

  • Slide 17 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3.

    Time effects

  • Slide 18 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

    If we assume that all companies have the same profitability

    function, and this is stable over time, we can use OLS on the

    panel data. This is called the pooled regression.

    It seems that there is a negative albeit statistically insignificant relationship between advertisement and profitability.

  • Slide 19 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

    The pooled regression model is simple to estimate as it does

    not require the use of any special techniques. But it does not

    make full use of the richness of the panel data.

    Is the profit function really stable over the time period 2003-2006?

    Do all of the companies really have the same profits function?

    The pooled model ignores company heterogeneity and time

    differences, and this might lead to wrong conclusions. It is

    therefore advisable to check whether pooling is appropriate.

    One possibility is to test for time effects. For example, business

    cycle effects might be important in determining the overall level

    of company profitability. We can explore this possibility by re-

    estimating the model with year dummies.

  • Slide 20 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

    Testing for time effects involves three simple steps:

    1. Add time dummies to the pooled regression model.

    -- In our case, we have yearly data (2003-2006).

    -- So we include 3 year dummies ( say from 2004 to 2006)

    to avoid the dummy variable trap.

    2. Estimate the model using OLS.

    3. Test if the time dummies are jointly equal to zero.

    If we reject the null hypothesis that the time dummies

    are equal to zero, we conclude that there are time effects

    in the data and the pooled regression model is not

    appropriate.

    Next we demonstrate how to implement these steps in

    practice.

  • Slide 21 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

  • Slide 22 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

    Testing the joint significance of the time effects:

    No evidence that time (year) effects are significant. This means that the average level of profits (conditional on the regressors) has not fluctuated much during the sample period.

  • Slide 23 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    3. Time effects

    In the previous regression, the effect of advertisement on profits is assumed to be stable across the years. But it would be useful to explore whether the profits-advertisement relationship has changed over time by interacting the year dummies with advertisement.

  • Slide 24 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    4.

    Robust standard errors

  • Slide 25 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    4. Robust standard

    errors

    Heteroscedasticity is prevalent in cross sectional data and

    serial correlation is widespread in time series data.

    Since panel data is a combination of cross sectional and time

    series data, both problems are likely to be present.

    There are many methods of dealing with these problems in

    panel data, some more complicated than others. Here we

    consider the simplest but most widely used method.

    This method gives standard errors of regression coefficients

    that are robust to heteroscedasticity and serial correlation.

    These robust standard errors can be used to test hypotheses

    about the model parameters and construct confidence

    intervals.

  • Slide 26 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    4. Robust standard

    errors

    Serial correlation within each panel unit is sometimes referred

    to as clustering, and the robust standard errors are also known

    as clustered standard errors.

    Consider the following pooled panel data model

    Given independence over the panel units i, allows for

    heteroscedastic errors terms and unrestricted serial

    correlation within panel units. That is, for all t and s

    Let and denote the OLS estimator

    and the estimated residual term.

    .,...,1;,...1 TtNi

    xy ititit

    e

    0, isitCov ee

    ols olsititit xy e

  • Slide 27 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    4. Robust standard

    errors

    Assuming finite T and , the panel-robust estimator of the

    asymptotic variance covariance matrix is

    Where and .

    Note that if we only wanted to correct for heteroscedasicity ( that is

    assuming serially uncorrelated errors), the matrix W would simplify to

    WWVols

    W

    11 1

    N

    i

    T

    tititxxW

    WN

    i

    T

    t

    T

    sisitisit

    xx1 1 1

    ee

    2

    1 1it

    N

    i

    T

    tititxx e

    W

  • Slide 28 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    4. Robust standard

    errors

    Pooled model with panel robust standard errors.

  • Slide 29 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5.

    The Random Effects Model

  • Slide 30 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    Recall that: 1. A panel data model is called a random effects model if

    ui is not correlated with the regressors.

    2. By contrast a panel data model where the individual heterogeneity term is correlated with the regressors is referred to as the fixed effects or correlated effects model.

    3. The most important question in applied static panel data analysis is to determine which of the three contenders pooled, random effects and fixed effects models -- is the most appropriate for the data in hand.

  • Slide 31 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    Consider the following panel data model with the individual specific

    effects ui are assumed to be uncorrelated with the regressor, x:

    = 0 + 1 + +

    ~ 0, 2 and ~ 0,

    2

    i=1, .N; t=1, ,T.

    It can be shown that the model is best estimated by Generalised Least

    Squares ( GLS). The model is called the random effects model and the GLS

    estimator is usually called the random effects estimator. To demonstrate

    the mechanics of the random effects estimator, define the time means of

    y and x as

    T

    y

    y

    T

    iit

    i

    1 and

    T

    x

    x

    T

    iit

    i

    1

  • Slide 32 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    The random effects GLS estimator is equivalent to estimating the

    following transformed model by OLS

    where

    The above transformation is sometimes called the GLS

    transformation.

    In unbalanced panel with i=1, .N; t=1, ,Ti, the above

    transformation factor will have to be individual specific, i.e.

    ( ) = 0 1 + 1( ) +

    = 1

    2

    2+

    2

    = 1 + ( )

    = 1

    2

    2 +

    2

  • Slide 33 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    Estimating the random effects profitability model

    Advertisement doesnt seem to work?

  • Slide 34 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    Testing for individual heterogeneity in random effects models

    When there is individual heterogeneity, the random effects model

    is more efficient than the pooled model.

    But if there is no heterogeneity in the panel data, it is better to

    use the pooled model (apply OLS). So it is advisable to test for the

    presence of heterogeneity.

    The Breusch-Pagan test can be used for this purpose. The null

    hypothesis of this test states that there is no heterogeneity.

    Rejection of the null hypothesis can be taken as evidence in

    favour of the random effects model.

  • Slide 35 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    How to choose between REM vs. Pooled Regression ? Breusch and Pagan (1980) have devised a Lagrange

    Multiplier (LM) test for the REM against the Pooled Regression based on the OLS residuals.

    The hypotheses are:

    H0 : 2 = 0 Pooled regression is more appropriate)

    Ha: 2 0 ( REM is more appropriate)

    The LM test-statistic is based on the OLS (restricted model) residuals and follows a Chi-Square distribution with 1 degree of freedom :

    1

    121 1

    2

    1

    2

    1

    N

    i

    T

    iit

    N

    i

    T

    tit

    e

    e

    T

    NTLM

    N is the number of cross sections

    T is the number of time periods.

  • Slide 36 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    5. Random effects model

    Testing for individual heterogeneity in random effects models.

    In practice, use the following command right after estimating the

    random effects model

    P-value = 0, so reject the null hypothesis of no random effects. OLS would have been inefficient.

  • Slide 37 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6.

    The Fixed Effects Model

  • Slide 38 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    We saw that the random effects model is preferable to the pooled

    model if there is individual heterogeneity in the panel data.

    But recall that the random effects model assumes that the individual

    heterogeneity term is not correlated with the regressors of the

    model).

    If this assumption is not correct, and the regressors and ui are indeed

    correlated, and the random effects model would be inappropriate.

    The fixed effects model which allows for correlation between the

    regressors and the heterogeneity term should be used.

    The fixed effects model is sometimes referred to as the Least Squares

    Dummy Variables model.

  • Slide 39 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Consider our panel data model

    = 0 + 1 + +

    But now assume that ui is correlated with x:

    Because of this correlation, OLS will be biased and inconsistent. For

    this reason, we first eliminate the effects through the so-called within transformation of the data and then estimate the transformed model using OLS:

    The resulting estimator is called the within estimator, and it unbiased

    and consistent.

    ( ) = 1( ) + ( )

  • Slide 40 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    An alternative to the within transformation for dealing with regressor-

    individual effects correlation is the first-differenced transformation:

    When T=2 , the first-differenced and the within estimators are

    algebraically equivalent.

    However, a drawback of the within and first-differenced

    transformations is that they also eliminate all variables that are time-

    invariant. For example if the profitability regression model includes the

    gender of the manager as an explanatory variable, this variable will

    drop out of the transformed model.

    ( 1) = 1( 1) + ( 1)

  • Slide 41 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Estimating the fixed effects model

    Advertisement seems to work!

    The fixed effects estimator is also called least squares dummy variables estimator. Here 249 companies dummies are (implicitly) used, and the F-test shows that these are jointly significant: another evidence that there are individual specific effects.

    For the time being , we are ignoring the possibility of non-iid errors.

  • Slide 42 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Fixed or random effects model?

    The choice between the fixed and random effects models is an

    important issue in applied panel data analysis.

    If the individual effects (heterogeneity) and the regressors are

    uncorrelated, use the random effects model as it is the best

    (although the fixed effects model is still useful).

    If the regressors and the individual effects are correlated, choose

    the fixed effects model and never use the random effects model.

    The test used to choose between the two models is known as

    the Hausman test.

    The null hypothesis of this test states that there is no

    correlation between regressors and individual effects. So

    rejection of the null favours the fixed effects model.

  • Slide 43 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    The Hausman test

    H0: Regressors and effects ( heterogeneity) are not correlated

    H1: They are correlated

    Under H0 the Hausman test statistic is distributed as a Chi-Square

    random variable with degrees of freedom equal to the number of

    regressors.

    The formula of the Hausman test statistic is

    Where the FEM and REM indices are used to denote the fixed and random effects estimators respectively, and

  • Slide 44 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Going back to our empirical example, results from the random effects (RE) estimator appear to suggest that there is no relationship between advertisement and profitability. By contrast the fixed effect (FE) estimator would appear to suggest that advertisement works. Which model should we trust more? Enter the Hausman test!

    Reject the null hypothesis that RE model

    is best. Discard RE results and base

    your analysis on FE model.

  • Slide 45 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Micro panel data are typically characterised by heteroscedasticity and

    within units serial correlation. So it is advisable to estimate the random

    and fixed effects using robust standard errors.

    One important practical implication of doing so is that the standard

    Hausman test does not work with non-i.i.d errors ( heteroscedastic and

    serially correlated ). For instance when ~ 0, 2 .

    Instead a robust version of the Hausman test should be used. This

    involves estimating the following model by OLS with robust standard

    errors :

    where l is as defined on Slide 32.

    Testing the null hypothesis 20 is then equivalent to testing the null

    hypothesis that the individual effects are not correlated with x.

    One way to practically implement this approach is to use the user-written

    Stata program xtoverid (you should be able to install this into your machine

    by typing ssc install xtoverid , replace from within Stata).

    = 0 1 + 1( ) + 2( ) +

  • Slide 46 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    6. Fixed effects model

    Robust version of Hausman test using xtoverid command, which in this case is given by the Sargan-Hansen statistic

    Thus reject the RE model in favour of the FE model.

  • Slide 47 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    7.

    Summary

  • Slide 48 of 48 Lecture 1: Static panel data modelling L14025 Applied Microeconometrics

    7. Summary

    The estimation of static linear panel data models boils down to the choice between three estimators:

    1. The pooled model should be used when there is no individual

    heterogeneity in the model.

    2. When there is individual heterogeneity and it is not correlated with the

    independent variables of the model, the random effects model should be

    preferred.

    3. The Hausman test helps us decide whether this is the case or not. If the

    individual heterogeneity is correlated with the independent variables, the

    fixed effects model should be used.

    THANK YOU!