Introduction into Panel Data Regression Using Eviews · PDF fileIntroduction into Panel Data...

Introduction into Panel Data Regression Using Eviews and stata

Hamrit mouhcene

University of khenchela Algeria

[email protected]

phone +213778080398

Panel data is a model which comprises variables that vary across time and cross section, in this

paper we will describe the techniques used with this model including a pooled regression, a fixed

effect and a random effect, by the following equation:

yi t = α + βxi t + ui t

the simplest way to deal with such data is to estimate pooled regression, which means estimating

a single equation that contains all variables, cross-sectional data and time series data are in one

column, assessing this equation usually using OLS; however, this method has several limitations,

it ignores the common variations and structures that exist in the series over time and cross

sectional data and the unobserved heterogeneity besides. In these circumstances, the error term

would be correlated with some of the regressors therefore the coefficients of the model will be

biased as well as inconsistent.

- The first difference estimator

We can illustrate this model by the following equation:

𝑦𝑖𝑡 = 𝑏0 + 𝑏1𝑥𝑖𝑡 + 𝑣𝑡 + 𝑗𝑖 + 𝑢𝑖𝑡

Where the dependent variable varies across time 𝑡 and cross section 𝐼 , the regressor also varies

across time and cross section, 𝑣𝑡 is a variable that varies over time but it is constant cross

sectional. For example, time trend in price of something, which is constant over all entities and

the variable 𝑗𝑖 captures all variables that affect the dependant variable but they are constant over

time; those variables may include gender, geography, education and so on.

We can use a dummy variable to show the effect of variable 𝑣𝑡 in each period but we should pay

attention to the problem of perfect multicolinearity that emerge by using a number of dummies

equal to variables, then the equation above will be

𝑦𝑖𝑡 = 𝑏0 + 𝑏1𝑥𝑖𝑡 + 𝛼1𝑑2𝑡 + 𝛼2𝑑3𝑡 + ⋯ . . +𝛼𝑡𝑑𝑡𝑇 + 𝐽𝑖 + 𝑢𝑖𝑡

We can denote 𝐽𝑖 + 𝑢𝑖𝑡 by 𝛿𝑖𝑡 , estimating this equation by OLS would yield errors that are both

biased and inconsistent due to the fact that if we assume 𝑢𝑖𝑡 uncorrelated with the regressor we

cannot assume that the variable 𝐽𝑖 is uncorrelated with one regressor or more.

mailto:[email protected]

𝑐𝑜𝑣(𝛿𝑖𝑡, 𝑥𝑖𝑡) = 𝑐𝑜𝑣(𝐽𝑖 + 𝑢𝑖𝑡, 𝑥𝑖𝑡) = 𝑐𝑜𝑣(𝑗𝑖, 𝑥𝑖𝑡) ≠ 0

In this case we can use the first difference estimator rather than looking at the level of variables;

we can take the first difference as follow:

𝑦𝑖𝑡 − 𝑦𝑖𝑡−1 = ∆𝑦𝑖𝑡 = 𝑏∆𝑥𝑖𝑡 + 𝛼1∆𝑑2𝑡 + ⋯ … … … + 𝑗𝑖 − 𝑗𝑖 + ∆𝑢𝑖𝑡

By taking the first difference we have removed the issue of unobserved heterogeneity; if we have

𝑐𝑜𝑣(∆𝑥𝑖𝑡, ∆𝑢𝑖𝑡) = 0 besides, assuming that we have no heterogeneity and serial correlation then

we should have consistent.

When first difference taken by the variation over time as well as across cross- section would be

removed, this may run into problems in inference because the standard errors would potentially

be small and even though there may be significant variation in levels. Thus, another con of this

method is that we have no time independent factors because the unobserved heterogeneity

removed.

Another method of estimating the unobserved heterogeneity is Least Square Dummy Variable

which includes a dummy variable for each cross-section variable so the equation will look like:

𝑦𝑖𝑡 = 𝑏1𝑥𝑖𝑡 + 𝛼1𝑑2 + 𝛼2𝑑3 + ⋯ … . +𝑢𝑖𝑡

By including dummy variable we allow for each cross-section variable to have different value of

intercept (different value of an unobserved heterogeneity). If there is no problem of endogeneity

the errors are not serially correlated and homoscedastic besides; then parameters of the model

will be consistent and exactly equivalent to fixed effect estimators.

One of the pros concerning LSDV represents by the effect of time constant factors. Moreover, if

the sample size is not very large; introducing too many dummies will leave few observations to

do meaningful statistical analysis which means that every additional dummy will cost an

additional degree of freedom.

- The difference between fixed effect and first difference:

Under the assumption that we have:

1) Strict exogeneity (the errors are not only uncorrelated with the explanatory variable in this

period but also uncorrelated with the expected values of the explanatory variable;

2) Random sample in cross-section;

3) Some variation in the variables.

In this case, the first difference and the fixed effect both are unbiased and consistent then we

need another criteria to compare these two estimators which relative to efficiency that depends

on whether we have serially uncorrelated errors in our original model or not.

𝑦𝑖𝑡 = 𝑏1𝑥𝑖𝑡 + 𝛼𝑖 + 𝑢𝑖𝑡

If we assume for this particular idiosyncratic error that they are serially uncorrelated (not

correlated across time) for the reason that we don’t have to worry about correlation across cross-

section because we have assumed that they’re random sampling over cross-section,

mathematically speaking: 𝑐𝑜𝑣(𝑢𝑖𝑡, 𝑢𝑖𝑡−1) = 0

The first difference model will be:

∆𝑦𝑖𝑡 = 𝑏1∆𝑥𝑖𝑡 + ∆𝑢𝑖𝑡

For the first difference to be efficient we require ∆𝑢𝑖𝑡 to be serially uncorrelated

𝑐𝑜𝑣(∆𝑢𝑖𝑡, ∆𝑢𝑖𝑡−1) = 𝑐𝑜𝑣(𝑢𝑖𝑡 − 𝑢𝑖𝑡−1, 𝑢𝑖𝑡−1 − 𝑢𝑖𝑡−2) = 𝑣𝑎𝑟(𝑢𝑖𝑡−1)

In this case, the standard error of first difference will be higher than the standard error of fixed

effect estimator which indicates that the fixed effect is better than first difference.

If the idiosyncratic error follows random walk 𝑢𝑖𝑡 = 𝑢𝑖𝑡−1 + 휀𝑖𝑡 when 휀𝑖𝑡 represents a white

noise; in this case the difference in ∆𝑢𝑖𝑡 = 휀𝑖𝑡 the, it is better to use first difference because the

standard error of parameter will be small in comparison with the parameters obtained by the

fixed effect.

If the errors in our original model follow AR process 𝑢𝑖𝑡 = 𝜌𝑢𝑖𝑡−1 + 휀𝑖𝑡 it is hard to evaluate the

difference between first effect and first difference and moreover it depends on the parameter 𝜌

- If 𝜌 close to 1, it is better to use first difference.

- If 𝜌 close to 0, it is better to use first difference.

How do we know if we have serially correlated error in the original model?

We compute 𝑐𝑜𝑣(∆𝑢𝑖𝑡, ∆𝑢𝑖𝑡−1) ; if it was different from 0 that’s an indicative of the fact we

may have error that serially uncorrelated in the original model then it is better to use fixed effect.

If it was close to 0 we should use first difference.

If 𝑡 > 𝑛 the time is much greater than sample size, the fixed effect will be quite sensitive to any

violation of the assumptions that we required for it to produce unbiased and consistent

estimators; For instance, if we include a variable that is non-stationary, in addition we use fixed

effect we could be regressing non-stationary variable on another non stationary variable;

consequently, it is likely a problem of spurious regression. In this case, it is much better to use

first difference estimators.

When 𝑡 is very large the fixed effect is less sensitive and more robust in regard to strict

exogeneity than the first difference.

- Random effect model

If we assume that 𝑣𝑜𝑣(𝑗𝑖, 𝑥𝑖) = 0 in the case that we have controlled the entire dependant factors

by including them in the equation or in the case that the effect of the unobserved heterogeneity is

very small

If this assumption is true then we can just use pooled OLS because the unobserved heterogeneity

is removed, thus the parameters of pooled OLS will be consistent even though the pooled OLS

have some problems.

𝑦𝑖𝑡 = 𝑏0 + 𝑏1𝑥𝑖𝑡 + 𝑣𝑡 + 𝑗𝑖 + 𝑢𝑖𝑡 If we denote the composite error term 𝛿𝑖𝑡 = 𝑗𝑖 + 𝑢𝑖𝑡 then:

𝑐𝑜𝑣(𝛿𝑖𝑡, 𝛿𝑖𝑠) = 𝑐𝑜𝑣(𝑗𝑖 + 𝑢𝑖𝑡, 𝑗𝑖 + 𝑢𝑖𝑠) = 𝑣𝑎𝑟𝑗𝑖 = 𝜎𝑗2 We assumed that there is no covariance

between 𝑗𝑖𝑎𝑛𝑑 𝑢𝑖𝑡 and 𝑢𝑖𝑠 which indicates the correlation between two composite errors of the

same cross -section but two different points in time.

In this case even though we assume no endogeneity 𝐼; our original model 𝑣𝑜𝑣(𝑗𝑖, 𝑥𝑖) = 0 if we

estimate the model by pooled OLS, the errors will be serially correlated with one by one then we

should look for another method that correct the serial correlation of errors which known as

Fusible Generalized Least Squares or Random Effect Model.

Random Effect Model working by transforming the original model by some amount 𝜃 which

represents the one minus the coefficient of correlation between two composite errors, we can

write the model as follow:

𝑦𝑖𝑡 − 𝜃�̅�𝑖 = 𝑏0 − 𝜃𝑏0 + 𝑏1(𝑥𝑖𝑡 − 𝜃�̅�𝑖) + 𝛿𝑖𝑡 − 𝜃𝛿̅ 𝑖

𝜃 Is the coefficient of correlation between two composite errors, which equal to:

𝜃 = 1 − 𝑐𝑜𝑟𝑟(𝛿𝑖𝑡, 𝛿𝑖𝑠) = 1 −𝜎𝑗

2

𝜎𝑗2 + 𝜎𝑢

2=

𝜎𝑢2

𝜎𝑢2 + 𝜎𝑗

2.

There are two cases about 𝜃 should be noted:

- If 𝜎𝑗2 = 0 then 𝜃 = 0 therefore the random effect will be equivalent to pooled OLS.

- If 𝜎𝑗2 → ∞ then 𝜃 = 1 therefore the random effect will be equivalent to fixed effect. We

should use fixed effect rather than random effect

To estimate 𝜃 we must estimate 𝜎𝑗2𝑎𝑛𝑑 𝜎𝑢

2 by either pooled OLS or fixed effect and after that we

use the estimated value of 𝜃 to transform the model.

The assumptions required for random effect to be consistent are:

- 𝑐𝑜𝑣(𝑗𝑖, 𝑥𝑖𝑡) = 0 no heterogeneity

- The cross-section data are randomly selected.

- 𝐸(𝑢𝑖𝑡/𝑥𝑖𝑠 , 𝑗𝑖) = 0 strict exogeneity

- 𝑛𝑜 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑐𝑜𝑙𝑖𝑛𝑒𝑎𝑟𝑖𝑡𝑦.

If these assumptions are satisfied then random effect estimators are consistent.

One of the major benefits of random effect model is allowing the estimation of the variables’

effects which are constant over time on the dependent variable in our original model

𝑦𝑖𝑡 − 𝜃�̅�𝑖 = 𝑏0 − 𝜃𝑏0 + 𝑏1(𝑥𝑖𝑡 − 𝜃�̅�𝑖) + 𝑏2(𝑔𝑒𝑜𝑔𝑖 − 𝜃𝑔𝑒𝑜𝑔𝑖) + 𝑗𝑖 + 𝛿𝑖𝑡 − 𝜃𝛿̅ 𝑖

We could include the variable geog which is constant over time and 𝑗𝑖 in this case will contain

the remaining time constant factors.

In general 𝜃 lies between 0 and 1 and if 𝜃 ≠ 1 we can estimate the effect of this time constant

factors

From the above discussions we’ve undertaken so far; in order to estimate random effect model

we have to assume 𝑐𝑜𝑣(𝑗𝑖, 𝑥𝑖𝑡) = 0

The later assumption along the other assumptions that we have discussed so far are sufficient for

the parameters of random effect to be consistent and more efficient than the fixed effect.

However, this assumption is not always being true. If this turns out to be the case the random

effect will be inconsistent whereas the fixed effect always consistent, whether or not this

covariance is equal to zero another cons of random effect depends on 𝑗𝑖 which is treated as

rondom whereas if fixed is treated as fixed to be estimated essentially by including dummies

variables for each cross-section .

There is a particular test that we can use to test whether we should use fixed effect or random

effect which known as houseman test.

Random effect essentially assume that the covariance 𝑐𝑜𝑣(𝑗𝑖, 𝑥𝑖𝑡) = 0 and if it is the case both

random effect and fixed effect are consistent, but random effect is more efficient, if this

assumption above isn’t true then fixed effect is solely consistent.

In the basis of this statement, we’ll be able to start looking at the houseman test and this test is

comparing the consistency of parameters of both random effect and fixed effect with relative

gain of efficiency which could be obtained by using random effect over fixed effect and this test

for one explanatory variable will look like this:

ℎ =(�̂�𝑓𝑒 − �̂�𝑟𝑒)2

𝑣𝑎𝑟�̂�𝑓𝑒 − 𝑣𝑎𝑟�̂�𝑟𝑒

→ 𝑋12

Under the null hypothesis being true it follows chi-squared with degrees of freedom equal to the

number of regressors, what is the idea of test?

The null hypothesis of the test is 𝑐𝑜𝑣(𝑗𝑖, 𝑥𝑖𝑡) = 0 if it was true then random effect should be

used and if it was false a fixed effect should be used. What is the intuition behind that?

If the null hypothesis is true then we know that both fixed effect and random effect are consistent

and difference between them which is numerator will be quite small and the variance of fixed

effect will be greater than the variance of random effect and the denominator will be quite large

then the h statistics will be quite small (we don’t reject the null hypothesis).

If the null hypothesis is false the numerator will be relatively large because it would be some

differences in point estimate between fixed effect and random effect because the fixed effect is

the only one which is consistent and therefore the value of the numerator will be large and far

away from zero which is quite unlikely that we would get a value which is far away from zero.

If the null hypothesis is true then we reject the null hypothesis.

In these circumstances we can use the time demeaned variables

�̅�𝑖 =1

𝑡∑ 𝑦𝑖

𝑇𝑡=𝑡 The time demeaned of the dependant variable and the same manner for all the

explanatory variables then equation will be like this:

�̅�𝑖 = 𝑏0 + 𝑏1�̅�𝑖 + 𝑗𝑖 + �̅�𝑖 when we subtract this equation from original equation we get 𝑦𝑖𝑡 −

�̅�𝑖 = 𝑏1(𝑥𝑖𝑡 − �̅�𝑖) + 𝑗𝑖 − 𝑗𝑖 + 𝑢𝑖𝑡 − �̅�𝑖 in this case the problem of unobserved heterogeneity will

be rid off by eliminating the effect of variable 𝑗𝑖, compared to pooled estimation this method

give a consistent estimator under the condition 𝑐𝑜𝑣(𝑥𝑖𝑡., 𝑢𝑖𝑡) = 0 being satisfied which known as

weak exogeneity, this assumption along with guass –markov assumption are sufficient to ensure

that the fixed effects are consistent but they have a large variance because when we express

variables as deviation from their mean values , the variance of the mean-corrected values will be

much smaller than the variance of the original values, in this situation the variance of the

disturbance term will be relatively large leading to higher standard errors of the estimated

coefficients.

When we cancel the effect of unobserved heterogeneity how we can go ahead and estimate it?

Because we need to know how these omitted variables affect the endogenous variable.

First of all, we estimate the model by OLS to get the estimated coefficients and therefore using

this estimated coefficient to obtain the element 𝑗𝑖

�̃�𝑖𝑡 = �̂�1�̃�𝑖𝑡 + �̃�𝑖𝑡 When the variable �̃�𝑖𝑡 represent the time demeaned variable 𝑦𝑖𝑡 − �̅�𝑖

Then we estimate 𝑗𝑖

𝑗�̂� = �̅�𝑖 − �̂�1�̅�𝑖 Because of all variables in the right side are represented by values fixed over time

in harmony with the variable in the left side which also does not vary across time; in this case the

𝑗𝑖is an unbiased estimator of the true parameter 𝑗𝑖 mathematically 𝐸(𝑗�̂�) = 𝑗𝑖

The pooled OLS regression

The simplest way to deal with such data would be estimating a pooled regression which would

involve estimating a single equation on all data together.

The panel data in this example is a balanced panel because the number of time observation is the

same for each individual (10), and it is also called short panel because the number of cross-

section (47) is greater than time observation consider the following function these data are

obtained from Guajarati, econometrics by example).

𝑐𝑖𝑡 = 𝑏1𝑖 + 𝑏2𝑎𝑔𝑒𝑖𝑡 + 𝑏3𝑖𝑛𝑐𝑜𝑚𝑒𝑖𝑡 + 𝑏4𝑝𝑟𝑖𝑐𝑒𝑖𝑡 + 𝑏5𝑚𝑠𝑖𝑡 + 𝑢𝑖𝑡

The dependant variable c: the sum of cash and other property contributions

Income adjust gross income

Price one minus the marginal income tax rate

Age dummy variable equal 1 if the taxpayer is over 64 and 0 otherwise

Ms Dummy variable equal to 1 if the taxpayer is married, 0 otherwise.

Because we have dummies we can’t use the log transformation of variables

Dependent Variable: CHARITY

Method: Panel Least Squares

Sample: 2000 2009

Periods included: 10

Total panel (balanced) observations: 470

Variable Coefficient Std. Error t-Statistic Prob.

C -4.430906 1.318788 -3.359833 0.0008

AGE 1.376106 0.216538 6.355043 0.0000

INCOME 1.024039 0.131100 7.811139 0.0000

PRICE 0.531058 0.210894 2.518126 0.0121

MS 0.384056 0.161051 2.384683 0.0175

R-squared 0.196220 Mean dependent var 6.577150

Adjusted R-squared 0.189305 S.D. dependent var 1.313659

S.E. of regression 1.182800 Akaike info criterion 3.184228

Sum squared resid 650.5427 Schwarz criterion 3.228406

Log likelihood -743.2936 Hannan-Quinn criter. 3.201609

F-statistic 28.37905 Durbin-Watson stat 0.669074

Prob(F-statistic) 0.000000

from the table we see that all the variables including: age income, price and Ms have significant

positive effect on the dependant variable; the parameter is negative (-4.43) and assumed to be the

same for each cross-section and time period, age seems to be more effective in this model but

surprisingly the values of R Squared and durbin-watson are very small which may indicate a

problem of misspecification and therefore the estimated coefficient may be inconsistent as well

as biased.

Fixed effect model

In the estimation of fixed effect model we allow for each cross-section to have a particular effect,

the intercept estimators in fixed effect are of special interest they can be used to analyze the

extent of variable heterogeneity and to examine any particular cross section that may be of

interest

The estimation of fixed model presented in table below


Method: Panel Least Squares

Sample: 2000 2009


Cross-sections included: 47



C -2.242644 1.129317 -1.985840 0.0477

PRICE 0.349456 0.124105 2.815821 0.0051

AGE 0.146090 0.206655 0.706927 0.4800

INCOME 0.839893 0.111478 7.534161 0.0000

MS 0.110532 0.258523 0.427551 0.6692

Effects Specification

Cross-section fixed (dummy variables)



S.E. of regression 0.678460 Akaike info criterion 2.164176

Sum squared resid 192.8689 Schwarz criterion 2.614792

Log likelihood -457.5815 Hannan-Quinn criter. 2.341460



Note: the errors variance must be corrected because we have to estimate N means in computing

average by dividing RSS on NT-N-2 rather than NT-2.

We can see from the table that both age and Ms are insignificant at the level 5 in percent, and in

this case the constant -2.24 represent the average effect of all 47 cross-sections, The significance

of all individual fixed effect dummies can be shown using Stata

We suppress the automatic constant term as its inclusion would create exact collinearity (47

dummies +4 explanatory variable).

Next, it is worth determining whether the fixed effects are necessary or not by running a

redundant fixed effects test which means F-test between the restricted model(pooled OLS) and

the unrestricted model (fixed effect model ). We can see that the fixed effect model is not

redundant that is to say all fixed effect coefficient together are highly significant at 5 percent,

suggesting that pooled OLS hide the heterogeneity among the cross-section variable.

In this example, we have allowed the intercepts to differ from entity to another, which called one

way fixed effect but if, we have allowed to these intercepts to vary across time and cross-section

the model in this case called two-way fixed effect model.

m s

. 1 1 0 5 3 1 8

. 2 5 8 5 2 3 3

0 . 4 3

0 . 6 6 9

- . 3 9 7 6 3 2 4

. 6 1 8 6 9 6 a g e

. 1 4 6 0 8 9 9

. 2 0 6 6 5 5

0 . 7 1

0 . 4 8 0

- . 2 6 0 1 1 9 7

. 5 5 2 2 9 9 6

p r i c e

. 3 4 9 4 5 6 3

. 1 2 4 1 0 4 6

2 . 8 2

0 . 0 0 5

. 1 0 5 5 1 1 1

. 5 9 3 4 0 1 5 i n c o m e

. 8 3 9 8 9 2 8

. 1 1 1 4 7 7 9

7 . 5 3

0 . 0 0 0

. 6 2 0 7 6 7 1

1 . 0 5 9 0 1 8

4 7

- 3 . 2 5 7 3 3 8

1 . 2 1 9 7 6

- 2 . 6 7

0 . 0 0 8

- 5 . 6 5 4 9 4 9

- . 8 5 9 7 2 6 2

4 6

- 3 . 4 4 6 2 7 8

1 . 1 3 1 5 7 7

- 3 . 0 5

0 . 0 0 2

- 5 . 6 7 0 5 5 2

- 1 . 2 2 2 0 0 3 4 5

- . 8 1 0 7 9 1 8

1 . 1 8 5 4 1 5

- 0 . 6 8

0 . 4 9 4

- 3 . 1 4 0 8 9 4

1 . 5 1 9 3 1

4 4

- 2 . 5 4 8 2 1 7

1 . 1 9 1 5 5 8

- 2 . 1 4

0 . 0 3 3

- 4 . 8 9 0 3 9 3

- . 2 0 6 0 4 1 4 3

- 2 . 5 8 6 9 5 4

1 . 1 5 0 9 2 6

- 2 . 2 5

0 . 0 2 5

- 4 . 8 4 9 2 6 3

- . 3 2 4 6 4 4 4

4 2

- 2 . 0 9 8 9 0 6

1 . 0 8 7 0 2 4

- 1 . 9 3

0 . 0 5 4

- 4 . 2 3 5 6 0 7

. 0 3 7 7 9 3 8 4 1

- 2 . 4 7 6 0 5 2

1 . 1 8 2 1 6 9

- 2 . 0 9

0 . 0 3 7

- 4 . 7 9 9 7 7 3

- . 1 5 2 3 3 1 2

4 0

- 2 . 9 4 2 0 0 1

1 . 1 7 9 9 4 3

- 2 . 4 9

0 . 0 1 3

- 5 . 2 6 1 3 4 6

- . 6 2 2 6 5 6 4 3 9

- 3 . 2 4 4 1 6 2

1 . 1 4 2 5 3 1

- 2 . 8 4

0 . 0 0 5

- 5 . 4 8 9 9 6 9

- . 9 9 8 3 5 4 1

3 8

- 5 . 4 3 0 7 1 2

1 . 1 3 4 3 4 6

- 4 . 7 9

0 . 0 0 0

- 7 . 6 6 0 4 2 9

- 3 . 2 0 0 9 9 4 3 7

- 1 . 4 8 9 4 0 4

1 . 1 8 9 6 7 8

- 1 . 2 5

0 . 2 1 1

- 3 . 8 2 7 8 8 6

. 8 4 9 0 7 6 9

3 6

- 1 . 4 1 5 3 8 7

1 . 1 4 6 1 1 5

- 1 . 2 3

0 . 2 1 8

- 3 . 6 6 8 2 3 8

. 8 3 7 4 6 4 2 3 5

- 2 . 8 8 6 2 8 4

1 . 1 4 9 0 1 8

- 2 . 5 1

0 . 0 1 2

- 5 . 1 4 4 8 4 2

- . 6 2 7 7 2 6

3 4

- 1 . 1 4 6 9 4 1

1 . 1 1 5 1 5 6

- 1 . 0 3

0 . 3 0 4

- 3 . 3 3 8 9 3 9

1 . 0 4 5 0 5 7 3 3

- 2 . 1 6 2 8 1 4

1 . 1 5 7 9 4 5

- 1 . 8 7

0 . 0 6 2

- 4 . 4 3 8 9 1 9

. 1 1 3 2 9 2 1

3 2

- 1 . 0 6 8 2 8 4

1 . 1 0 4 8 9 2

- 0 . 9 7

0 . 3 3 4

- 3 . 2 4 0 1 0 6

1 . 1 0 3 5 3 8 3 1

- 1 . 8 5 6 9 8 9

1 . 1 3 3 5 3

- 1 . 6 4

0 . 1 0 2

- 4 . 0 8 5 1 0 2

. 3 7 1 1 2 3 7

3 0

- 1 . 6 1 4 3 7 2

1 . 1 3 6 3 6 6

- 1 . 4 2

0 . 1 5 6

- 3 . 8 4 8 0 6 1

. 6 1 9 3 1 7 5 2 9

- 3 . 0 2 3 4 7 3

1 . 0 9 5 4 9 5

- 2 . 7 6

0 . 0 0 6

- 5 . 1 7 6 8 2 4

- . 8 7 0 1 2 2 9

2 8

- 1 . 9 4 0 2 7 4

1 . 2 2 8 3 7 2

- 1 . 5 8

0 . 1 1 5

- 4 . 3 5 4 8 1 4

. 4 7 4 2 6 5 9 2 7

- 2 . 5 7 1 6 8 4

1 . 1 5 8 2 6 9

- 2 . 2 2

0 . 0 2 7

- 4 . 8 4 8 4 2 5

- . 2 9 4 9 4 2 5

2 6

- 3 . 5 6 6 2 1 1

1 . 1 3 9 2 8

- 3 . 1 3

0 . 0 0 2

- 5 . 8 0 5 6 2 7

- 1 . 3 2 6 7 9 5 2 5

- 2 . 0 2 4 7 5 9

1 . 1 4 7 1 5 2

- 1 . 7 7

0 . 0 7 8

- 4 . 2 7 9 6 4 9

. 2 3 0 1 3 1 6

2 4

- 2 . 0 3 0 1 9 3

1 . 1 6 0 2 6 8

- 1 . 7 5

0 . 0 8 1

- 4 . 3 1 0 8 6 4

. 2 5 0 4 7 7 6 2 3

- 2 . 4 6 9 6

1 . 1 3 9 3 5 6

- 2 . 1 7

0 . 0 3 1

- 4 . 7 0 9 1 6 7

- . 2 3 0 0 3 4 1

2 2

- 2 . 9 0 8 6 1 2

1 . 1 5 8 1 8 1

- 2 . 5 1

0 . 0 1 2

- 5 . 1 8 5 1 8 1

- . 6 3 2 0 4 2 9 2 1

- 3 . 2 4 5 3 0 1

1 . 0 9 1 8 9 2

- 2 . 9 7

0 . 0 0 3

- 5 . 3 9 1 5 7

- 1 . 0 9 9 0 3 3

2 0

. 2 0 3 1 0 0 7

1 . 1 6 0 3 2 8

0 . 1 8

0 . 8 6 1

- 2 . 0 7 7 6 8 8

2 . 4 8 3 8 8 9 1 9

- 2 . 0 0 7 6 1 1

1 . 1 8 8 4 5 5

- 1 . 6 9

0 . 0 9 2

- 4 . 3 4 3 6 8 8

. 3 2 8 4 6 5 4

1 8

- 1 . 9 7 3 2 5 6

1 . 1 8 0 7 8 6

- 1 . 6 7

0 . 0 9 5

- 4 . 2 9 4 2 5 9

. 3 4 7 7 4 6 9 1 7

- 1 . 6 1 3 9 2 4

1 . 1 7 3 4 6 8

- 1 . 3 8

0 . 1 7 0

- 3 . 9 2 0 5 4 1

. 6 9 2 6 9 3 8

1 6

- 4 . 1 8 7 3 6

1 . 1 7 5 8 3 8

- 3 . 5 6

0 . 0 0 0

- 6 . 4 9 8 6 3 6

- 1 . 8 7 6 0 8 3 1 5

- . 9 6 2 1 9 8 1

1 . 1 0 9 2 2 2

- 0 . 8 7

0 . 3 8 6

- 3 . 1 4 2 5 3 1

1 . 2 1 8 1 3 5

1 4

- 2 . 2 6 3 4 1 1

1 . 2 0 4 1 3 5

- 1 . 8 8

0 . 0 6 1

- 4 . 6 3 0 3 1

. 1 0 3 4 8 8 1 3

- 2 . 6 0 2 1 9 3

1 . 0 9 4 3 7 2

- 2 . 3 8

0 . 0 1 8

- 4 . 7 5 3 3 3 6

- . 4 5 1 0 5

1 2

- 1 . 1 0 1 5 6 6

1 . 1 9 0 7 6 8

- 0 . 9 3

0 . 3 5 5

- 3 . 4 4 2 1 9

1 . 2 3 9 0 5 8 1 1

- 2 . 4 1 7 3 2

1 . 1 5 7 6 8 9

- 2 . 0 9

0 . 0 3 7

- 4 . 6 9 2 9 2 2

- . 1 4 1 7 1 7 7

1 0

- 1 . 8 1 8 9 0 9

1 . 0 8 2 5 6 8

- 1 . 6 8

0 . 0 9 4

- 3 . 9 4 6 8 4 9

. 3 0 9 0 3 1 6 9

- . 3 0 7 6 6 6 4

1 . 2 3 3 1 6 3

- 0 . 2 5

0 . 8 0 3

- 2 . 7 3 1 6 2 4

2 . 1 1 6 2 9 1 8

- 1 . 7 9 4 1 6 4

1 . 1 1 7 0 6 3

- 1 . 6 1

0 . 1 0 9

- 3 . 9 8 9 9 0 9

. 4 0 1 5 8 0 6 7

- 4 . 3 3 5 4 7

1 . 1 5 5 7 7 2

- 3 . 7 5

0 . 0 0 0

- 6 . 6 0 7 3 0 4

- 2 . 0 6 3 6 3 6 6

- 2 . 7 2 2 3 8 3

1 . 1 5 7 6 8 3

- 2 . 3 5

0 . 0 1 9

- 4 . 9 9 7 9 7 3

- . 4 4 6 7 9 3 1 5

- 1 . 6 0 2 8 6

1 . 1 5 4 7 0 5

- 1 . 3 9

0 . 1 6 6

- 3 . 8 7 2 5 9 6

. 6 6 6 8 7 6 8 4

- 1 . 4 9 4 0 2

1 . 1 3 7 7 8 7

- 1 . 3 1

0 . 1 9 0

- 3 . 7 3 0 5 0 1

. 7 4 2 4 6 0 7 3

- 1 . 8 6 9 9 7 2

1 . 1 7 7 7 7 1

- 1 . 5 9

0 . 1 1 3

- 4 . 1 8 5 0 4 8

. 4 4 5 1 0 3 6 2

- 1 . 1 2 8 0 4 2

1 . 1 4 9 5 2 1

- 0 . 9 8

0 . 3 2 7

- 3 . 3 8 7 5 8 8

1 . 1 3 1 5 0 4 1

- 3 . 1 4 3 0 5 1

1 . 1 4 1 7 6

- 2 . 7 5

0 . 0 0 6

- 5 . 3 8 7 3 4 2

- . 8 9 8 7 5 9 1 i d

c h a r i t y

C o e f .

S t d . E r r .

t

P > | t |

[ 9 5 % C o n f .

I n t e r v a l ]

Redundant Fixed Effects Tests

Equation: Untitled

Test cross-section fixed effects

Effects Test Statistic d.f. Prob.

Cross-section F 21.614737 (46,419) 0.0000

Cross-section Chi-square 571.424174 46 0.0000

The coefficient labeled _cons reports the average of the 47 indicator variable coefficients, the

below F-statistic for the null hypothesis indicates that there is no significant difference between

the individual intercepts located at the bottom of the output which is the same as Eviews.

Random Effect Model

Individual effects that were modeled by fixed coefficients in the fixed effects model are treated

as random dranws from large population in the random effect model and they become part of the

error term. For estimation purposes, they become part of the error term, which is random error

term with mean 0 and variance𝜎𝜀2. We can write the different coefficient in this model as 𝑏1𝑖 =

𝑏1 + 𝜖𝑖 all coefficients have a common mean of the intercept and the difference between them is

reflected in the error term3


Method: Panel EGLS (Cross-section random effects)

F t e s t

t h a t a l l

u _ i = 0 : F ( 4 6 ,

4 1 9 ) =

2 1 . 6 1 P r o b

> F

= 0 . 0 0 0 0

r h o . 7 0 5 4 2 5 4 8

( f r a c t i o n o f

v a r i a n c e d u e

t o u _ i ) s i g m a _ e

. 6 7 8 4 5 9 7 9 s i g m a _ u 1 . 0 4 9 9 1 0 1

_ c o n s - 2 . 2 4 2 6 4 4

1 . 1 2 9 3 1 7 - 1 . 9 9

0 . 0 4 8 - 4 . 4 6 2 4 7 8

- . 0 2 2 8 1 0 3 m s . 1 1 0 5 3 1 8

. 2 5 8 5 2 3 3 0 . 4 3

0 . 6 6 9 - . 3 9 7 6 3 2 4

. 6 1 8 6 9 6 a g e . 1 4 6 0 8 9 9

. 2 0 6 6 5 5 0 . 7 1

0 . 4 8 0 - . 2 6 0 1 1 9 7

. 5 5 2 2 9 9 6 p r i c e . 3 4 9 4 5 6 3

. 1 2 4 1 0 4 6 2 . 8 2

0 . 0 0 5 . 1 0 5 5 1 1 1

. 5 9 3 4 0 1 5 i n c o m e . 8 3 9 8 9 2 8

. 1 1 1 4 7 7 9 7 . 5 3

0 . 0 0 0 . 6 2 0 7 6 7 1

1 . 0 5 9 0 1 8 c h a r i t y

C o e f . S t d .

E r r . t

P > | t | [ 9 5 %

C o n f . I n t e r v a l ]

c o r r ( u _ i , X b )

= 0 . 1 0 8 7

P r o b >

F =

0 . 0 0 0 0 F ( 4 , 4 1 9 ) =

1 5 . 5 9

o v e r a l l =

0 . 1 4 2 6 m a x

= 1 0 b e t w e e n

= 0 . 1 5 4 8

a v g =

1 0 . 0 R - s q :

w i t h i n =

0 . 1 2 9 6 O b s

p e r g r o u p :

m i n =

1 0

G r o u p v a r i a b l e :

i d N u m b e r

o f g r o u p s

= 4 7

F i x e d - e f f e c t s ( w i t h i n )

r e g r e s s i o n N u m b e r

o f o b s

= 4 7 0

Date: 11/22/14 Time: 03:31

Sample: 2000 2009


Cross-sections included: 47


Swamy and Arora estimator of component variances


C -2.419123 1.103968 -2.191298 0.0289

AGE 0.289275 0.198131 1.460022 0.1450

INCOME 0.853282 0.107906 7.907637 0.0000

MS 0.145714 0.223034 0.653327 0.5139

PRICE 0.361560 0.123757 2.921528 0.0037

Effects Specification

S.D. Rho

Cross-section random 0.967847 0.6705

Idiosyncratic random 0.678460 0.3295

Weighted Statistics



S.E. of regression 0.682115 Sum squared resid 216.3557



Unweighted Statistics


Sum squared resid 689.3364 Durbin-Watson stat 0.579938

The estimation of random effect by Stata

𝜃 = 1 − 𝑐𝑜𝑟𝑟(𝛿𝑖𝑡, 𝛿𝑖𝑠) = 1 −𝜎𝑗

2

𝜎𝑗2 + 𝜎𝑢

2=

𝜎𝑢2

𝜎𝑢2 + 𝜎𝑗

2 =0.962

0.962 + 0.672= 0.6705.

This is the correlation of two components errors

The Wald test in the other hand is that test to see whether all the coefficients are different from

zero or not.

Interpretation of the coefficients is tricky since they include both the within-entity and between-

entity effects; in this case, it represents the average effect of X over Y when X changes across

time and between entities by one unit.

Theta represents the fraction (or the weight) of mean that is subtracted from each variable in the

quasidemeaning transformation in the one-way random effect model.

Theta = 1 −𝜎𝑗

√𝜎𝑗+𝑡𝜎𝑢.

What should we use random effect or fixed effect: the houseman test

To decide between fixed or random effects you can run a Houseman test where the null

hypothesis points out the preferred model, which is random effects vs. the alternative fixed

effect, it basically tests whether the unique errors (ui) are correlated with the regressors, and the

null hypothesis is they are not.

rho .67051107 (fraction of variance due to u_i)

sigma_e .67845979

sigma_u .96784693

_cons -2.419123 1.109916 -2.18 0.029 -4.594519 -.243728

ms .1457141 .2242357 0.65 0.516 -.2937798 .5852079

age .2892753 .1991982 1.45 0.146 -.101146 .6796966

price .3615598 .1244239 2.91 0.004 .1176935 .6054261

income .8532822 .1084875 7.87 0.000 .6406507 1.065914

charity Coef. Std. Err. z P>|z| [95% Conf. Interval]

theta = .78357834

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

Wald chi2(4) = 70.40

overall = 0.1575 max = 10

between = 0.1799 avg = 10.0

R-sq: within = 0.1287 Obs per group: min = 10

Group variable: id Number of groups = 47

Random-effects GLS regression Number of obs = 470

The output of the test shows the common coefficients to both models and their estimated

difference the column labeled SE is the standard error of the difference, so calculation of the T-

statistic for the coefficient on south is for example if we take the variable income

ℎ =(�̂�𝑓𝑒 − �̂�𝑟𝑒)2

𝑣𝑎𝑟�̂�𝑓𝑒 − 𝑣𝑎𝑟�̂�𝑟𝑒

=((0.84 − 0.85)2

(0.111)2 − 0.1082)=

0.013

0.0256= 0.51

bibliography

damodar gujarati dawn c.porter basic econometrics fifth edition mc graw-hill 2009

ghris brooks Introductory Econometrics for Finance The ICMA Centre, Henley Business

School, University of Reading third edition 2014

ben lambert youtube channel https://www.youtube.com/user/SpartacanUsuals

.

( V _ b - V _ B i s

n o t p o s i t i v e

d e f i n i t e ) P r o b > c h i 2

= 0 . 0 7 4 7

= 8 . 5 1

c h i 2 ( 4 ) =

( b - B ) ' [ ( V _ b - V _ B ) ̂ ( - 1 ) ] ( b - B )

T e s t : H o :

d i f f e r e n c e i n

c o e f f i c i e n t s n o t

s y s t e m a t i c

B =

i n c o n s i s t e n t u n d e r

H a , e f f i c i e n t

u n d e r H o ;

o b t a i n e d f r o m

x t r e g b

= c o n s i s t e n t

u n d e r H o

a n d H a ;

o b t a i n e d f r o m

x t r e g m s

. 1 1 0 5 3 1 8 . 1 4 5 7 1 4 1

- . 0 3 5 1 8 2 3 . 1 2 8 6 5 7 1

a g e . 1 4 6 0 8 9 9

. 2 8 9 2 7 5 3 - . 1 4 3 1 8 5 3

. 0 5 5 0 1 2 3 p r i c e

. 3 4 9 4 5 6 3 . 3 6 1 5 5 9 8

- . 0 1 2 1 0 3 5 .

i n c o m e . 8 3 9 8 9 2 8

. 8 5 3 2 8 2 2 - . 0 1 3 3 8 9 4

. 0 2 5 6 4 7 7 f e

r e D i f f e r e n c e

S . E . ( b )

( B ) ( b - B )

s q r t ( d i a g ( V _ b - V _ B ) ) C o e f f i c i e n t s

https://www.youtube.com/user/SpartacanUsuals

Introduction into Panel Data Regression Using Eviews · PDF fileIntroduction into Panel Data...

Documents

Transcript of Introduction into Panel Data Regression Using Eviews · PDF fileIntroduction into Panel Data...