Juselius - Basic Time Series

8/12/2019 Juselius - Basic Time Series

1/121

Chapter 1

Introduction

The purpose of this course is to:

Teach econometric principles for the analysis of macroeconomic data.

Discuss the link between theory models and empirical macrodata.

Provide the tools to read applied (and theoretical) econometrics papers.

Develop the skills to read and use matrix algebra Provide the tools to critically evaluate empirical results (PcGive and

PcNaive).

Provide the tools to perform moderately demanding empirical analyses.

The guiding principle is learning by doing! It is, therefore, important to

actively participate in the exercises which are all based on real data sets.Half of the exam questions will be related to the exercises.

1.1 The dynamic linear regression model fortime series data


2/121

2 CHAPTER 1. INTRODUCTION

and the covariances over time, some of which cannot be tested because wehave just one realization per time period. This is contrary to panel dataanalysis where we have several observations available per time period. Totest constancy of parameters (i.e. constancy of covariances of the data overtime) we often need quite long time series. This means that it is almost

impossible to know whether some macroeconomic mechanisms have changedas a result of a change in a regime, for example, as a result of adopting theEuro. Thus, when interpreting the results from macroeconometric models itis important to have a realistic sense for the reliablity of the results.

Random variable with one observation (realization) per time period.

Consequtive realizations of the random variable usually strongly time

dependent Simplifying assumptions regarding the mean, variance, and covariances

necessary for statistical inference.

The notation in econometrics is far from standardized and it is importantfrom the outset to get used to the fact that di ff erent authors use di ff erentnotations for the same concepts. Though there will be occasional exceptionsthe following notation will generally be used during the course:

In MV (Marno Verbeek) Y denotes a random variable and y a realizationof the random variable. For time series data we often use the notation yt bothfor a random variable and its realization. It is also customary to use capitalletters (for example C t ) to denote a variable before the log transformationand lower case letter (for example ct ) to denote lnC t .

In the following we use the notation:

yt is the dependent/endogenous/ variable (or the regressand),x i,t is an explanatory/exogenous variable (or a regressor) i is the theoretical regression coe ffi cientbi or i is an estimate of i , whereas the formula (for ex. b = ( X

0X ) 1X 0y)is called an estimator.

2 or are used to denote the theoretical variance of y The former is


3/121

1.1. THE DYNAMIC LINEAR REGRESSION MODEL FOR TIME-SERIES DATA 3

1.1.1 A single time series process

To begin with we will look at a single variable observed over consecutive timepoints and discuss its time-series properties. Let ys,t , j = 1,...,N, t = 1,...,T

describe realizations of a variable y observed over T time periods. WhenN > 1 the observations could, for example, come from a study based on paneldata or they could have been generated from a simulation study of a timeseries process yt , in which the number of replications are N . Here we will focuson the case when N = 1, i.e. when there is just one realization (y1,...,yT ) onthe index set T . Since we have just one realization of the random variable yt ,we cannot make inference on the shape of the distribution or its parameter

values without making simplifying assumptions. We illustrate the diffi

cultieswith two simple examples in Figures 1 and 2.

y(t)

6

r

y1

r

y2

r Q Q Q Q Q

y3

r

y4

r

S S

S S S S

y5

r y6


4/121


y(t)

T

6

-

1

r

y1

1

2

r

y2

2

3

r Q Q Q Q Q

y3

3

4

r

y44

5

r

S S

S S

S S

y5

5

6

r y66

Figure 2. E (yt ) = t , V ar(yt ) = 2y , t = 1,.., 6

In the two examples, the line connecting the realizations yt producesthe graph of the time-series. For instance in Figure 1 we have assumed

that the distribution, the mean value and the variance is the same for eachyt , t = 1,...,T . In gure 2 the distribution and the variance are identical,but the mean varies with t. Note that the observed time graph is the samein both cases illustrating the fact, that we often need rather long time seriesto be able to statistically distinguish between di ff erent hypotheses in timeseries models.

To be able to make statistical inference we need:

(i) a probability model for yt , for example the normal model(ii) a sampling model for yt , for example dependent or independent drawings

For the normal distribution, the rst two moments around the mean are su ffi -i d ib h i i i h d Wi h i lif i i


5/121


E [y ] = E

y1y2...yT

=

12...

T

=

Cov[y ] = E [y E (y )][y E (y )]0 =

11 .0 12.1 13.2 1T.T 121.1 22.0 23.1 2T.T 231.2 32.1 33.0 3T.T 3

... ...

... . . . ...

T 1.T 1 T 2.T 2 T 3.T 3 T T. 0

=

y =

y1y2...yT

N ( , )

Because there is just one realization of the process at each time t, there isnot enough information to make statistical inference about the underlyingfunctional form of the distribution of each yt , t T and we have to make

simplifying assumptions to secure that the number of parameters describingthe process are fewer than the number of observations available. A typicalassumption in time series models is that each yt has the same distributionand that the functional form is approximately normal. Furthermore, giventhe normal distribution, it is frequently assumed that the mean is the same,i.e. E (yt ) = y , fort = 1,...,T, and that the variance is the same, i.e.E (yt )2 = 2y , for t = 1,...,T .

1.1.2 A vector process

We will now move on to the more interesting case where we observe a variableyt (the endogenous variable) and k explanatory variables xi,t , i = 1,...,k .In this case we need to discuss covariances between the variables {y x i } at


6/121


z t =

ytx1,tx2,t

..

.xk,t

, t = 1,...,T,

and introduce the following notation:

E [z t ] =

y,tx 1 ,tx 2 ,t

...x k ,t

= t ,

Cov[z t , z t h ] =

y t ,y t h y t ,x 1 t h y t ,x 2 t h y t ,x kt h

x 1 t ,y t h x 1 t ,x 1 t h x 1 t ,x 2 t h x 1 t ,x kt hx 2 t ,y t h x 2 t ,x 1 t h x 2 t ,x 2 t h x 2 t ,x kt h...

... ... .. .

...x kt ,y t h x kt ,x 1 t h x kt ,x 2 t h x kt ,x kt h

= t.h

t = 1,...,T.

for the case when no simplifying assumptions have been madeWe will nowassume that the same distribution applies for all z t and that it is approxi-mately normal, i.e. z t N ( t , t ). Under the normality assumption the rsttwo moments around the mean (central moments) are su ffi cient to describethe variation in the data. We introduce the notation :


7/121


where Z is a (k + 1) T 1 vector. The covariance matrix is given by

E [(Z )(Z )0] =

1.0 0

2.1 0

T 1.T 2 0

T.T 1 2.1 2.0 0T.T 2

... ... .. .

... ...

T 1.T 2 ... T 1.0 0T. 1 T.T 1 T.T 2 T. 1 T. 0

= T (k +1) T (k +1)

where t.h = Cov(zt , zt h ) = E (zt t )(zt h t h )0. The above notation

provides a completely general description of a multivariate vector time seriesprocess. Since there are far more parameters than observations available forestimation, it has no meaning from a practical point of view. Therefore, wehave to make simplifying assumptions to reduce the number of parameters.Empirical models are typically based on the following assumptions:

t.h = h , for all t T, h = ..., 1, 0, 1,...

t = for all t T.

These two assumptions are needed to secure parameter constancy in thedynamic regression model to be subsequently discussed. When the assump-tions are satis ed we can write the mean and the covariances of the datamatrix in the simpli ed form:

=

...

, =

0 0

1 0

2 0T 1 1 0

01 . . . ...

2 1 0. . . 02

... . . . . . . . . . 01 T 1 2 1 0


8/121


De nition 1 Let {y t } be a stochastic process (an ordered series of random variables) for t = ..., 1, 1, 2,.... If

E [y t ] = < for all t,E [y t ]2 = 2 < for all t,

E [(y t )(y t + h )] = .h < for all t and h = 1, 2,...

then {y t } is said to be weakly stationary. Strict stationarity requires that the distribution of (y t 1 ,..., y t k ) is the same as (y t 1 + h , ..., y t k + h ) for h = ..., 1, 1, 2,....

1.1.3 An illustration:

The data set is de ned by [crt , yrt , wrt , R b,t , pt , ph,t pc,t ], t = 1973:1, ..., 2003:1,where

crt = ct pt is a measurement of real private consumption at time t, wherect is the log of nominal conumption expenditure in Denmark and pt is thelog of the implicit consumption de ator,

yrt is the log of real domestic expenditure, GNE,

Rb,t is the 10 year government bond rate,

pt is the quarterly in ation rate measured by the implicit consumptionde ator, and

ph t pc t is the log diff erence between the house price de ator and the


9/121

1.1. THE DYNAMIC LINEAR REGRESSION MODEL FOR TIME-SERIES DATA9

1970 1980 1990 2000

6.0

6.2

LrC

1970 1980 1990 2000

-0.05

0.00

0.05

0.10DLrC

1970 1980 1990 2000

6.6

6.8

7.0 LrY

1970 1980 1990 2000

0.00

0.05DLrY

1970 1980 1990 2000

8.2

8.4

LrW

1970 1980 1990 2000

-0.025

0.000

0.025DLrW

Figure 3: The graphs of real consumption, real income, and real wealth inlevels and di ff erences.

A visual inspection reveals that neither the assumption of a constantmean or a constant variance seem appropriate for the levels of the variables,whereas the di ff erenced variables look more satisfactory in this respect. If themarginal processes are normal then the observations should lie symmetricallyon both sides of the mean. This seems approximately to be the case for thediff erenced variables, although most of the variables seem to have a highervariance in the seventies and the beginning of the eithies than in the morerecent EMS period. Moreover, there seem to be some outlier observations

in most of the series and the question is whether these observations are too faraway from the mean to be considered realizations from a normal distribution.It is generally a good idea to have a look in the economic calendar to ndout if the outlier observations can be related to some signi cant economicinterventions or reforms.

For example the outlier observation in real consumption real income and


10/121


1970 1980 1990 2000

0.02

0.04

Rb

1970 1980 1990 2000

-0.005

0.000

0.005 DRb

1970 1980 1990 2000

0.00

0.02

0.04DLPc

1970 1980 1990 2000

-0.025

0.000

0.025

0.050DDLPc

1970 1980 1990 2000

-0.2

-0.1

0.0

0.1LphLPc

1970 1980 1990 2000

0.000

0.025

0.050DL phLPc

Figure 1.1: Figure 4: The graphs of the bond rate, in ation rate, and relativehouse - consumption prices in levels and di ff erences.


11/121


is related to the lifting of previous restrictions on capital movements and thestart of the hard EMS in 1983.

These are realistic examples that point at the need to include additionalinformation on interventions and institutional reforms in the empirical model

analysis. This can be done by including new variables measuring the e ff ect of institutional reforms, or if such variables are not available by using dummyvariables as a proxy for the change in institutions.

At the start of the empirical analysis it is not always possible to knowwhether an intervention was strong enough to produce an extraordinaryeff ect or not. Essentially every single month, quarter, year is subject to some

kind of political interventions, most of them have a minor impact on the dataand the model. Thus, if an ordinary intervention does not stick out as anoutlier, it will be treated as a random shock for practical reasons. Majorinterventions, like removing restrictions on capital movements, joining theEM S , etc. are likely to have much more fundamental impact on economicbehavior and, hence, need to be included in the systematic part of the model.Ignoring this problem is likely to seriously bias all estimates of our model and

result in invalid inference.

It is always a good idea to start with a visual inspection of the dataand their time series properties as a rst check of the assumptions of thelinear regression model. Based on the graphs we can get a rst impression of whether xi,t looks stationary with constant mean and variance, or whetherthis is the case for xi,t . If the answer is negative to the rst question, but

positive to the next one, we can solve the problem by respecifying the modelin error-correction form as will be subsequently demonstrated. If the answeris negative to both questions, it is often a good idea to check the economiccalendar to nd out whether any signi cant departure from the constantmean and constant variance coincides with speci c reforms or interventions.The next step is then to include this information in the model and nd out


12/121


1.1.4 Descriptive statistics: de nitions and estimators

In this section we give the de nitions of the statistics calculated by the pack-age Descriptive Statistics in Pc-Give.

The mean of a random variable, x j , is de ned by

E (x j ) = Xi f i x j,i = jwhere f i is the probability density function.

Given a sample of T observations of the variable, x j , the sample meanis calculated as:

x j = 1

T

T

Xt =1

x j t .

In the case of a set of variables, x1, ...x k , the (k 1) vector of samplemeans is given by:

x =

x1x2...

xkNote that the sum of the deviations from the mean is always zero. Notealso that the sample mean is sensitive to outlying observations, which donot belong to the assumed probability distribution and can in such cases bemisleading.

The variance of xi is the expected value of the squared deviations fromthe mean:

V ar(x i ) = E (x i i )2 = E (x2i ) E (xi )2 = Xi f i (x i i )2 = 2iAn unbiased estimator of the sample variance is given by:


13/121


The covariance of two variables, xk and x j , is de ned as:

cov(xk , x j ) = E (xk k )(x j j ) = E (xk x j ) k j = Xi f i (xk,i k )(x j,i j ) = kjand an unbiased estimator is given by:

ccov(xk , x j ) = kj = skj = 1T 1

T

Xt =1 (xk t xk )(x j t x j )A positive ( i.e., upward-sloping) linear relationship between the variableswill give a positive covariance, and a negative ( i.e., downward-sloping) linearrelationship gives a negative covariance.

The variance-covariance matrix of a set of variables {xi , x j , xk } isgiven by:

x. 0 =var (xi ) cov(x i , x j ) cov(xi , xk )

cov(x i , x j ) var (x j ) cov(x j , xk )cov(x i , xk ) cov(x j , xk ) var (xk )

=ii ij ik ji jj jkki kj kk

with the variances on the diagonal and the covariances on the o ff -diagonal.The subscript 0 in x. 0 shows that the covariances have been calculated basedon current values (but not lagged values) of the variables, i.e. h = 0.

The sample standard deviation of the variable, xi , is given by:

bi =

q b2i = v

uut 1

T 1

T

Xt =1(xi t x i )2

The sample correlation coe ffi cient between two variables, xi and x j , isgiven by:

P(x x )(x x )


14/121


=

1T 1

T Pt =1 (xi t xi )(x j t x j )q 1T 1 Pt (x i t xi )2q 1T 1 Pt (x j t x j )2

which reduces to:

r ij = iji j

(1.2)

The correlation coe ffi cient measures the strength of the linear relationshipbetween the variables. Perfect negative and positive linear relationships are

indicated by r = 1 and r = 1, respectively, and a value of r = 0 indicates nolinear relationship. It interpretation is strictly limited to linear relationships.The estimated correlation matrix, i.e. the standardized covariance matrix

0, for the Danish consumption data (output from the PcGive DescriptiveStatistics package):

Means, standard deviations and correlations (using consumption.in7)The sample is 1973 (1) - 2003 (1)Means

LrC LrY LrW Rb DLPc6.1337 6.7752 8.2741 0.029070 0.013802

Standard deviations (using T-1)LrC LrY LrW Rb DLPc

0.11663 0.13811 0.13517 0.011418 0.011389Correlation matrix:

LrC LrY LrW Rb DLPcLrC 1.0000 0.98299 0.97384 -0.86493 -0.70119LrY 0.98299 1.0000 0.97986 -0.87192 -0.64968LrW 0.97384 0.97986 1.0000 -0.88104 -0.67227


15/121

1.2. JOINT, MARGINAL AND CONDITIONAL PROBABILITIES 15

1.2 Joint, marginal and conditional probabil-ities

First we give a repetition of the simple multiplicative rule to calculate jointprobabilities, and then the formulas for calculating the conditional and mar-

ginal mean and the variance of a multivariate normal vector zt .Repetition :***********************************************An illustration of the multiplicative rule for probability calculations based

on four dependent events, A, B, and C :

P (A B C ) = P (A|B C )P (B C )= P (A|B C )P (B |C )P (C )

Note that a multiplicative formulation has been achieved for the conditionalevents, even if the events themselves are are not independent.

The general principle of the multiplicative rule for probability calculationswill be applied in the derivation of conditional and marginal distributions.

Consider rst two normally distributed random variables yt and x t with the joint distribution:

z t N ( , )

z t = ytx t , E [z t ] = yx , Cov ytx t ytxt 0

= yy yx xy xx The marginal distributions for y1,t and y2,t are given by

yt N (y , yy )xt N (x , xx )

The conditional distribution for yt |xt is given by

(yt |xt ) N (y.x , yy.x )


16/121


and yy.x = yy yx

1xx

xy (1.4)

The joint distribution of zt can now be expressed as the product of theconditional and the marginal distribution:

P (yt , x t ;

) | {z } the joint distribution = P (yt |x t ;

1) | {z } the conditional distribution P (x

t ;

2) | {z } the marginal distribution (1.5)The linear regression model:

yt = 0 + 1xt + t

corresponds to the conditional expectation of yt for given values of xt (or

alternatively when keeping xt xed.


17/121

Chapter 2Estimating the Standard LinearModel

2.1 The Assumptions of the Linear Regres-sion Model

The linear regression model , in matrix notation, can either be written as:

yt = 0x t + t , t = 1,...,T

where is a k 1 vector of coeffi cients, x t is a k 1 vector of explanatoryvariables, including a constant, or in more compact form as:

y = X + (2.1)

where y is an (T 1) vector, X is an (T k) matrix of which the rst columnis 1s, is a (k 1) vector, and is a (T 1) vector.

Estimation of the standard linear model by the method of ordinaryleast squares (OLS) is motivated by the Gauss-Markov theorem whichstates that the OLS estimators are best linear unbiased estimators ( b.l.u.e. ).Least-squares estimators are the best in the sense that among the class of


18/121

2 CHAPTER 2. ESTIMATING THE STANDARD LINEAR MODEL

3. V ar(t ) = 2 , t = 1,...,T, i.e. error term has a constant variance.

4. Cov(t , t h ) = 0 , t = 1, ..., T, h = ... 1, 0, 1,...

5. t Nid(0, 2 ) or N (0, 2 I )

6. 1

T (X 0

T X

T ) T

xx

The normality assumption (5.) is not needed for the Gauss-Markov the-orem to deliver estimators of minimum variance amongst the class of alllinear estimators, but it is required for tests of hypotheses (inference) in theestimated models unless the sample size is very large.

2.1.1 Derivation of the OLS EstimatorThe idea of the OLS estimation is to nd an estimate of , i.e., b , with theproperty that the sum of the squared residuals from the estimated model:yt =

0

x t + et , t = 1,...,T

or

y = X + eis minimized, i.e. such that Pe2t = e 0e is minimized. Based on the estimatedvalue, , the estimated, or predicted, value for y , is:

by = X b The residuals in the vector, e, is then seen to be the deviations of the esti-mated by s from the actual y s: e = by

y .

In order to nd the OLS estimator we rst express e as:


19/121

2.1. THE ASSUMPTIONS OF THE LINEAR REGRESSION MODEL 3

To minimize the sum of squared residuals, take the derivative of e 0e withrespect to b and set it equal to zero: (e 0e )

b

= 2X 0y + 2X 0X

b = 0

which yields(X 0X ) b = X

0y .

If the matrix X has full rank the design matrix , X 0X is invertible andwe can nd the OLS estimator as:

b = ( X 0X ) 1X 0y . (2.2)

To derive the variance of the OLS estimate b we insert the value of yfrom (2.1) in (2.2): b = ( X

0X ) 1X 0(X + )= ( X 0X ) 1(X 0X ) + ( X 0X ) 1X 0= + ( X 0X ) 1X 0

(2.3)

and

E ( b ) = + E (X0X ) 1X 0

= + ( X 0X ) 1E (X 0 )

Under the assumption A.2:E (X 0 ) = 0

the OLS estimator is unbiased.To derive the standard error of estimate we rst express the deviation

of the OLS estimate from the true value (see (2.3)) as:


20/121


By assumption A.5 E ( 0) = 2 I T and we obtain:

var ( b ) = 2 = ( X

0X ) 1X 02I T X (X0

X ) 1

= 2(X 0X ) 1

The normality assumption implies

b is a linear combination of normally

distributed variables. Since we know its mean and variance, it can be con-cluded that

b N ( , 2(X 0X ) 1).

2.1.2 The Estimated Residual VarianceThe OLS residual, e , is connected to the error term, , in the following way:

e0

e = 0

M ,where M = I X (X 0X ) 1X 0 is an idempotent matrix of reduced rank, (T k).

An unbiased estimator of the residual error variance , b2 , is:

b2 =

0M T k

= e 0eT k

,

or, equivalently,

b = 1T k

T

Xt=1 et 2 = RSS (T k) ,where RSS stands for the residual sum of squares . Thus, note that

RSS = ( T k)

b2 .

The square-root of the estimated residual variance is the standard error of

the regression estimate , b , calculated by

b = s RSS (T k)

Finally, as will be shown in the next chapter, the quadratic form, 0M ,


21/121


2.1.3 The Analysis of VarianceThe total variation of the observed y s (TSS ) can be decomposed into theexplained sum of squares ( ESS ) and the residual sum of squares ( RSS ):

T

Xt=1 (yt

y)2

=

T

Xt=1 ( byt

y)2

+

T

Xt=1 (yt

byt )

2

or

T SS = ESS + RSS.

Thus:

TSS =

T

Xt=1 (yt y)2

ESS =T

Pt=1 ( byt y)2 = b

T

Pt=1 (xt x)(yt y)RSS =

PT t=1 (yt

byt )2 =

T

Pt=1

e2t

Note that the OLS estimation procedure implies that mean of e is zero byconstruction.

To show the variance decomposition in matrix notation, we rst transformthe linear regression model, into deviation form , so that the variables areexpressed as deviations from their means using the transformation matrix ,A :

A = I T 1T ii0

where i is a vector of 1s, and IT is the (T T ) identity matrix. It is nowuseful to distinguish between the constant term and the remaining regressors 1 = [ 1,..., k ] in the regression model:


22/121


Because the mean of a constant is equal to a constant, A 0 is zero and the rst term drops out. Furthermore, Ae = e , because e = 0, and the regressionmodel becomes:

Ay = AX 1 + e

Squaring the model in deviation form gives:

y 0Ay = 0

1X0AX 1 + e

0e

which gives the decomposition into:

T SS = ESS + RSS

With the variance decomposition we can form the ratio of the explainedsum of squares to the total sum of squares, ESS/TSS, which is a measureof the goodness-of- t of the linear regression model:

R2 = ESS T SS

= 1 RSS T SS

When the assumptions of the regression model are satis ed a high value of R2

is generally an indication of a high explanatory power of the model. However,R2 can be completely misleading when the assumption A.6 no longer holds,i.e. when the data are not stationarity. (If the data are nonstationary,i.e. when the data are trending, the assumption A.6 is violated, as thedesign matrix( 1T X

0

T X T ) does not converge to a constant matrix. Thiswill be discussed later on in the course). The reason is that R2 essentiallycompares the tted values of the regression model, yt , with the average value,y. Since the average value is a very poor statistic as a description of a trendingvariable adding almost any variable to the model is likely to increase ESS substantially. For example a linear time trend will often give a high R2 evenif a time trend does not really explain anything.

When the data are strongly time dependent R2 can still be relevant asa measure of explanatory power but only in a model explaining yt (for


23/121


(Note again that for the this measure to be relevant yt need to be astationary variable.) When adding more variables to the regression modelthe R2 statistic tends to increase, irrespective of whether or not the newvariables have signi cantly improved the explanatory power of the model.The adjusted R2, or R2, corrects for the number of explanatory variables inthe following way:

R2

= 1 RSS/ (T k)TSS/ (T 1)

= 1 b2

2y.

Both R2

and R2 measures only the explanatory power of the informationset as a whole and says nothing about the relative contribution of any oneindividual explanatory variable. The squared partial correlation coef- cient r2yx.z measures the partial e ff ect between the dependent variable,y , and one of the explanatory variables, x , when the eff ect of the remainingk 1 variables, z ,have been corrected for.

To illustrate the meaning and the calculation of the partial correlationcoeffi cient we assume the following regression model:

yt = y0.123 + y1.23x1t + y2.13x2t + y3.12x3t + t .

We would now like to know how much x3t is able to improve the explanationof yt when we have corrected for the in uence of x2t and x3t . To answer thiswe need to perform three auxiliary regressions:

yt = b y0.23 + b y2.3x2t + b y3.2x3t + e1t ,andx1t = b x 1 0.23 + b x 1 2.3x2

t + b x 1 3.2x3t + e2t .

The residual from the rst regression, e1t measures what is left of the variationof yt after having removed the variation explained by x2t and x3t . With otherwords we have cleaned yt of the eff ects of the other explanatory variables.The resid al from the second regression does the same ith : It cleans


24/121


The squared partial correlation coe ffi cient can now be de ned as:

r 2yx 1 .x 2 ,x 3 = ESS ATSS A

where ESS A and TSS A are, respectively, the explained sum of squares and

total sum of squares from the third auxiliary regression (2.4). Adding orremoving explanatory variables to the regression model is likely to changethe squared partial correlation coe ffi cient. If the coe ffi cient falls when addinga variable, the two variables are probably collinear, i.e. the new variable isa substitute rather than a complement to the variables already included inthe model.

The partial r 2 can also be calculated from the t-values of the coeffi cientsin the original regression based on the following formula:

part. r 2 = t2i

t2i + ( T k),

where t i is the t-value of the coeffi cient yx.z in the original regression model:

t i =

b i

b i

,

where b i is the OLS estimator of the ith coeffi cient, and b i is the estimatedstandard error.


25/121

2.2. MULTICOLLINEARITY AND OMITTED VARIABLES 9

2.2 Multicollinearity and Omitted VariablesMulticollinearity and omitted variables are both concerned with the structureof the X matrix. Though their e ff ect on the obtained estimates can be serious,in general neither of them violates the classical assumptions of the standardlinear model.

2.2.1 MulticollinearityInitially, the term multicollinearity was used to refer to linear dependenciesamong three or more regressors, while collinearity meant a high correlationbetween a pair of them, but the term multicollinearity has come to coverboth cases. A perfect linear relationship between some of the x variables is

seldom the case (except when certain transformations of the variables resultsin exact linear dependencies among the variables) some re one has fallen intothe dummy variable trap)dependencies and the multicollinearity problem isone of degree. Thus, when we speak of multicollinearity it usually meansa high linear correlation between two or more of the regressors. One wayof investigating multicollinearity is to examine the correlation coe ffi cients of the variables. A correlation matrix can often highlight which (if any) of the

regressors are highly collinear.The eff ects of multicollinearity are not as serious as other conditions whichviolate the assumptions of the standard linear model. The coe ffi cient es-timates are not biased in the presence of multicollinearity, and the OLSestimators are still b.l.u.e. There is, however, an important e ff ect on infer-ence. The standard errors of the estimates usually increase in the presenceof multicollinearity, lowering the t-values and making signi cant individualcoeffi cient estimates appear insigni cant. This raises the probability of toooften accepting the null hypothesis of a zero coe ffi cient when it should berejected. Multicollinearity, therefore, means that the individual e ff ects of the correlated variables cannot be disentangled. Thus, an F-test of the jointsigni cance of the same variables may indicate signi cance, even when thet ratios do not Therefore one should be cautious to remove variables exclu


26/121


is a problem the underlying cause is that Assumption A.6 is violated, i.e. thedata are nonstationary instead of stationary. In this case the remedy is totransform the model so that stationarity is recovered. This will be discussedlater on in the course.

2.2.2 Omitted Relevant VariablesThe problem of omitted variables is more fundamental. If the omitted vari-able(s) are relevant for the explanation of the variation in the dependentvariable, their e ff ects will be contained in the residual. This is likely to resultin ineffi ciency (in the sense that the estimated residual variance will be over-estimated) and, if the omitted variables are not orthogonal to the includedregressors, in omitted variables bias.

Assume the true model isy = X x + Z z + , (2.5)

but the economist estimates the model

y = X x + 1. (2.6)

The OLS estimator:

x = ( X0

X ) 1X 0y (2.7)= x + ( X

0X ) 1X 0 1

We now substitute y in (2.7) with (2.5) and obtain an expression of the OLSestimator

x as a function of the omitted variable(s) Z

x = ( X0X ) 1X 0y

= ( X 0X ) 1X 0X + ( X 0X ) 1X 0Z z + ( X0X ) 1X 0

+ ( X 0X ) 1X 0Z


27/121

2.2. MULTICOLLINEARITY AND OMITTED VARIABLES 11

i.e. even if the assumption A.1, E (X 0 ) = 0 , holds true, the OLS estimate x will be biased if the omitted variables Z are correlated with the includedvariable X , i.e. unless E (X 0Z ) = 0 . This can clearly give rise to seriousinterpretational problems, as economic models usually only contain a subsetof all relevant variables (because of the ceteris paribus assumption everythingelse equal) and because economic variables are usually strongly correlated.

To demonstrate the e ff ect of omitted variables on the residual error vari-ance we rst note that the residual sum of squares can be expressed as

RSS = y 0My .

where M = I X (X0

X ) 1X 0 is de ned in terms of the economists model(2.6). Substituting y for the true model value (2.5) gives

RSS = ( X x + Z z + )0M (X x + Z z + ),

which, since MX = 0, reduces to

RSS = ( Z z + )0M (Z z + )

= 0M + 0z Z0MZ z + 2

0

z Z0M ,

which, upon taking expectations, isE (RSS ) = E { 0M + 0z Z

0MZ z + 20

z Z0M } .

Because E (M ) = 0 and E ( 0M ) = 2 (T k), where k is the number of variables in X , so

E (RSS ) = 2 (T k) + ( 0

z Z0MZ z ),

andE RSS T k= 2 1 = 2 + 1T k ( 0z Z 0MZ z )

which shows that the residual variance of the economist model (2.6) is larger


28/121

Problems typical of time-series econometrics

Only one realization per time unit

Strong time dependence Many simplifying assumptions on the constancy of mean

and covariances Many observations are needed to check these

assumptions Economic mechanisms often change as a result of

changes in regime


29/121


30/121


31/121


32/121

The Danish consumption function


33/121


34/121


35/121


36/121

Notation


37/121


38/121


39/121

Estimation in the linear regression model


40/121

The assumptions of the linear regression

model

The properties of the OLS estimator


41/121

The properties of the OLS estimator


42/121


43/121


44/121


45/121


46/121

Statistical properties of estimators

Unbiasedness

Consistency (asymptotic property) Efficiency

Problems: Multicollinearity Time dependence


47/121


48/121

Multicollinearity and omitted variable bias


49/121

Chapter 4

Evaluating the Standard LinearModel

The linear regression model is easy to estimate and is, therefore, one of themost used empirical models in economics. Though simplicity is a desirableproperty, the estimated results are not necessarily reliable unless the basicassumptions underlying the regression model are valid. Therefore, these as-sumptions should always be checked for its empirical consistence The nal


50/121

2 CHAPTER 4. EVALUATING THE STANDARD LINEAR MODEL

1. A linear model of the form: yt = x 0t + t ,

2. with disturbances of mean zero, E ( t ) = 0 ,

3a. constant (homoscedastic) error variance,

(important for e ffi ciency) E (2t ) =

2 ,

3b. independent errors,(important for correct inference) E ( t s ) = 0 , t 6= s

4.

The X matrix has full rank,(otherwise estimation breaks down)

Cov (x i , x j ) preferably as small as possible,(important for precise estimates of i)

rank (X ) = k,

multicollinearity

5. X is non-stochastic, or stochastic butuncorrelated with the s,(gives unbiased estimates)

E (X 0 ) = 0 ,

6. The errors are normal(important for inference)

t Niid (0, ).

7. The x variables are stationary,(important for consistency) P lim(X 0T X T )

4.1 Speci cation and Misspeci cation Tests

It is customary to speak (interchangeably) about speci cation tests ormisspeci cation tests. Though the distinction between the two is notalways clear one could, nevertheless, claim that it lies in the speci city of thealternative: In speci cation tests one model is tested against another i e a


51/121

4.2. HETEROSCEDASTICITY 3

of whether the linear form of the model is adequate or not using a batteryof diff erent tests belongs to this category. Often a mis-speci cation test maytell us that our model does not satisfy (some of) the basic assumptions butdoes not necessarily give us a precise information on how to respecify themodel.

Clearly, testing for mis-speci cation should logically precede speci cation

testing, because we are examining empirically the adequacy of the assump-tions of the standard linear model, on which the validity and power of infer-ence depends. When the model is misspeci ed, the eff ect of misspeci cationcan usually be seen in the estimated residuals, and most misspeci cation testsare about checking whether the residuals are approximately homoscedastic,are not autocorrelated, and approximately normal. It should, however, benoted that rejection of normally distributed residuals, for example as a result

of outliers, is likely to in

uence the test of residual heteroscedasticity and thetest of no autocorrelation. Therefore, the order of the testing is not irrele-vant. It is often a good idea to correct the speci cation of the model for themost obvious problems, like outliers, regime shifts, and so on before testingfor autocorrelation. When this is said, there does not necessarily exist anunambiguous order in which to perform the tests. For instance, a test for het-eroscedastic disturbances is also a test of mis-speci cation of the functionalform, because systematic patterns in the variance of the estimated residualsmay be caused by either one. Similarly, Chow-tests of recursive residuals, tobe discussed in the next chapter, are sensitive not only to instability in thecoeffi cient estimates but also to heteroscedastic disturbances. The Durban-Watson d statistic, originally constructed to test against rst-order autocor-relation, can also be symptomatic of an inappropriate functional form, inparticular a static instead of a dynamic formulation.

The reason for the popularity of the regression model can be traced backto the Gauss-Markov theorem stating that the OLS estimators are B.L.U.E.under assumptions 1 - 3 and 5 above. This chapter is primarily concernedwith testing the Assumptions 3-4 and 6 whereas Assumption 5 will be dis-cussed later on in connection with the Instrumental Variables approach.


52/121


in the DGP or in the gathering of the data which the researcher has failed tomodel, in which case economic or statistical theory should be called upon toprovide a basis for the existence of such a process and an attempt should bemade to identify the form of the heteroscedasticity and to explicitly includethe process in the model. On the other hand, heteroscedastic residuals could just as easily arise on account of a mis-speci ed functional form, or other

mis-speci cation. In other words, observed heteroscedastic residuals are aslikely to mean that the model is mis-speci ed as that a true heteroscedasticprocess is present.

4.2.1 The E ff ects of Heteroscedasticity

Under the null hypothesis of homoscedasticity the errors have a constant

variance over the sample period, that isH 0 : E {

0} = 2 I ,t = 1 ,...,T.

Under the alternative hypothesis of heteroscedasticity the variance of theerrors varies over time:

E {t 0

t} =

where is a diagonal matrix with diagonal elements 2,t , t = 1, . . . , T.Under the null hypothesis it was already shown that the estimated resid-

ual variance was derived from the equality

0 = 0M ,

in whichM

is the symmetric idempotent projection matrix, I X

(X

0X

) 1X 0

.In the case of homoscedastic residuals, the derivation of the estimator, aftertaking expectations, leads to

E { 0 } = 2 trM


53/121


which is an unbiased estimator of 2 :

E ( b2 ) = 2 .

Under the alternative of heteroscedastic residuals the estimator (4.1) willyield an average residual error variance, 2 , which does not account for thatin periods of high residual variance the predictions of yt are less precisethan in periods of small residual variance. Therefore, the e ff ect of residualheteroscedasticity is that it renders the OLS estimate or, b, ineffi cient. Thevariance-covariance matrix of b is given byCov ( b ) = ( X

0X ) 1X 0 X (X 0X ) 1, (4.2)

whereE { 0} = .

It is easy to see that only when = 2 I , the covariance matrix (4.2) collapsesto the usual OLS estimator:

Cov ( b ) = 2 (X

0X ) 1. (4.3)

In the case of heteroscedastic errors = diag {11 , . . . , T T } and (4.3)will typically underestimate the true variance of

b . Although the e ff ect of

heteroscedasticity is that the OLS estimator is no longer B.L.U.E., the lossof effi ciency is not necessary very large in many practical situations.

4.2.2 Heteroscedasticity-Consistent Standard Errors

In the presence of heteroscedasticity it is possible to estimate heteroscedastic-consistent standard errors (HCSE) for the OLS estimators. The method

proposes an estimate of b = diag

21, . . . ,

2T

, obtained by considering theindividual elements of the residual vector to be proxies for the unobservabledisturbances. That is, given that t is a proxy for t , the former may be con-sidered a sample of size 1 from the distribution of the tth residual, and 2tcan be viewed as an estimator of 2 t , the true variance of t . Under standard


54/121


where x 0i is the (1 k) ith row of X . The true variance of the OLS estima-tor, b , which is (X

0X ) 1X 0 X (X 0X ) 1, can then be consistently estimated

despite the heteroscedasticity by,

dCov (

b ) = ( X 0X ) 1" T

Xi=1

2i x i x0

i#(X 0X ) 1. (4.4)Under the assumption of heteroscedastic errors the use of (4.4) instead of (4.2) yields asymptotically valid inference based on Students t- and F-tests.Moreover, HCSEs act as a general test of heteroscedasticity, insofar as theresiduals from a model in which the HCSEs do not signi cantly di ff er fromthe conventional estimates can be assumed to be homoscedastic. Note alsothat no speci c form of heteroscedasticity is needed for the calculation of

HCSE.

4.2.3 Testing for Heteroscedasticity

The Breusch-Pagan test and the White LM test for heteroscedastic errorsare very similar in nature. They are based on the following very generalassumption about the error term:

yt = 0x t + t , t N (0, 2,t ) (4.5)

and2,t = 2 h(z

0

t ).

The vector z t consists of known variables which under the alternative hy-pothesis are supposed to cause the time dependent variation in 2,t . Under

the null hypothesis 2,t =

2 . This is a typical case where the model is veryeasy to estimate under the null, but quite di ffi cult to estimate under the

alternative. Thus, the LM procedure is a convenient test procedure. Thepractical procedure consists of the following steps:

1. Estimate the model with OLS and obtain the estimated residuals


55/121


where bi ' ( (H 1 )i (H 0 ) i ) is needed to correct for the error in the OLSestimates, (H 0 )i , when the alternative is true. (Compare with the result(3.4) in Chapter 3.) This error can be explained by the fact that if wehad estimated the true model under the alternative hypothesis, i.e. with amodel which properly accounted for the heteroscedasticity in the error term,then the correctly estimated coe ffi cients, (H 1 )i , would have deviated from

the OLS estimates, (H 0 )i , which were based on the wrong assumption of homoscedastic errors. Even if the OLS estimates are ine ffi cient when theerrors are heteroscedastic, they need not be biased and, depending on theassumed type of heteroscedsticity h(z 0t ), they often are not. In the latter casethe estimation error, bi , does not behave in a systematic way. For example,if we simulated a model with heteroscedastic errors a large number of times,then the average of the estimated coe ffi cients bi would be approximately zero.This does not exclude the possibility that the correction bi in one speci cmodel run can be (and often is) very large.

3. Calculate the LM test statistic, T R 2, where R 2 is a measure of goodnessof t in the auxiliary regression, which under the null is distributed as 2(m )or, alternatively:

4. Calculate the F-form of the LM test:

H (m, T k m 1) = T 2k m

m R2

1 R 2where k is the number of explanatory varibles and the constant. The LMtest is distributed as F (m, T k m 1).

In the Breusch-Pagan test z 0t = [x1,t ,...,x k,t , x 21,t ,...,x 2k,t ], that is 2t is

regressed on the explanatory variables and their squares, and in the Whitetest 2t is regressed additionally on their crossproducts.

If heteroscedasticity is detected by these tests it is in most cases a sign of model misspeci cation, rather than a true property of the error term.

4.2.4 Non-Normality


56/121


standard test procedures and caution is needed. It is always a good idea toplot the residual to get a rst visual impression of possible deviations fromthe normality assumption. If some of the residuals fall outside the interval 3 b it might be an indication that an intervention or similar extraordinaryevents have taken place at that time period. A careful examination of theeconomic calendar is indispensable when judging whether a big residualsis an innovation outlier or only an extraordinary big residual. The formerneeds often to be accounted for using a dummy variable as it corresponds toan actual institutional event, whereas the latter can just be the result of amisspeci ed model and may disappear when the problem has been corrected.

Similarly, it is useful to plot the a frequency distribution of the estimatedresiduals together with the frequency distribution of a normal random vari-able. Any large deviations from the normal distribution can then be seen byinspection. Another way of illustrating the deviation of the residuals fromthe normal distribution is by plotting the QQ graph. In this case normallydistributed residuals will lie on the 45 0 straight line and deviations from thisline indicate deviations from normality.

4.2.5 Skewness and Excess Kurtosis

The deviations from normality can be described by calculating the thirdand forth moment of a standardized variable. Skewness yields informationabout the extent to which a distribution is skewed away from the mean andkurtosis yields information about the extent to which the distribution hasfat tails.

For a normal standardized variable the third central moment, skewness,is de ned as:

SK = E x

t

xx

3

and the fourth central moment, kurtosis, as:

4


57/121


can be considered asymptotically 2(2). However, in small and even moder-ately large samples, the estimated test statistic has generally not followedvery closely the 2 distribution. This has led to various small sample correc-tions of the normality test. As will be discussed below PcGive reports theasymptotic 2 test and as well as a small sample corrected test. The deriva-tion of the latter is quite complicated and we will, therefore, only report the

calculations of the asymptotic test statistics here.Based on a sample of T observations on a the normal variable x t skewnessis calculated approximately distributed as:

dSK =T

Xi=1 1T x t xx 3

and kurtosis as:

bK =T

Xi=1 1T xt xx

4

where the estimated variance of xt is:

2x = 1T T

Xt=1

(x t x)2!The estimated skewness is asymptotically distributed as:

dSK as N (0,

6T

)

and kurtosis as:

bK as N (3,

24T

).

The estimated residuals residuals from an OLS regression model sum to zeroby construction given that the model contains a constant term. This can beseen from the OLS result:

X 0 = X 0(I X (X0X )

1X 0) = 0


58/121


Since the rst element of X 0 = 0 corresponds to P t = 0, the residualsfrom an OLS regression model with a constant will always have a zero meanand a zero sample correlation with the explanatory variables.

Using the result that = 0 , the skewness of T estimated residuals residu-als from an OLS regression model with k explanatory variables and a constantis asymptotically distributed as:

dSK = 1T

T

Xi=1 t 3

= 13 1T

T

Xi=1 3t!and the kurtosis as:

bK =

1T

T

Xi=1

t

4

= 14

1T

T

Xi=1

4t

!,

where the estimated error variance is:

2 = 1T T Xt =1 2t!For a reasonably large sample the estimated residual skewness is distributed

as:

dSK as N (0,

6T

)

and the kurtosis as:

bK as N (3,

24T

).

Instead of kurtosis it is often customary to report excess kurtosis , EK ,which is

EK = Kurtosis 3.

4.2.6 The Jarque-Bera Normality Test


59/121


The test is asymptotically 2 with 2 degrees of freedom, that is,

JB A as 2(2) .

This is the normality test in PcGive called the asymptotic test. If the nullhypothesis of normally distributed residuals is correct, then the test statisticshould be small (for example, less than 2( .95) = 5.9), otherwise normality isrejected. However, simulation studies have shown that even in moderatelylarge samples the asymptotic test results are often not very good. This ispartly due to the fact that we use the estimated instead of the true errorvariance 2 . A degrees of freedom corrected test is calculated as:

JB = T kT JB A .However, another problem with the Jargue-Bera test is that in nite samplesthe skewness and the kurtosis are generally not uncorrelated, which is aviolation against the independence assumption of the 2 distribution. Thesecond test statistic in PcGive contains a degrees of freedom correction as wellas a correction for the sample dependence between skewness and kurtosis.

Jarque-Bera type of normality tests are often standard output in regres-sion packages. It is often hard to know exactly how they have been calcu-

lated. If in doubt, one possibility, though a time consuming one, is to try toreplicate the results.

Econometrics 2.December 10, 2003

Monte Carlo Simulations Many results in econometrics are asymptotic, i.e. for T .

1. Often di ffi cult to get a rm understanding of the results:


60/121

An Introduction toMonte Carlo Simulations and PcNaive

Heino Bohn NielsenInstitute of Economics,

University of Copenhagen

1 of 18

How should we think of repeated sampling?What is the exact meaning of a central limit theorem?

2. How does a given estimator work in nite samples?

To answer these questions, simulation methods are often useful. Get some intuition for asymptotic results.

Graphical representation of convergence, uncertainty etc. Analyze nite sample properties (often very di ffi cult analytically).

2 of 18

Outline of the Lecture1. The basic idea in Monte Carlo simulations.2. Example 1: Sample mean (OLS) of iid normals.3. Introduction to PcNaive.4. Example 2: OLS in an AR(1) model:

Consistency of OLS.

Finite sample bias.5. Example 3: Simultaneous equation model: Inconsistency of OLS. Consistency of IV. The idea of strong and weak instruments.

3 of 18

The Monte Carlo Idea The Monte Carlo method replaces a di ffi cult deterministic problem with a

stochastic problem with the same solution (e.g. due to LLN).If we can solve the stochastic problem by simulations, labour intensive workcan be replaced by cheap capital intensive simulations. What is the probability of success in the game Solitaire ?

Very diffi

cult analytical problem.But a machine could play M times, and for M we could estimatethe probability.

What is the nite sample distribution of an estimator?Very diffi cult in most situations.We could generate M samples and look at the empirical distribution of the estimates.

4 of 18

Note of CautionThe Monte Carlo method is a useful tool in econometrics.But:

Ex. 1: Mean of iid Normals Consider the model

yt = + t t N (0 2 ) t = 1 2 T (1)


61/121

Simulations do not replace (asymptotic) theory.

Simulation can illustrate but not prove theorems.

Simulations results are not general .Results are speci c to the chosen setup.Work like good examples.

5 of 18

yt + t , t N (0, ), t 1, 2,...,T. (1)

The OLS (and ML) estimator

b of is the sample mean

b = T 1

T Xt =1

yt .

Note, that b is consistent, unbiased and (exactly) normally distributed b N (, T

1 2 ).

The standard deviation of the estimate can be calculated as

b = T 1

T

Xt =1 (yt

b)2

.

We call this the estimated standard error (ESE).

6 of 18

Ex. 1 (cont.): Illustration by SimulationWe can illustrate the results, if we can generate data from (1). We need:

1. A fully speci ed Data Generating Process (DGP), e.g.

yt = + t , t N (0, 2 ), t = 1, 2,...,T (2)

= 52 = 1.

An algorithm for drawing random numbers from N (, ).Specify a sample length, e.g. T = 50.

2. An estimation model for yt and an estimator. Consider OLS in

yt = + u t . (3)

Note that the statistical model (3) and the DGP (2) need not coincide.

7 of 18

Ex. 1 (cont.): Four Realizations Suppose we draw 1 ,..., 50 from N (0, 1) and construct a data set,

y1 ,...,y 50 ,

from (2). We then estimate the model

yt = + ut ,using OLS to obtain the sample mean and the standard deviation in onerealization, b = 4.9528, ESE( b ) = 0 .1477. We can look at more realizations

Realization b ESE( b )1 4.98013 0.14772 5.04104 0.1320

3 4.99815 0.14794 4.82347 0.1504Mean 4.96070 0.1445

8 of 18


62/121

PcNaive PcNaive is a menu-driven module in GiveWin. Technically, PcNaive generates Ox code, which is then executed by Ox.

Output is returned in GiveWin

Ex. 3: OLS in an AR(1) Consider the DGP

yt = 0.9 yt 1 + t t N (0, 1), t = 1, 2,...,T.


63/121

Output is returned in GiveWin. Idea:

1. Set up the DGP AR(1) Static PcNaive General

2. Specify the estimation model3. Choose estimators and test statistics to analyze.4. Set speci cations: M, T etc.

5. Select output to generate.6. Save and run.

13 of 18

y y ( )

We specify y0 = 0, and discard 20 observations.

The estimation model is given byyt = yt 1 + ut ,

and we estimate with OLS. Note, that b is consistent but biased.

T MEAN BIAS MCSE5 0.7590 0.1410 0.00475

10 0.7949 0.1051 0.00310

25 0.8410 0.0590 0.0017250 0.8673 0.0327 0.0010875 0.8779 0.0221 0.00082

100 0.8833 0.0167 0.00069

14 of 18

Ex. 3: Simultaneity Bias Consider as a DGP the system of equations

yat = 0.5 ybt + 0.75 yat 1 + 1 tybt = + 0 .8 z t + 2 tz t = + 0 .75 z t 1 + 3 t

for t = 1, 2, ..., 50, where

1 t

2 t

3 t

N 000

,1 0.2 0

0.2 1 00 0 1

.

We set ya0 = z 0 = 0 and discard 20 observations. Consider the OLS estimator in the estimation equation

yat = 0 + 1 ybt + 2 yat 1 + ut .

Note that cov( 1 t , 2 t) 6= 0 so that OLS is inconsistent. z t is a valid instrument for ybt and IV is consistent.

15 of 18

Ex. 3: Results, OLS vs. IV: T=50

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.10

2

4

No Simultaneity. OLS.Yb

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.10

2

4

Simultaneity. OLS.

Yb

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.10

2

4

Simultaneity. IV.Yb

16 of 18

Ex. 3: Results, OLS vs. IV: T=500

10

15 No simultaneity, OLS

Ex. 3: Results, Strength of Instruments

5.0

IV. Strong instrument. Yb = 1.30*Z+e.Yb


64/121

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80

5

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80

5

10

15 Simultaneity, OLS

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80

5

10

15 Simultaneity, IV

17 of 18

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1

2

3

4 IV. Medium instrument. Yb = 0.80*Z+e.Yb

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.5

1.0

1.5 IV. Weak instrument. Yb = 0.30*Z+e.Yb

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

2.5

18 of 18


65/121

Chapter 5

Autocorrelation and LaggedVariables

In the previous chapter we demonstrated that inference in the linear regres-sion model (for example using t-tests and F-tests) can be totally misleadingif the error term is not independent of previous errors, i.e. if the error termis autocorrelated. Therefore, it is very important always to test whetherthe residuals can be assumed autocorrelated or not. Such test procedureswill be discussed in Section 5.1. Since economic time-series data as a rule

are strongly time-dependent, the errors t in static linear regression model,yt =

0x t + t , are almost always autocorrelated. In this case there are twopossibilities: (i) one can adjust the OLS estimates by correcting for the de-tected autocorrelation or (ii) one can reformulate the static regression modelas a dynamic regression model. In most cases the latter procedure is likelyto yield much better results. Section 5.2 demonstrates how this can be doneusing lagged variables.

5.1 Autoregressive ResidualsThe assumption of independent errors E ( ) = 0 t 6= s in seldom satis ed


66/121

2 CHAPTER 5. AUTOCORRELATION AND LAGGED VARIABLES

i.e. yt = 0x t + ut .

5.1.1 The Autocorrelation and Partial AutocorrelationFunction.

The combined residual autocorrelation function (autocorrelogram) and the

partial autocorrelation function provides a

rst description of the pattern of time dependence. The correlation coe ffi cient between ut and u th , is calcu-lated as

r h =

T

Pt = h +1 u t uth

s T

Pt = h +1

u2t

T

Pt = k+1

u2th

, t = 1 + h, 1 + h + 1 ,...,T

= Cov(u t , u th )

u, t u t hIf the residuals are independent over time, then the true autocorrelation

coeffi cients are zero and the estimated values should be small. As a rule of thumb a correlation coe ffi cient is signi cant if rh > 2 1 T , where

1 T is the

approximate standard error for rh when the true is zero, i.e. when the er-rors are uncorrelated at lag h. The autocorrelogram, or the autocorrelationfunction, consists of r h , for h = 1, 2,... Plotting the autocorrelation functionagainst the lag, h, gives a rst visual impression of the magnitude of theautocorrelation problem. The autocorrelogram provides information aboutsimple correlation coe ffi cients and, when the errors are strongly time depen-dent, the autocorrelations tend to be fairly large even for large values of h.The residual autoregression of order h is de ned from:

ut = 0 ,h + 1 ,h u t1 + ... + h,h u th + t , t = 1 + h, 1 + h + 1 ,...,T

yields the multiple correlation coe ffi cient i between the residual at time t andt i. One can say that it di ff ers from the simple autocorrelation coe ffi cient,r in the same way as a multiple regression coe ffi cient di ffers from a single

5 1 AUTOREGRESSIVE RESIDUALS 3


67/121

5.1. AUTOREGRESSIVE RESIDUALS 3

Table 5.1: Residual autocorrelations and autoregressionsh 1 2 3 4 5 6 7 8

The residual autocorrelogramr i 0.69 0.57 0.42 0.31 0.19 0.15 0.06 -0.08

The residual autoregressioni, 8 0.48 0.33 0.01 -0.02 -0.16 0.08 0.10 -0.21i, 7 0.49 0.32 0.03 -0.02 -0.15 0.03 -0.02i, 6 0.48 0.32 0.03 -0.02 -0.16 -0.02i, 5 0.48 0.32 0.03 -0.01 -0.14i, 4 0.49 0.33 -0.01 -0.09i, 3 0.50 0.30 -0.05i, 2 0.49 0.28i, 1 0.68

The partial autocorrelogrami,P 0.68 0.28 -0.05 -0.09 -0.14 -0.02 -0.02 -0.21

Note that the sample estimates can depend on how we treat the rstlagged initial values. There are essentially three possibilities: (1) we doall calculations on the same sample t = 1, 2,...,T and replace the initial

values u1 ,...,u h with zero, (2) we do all calculations on the same samplet = 1 + h, 1 + h + 1,...,T, and (3) we estimate 1 ,1 on the sample t =2, 3,...,T, 2 ,2 on the sample t = 3, 4,...,T. The autoregressions and thepartial autocorrelogram in Table 5.1 are based on alternative (2), which doesnot exactly reproduce the output from PcGive.

By running an OLS regression in PC-Give and then selecting the test, onecan obtain the estimated residual autocorrelation and partial autocorrelationfunction for a user de ned lag length h.

As an illustration, the residual correlogram and the residual autogregres-sion below of lag length 8 are calculated for the residuals from regressingthe real aggregate consumption on real domestic demand and real wealth inDenmark 1973:1-2003:1

4 CHAPTER 5 AUTOCORRELATION AND LAGGED VARIABLES


68/121


ed model in the sense that the errors were supposed to be uncorrelated,but obviously were not. When testing for the signi cance of the residualautoregressions we need to account for this fact as will be illustrated below.

5.1.2 An LM Test for Residual Autoregressive ErrorsWhen testing the signi cance of the residual autocorrelations, we need tospecify a model describing the dependence of ut on its lagged values, againstwhich we can test the null hypothesis of no autocorrelation. Althoughit is possible to choose between various kinds of ARMA (AutoRegressive-MovingAverage) models, the simple AR(m) model is by far the most popularin empirical applications. The reason is, besides its simplicity, that the testprocedures for autocorrelated errors have been shown to work well for theAR model, independently of whether the true model is an ARMA. We will,therefore, only discuss the AR model here and leave the more detailed treat-ment of ARMA models to the end of the course. We specify the regressionmodel with autoregressive residuals as:

yt = 0x t + u t (5.1)

where

u t = 1 ut1 + ... + m utm + t . (5.2)

The null hypothesis is

H 0 : 1 = . . . = m = 0,

which corresponds to the joint insigni cance of the autoregression coe ffi cients,and, thus, to the absence of autocorrelation, whereas the alternative hypoth-esis is at least one i 6= 0 . To estimate (5.1) subject to the restriction (5.2)is a highly nonlinear problem. Therefore, the type of test procedure to applyis obviously the LM test procedure. It has been shown that this amounts torunning the usual auxiliary regression:

5 1 AUTOREGRESSIVE RESIDUALS 5


69/121

5.1. AUTOREGRESSIVE RESIDUALS 5

be included to correct for any bias in the original OLS regression due tomisspeci cation of the error term.

From the auxiliary regression we can calculate the LM test statistic:

(m) = T R2

distributed as 2 (m) or, alternatively, the F-form:

F (m, T k m) = ESS aux /m

RSS aux / (T k m 1) = T k m 1

mR2

1 R2

distributed as F (m, T k m 1). When deciding about the lag length min (5.2) one can use the results from the partial autocorrelation function asa coarse guide. Because the residuals are likely to have been in uenced bythe misspeci cation bias, it is advisable not to take the results too literally.

As an illustration we calculate the LM test assuming a residual autore-

gressive process of order

ve. The estimated coeffi

cients of 1 ,..., 5 are givenbelow with their standard errors:i 1 2 3 4 5

i 0.58 0.23 0.03 0.00 0.06 i 0.09 0.11 0.11 0.11 0.09

It appears that the estimated coe ffi cients di ff er quite substantially from theestimated partial autocorrelation function and the third autoregressive co-

effi cient is, in fact, more signi cant than we might have expected. Basedon the RSS aux = 0.0302818 and the RSS OLS = 0.06112145 we can derivethe LM test statistics, (5) = 61.05 [0.0000] and the F-form, F (5, 113) =23.16[0.0000].

Because the LM test is a joint test of all the coe ffi cients, the power of the test will decrease if we choose m too large. Therefor, the choice of m isa compromise between not including too many insigni cant coe ffi cients, but

at the same time not excluding any signi

cant coeffi

cients.

5.1.3 The Durbin-Watson Test for AR(1) ResidualsA commonly used test for detecting rst-order autocorrelation in the residuals


70/121

5.2. LAGGED VARIABLES IN THE LINEAR REGRESSION MODEL 7


71/121

7

yt = 0 + 1 x t + u t , and

ut = ut1 + t ,

t N (0, 2 ).The parameter, , is the coeffi cient of rst-order autocorrelation, and

t is

a normal independent error, also called a white-noise process (a termborrowed from engineering).

Noting thatut = yt 0 1 x t ,

we can replace u t to obtain

yt 0 1 x t = (yt

1 0 + 1 xt

1 ) + t

or,yt = yt1 + 1 xt

1 xt1 + (1 ) 0 + t

(1 L) yt = 1 (1 L)xt + (1 ) 0 + t= a1 yt1 + b0 xt + b1 xt1 + c0 + t

(5.3)

where the lag operator L is de ned as Lm x t = xtm , a1 = , b0 = 1 , b1 = 1 , and c0 = (1 ) 0 . The middle row of (5.3) illustrates what is oftencalled the common factor dynamic regression model. Note that the GLSmethod of correcting for residual autocorrelation is based on the validity of the common factor model. Before using GLS one should, therefore, rst testthe validity of the common factor restrictions which can be done directly bythe common factor test in PCGive.

By re-specifying the model with a lagged dependent variable and a lagof the explanatory variable as given by the third row of (5.3), we obtain a

model where residuals often are much closer to the OLS assumptions and amodel that more faithfully re ects the systematic components of the datagenerating process.

The general dynamic model is speci ed as:



72/121

If p = 1, q = 1 we obtain the simple dynamic linear model discussed above:

(1 a1 L)yt = ( b0 b1 L)xt + c0 + tyt = a1 yt1 + b0 xt + b1 x t1 + c0 + t , t = 2,...,T t N (0, 2 )

Let

x t =yt1x tx t1

1

=a1b0b1c0

The OLS-estimate of in y = X +

OLS = ( X0X )1 X 0y

= ( X0

X )1

X0

(X + )= ( X 0X )1 X 0X + ( X 0X )1 X 0E ( OLS ) = if E (X

0 ) = 0

(5.5)

The condition E (X 0 ) = 0 is generally satis ed if E ( 0) = 2 I , i.e. if t isuncorrelated th . Therefore, if the residuals from the dynamic linear modelare well-behaved, i.e.:

E (0

) = 2

Ithe OLS-estimates corresponds to the maximum likelihood estimates. There-fore, the parameters of a linear model with lagged (dependent and/or ex-planatory) variables can very well be estimated with OLS, which renders un-biased estimates as long as the residuals are not autocorrelated . However, if the explanatory variables xt are not stationary, cf. de nition in KJ: Chapter1, inference can be improved be properly accounting for the nonstationarity.

This will be discussed in subsequent chapters.

5.2.1 Deriving the Steady-State Solutionh h l d bl h d l l h d

5.2. LAGGED VARIABLES IN THE LINEAR REGRESSION MODEL 9


73/121

The long-run steady-state solution de nes the hypothetical value towardswhich the process would converge if one could switch o ff the errors. Thus, thesolution is found by solving the dynamic model for yt = yt1 and xt = xt1 ,i.e. by setting L = 1. For model (5.6) this yields the steady-state solution:

(1 a1 )y = ( b0 + b1 )x + c0

y = b0 + b1

1 a1 x + c0

1 a1

i.e.

y = 1 x + 0 (5.7)

where: 1 = b0 + b11 a1 and 0 = c01 a1 .Using (5.7) it is now easy to see that the steady-state solution of the

dynamic linear model is closely related to the static linear model:

yt = 0 + 1 xt + u t , t = 1,...,T (5.8)

where ut is an autocorrelated process.

5.2.2 The Dynamic Linear Model in Equilibrium-CorrectionForm

It is often a good idea to reformulate the dynamic linear model in the


74/121


75/121


76/121

2 CHAPTER 6. TESTING FOR STRUCTURAL CHANGE


77/121

the available test procedures are not very good at nding the time pointswhen the break happens, partly because estimated model parameters can beunstable as a result of other types of model misspeci cation. For example,the correct model is a loglinear model, but we have used levels of variables,or we estimated a static instead of a dynamic model.

6.1 Tests of a structural break at a knowntime point

In this case the sample is split into two sub-samples, one consisting of theobservations up to the suspected break, t = 1,...,T 1, and another sub-samplet = T 1,...,T containing the rest of the observations, T T 1 = T 2.

As an illustration we consider the regression model with two explanatoryvariables and a constant term.

yt = 0 + 1x1t + 2x2t + t ,t = 1, . . . , T t NI (0, 2 ),

The null hypothesis is formulated as no change in the model parametersand the alternative hypothesis as at least one parameter has changed i.e.:

The following hypotheses are of interest:

H 0 : The parameters ( 0, 1, 2) are constant over the entire sample period .H 1 : The parameters ( 0, 1, 2) changes at time T 1 to

: ( 0 + 0, 1 + 1, 2 + 2), where at least one i 6= 0 , i = 1, 2, 3,and remain constant until the end of the sample period.

Under the null of no structural change the sum ( RSS T 1 + RSS T 2 ) shouldnot to diff er signi cantly from the RSS T obtained from the full-sample model.This is the idea behind the structural change Chow-test:

(RSS (RSS RSS ))/k

6.1. TESTS OF A STRUCTURAL BREAK AT A KNOWN TIME POINT 3


78/121

is signi cantly smaller than RSS T , suggesting the presence of a structuralchange. The degrees of freedom are k in nominator as we test the stabilityof k parameters ( k = 3 in the example above) whereas in the denominatorthe degrees of freedom is T 2k, because the model with k regressors isestimated twice.

The Chow test can easily be generalized to more than two sub-periods,corresponding to multiple structural breaks.

A special case arises when one of the sub-periods has the same or a smallernumber of observations than the model, that is, when k T i . In this casethe OLS estimator cannot be calculated for the smaller sub-sample and theChow-test is calculated as

F Chow = (RSS T RSS T 1 )/T 2

RSS T 1 / (T 1 k) F (T 2 ,T 1 k ) .

The Chow test is an overall test that does not distinguish between whetherall of the parameters have changed or whether only some subset of the param-eters have done so. Another more informative way of testing for structuralchange is to use dummy variables to create new split sample variables inthe following way:

First, de ne an indicator variable (a shift dummy variable)

DT 1 = 0 for t = 1,...,T 1 11 for t = T 1,...,T

Then create the new variables, xi,t = DT 1 xi,t , i = 1,...,k. For a regressionmodel with two explanatory variables and a constant term the new datamatrix will look as follows:

X H 1 = X T 1 0X T 2 X T 2 =1 x11 x21 0 0 0...

...

...

...

...

...1 x1T 1 1 x2T 1 1 0 0 0

1 x1T 1 x2T 1 1 x1T 1 x2T 1...

... ...

... ...

...



79/121

It is now easy to test the overall hypothesis of parameter change at timeT 1 as the joint hypothesis of ( 0, 1, 2) = 0 . If rejected, at least one of theparameters have changed at time T 1. The testing can, for example, be doneas a Wald test, R = r, where

[R, r ] =0, 0, 0, 1, 0, 0, 00, 0, 0, 0, 1 0, 00, 0, 0, 0, 0, 1 0

.

The advantage of this procedure is that it is easy to test the signi canceof each coeffi cient with a Students t-test, so that we can nd out whichcoeffi cients have changed, if any. In most cases we have from the outset aprior hypothesis regarding which coe ffi cients are likely to have changed andwhich not. For example, if we expect the regime shift to exclusively have

changed the constant term in the model, i.e. have caused a level shift in ytat time T 1, then we would only add the shift dummy DT 1 to the model andthen test the hypothesis 0 = 0. Thus, the dummy procedure allows us tobe more speci c regarding which parameters we believe have changed.

6.2 Diagnosing Structural Change

The above Chow tests for a structural change have been designed to detecta regime shift in the economy at a speci c point in time. Sometimes we donot know for sure whether there have been any regime shifts or not or wemight suspect such shifts to have occurred but do not know when. By the useof recursive methods, a sample can be successively split and the model re-estimated for each consecutive sample, thus providing a method of locating

the point in the sample where the evidence for a structural break is thestrongest. When this is said it is often very di ffi cult to diagnose a structuralchange based on the recursive graphs. This is so partly because adding onemore observation belonging to a new regime may not be enough to producea signi cant test statistic partly because a signi cant test statistic need not

6.2. DIAGNOSING STRUCTURAL CHANGE 5


80/121

instability of the parameters that arises from a mis-speci ed model on theother.

6.2.1 Analysis of 1-Step Forecasts

In most cases the purpose of empirical modelling is partly to obtain a betterunderstanding of the empirical mechanism that have generated our data,partly to use this knowledge to obtain better forecasts for future periods.Thus the task is to estimate a good econometric model and then use it toproduce a forecast of the variable, i.e. E t (yt + h | xt + h ) for the next period(h = 1), or possibly for several periods ahead ( h = 1, 2,...). In a realisticforecasting situation we do not know the future value of xt + h . Thus, weshould rst calculate forecasts for the explanatory variables to be able tocalculate forecasts for ( yt + h | xt + h ). This is called dynamic forecasting or ex

ante forecasting.Most people would believe that an econometric model with more explana-

tory power would generate better forecasts than a model with less explana-tory power. Unfortunately this is not always true. In some cases simple mod-els, like yt = yt 1 + t , with essentially no explanatory content providesbetter forecasts than sophisticated explanatory models. When this happensit is often an indication either that our econometric model is misspeci ed

or that there has been an unpredictable change in the model parametersover the forecast period. In the latter case the forecast failure can only beremedied ex post, that is after the event that caused forcast failure. This isbecause it is impossible to include information about unpredictable events inour forecast procedure (such events could, for example, be the tearing downof the Berlin wall).

But, if the forecast failure is due to model misspeci cation then the rem-edy is to improve the speci cation of our model. To nd out whether ourmodel is likely to suff er from forecast failure due to misspeci cation we canperform an ex post forecasting exercise. The idea is the following: Assumethat we had estimated our present model based on a sample 1 ,...,T 1 whereT1 < T and used it to forecast E (yT h | xT h ) h = 1 T2 T2 = T T1



81/121

Ex post forecasts, which are systematically incorrect, either too big or small,or systematically positive or negative, will be interpreted as a sign of modelmisspeci cation.

6.2.2 The 1-Step Forecasts

The purpose of this section is to discuss ex post forecast analysis as a meansto detect model misspeci cation. The idea is to divide the sample into aestimation base period, t = 1,...,T 1 and a forecast period, t = T 1 + 1,...,T .The 1-step forecasts, byT 1 +1 , . . . , byT , are calculated based on a model that hasbeen estimated over the base period t = 1, . . . , T 1, for given the values of x t , t = T 1 + 1 , . . . , T, i.e.:

byT 1 + i = x

0

T 1 + i

b , i = 1,...,T 2

where T 2 = T T 1 and b has been estimated on the T 1 rst sample ob-

servations. The 1-step forecast error, eet , is de ned as the departure of theprediction, byt , from the actual value, yt , i.e.:eet = yt byt= yt x 0t b , t = T 1 + 1 ,...,T. (6.1a)

Under the null of a correctly speci ed model we can substitute the true value

of yt = x0

t

+

t , for yt in (6.1a)

eet = x0

t t x0

t b= xt ( b ) + t ,and it appears that the forecast error is made up of two components, the rst of which is due to the deviation of the estimated parameters from theirtrue values, (

b ), and the second, t , is due to an (unpredictable) random

shock. The variance of the 1-step forecast error is given as

var (eet ) = x0

t V ar( b )x t + V ar(2t )= b2 x

0

t (X0X ) 1x t + b

2 = b

2 (1 + x 0t (X

0X ) 1x t ),

where b2 is the estimated within-sample residual variance

6.2. DIAGNOSING STRUCTURAL CHANGE 7


82/121

which is t-distributed with T 1 k degrees of freedom:

eetse(eet ) t (T 1 k ) .

This provides a test of the signi cance of an forecast error. The 1-step aheadforecasts can be evaluated graphically by plotting the actual and predicted

values on the same graph. By plotting 95% con dence intervals around theforecast, byt 1.96se(eet ), one can illustrate whether the actual realization, yt ,would have fallen within the the con dence bands of the model forecast.6.2.3 Chow Type Forecast Tests

The t test of individual forecast errors tells us whether at one speci c pe-

riod our model forecast would have failed or not, but does not say muchabout the joint forecasting performance of our model. If the model worksadequately also in the forecast period we would expect the forecast errors tobe independent and have approximately the same variance as the base periodresiduals. Therefore, a comparison of the base period residual variance, 2 ,with the forecast-period residual variance can be used to evaluate the jointperformance of the forecasts.

Under the H 0 hypothesis of identically and independently distributedforecast errors, the sum of squared standardized forecast errors is distributedas 2(T 2 ) . Since we use the approximate error variance,

2 , instead of the true

one, the forecast 2 test is only approximately distributed as 2, i.e.:

1 =T

Xt = T 1 +1 ee2t

b2

app. 2(T 2 ) ,

When 1 > 2c (T 2) we reject the null hypothesis that our base sample modelis correctly speci ed as a description of the behavior in the forecast period.The F form becomes:



83/121

A similar test procedure, called the forecast Chow Test, compares thediff erence between RSS T calculated for the full period, and the RSS T 1 calcu-lated for the baseline period with the RSS T 1 , i.e. the residual error variancefor the baseline period. The null hypothesis is H 0 : (T 1) = (T 2) = ,2 (T 1) = 2 (T 2) = 2 , and E ( t t h ) = 0 , under which the Chow test Fstatistic is

3 = (RSS T RSS T 1 )/T 2RSS T 1 / (T 1 k)app F (T 2 ,T 1 k ) .

The forecast Chow F-test is essentially a way of checking whether thebaseline model would be appropriate over the entire sample, so it is more of a test for a structural break at or near period t = T 1.

6.3 Recursive MethodsRecursive estimation is used as a general diagnostic tool to detect signs of parameter nonconstancy over the sample period. It is primarily meant tobe used as a graphical device to detect any nonconstacy problems and, afterthe problems have been solved, as a way to convince the reader that theempirical results can be trusted. Similarly as in the 1-step forecast analysisthe sample needs to be split into a base period, t = 1, . . . , T

1, and a recursive

test period, t = T 1 +1 , . . . , T . However, unlike the one-step forecast analysis,the model parameters are updated at each new observation in time.

6.3.1 Recursive parameter estimates

For a given base period the OLS parameter estimates and their variance canbe calculated as

b T 1 = ( X0

T 1 X T 1 ) 1X 0T 1 yT 1

Cov( b T 1 ) = 2 ,T 1 (X0

T 1 X T 1 ) 1.

6.3. RECURSIVE METHODS 9


84/121

any b i,T 1 + i , i = 1,...,T 2, falls outside the con dence bands given by themin b i,T 1 + j + 1.96q V ar( b i,T 1 + j )and the max b i,T 1 + i 1.96q V ar( b i,T 1 + i ).When this happens it may be an indication of param

Juselius - Basic Time Series

Documents

Transcript of Juselius - Basic Time Series