3 Econometric Tools and Techniques

41
3 Econometric Tools and Techniques This chapter analyzes the properties of estimators and test statistics. It illustrates how their prop- erties differ between stationary and non-stationary processes, both in large and small samples, as well as under particular kinds of model mis-specification. Asymptotic and finite-sample res- ults are tied together to illustrate the usefulness of the former in designing efficient Monte Carlo studies to examine small-sample properties: asymptotic results guide the choice of the region in the parameter space to provide the largest amount of information, and suggest the form of control variate statistics to construct more efficient simulation methods. We first consider sta- tionarity and invertibility conditions for a moving-average process in §3.1, following up §2.11, then investigate the properties of OLS and instrumental variables (IV) estimators in §3.2, analyt- ically in large samples, and by Monte Carlo in small samples. That sets the scene for a more thorough look at Monte Carlo experimentation in §3.3, and the linkages through control variates of the finite-sample and asymptotic behaviour of OLS in §3.4. We next apply Monte Carlo to investigate the effects of structural breaks on regression estimators in §3.5, and discuss the role of recursive updating in §3.6 as an efficient way to obtain results at many sample sizes. Finally, §3.7 considers the properties of some Wiener processes that arise in the sampling distributions of estimators when there are unit roots in the DGP. 3.1 Moving-average processes Consider the first-order moving-average process: y t = μ 0 + t + μ 1 t-1 where t IN 02 . (3.1) (1) Derive the mean of y t : are any stationarity conditions required on μ 1 for {y t } to have a constant mean? (2) Derive V[y t ] – when is it constant over time? (3) Derive: r 1 = corr[y t ,y t-1 ], r 2 = corr[y t ,y t-2 ], and hence r j = corr[y t ,y t-j ]. Can r j take any value? (4) Is {y t } in (3.1) generally stationary? If so, are all μ 1 values admissible? If not, what range of μ 1 is allowed? Let μ * 1 =11 : what is r j (μ * 1 ) in comparison to r j (μ 1 )? 51

Transcript of 3 Econometric Tools and Techniques

3

Econometric Tools and Techniques

This chapter analyzes the properties of estimators and test statistics. It illustrates how their prop-

erties differ between stationary and non-stationary processes, both in large and small samples,

as well as under particular kinds of model mis-specification. Asymptotic and finite-sample res-

ults are tied together to illustrate the usefulness of the former in designing efficient Monte Carlo

studies to examine small-sample properties: asymptotic results guide the choice of the region

in the parameter space to provide the largest amount of information, and suggest the form of

control variate statistics to construct more efficient simulation methods. We first consider sta-

tionarity and invertibility conditions for a moving-average process in§3.1, following up§2.11,

then investigate the properties ofOLS and instrumental variables (IV) estimators in§3.2, analyt-

ically in large samples, and by Monte Carlo in small samples. That sets the scene for a more

thorough look at Monte Carlo experimentation in§3.3, and the linkages through control variates

of the finite-sample and asymptotic behaviour ofOLS in §3.4. We next apply Monte Carlo to

investigate the effects of structural breaks on regression estimators in§3.5, and discuss the role

of recursive updating in§3.6 as an efficient way to obtain results at many sample sizes. Finally,

§3.7 considers the properties of some Wiener processes that arise in the sampling distributions of

estimators when there are unit roots in the DGP.

3.1 Moving-average processes

Consider the first-order moving-average process:

yt = µ0 + εt + µ1εt−1 where εt ∼ IN[0, σ2

ε

]. (3.1)

(1) Derive the mean ofyt: are any stationarity conditions required onµ1 for yt tohave a constant mean?

(2) DeriveV[yt] – when is it constant over time?(3) Derive: r1 = corr[yt, yt−1], r2 = corr[yt, yt−2], and hencerj = corr[yt, yt−j].

Canrj take any value?(4) Is yt in (3.1) generally stationary? If so, are allµ1 values admissible? If not,

what range ofµ1 is allowed? Letµ∗1 = 1/µ1: what isrj (µ∗

1) in comparison torj (µ1)?

51

(5) Apply the same techniques to the ARMA process:

yt = ρyt−1 + εt + µεt−1 where εt ∼ IN[0, σ2

ε

], (3.2)

with |µ| < 1 and |ρ| < 1. Obtain the first two moments and the correlogramrj.

3.1.1 Mean of an MA(1) process

Taking expectations:

E [yt] = µ0 + E [εt] + µ1E [εt−1] = µ0 ∀t.Hence, no conditions onµ1 are required forE [yt] to be constant over time.

3.1.2 Variance of an MA(1) process

V [yt] = E [εt + µ1εt−1]2

= E[ε2t]+ µ2

1E[ε2t−1

]+ 2µ1E [εtεt−1]

= σ2ε

(1 + µ2

1

).

Hence,V[yt] is constant over time∀µ1.

3.1.3 Correlogram of an MA(1) process

The autocorrelation of orderj is defined asrj = C[yt,yt−j]/V[yt], so let us deriveC[yt,yt−j ]:

C [yt,yt−j] = E [(yt − µ0) (yt−j − µ0)]

= E [(εt + µ1εt−1) (εt−j + µ1εt−j−1)]

=µ1σ

2ε j = 1

0 otherwise,

so that:

rj =

µ1

1 + µ21

j = 1

0 otherwise.

To find the interval in whichr1 may vary, notice that the functionµ1/(1 + µ21) has its

stationary point at:

∂(

µ11+µ2

1

)∂µ1

=1

1 + µ21

− 2µ21

(1 + µ21)

2 =1 − µ2

1

(1 + µ21)

2 ,

which is zero atµ1 = ±1. Hence,r1 reaches its maximum whenµ = 1 and itsminimum whenµ = −1, so|r1| ≤ 0.5.

3.1.4 Stationarity and invertibility conditions for an MA(1) process

Weak stationarity holds without restrictions onµ1: the mean and variance are bothconstant andrj depends only upon the lag. However, we would be able to write (3.1)as an infinite autoregressive process if and only if an invertibility condition is satisfied,that is, if and only if|µ1| < 1 since then:

(1 + µ1L)−1 =∞∑

i=0

(−µ1)iLi.

Hence, assuming|µ1| < 1:

εt =yt − µ0

1 + µ1L

= − µ0

1 + µ1+

∞∑i=0

(−µ1)i yt−i.

Also, |µ1| ≤ 1 is a necessary condition for identifying the process. To see that, consideranMA(1) with parameterµ∗

1 = µ−11 for µ1 6= 0. Then:

rj (µ∗1) =

µ∗1

1 + µ∗21

for j = 1 and 0 for j > 1.

But:

r1 (µ∗1) =

µ∗1

1 + µ∗21

=1µ1

1

1 + (1/µ1)2 =

µ1

1 + µ21

= r1 (µ1) ,

so we require|µ1| ≤ 1 if the same autocorrelation function is not to correspond to twodifferent processes.

3.1.5 Properties of ARMA(1,1) processes

As |ρ| < 1, the ARMA process in (3.2) can be written as the following infinite MA:

yt =εt + µεt−1

1 − ρL=

∞∑i=0

ρi (εt−i + µεt−i−1)

so, its mean isE[yt] = 0. Further, its variance is:

E[y2

t

]= E

[ ∞∑i=0

ρi (εt−i + µεt−i−1)

]2

= E

[εt + (µ+ ρ)

∞∑i=0

ρiεt−i−1

]2

= E[ε2t]+ (µ+ ρ)2

∞∑i=0

ρ2iE[ε2t−i−1

]=

(1 + µ2 + 2ρµ

)σ2

ε

1 − ρ2,

and its covariancesE[ytyt−j ] are:

E

[εt + (µ+ ρ)

∞∑i=0

ρiεt−i−1

εt−j + (µ+ ρ)

∞∑i=0

ρiεt−i−j−1

]

= ρj−1 (µ+ ρ)E[ε2t−j

]+ ρj (µ+ ρ)2

∞∑i=0

ρ2iE[ε2t−j−i−1

]=

σ2ε (µ+ ρ) (1 + ρµ) ρj−1

1 − ρ2,

for j ≥ 1. Finally, its autocorrelations are:

rj =(µ+ ρ) (1 + ρµ) ρj−1

1 + µ2 + 2ρµ= ρrj−1 for j ≥ 1.

Compare§2.11.2.

3.2 Properties ofIVE and OLS

Consider the model:

yt = βxt + εt (3.3)

xt = πzt + ωt (3.4)

where εtωt

zt

∼ IN3

000

,

σ2ε α 0α σ2

ω 00 0 σ2

z

. (3.5)

(1) Derive the limiting distributions of the instrumental-variables estimatorβ of βand of the least-squares estimatorβ of β.

(2) Explain how you would undertake a Monte Carlo study of the behaviour ofβ andβ in finite samples. Obtain and use control variates forβ andβ based on§3.2.1.

(3) On what criteria would you base a choice betweenβ andβ as the ‘least worst’estimator ofβ when: (i)T = 10; (ii) T = 100; (iii) T = 2000?

(4) Why might the behaviour ofβ be erratic at any sample sizeT whenσzπ/σω issmall?

(Oxford M.Phil., 1986)

3.2.1 Limiting distributions of IVE and OLS

TheIVE of β can be written as:

β =T−1

∑Tt=1 ytzt

T−1∑T

t=1 xtzt

= β +T−1

∑Tt=1 εtzt

T−1∑T

t=1 xtzt

. (3.6)

By assumptionεt, ωt, zt are jointly IID, so that the continuous functionsztεt,z2

t

, and ztωt are alsoIID. Hence, Kolmogorov’s strong law of large numbers

(SLLN) implies that (see e.g., Hendry, 1995a, ch. A4):

T−1T∑

t=1

ztεtAS→ E [ztεt] = 0, (3.7)

and that:

T−1T∑

t=1ztxt = πT−1

T∑t=1

z2t + T−1

T∑t=1

ztωt

AS→ πE[z2

t

]+ E [ztωt] = πσ2

z .

Finally, by Slutsky’s theorem, forπ 6= 0:(T−1

T∑t=1

ztxt

)−1

AS→ 1πσ2

z

(3.8)

and soβAS→ β + (πσ2

z)−1 · 0 = β, implying thatβ is a consistent estimator ofβ.To find the asymptotic distribution, becauseztεt is IID, we can apply the

Lindeberg–Levy theorem. So, from (3.7):

1√V [εtzt]

T− 12

T∑t=1

εtztD→ N [0, 1] .

Also, zt andεt are mutually independent so that:

V [εtzt] = E[z2

t

]E[ε2t]

= σ2zσ

2ε ,

and so:1

σzσεT− 1

2

T∑t=1

εtztD→ N [0, 1] . (3.9)

Finally, from (3.8) and (3.9), Cram´er’s theorem implies that:

√T(β − β

)=T− 1

2∑T

t=1 εtzt

T−1∑T

t=1 xtzt

D→ N

[0,

σ2ε

π2σ2z

]. (3.10)

Let us now look at the properties of theOLS estimator ofβ, defined as:

β = β +

(T∑

t=1

x2t

)−1 T∑t=1

xtεt. (3.11)

Althoughxt, εt is IID, the Lindeberg–Levy theorem cannot be directly applied be-causeE[xtεt] = α 6= 0 implies thatxt is not stochastically independent ofεt, soderivingV[εtxt] is not straightforward. Hence, we proceed in the following way. Sinceztωt and

ω2

t

areIID:

T−1T∑

t=1

x2t = π2T−1

T∑t=1

z2t + 2πT−1

T∑t=1

ztωt + T−1T∑

t=1

ω2t

AS→ π2σ2z + σ2

ω (3.12)

by Kolmogorov’s SLLN. Also, Slutsky’s theorem implies that:(T−1

T∑t=1

x2t

)−1

AS→ (π2σ2z + σ2

ω)−1,

so that from (3.7):

β − βAS⇒ α

π2σ2z + σ2

ω

. (3.13)

Write (3.3) as:

yt =(β +

α

π2σ2z + σ2

ω

)xt + ηt = δxt + ηt (3.14)

with:

ηt = εt − α

π2σ2z + σ2

ω

xt.

Then,ηt so defined is an independently distributed normal variate, because it is alinear combination of jointly-distributed variables of that kind. Also,ηt is uncorrelatedwith xt because:

E [xtηt] = E [xtεt] − α

π2σ2z + σ2

ω

E[x2

t

]= E [(πzt + ωt) εt] − α = 0 (3.15)

which, in addition to normality, implies stochastic independence ofηt from xt.Thus:

V [ηtxt] = E[x2

t

]E[η2

t

]=

(π2σ2

z + σ2ω

)(σ2

ε − α2

π2σ2z + σ2

ω

)=

(π2σ2

z + σ2ω

)σ2

ε − α2.

TheOLS estimatorδ of δ in the transformed equation in (3.14) is defined by:

δ − δ =(β − β − α

π2σ2z + σ2

ω

)+

(T∑

t=1

x2t

)−1 T∑t=1

xtηt, (3.16)

so that:

√T

(β − β − α

π2σ2z + σ2

ω

)=

(T−1

T∑t=1

x2t

)−1

T− 12

T∑t=1

xtηt.

But ηt, xt is IID so thatηtxt is as well, and hence the Lindeberg–Levy theoremimplies that:

((π2σ2

z + σ2ω

)σ2

ε − α2)− 1

2 T−12

T∑t=1

ηtxtD→ N [0, 1] .

Applying Cramer’s theorem, we finally arrive at the asymptotic distribution ofβ:

√T

(β − β − α

π2σ2z + σ2

ω

)D→ N

[0,

1π2σ2

z + σ2ω

(σ2

ε − α2

π2σ2z + σ2

ω

)]. (3.17)

Note thatα2/σ2εσ

2ω < 1.

3.2.2 Monte Carlo study ofIVE and OLS

To study the behaviour ofβ andβ in finite samples, we conduct a Monte Carlo studywhich comprises a set of experiments, each of which can be replicated a large numberof times. An experiment is defined by a particular choice of the parameters in the DGPand a sample size. In this exercise, having defined a particular experiment by choosinga value forβ, π, σ2

ε , σ2ω , σ2

z , α andT , we obtainM replications by generatingM setsof independent random numbers from the trivariate normal distribution in (3.5). Sub-stituting these values of the triplet(εt, ωt, zt) in the system in (3.3)+(3.4), we obtainMsets of observations on(yt, xt). (3.3) is estimated byOLS andIV for every replication,providing sets of values forβ and β from which their properties are investigated asfollows. In effect, we empirically simulate their behaviour in a controlled setting.

Say we are interested in the bias ofβ and that each replication (i) consists ofTobservations, for which we compute:

βi =∑T

t=1 yi,txi,t∑Tt=1 x

2i,t

, i = 1, . . . ,M.

From these, we can obtain a Monte Carlo estimator of the bias ofβ, namely an estimatorof E[β − β]. A naive Monte Carlo estimator of the bias ofβ is defined by:

β − β = M−1M∑i=1

(βi − β

).

However, more efficient Monte Carlo estimators can be constructed by using an addi-tional statistic called a control variate (CV). A CV is a statistic which is asymptoticallyequivalent to the econometric estimator whose properties we wish to analyze, whosefirst two moments are known exactly and which is positively correlated with the naiveMonte Carlo estimator. Denoting the CV forβ by β∗, a pooled estimator of the bias isdefined by: (

β∗∗ − β)

=(β − β

)− β∗

+ E[β∗]

(3.18)

where:

β∗ − β = M−1

M∑i=1

(β∗

i − β).

When the estimator under study possesses moments, the properties of the pooled estim-ator are:

E[β∗∗ − β

]= E

[β − β

]= E

[β − β

],

so thatβ∗∗ is unbiased for the bias of the econometric estimator. But now:

V[β∗∗]

= V[β]+ V

[β∗]− 2C

[β, β

∗]< V

[β]

(3.19)

whenV[β∗]< 2C

[β∗, β

∗].

To apply this result, let us derive the CV forβ. (3.12) and (3.16) suggest using:

β∗ =(β +

α

π2σ2z + σ2

w

)+

1π2σ2

z + σ2ω

T−1T∑

t=1

ηtxt. (3.20)

Its mean and variance are known exactly:

E[β∗]

= β +α

π2σ2z + σ2

ω

+1

π2σ2z + σ2

ω

T−1T∑

t=1

E [ηtxt]

= β +α

π2σ2z + σ2

ω

(3.21)

matching (3.13), and:

V[β∗]

= E

[β∗ − β − α

π2σ2z + σ2

ω

]2= T−2

(1

π2σ2z + σ2

ω

)2

E

[T∑

t=1

ηtxt

]2

= T−2

(1

π2σ2z + σ2

ω

)2 T∑

t=1

E[η2

t x2t

]+

T∑t6=s

E [ηtxtηsxs]

= T−1

(1

π2σ2z + σ2

ω

)2 [(π2σ2

z + σ2ω

)σ2

ε − α2], (3.22)

matching (3.17). Finally,β∗ is asymptotically equivalent toβ because from (3.12) and(3.15):

√T(β − β∗

)=

(T−1T∑

t=1

x2t

)−1

− 1π2σ2

z + σ2ω

T− 12

T∑t=1

ηtxtAS⇒ 0.

As will be shown in§3.2.3, the finite sample moments ofβ do not exist, so that theanalysis above does not follow for this estimator. However, intuitively, the CVs reflectaberrant random numbers in the same way as conventional estimators, so deviationsbetween them have less variation than deviations about a fixed value. Hence, the preci-sion of the Monte Carlo will usually be improved by using a CV. (3.6) and (3.8) suggestthe following control variate forβ:

β∗ = β +(πσ2

z

)−1T−1

T∑t=1

εtzt (3.23)

which is asymptotically equivalent toβ because from (3.7) and (3.8):

√T(β − β∗

)=

(T−1T∑

t=1

ztxt

)−1

− 1πσ2

z

T− 12

T∑t=1

εtztAS⇒ 0.

Its mean and variance are known exactly:

E[β∗]

= β +(πσ2

z

)−1T−1

T∑t=1

E [εtzt] = β,

matching the consistency ofIV, and:

V[β∗]

= E[β∗ − β

]2= T−2

(πσ2

z

)−2T∑

t=1

E[z2

t ε2t

]= T−1 σ2

ε

π2σ2z

,

again matching (3.10).To use these control variates simply requires programming the formulae for (3.20)

and (3.23), then computing their values at each replication and recording the deviationsfrom the uncontrolled estimators. To choose values for the design variables in the MonteCarlo, notice from (3.11) that the inconsistency is the same whatever the value ofβ.Hence, we setβ = 1 without loss of generality, so the design variables in the MonteCarlo areT, π, α, σ2

z , σ2ε andσ2

ω . Also:

V

εtxt

zt

=

σ2ε α 0α π2σ2

z + σ2ω πσ2

z

0 πσ2z σ2

z

,

which determines five equations in five unknowns, implying that the process changesby changing any of the design variables.

To provide an example of these claims, choose the single experiment defined by:T = 40, π = 0.5, α = 0.8, σ2

z = 16 andσ2ε = σ2

ω = 1, replicated10 times only. Fromthe results of this experiment, and using (3.21), we find that (3.18) provides a pooledestimate of the bias equal to:

β∗∗ − β = β − β − β∗

+ E[β∗]

=(β − β

)− (β∗ − β)

π2σ2z + σ2

ω

= 0.1747− 0.1690 + 0.16

= 0.1657,

as against the inconsistency of0.16, with variance computed from (3.19) as:

V[β∗∗]

= 0.00477 + 0.00451− 2(0.004534) = 0.00022,

showing a 20-fold reduction relative to the variance of0.00477 for the uncontrolledMonte Carlo estimate of the bias ofOLS. The covariance term reveals the very highcorrelation between the CV andOLS. Thus, the precision is equivalent to having done200 replications.

3.2.3 Criteria for estimator selection

The choice between estimators must be based on their properties. When exact distri-butions are known, estimators may be compared in terms of bias and variance, and therobustness of their properties to potential mis-specifications. However, in this particularexample,β does not have moments in finite samples because:

E[β]

=∫

E[β | z

]f (z) dz,

where:

β − β =z′ε

πz′z + z′ω,

with z′ε|z ∼ N[0, σ2εz

′z], andz′ω|z ∼ N[0, σ2ωz′z]. Hence, conditional uponz, β is

the ratio of two normal variates, so is distributed as a Cauchy variate. However, Cauchydistributions do not have any moments, so thatE[β|z] does not exist, implying thatE[β]is not finite. However,β does have finite first two moments, so would often ‘win’ ona bias or mean-square error criterion, despite being inconsistent. Hence, alternativecriteria must be used. Several possibilities exist.

First, measures of concentration and closeness have been defined. Consider twoestimatorsθ andθ of θ. Thenθ is said to be more concentrated thanθ if:

P[θ − λ < θ < θ + λ

]> P

[θ − λ < θ < θ + λ

],

for all λ > 0 and for eachθ ∈ Θ. θ is most concentrated if that property is satisfied forany otherθ. Alternatively,θ is Pitman closer thanθ if and only if:

P[∣∣∣θ − θ

∣∣∣ < ∣∣∣θ − θ∣∣∣] ≥ 1

2,

for eachθ ∈ Θ. θ is Pitman closest if that property is satisfied for anyθ. In the MonteCarlo, we can estimate the distributions ofβ andβ, so that the probabilities showingconcentration and Pitman closeness can be computed, providing us with information tochoose between the two estimators even for samples of size as small asT = 10.

Secondly, non-parametric measures might be selected, such as median and inter-quartile range. The median does exist forβ here, and isβ as the error distributionis symmetric, whereasOLS will have a median close toβ + α/

(π2σ2

z + σ2ω

). Thus,

IV is preferable on that criterion. However,OLS has an asymptotic variance no largerthanσ2

ε

(π2σ2

z + σ2ω

)−1as againstσ2

ε

(π2σ2

z

)−1for IV, and hence in sufficiently-small

samples, could well win on closeness or inter-quartile range. AsT → ∞, we cancompare these estimators in terms of their asymptotic properties, so the inconsistencyof OLS (an O(1) error) will eventually dominate its variance advantage.

3.2.4 Absence of moments

Although β does not have finite-sample moments (see§3.2.3), f(β|z) is a Cauchydistribution with z normally distributed, so that we can derive the joint distributionf(β, z) = f(β|z)f(z), and then obtainf(β) =

∫f(β, z)dz. This allows us to compute the

probability of β taking any given value, for a set of values ofπσz relative toσω. Thebehaviour ofβ is erratic if it can take almost any value with a high probability. Whenπσz/σω is small, thenzt is a bad instrument, in thatV[xt] is close toV[ωt]. Therefore,β is badly determined in the sense thatV[β] is large. To illustrate this argument write(3.6) as:

β = β + π−1

(T∑

t=1

z2t

)−1 T∑t=1

εtzt,

and notice that: √T (π − π) D→ N

[0, σ2

ωσ−2z

],

so thatπ is approximately distributed asN[π, σ2ω/Tσ

2z ]. Hence, for large enoughT :

V[π−1

] ' [ ∂∂π

(1π

)]2V [π] ' T−1σ2

ωσ−2z π−4,

so that, ifπσz/σω is small, V[π−1] may be large, and soV[β] may be very large.However,V[π−1] decreases withT and in large samples,V[β] does not depend uponσ2

ω.

3.3 Monte Carlo experimentation

Design a Monte Carlo study to investigate estimation and inference aboutβ in themodel:

yt = βxt + ε1,t (3.24)

xt = αxt−1 + ε2,t

where (ε1,t

ε2,t

)∼ IN2

[(00

),

(σ11 00 σ22

)].

Would different results ensue ifxt were generated as in (3.24) but held fixed acrossreplications?

3.3.1 Properties of estimators and tests

We consider designing a Monte Carlo study to analyze the finite-sample properties ofOLS and of a test statistic for the hypothesisH0: β = 0. The simplicity of the econo-metric problem is to allow us to focus on the Monte Carlo aspects. Because we wishan efficient Monte Carlo estimator, we propose to use a control variate (CV). To derivethat CV notice that theOLS estimator is:

β = β +

(T∑

t=1

x2t

)−1 T∑t=1

xtε1,t,

and that becausext is stationary and ergodic:

T−1T∑

t=1

x2t

AS→ E[x2

t

]=

σ22

1 − α2.

In addition,ε1,t is IID, so thatε1,t is also stationary and ergodic. Hence,xtε1,tis stationary and ergodic, and so:

T−1T∑

t=1

xtε1,tAS→ E [xtε1,t] = 0.

Thus,βAS→ β. To derive its asymptotic distribution notice that the Mann–Wald theorem

implies that:

T− 12

T∑t=1

xtε1,tD→ N

[0, σ11E

[x2

t

]]= N

[0,σ11σ22

1 − α2

],

so that by Slutsky’s and Cram´er’s theorems:

√T(β − β

)D→ N

[0,σ11(1 − α2)

σ22

]. (3.25)

These asymptotic results suggest choosing the following CV:

β∗ = β +1 − α2

σ22T−1

T∑t=1

xtε1,t.

β∗ andβ are asymptotically equivalent because:

√T(β − β∗

)=

(T−1T∑

t=1

x2t

)−1

− 1 − α2

σ22

T− 12

T∑t=1

xtε1,tAS→ 0.

The mean and variance of the CV are known exactly:

E[β∗]

= β +1 − α2

σ22T−1

T∑t=1

E [xtε1,t] = β,

and:

V[β∗]

= E[β∗ − β

]2=

(1 − α2

σ22

)2

T−2E

[T∑

t=1

xtε1,t

]2

=(

1 − α2

σ22

)2

T−2

E

[T∑

t=1

x2t ε

21,t

]+

T∑t6=s

E [xtε1,txsε1,s]

=

(1 − α2

σ22

)2 T∑t=1

T−2E[x2

t ε21,t

]=

σ11(1 − α2)Tσ22

.

We also wish to design a Monte Carlo for testing the hypothesisH0: β = 0, using aWald test. The statistic associated with this test is derived by noticing that from (3.25):

σ22

(1 − α2)σ11T β2 D→ χ2(1),

underH0, providing the Wald statistic:

ξw = σ−111 β

2T∑

t=1

x2t

D→ χ2(1),

whereσ22, σ11 andα are theOLS estimates of their parameters, and hence are consist-ent for them here. We are interested in analyzing the finite-sample size and power ofthis test. The size of a test can be estimated by the proportion of times the null is rejec-ted in those experiments in whichβ = 0, and its power is estimated by the proportionof times the null hypothesis is rejected in those experiments generated underβ 6= 0,where the critical value is based on the first set to ensure the nominal and actual sizescoincide.

To avoid considering only experiments for which asymptotic power is nearly zeroor unity, let us derive the asymptotic distribution of the statistic under the alternativeso that we can control for power. Thus, consider the local alternativeH1: β = δ/

√T ,

whereδ is fixed. Then: √T β

D→ N

[δ,

(1 − α2)σ11

σ22

],

underH1 and so:

ξw = σ−111 β

2T∑

t=1

x2t

D→ χ2(1, ψ2

),

(which is a squaredt-ratio) whereψ2 = σ22δ2/σ11(1 − α2) is the non-centrality para-

meter. To find the critical values against which values of this statistic must be compared,we approximate the non-centralχ2 by a centralχ2 and write the asymptotic power func-tion as:

pa = P[χ2(1, ψ2

) ≥ c] ≈ P

[hχ2(m) ≥ c

]= P

[χ2(m) ≥ c/h

],

whereh = (1 + 2ψ2)/(1 + ψ2) andm = (1 + ψ2)/h (see ch. 13).The bias ofβ is the same whatever value is given toβ. However, the non-

centrality parameter changes withβ so that the variables in the Monte Carlo designareθ = (β, α, σ11, σ22,T ). To choose values for these variables, notice that the covari-ance matrix of the process is:

V

[(yt

xt

)]=

β2σ22

1 − α2+ σ11

βσ22

1 − α2

βσ22

1 − α2

σ22

1 − α2

,

determining three equations (two variances and one covariance) in four parameters, soone of them can be set to a single value and yet be able to generate data from any processof this kind by varying the remaining parameters. Hence, let us chooseσ22 = 1 withoutloss of generality. We wish to consider stationary and nearly non-stationary processesso that we chooseα = (0.3, 0.5, 0.8); to examine the effect of increasing sample sizes,we chooseT ∈ [40, 100]; and to investigate how easy it is to reject the null for smalland large values ofβ, we chooseβ = (0.0, 0.3, 0.6, 1.0). Having chosen values forT , β, α andσ22, the remaining variableσ11 determines the size of the non-centrality

parameter, and the signal to noise ratio. So, to control for asymptotic power,σ11 mustbe selected accordingly. To illustrate, consider a single experiment defined byT = 40β = 0.3, α = 0.3 andσ11 such thatm is an integer. Hence:

ψ2 =Tβ2

σ11(1 − α2)=

36091σ11

,

and choosingm = 4:

4 =(1 + ψ2)2

1 + 2ψ2=

(91σ11 + 360)2

91σ11(91σ11 + 720),

yieldsσ11 = 0.612 as its positive root. So,ψ2 = 6.464 andh = (1 + ψ2)/m = 1.866.For a test of size0.05, the critical value corresponding to aχ2(1) is 3.84 and so theapproximated asymptotic power is:

pa ≈ P

[χ2(4) ≥ 3.84

1.866

]= P

[χ2(4) ≥ 2.06

]= 0.7256,

showing that the test rejects the false null72.56% of the time.The signal-to-noise ratio for this experiment equals0.16 < 1 (so the situation is

unfavourable for detecting bias and test power), but even soPcNaiveyields the MonteCarlo results forM = 5000 replications shown in table 3.1.

Table 3.1 Analysis of ‘t’ statistics around zero usingOLS

mean standard deviationH0 rejected % rejection2.560 1.116 3421 68.42

The Wald test rejects68.4% of the time, which is the Monte Carlo estimate of thefinite-sample power as the Monte Carlo estimate of the finite-sample size is close to thenominal size of5% (see table 3.2).

Table 3.2 Analysis of ‘t’ statistics around the true parameter usingOLS

mean standard deviationH0 rejected % rejection−0.009 1.029 251 5.02

3.3.2 Regressors fixed in repeated samples

The analysis above used asymptotic distributions as the basis of the Monte Carlo, sincethere was little point in conducting a simulation study when the exact distribution was

already known. For stochastic regressors that are strongly exogenous as in (3.24) (seech.5), we have:

E[β]

= β + E

( T∑t=1

x2t

)−1 T∑t=1

xtε1,t

= β + Ex

( T∑t=1

x2t

)−1 T∑t=1

xtε1,t

| X

= β + Ex [0] = β. (3.26)

Thus,OLS is unbiased at all sample sizes. The corresponding analysis applied toE[(β−β)2] leads to:

V[β]

= σ11E

( T∑t=1

x2t

)−1 , (3.27)

which differs fromσ11(E[∑T

t=1 x2t ])−1 but is approximately equal to it when the vari-

ance of∑T

t=1 x2t is small.

If xt is kept fixed in repeated samples then:

β | x ∼ N

β, σ11

(T∑

t=1

x2t

)−1 ,

so, the values ofβi, i = 1, . . . ,M , equalβ on average. Also, underH0:

β | x ∼ N

0, σ11

(T∑

t=1

x2t

)−1 ,

and hence, the ‘t’ -ratio is distributed as a Student-t, so the size of this test is exact.Under the alternative, thet-ratio is distributed as a non-central Student-t, and so we canderive its power exactly. In effect, this completes the analysis of the problem, as thereis no need for Monte Carlo, except to check the derivations. However, for fixed andstochasticxs, different results may ensue when the model is mis-specified (see, e.g.,Hendry, Neale and Ericsson, 1991).

3.4 Finite-sample and asymptotic behaviour ofOLS

Consider the conditional stationary process:

yt | Zt,Yt−1 ∼ N[β′xt, σ

2]

for t = 1, . . . , T, (3.28)

whereβ is 2 × 1, x′t = (zt : yt−1), E[xtx′

t] = Σxx ∀t andΣxx is positive definite.

(1) Derive the limiting distribution of theOLS estimatorβ of β.(2) Obtain a control variateβ

∗for β such thatE[β

∗] = β and V[β

∗] =

σ2T−1Σ−1xx ∀T . Prove thatβ

∗has the same limiting distribution asβ.

(3) Explain the role ofβ∗

in Monte Carlo studies of the first two moments ofβ,briefly describing how to design a study to test the hypothesisH0: E[β] = β.

(Oxford M.Phil., 1983)

3.4.1 Limiting distribution

Because we are conditioning on the complete history, thenεt = yt − E [yt|Zt,Yt−1]is a martingale difference sequence, so that,E[εt] = E[εtεt−τ ] = 0, namely, zero ex-pectation and non-autocorrelation (see§2.5). This is sufficient to apply central limittheorems in itself, though normality allows passage from uncorrelatedness to independ-ence anyway. Let:

yt = β′xt + εt,

so that:

εt | Zt,Yt−1 ∼ IN[0, σ2]. (3.29)

Consider theOLS estimator ofβ:

β = β +

(T∑

t=1

xtx′t

)−1 T∑t=1

xtεt,

whereX = (z,y1). To find its asymptotic distribution, sinceεt is serially uncorrel-ated and stationary by assumption, it is stationary and ergodic. Also,xt is stationary,so thatxtεt is stationary, withE[xtεt] = 0 by construction. Hence, we can showthatxtεt is ergodic by establishing that it is serially uncorrelated. This involves theexpectation of a four-way product, and under joint normality (zero excess kurtosis),E[xtx′

t−τ εtεt−τ

]is equal to:

E[xtx′

t−τ

]E [εtεt−τ ] + E [xtεt]E

[x′

t−τ εt−τ

]+ E [xtεt−τ ] E

[εtx′

t−τ

]= 0, (3.30)

as every pair has one zero term. Thus, by the ergodic theorem and (3.29):

T−1T∑

t=1

xtεtAS→ E [xtεt] = 0,

and:

T−1T∑

t=1

xtx′t

AS→ E [xtx′t] = Σxx,

and so, by Slutsky’s theorem:

βAS→ β + Σ−1

xx · 0 = β.

Also, becausext is ergodic,εt is IID andE[xtεt] = 0, then the Mann–Wald the-orem implies that:

T− 12

T∑t=1

xtεtD→ N2

[0, σ2E [xtx′

t]]

= N2

[0, σ2Σxx

],

and by Slutsky’s and Cram´er’s theorems:

√T(β − β

)D→ N2

[0, σ2Σ−1

xx

].

3.4.2 Control variate

The obvious choice of a control variate is:

β∗

= β + T−1Σ−1xx X′ε,

because its mean and variance are:

E[β∗]

= β + T−1Σ−1xx E

[T∑

t=1

xtεt

]= β,

and:

V[β∗]

= E

[(β∗ − β

)(β∗ − β

)′]= T−2Σ−1

xx E

[(T∑

t=1

xtεt

)(T∑

t=1

x′tεt

)]Σ−1

xx

= T−2Σ−1xx

T∑t,s=1

E [xtεtx′sεs]Σ

−1xx

= T−2Σ−1xx

T∑t=1

E[xtx′

tε2t

]Σ−1

xx

= T−2Σ−1xx

T∑t=1

E[xtx′

tE[ε2t | xt

]]Σ−1

xx

= σ2T−2Σ−1xx

T∑t=1

E [xtx′t]Σ

−1xx

= σ2T−1Σ−1xx .

Finally, the asymptotic distribution of the CV is found to be the same as that ofOLS

by noticing that the CV differs from theOLS estimator by terms ofop (1). To see that,

subtractβ∗

from β to give:

√T(β − β

∗)=

(T−1T∑

t=1

xtx′t

)−1

− Σ−1xx

(T− 12

T∑t=1

xtεt

)AS→ 0.

3.4.3 Monte Carlo estimation of moments

The control variateβ∗

allows us to construct more efficient Monte Carlo estimates ofthe bias and variance ofβ in finite samples. This can be done by defining a pooledMonte Carlo estimator as a function of the naive Monte Carlo estimator and the CV.

The pooled estimator of the bias is defined as:

β∗∗ − β =

(β − β

)− (β∗ − E[β∗])

,

where:

β =1M

M∑i=1

βi and β∗

=1M

M∑i=1

β∗i ,

are the naive and CV estimators of the expectation ofβ, respectively, designed so thatE[β

∗∗] = E[β]. The pooled estimator of the variance of thejthparameter inβ is defined

as:

V[βj

]∗∗=

1M − 1

M∑i=1

[(βi,j − βj

)2

−(β∗i,j − β

∗j

)2]

+ V[β∗j

],

which is an unbiased estimator ofV[βj ].To design a Monte Carlo study, we need to specify a process for generating data on

zt. To simplify the analysis, consider:

zt = αzt−1 + vt,

with |α| < 1, andE[εtvs] = 0 for all t, s. Hence, the DGP we choose is:

yt = β1zt + β2yt−1 + εt

zt = αzt−1 + vt,

with: (εtvt

)∼ IN2

[(00

),

(σ2

ε 00 σ2

v

)]. (3.31)

The design variables in this Monte Carlo study areθ = (β1, β2, α, σ2ε , σ

2v , T )′, to which

we must assign sets of values to generate data onz andy. Each particular value ofθ

defines an experiment in the Monte Carlo study. We start by generatingM sets ofTindependent random numbers on(εt, vt) according to the specified bivariate normal dis-tribution (3.31). From each set of those numbers, and a particular value ofβ1, β2 andα,we compute a set ofT observations on(yt, zt). For each of thoseM sets or replications,the pooled estimator of the bias and variance as defined above are computed.

Contrary to the situation in§3.2.2, the statistics we wish to examine are functions ofβ1 andβ2 through pastys. However, it is found that two of the design variables can beset to a single value without loss of generality. To see that, we first derive the variancematrix of (yt : zt):

E[y2

t

]= β2

1E[z2t ] + β2

2E[y2

t

]+ σ2

ε + 2β1β2E[yt−1zt],

but:E[yt−1zt] = αE[ytzt],

and:E[ytzt] = β1E[z2

t ] + β2E[yt−1zt] = β1E[z2t ] + αβ2E[ytzt]

so that:

E[ytzt] =β1E[z2

t ]1 − αβ2

,

and so:

E[yt−1zt] =αβ1E[z2

t ]1 − αβ2

.

Hence:

E[y2

t

]=

11 − β2

2

((1 + αβ2)β2

1σ2v

(1 − αβ2)(1 − α2)+ σ2

ε

),

and so the variance matrix is:

V

[(yt

zt

)]=

(E[y2

t

]E[ytzt]

E[ytzt] E[z2t ]

)

=

1

1 − β22

((1 + αβ2)β2

1σ2v

(1 − αβ2)(1 − α2)+ σ2

ε

)β1σ

2v

(1 − αβ2)(1 − α2)β1σ

2v

(1 − αβ2)(1 − α2)σ2

v

1 − α2

.

This matrix involves five parameters to be obtained from three equations (the two vari-ances and the covariance), so we must be able to fix two of them to a single value andyet be able to generate data from any process of this kind by just varying the remainingthree design variables, with the only restriction that the variance matrix is positive def-inite. Hence, let us setσ2

ε = σ2v = 1 without loss of generality, and consider analyzing

the effects of: small and large short-run impacts, so chooseβ1 = (0.3, 0.6, 1); ap-proaching non-stationarity, so chooseα = (0.3, 0.6, 0.8) andβ2 = (0.2, 0.5, 0.7); andan increasing sample size on the bias and estimated standard errors (ESE), so choose

T ∈ [40, 100]. The combinations of all these values lead to27 experiments for eachsample size. For each experiment, we obtain a pooled estimator of the bias ofβ andof its ESE, providing a set of ‘data’ for each of those variables. Because the bias andstandard error are functions of the parameters of the DGP, it is sensible to formulate re-sponse surfaces to help in analyzing these Monte Carlo results to reduce the specificityof the study.

The results are also useful for testing the hypothesisE[β] = β. First, notice that:

β − β ∼ D[E[β − β],V[β]

],

exactly, so that what we wish to test is whether the mean ofβ − β is zero. We canconstruct a statistic based on the mean, since the Monte Carlo provides observations on(β − β)i (e.g., the naive estimator of the bias):

1M

M∑i=1

(βi − β

),

and then find its distribution. All we can do here is to look at its asymptotic distribution.Because:

T12

(β − β

)D→

T→∞N2[0, σ2

εΣ−1xx ],

then:

T12

(βi − β

)D→

T→∞N2[0, σ2

ε Σ−1xx ] i = 1, . . . ,M,

and because those observations are mutually independent:

1M

M∑i=1

T12

(βi − β

)= T

12(β − β

) D→T→∞

N2

[0,M−1σ2

ε Σ−1xx

],

so that:T

12√

Mσ2ε

Σ12xx

(β − β

) D→T→∞

N2 [0, I2] ,

whereσ2ε andΣxx can be replaced byσ2

ε andΣxx, the consistent estimates of theirpopulation values. This provides the test-statistic:

T

Mσ2ε

(β − β

)′Σxx

(β − β

) D→T→∞

χ2(2),

whereΣxx andσ2ε are the mean values of the estimates in the experiment. The hypo-

thesis is rejected for large values of this statistic relative to a critical value from aχ2(2)distribution.

3.5 Structural breaks

Consider conducting a Monte Carlo experiment to investigate the effects of structuralbreaks on regression estimators in the process:

yt = β′zt + δt + εt for t = 1, . . . , T, (3.32)

whereεt ∼ IN[0, σ2

ε

], andzt is fixed in repeated samples. The variableδt = 0 till a

break pointS, 1 < S < T , andδt = 1 fromS onwards.

(1) Describe how to formulate an experiment which highlights the relative import-ance of such factors as the effective size of the breakδ, the timeS of the break,the goodness of fit of (3.32), and the sample sizeT .

(2) Discuss any one method of improving the efficiency of your experiment overnaive Monte Carlo estimates.

(3) Explain how to examine the properties of a test which is claimed to be able todetect structural breaks.

(Oxford M.Phil., 1989)

3.5.1 Effects of structural breaks

The analysis of forecast failure in Clements and Hendry (1998) suggests that determin-istic shifts are the primary culprit, so the issue discussed here is of practical relevance,albeit in a simplified context. Let us first look at the effect of the break on the propertiesof β andσ2

ε . To do so, we analyze the properties of these estimators for the periods be-fore and after the break, keeping in mind thatZ is to be kept fixed in repeated samples,and so can be regarded as non-stochastic. TheOLS estimator ofβ is:

β = (Z′Z)−1 Z′y,

so forT ≤ S − 1:β = (Z′Z)−1 Z′ (Zβ + ε) ,

and henceE[β] = β. ForT ≥ S, denoting byd the set of values onδt:

β = (Z′Z)−1 Z′ (Zβ + d + ε) ,

so that:E[β]

= β + (Z′Z)−1 Z′d.

Hence, β is biased forβ when the sample includes observations after the break.However, its variance does not change:

V[β]

= E

[(β − β − (Z′Z)−1 Z′d

)(β − β − (Z′Z)−1 Z′d

)′]= E

[(Z′Z)−1 Z′εε′Z (Z′Z)−1

]= σ2

ε (Z′Z)−1.

Let us now look atσ2ε = ε′ε/(T − k), wherek is the number of variables inzt. For

T ≥ S:

ε = y − Zβ

= Zβ + d + ε − Zβ

= −Z(β − β

)+ d + ε

= Mz (d + ε) ,

whereMz = IT − Z (Z′Z)−1 Z′. So:

ε′ε = ε′Mzε + 2ε′Mzd + d′Mzd,

and hence:

E[σ2

ε

]=

1T − k

E[ε′ε]

= σ2ε +

1T − k

d′Mzd.

so that the estimated error variance becomes upward biased after the break. Of course,the basic set-up here is sufficiently simple that a detailed analysis is possible. Morecomplicated cases, involving dynamic models or simultaneous systems would necessit-ate Monte Carlo, but the present example will provide a useful framework.

To conduct a Monte Carlo study to analyze the effect of the effective size of thebreak, the timeS of the break, the goodness of fit, and the sample size, on the propertiesof theOLS estimator ofβ andσ2

ε , we will considerz′t = (1 : xt) and assume thatxt isgenerated by:

xt = λxt−1 + vt,

with |λ| < 1 andvt ∼ IN[0, σ2

v

]. Hence, we use the DGP:

yt = β0 + β1xt + γδt + εt

xt = λxt−1 + vt,

with the (scale independent) break indicator:1

δt =

0 t ≤ S − 11 t ≥ S,

and we choose to analyze the simple case in which:(εtvt

)∼ IN2

[(00

),

(σ2

ε 00 σ2

v

)].

The model to be estimated is:

yt = β0 + β1xt + ut,

1Even when the error variance is normalized to unity, as below, the size of the break is vari-able, and the new parameterγ allows for that.

andxt is fixed in repeated samples. For this model, the effect of ignoring the break is:

E[β]

= β + γ (Z′Z)−1 Z′d

=(β0

β1

)+ γ

T∑T

t=1 xt∑Tt=1 xt

T∑t=1

x2t

−1(T − S + 1∑T

s=S xs

).

HenceE[β0] − β0 is given by:

γ

T∑T

t=1 x2t −

(∑Tt=1 xt

)2

((T − S + 1)

T∑t=1

x2t −

T∑t=1

xt

T∑s=S

xs

)

T 2V [xt]

(T − S + 1)

TV [xt] + T−1

(T∑

t=1

xt

)2−

T∑t=1

xt

T∑s=S

xs

=

γ (T − S + 1)T

(1 − x (xT−S − x)

V [xt]

),

where:

V [xt] = T−1T∑

t=1

(xt − x)2 and xT−S =1

T − S + 1

T∑s=S

xs.

Thus, the magnitude of the bias increases the earlier the break occurs, which is unsur-prising since the model has an intercept close toβ0 + γ whenS is small. Similarly:

E[β1

]− β1 =

γ

TV [xt]

(T

T∑s=S

xs − (T − S + 1)x

)=

γ

TV [xt]((T − S + 1)xT−S − (T − S + 1)x)

= γ(T − S + 1)

T

(xT−S − x

V [xt]

)= γ

(T − S + 1) (S − 1)T 2

(xT−S − xS

V [xt]

),

as:

x =(T − S + 1)

TxT−S +

S − 1T

xS ,

so:

xT−S − x =S − 1T

(xT−S − xS) , (3.33)

where:

xS =1

S − 1

S−1∑t=1

xt.

The bias inβ1 is essentially symmetric in the break point, as(T − S + 1) (S − 1)is maximized atS − 1 = T/2. However, providing thext process has remainedstationary, one might anticipatexT−S ' xS , so the bias will be small in any case.

To obtainE[σ2ε ], we use the fact thatMz = M0 − M′

0x (x′M0x)−1 x′M0, whereM0 = IT − ι (ι′ι)−1

ι′, with ι being aT × 1 vector of ones. HenceE[σ2ε ]− σ2

ε equals:

γ2

T − 2d′Mzd

=γ2

T − 2

(d′M0d− d′M0x (x′M0x)−1 x′M0d

)=

γ2 (T − S + 1)T − 2

(S − 1T

− (T − S + 1) (xT−S − x)2

TV [xt]

)

= γ2 (T − S + 1) (S − 1)T (T − 2)

(1 − (T − S + 1)

(S − 1)(xT−S − xS)2

V [xt]

),

using (3.33). Hence, the effect of the break here is relative toσ2ε , andσ2

ε is biased evenif the sub-sample means coincide. Both impacts of the break are symmetric in the breakpoint, and increase with the magnitude of the break.

Because a non-stochastic sequence can be regarded as a sequence of independentbut non-identically distributed random variables, providedxt is uniformly bounded,

thenxAS→ E[xt] = 0 (see White, 1984, p.34). Hence, whenS is independent of the size

of the sample, asT → ∞:

xS = T−1T∑

t=S

xtAS→ 0 and T−1

T∑t=1

x2t

AS→ E[x2t ],

so:E[β0

]AS→ β0 + γ and E

[β1

]AS→ β1,

with:E[σ2

ε

] AS→ σ2ε .

Thus, for fixedS, the effect of the break on the bias ofβ1 andσ2ε wears off asT → ∞.

However, whenS = κT , we have:

E[β0

]AS→ β0 + γ (1 − κ) (3.34)

and:E[σ2

ε

] AS→ σ2ε + γ2κ (1 − κ) (3.35)

so the error variance remains affected.Where a Monte Carlo study is invaluable is for investigating additional issues such

as the residual autocorrelation properties of any model affected by a shift-in-mean

break, where analysis becomes tedious, albeit feasible. If the break is ignored, theresiduals from the regression ofy onZ have the following expression:

εt = yt − z′tβ

= −z′t(β − β) + δtγ + εt

= −z′t(Z′Z)−1Z′dγ + δtγ − z′t(Z

′Z)−1Z′ε + εt

= w′tdγ + w′

tε,

wherew′t = i′ − z′t(Z′Z)−1Z′ wherei is zero except for unity at thetthobservation,

implying the first-order covariance:

C [εt, εt−1] = γ2w′tdd′wt−1 + E [w′

tεε′wt−1]

= γ2w′tdd′wt−1 + σ2

ε z′t(Z

′Z)−1zt−1.

The second term isO(T−1), and can be ignored. The variance is:

V [εt] = γ2w′tdd′wt + E [w′

tεε′wt]

= γ2w′tdd′wt + σ2

ε

(1 + z′t(Z

′Z)−1zt

),

so toO(T−1), the first-order residual autocorrelation is:

r [εt, εt−1] =w′

tdd′wt−1

w′tdd′wt + σ2

ε/γ2,

which is different from zero and increasing inγ/σε.The design variables in this Monte Carlo areθ = (β0, β1, S, λ, γ, σ

2ε , σ

2v , T ).

However, the residual autocorrelations and the biases ofβ0, β1 andσ2ε are independent

of β0 andβ1 (the residual autocorrelation is a function ofγ2/σ2ε ), so thatβ0 andβ1

can be set to a single value without loss of generality. In addition, the process has thefollowing properties:

E [yt] = β0 + β1E [xt] + γδt + E [εt] = β0 + γδt,

with:

V [yt] = E[(yt − β0 − γδt)2

]= E

[(β1xt + εt)2

]= β2

1E[x2

t

]+ σ2

ε

=β2

1σ2v

1 − λ2+ σ2

ε ,

and:

C [yt, xt] = E [(yt − β0 − γδt)xt]

= E [(β1xt + εt)xt]

=β1σ

2v

1 − λ2,

so that:

(yt

xt

)∼ D

( β0 + γδt0

),

β21σ

2v

1 − λ2+ σ2

ε

β1σ2v

1 − λ2

β1σ2v

1 − λ2

σ2v

1 − λ2

.

Hence, the process is a function of six parameters. However, for the purposes of thisMonte Carlo we have already argued thatβ0 andβ1 can be set to a fixed value, sothat we have four equations in four parameters. WhenS = κT , the key variables inthe Monte Carlo areκ, T, γ, λ, σ2

ε andσ2v . We chooseλ = (0.3, 0.5, 0.8) to consider

stationary and nearly non-stationary processes. Also, by selectingκ = (0.3, 0.5, 0.7),we can analyze the effect of the break at different points in time: early, halfway throughand close to the end of the sample.σ2

ε = (0.25, 0.5, 1) andσ2v = (1, 5, 10) allows us

to consider various values of the signal-to-noise ratio. Finally,T ∈ [40, 100] providesrecursive Monte Carlo results to investigate the effect of increasing sample size. Thiswould generate51 experiments for each sample size; takingγ = (0.5, 1, 5) would make153. A small random selection would suffice to highlight the main features, althoughit would ‘confound’ the different effects of the design variables relative to a completefactorial experiment.

We undertake just one of these possible experiments, settingσ2ε = 1, σ2

v = 1,T = 40 andκ = 0.7 with γ = 1, usingM = 1000 replications as a ‘pilot’. Thefollowing biases resulted:

β0 − β0

0.32(0.01)

,β1 − β1

−0.01(0.01)

, andσ2

ε − σ2ε

0.22(0.01)

.

These are precisely determined, and compare toγ (1 − κ) = 0.3 in (3.34), zero, andγ2κ (1 − κ) = 0.21 in (3.35). Thus, that aspect is closely matched by the theory forS = κT . TheDW test rejects21% of the time for a nominal5% level; whenγ = 2,rejection rises to83%, so unmodelled breaks can induce considerable residual autocor-relation. Figure 3.1a–c shows the recursively calculated simulation biases (we discusssegment d below). There is almost no impact onβ1 (the bordering lines areβ1±2MCSE

showing the precision of the Monte Carlo) as against a large shift in bothβ0 andσ2ε after

the break.

3.5.2 Efficient Monte Carlo

Because (whenγ = 1):

β | Z ∼ N[β + (Z′Z)−1 Z′d, σ2

ε (Z′Z)−1],

antithetic variates may be a suitable technique for reducing the variances of the MonteCarlo estimates. Thus, computeβ both for a sequence ofεt and a sequence of the

10 20 30 40

-.04

-.02

0

Bias1 amcse1bmcse1

10 20 30 40

0

.1

.2

.3Bias2 amcse2bmcse2

10 20 30 40

1

1.05

1.1sigma

10 20 30 40

10

15

20

25F_N

Figure 3.1 Monte Carlo recursively-estimated biases and rejection frequencies

same random numbers with opposite sign, namely,−εt, and define a new estimatorbased on their average such that:

E[

12

(βε + β−ε

)]= β + (Z′Z)−1 Z′d.

As V[βε] = V[β−ε], butC[βε, β−ε] = −V[βε]:

V[

12

(βε + β−ε

)]= 1/4

(V[βε

]+ V

[β−ε

]+ 2C

[βε, β−ε

])= 1/4

(2V[βε

]− 2V

[βε

])= 0,

so that the new estimator is unbiased for the bias ofβ, and is dramatically efficient(zero variance). However, as fig.3.1b shows, the precision is already high compared tothe size of the shift under analysis. A control variate could also be constructed withease here, but cannot improve on antithetic variates for the bias, although it may forestimating variances.

3.5.3 Tests to detect structural breaks

We will consider the Chow (1960) test when the break is at pointS in time, and look atits properties. The statistic associated with this test is defined fork regressors as:

FChow =(S − 1 − k

T − S + 1

)RSST −RSSS−1

RSSS−1,

whereRSSS−1 andRSST are the residual sum squares ofy on the first(S − 1) andon allT observations, respectively. However,RSSS−1 is also the residual sum squaresin the following regression:

y = Zβ + Dγ + ε, (3.36)

whereD is a matrix ofT − S + 1 impulse dummies defined by,

D =(

0IT−S+1

).

To see that, write: (β

γ

)= (W′W)−1 W′y,

whereW = (Z : D), so that:

W′W =(

Z′Z Z′DD′Z D′D

)=(

Z′Z Z′T−S+1

ZT−S+1 IT−S+1

),

where,ZT−S+1 is the matrix of the lastT − S + 1 rows ofZ. Hence, denoting by:

A =(Z′Z − Z′

T−S+1ZT−S+1

)−1 =(Z′

S−1ZS−1

)−1

then:

(W′W)−1 =(

A −AZ′T−S+1

−ZT−S+1A IT−S+1 + ZT−S+1AZ′T−S+1

),

and so:(β

γ

)=

(AZ′ − AZ′

T−S+1D′

−ZT−S+1AZ′ +[IT−S+1 + ZT−S+1AZ′

T−S+1

]D′

)y (3.37)

=(

AZ′y − AZ′T−S+1yT−S+1

−ZT−S+1AZ′y +[IT−S+1 + ZT−S+1AZ′

T−S+1

]yT−S+1

),

whereyT−S+1 is the vector of the lastT − S + 1 elements ofy. Partition the vec-tor of residuals into those associated with the first(S − 1) and the last(T − S + 1)observations, respectively, i.e.,ε =

(ε′S−1 : ε′T−S+1

)′. Hence, replacingβ andγ in:

εT−S+1 = yT−S+1 − ZT−S+1β − γIT−S+1 = 0,

so that:

ε =(

εS−1

0

),

and hence:ε′ε = ε′S−1εS−1 = RSSS−1. (3.38)

proving our earlier claim thatRSSS−1 is also the residual sum squares in the regression(3.36) using allT observations (see Salkever, 1976). In addition,RSST is the regres-sion residual sum squares in the regressiony = Zβ+ε using all observations, which isregression (3.36) withγ restricted to be zero. Hence, the Chow statistic can be writtenas:

FChow =(S − 1 − k

T − S + 1

)ε′ε − ε′ε

ε′ε,

whereε = y−Zβ = Mzε. The Chow statistic is then theF statistic for testingγ = 0,because (3.36) can also be written as:

y = Zβ + dγ + ε,

so that:

ε′ε − ε′ε = ε′Mzε − ε′Mwε

= ε′Mzε − ε′[Mz − Mzd (d′Mzd)−1 d′Mz

= ε′Mzd (d′Mzd)−1 d′Mzε

= (γ − γ)′ (d′Mzd) (γ − γ) ,

asγ = γ + (d′Mzd)−1 d′Mzε, and so:

FChow =(S − 1 − k

T − S + 1

)(γ − γ)′ (d′Mzd) (γ − γ)

ε′ε.

To examine the properties of this test, we derive its distribution under the null andalso under the alternative of a single break at timeS. Conditional uponz:

γ − γ ∼ N[0, σ2

ε (d′Mzd)−1],

and so under the null of no break,H0: γ = 0:

γ′ (d′Mzd) γσ2

ε

=ε′Mzd (d′Mzd)−1 d′Mzε

σ2ε

is a quadratic form inIN[0, 1] variates, where the matrix in the form is idempotent witha traceof T − S + 1. Thus:

γ′ (d′Mzd) γσ2

εfH0χ2(T − S + 1).

Next, because of (3.38),ε′ε can be written asε′S−1εS−1, and from (3.37):

εS−1 = yS−1 − ZS−1β

= yS−1 − ZS−1

(Z′

S−1ZS−1

)−1 [Z′y − Z′

T−S+1yT−S+1

]= yS−1 − ZS−1

(Z′

S−1ZS−1

)−1[(

Z′S−1 : Z′

T−S+1

)( yS−1

yT−S+1

)− Z′

T−S+1yT−S+1

]= yS−1 − ZS−1

(Z′

S−1ZS−1

)−1Z′

S−1yS−1

=[IS−1 − ZS−1

(Z′

S−1ZS−1

)−1Z′

S−1

]εS−1,

so thatε′ε/σ2ε is a quadratic form inIN[0, 1] variables, where the matrix of the form is

idempotent with a trace ofS − 1 − k, and so the denominator of the Chow statisticis ε′ε/σ2

ε ∼ χ2(S − 1 − k). Finally, the twoχ2s are independent because,ε′ε =ε′Mwε = ε′S−1εs−1, so that the matrix in the quadratic form in the denominator isMw, which annihilates the matrix of the quadratic form in the numerator. Hence:

FChow =(S − 1 − k

T − S + 1

)γ′ (d′Mzd) γ

ε′ε fH0F(T − S + 1, S − 1 − k).

Consider now its distribution under the alternativeH1: γ = γa 6= 0, then:

γ − γa ∼ N[0, σ2

ε (d′Mzd)−1]

and so:γ′ (d′Mzd) γ

σ2ε

fH1χ2(T − S + 1, ψ2

),

whereψ2 = γ2a (d′Mzd) /σ2

ε . The denominator is the same as for the null, and so:

FChow =(S − 1 − k

T − S + 1

)γ′ (d′Mzd) γ

ε′ε fH1F(T − S + 1, S − 1 − k;ψ2

),

whereF(T − S + 1, S − 1 − k;ψ2

)is a non-centralF with T − S + 1 andS − 1 − k

and degrees of freedom and non-centrality parameterψ2.Thus, we can compute the power function of this test exactly, and obtain the exact

probability of rejecting the null when it is false.

3.6 Multivariate recursive estimation

Consider a linear system ofk > 1 endogenous andn > k conditioning, jointly-stationary variablesyt, zt whereΠ is k × n:

yt = Πzt + εt where εt ∼ INk [0,Ω] for t = 1, . . . , T. (3.39)

(1) Derive the (multivariate) least-squares estimators(Π, Ω) of (Π,Ω) for T obser-vations and obtain the limiting distribution ofΠ asT → ∞, carefully stating anynecessary assumptions and any theorems used.

(2) Explain how to computeΠ recursively (without repeated inversion) overt =M, . . . , T whereM > k + n.

(3) Explain how to recursively update the estimate of theith diagonal elementωii

in Ω and hence how to compute a sequence ofF-tests for the constancy of thecoefficients in theith equation.

(Oxford M.Phil., 1988)

3.6.1 Multivariate least-squares

The system in (3.39) can be written as:

Y′ = ΠZ′ + ε′,

whereY andZ are the matrices ofT observations on thek variables inyt and on thenvariables inzt, respectively; andε is the correspondingT × k matrix of disturbances.Hence, theithequation in this system is:

yi = Zπ′i + εi,

(i = 1, . . . , k) so that we can also write the system as:

y∗ = (Ik ⊗ Z)β + ε∗,

whereβ = Πv and(·)v denotes vectorizing the argument by rows.Let us now look at the covariance matrix ofε∗:

E [ε∗ε∗′] =

E [ε1ε′1] · · · E [ε1ε

′k]

.... . .

...E [εkε′1] · · · E [εkε′k]

=

ω11IT · · · ω1kIT

.... . .

...ωk1IT · · · ωkkIT

= Ω ⊗ IT ,

so that the disturbances have a non-diagonal covariance matrix. Before deriving theestimator, we will transform the system to have disturbances with a diagonal variancematrix. To do so, sinceΩ−1 is positive definite, it can be written asΩ−1 = H′H,whereH is a non-singular matrix. Premultiply the system by(H⊗ IT ):

(H⊗ IT )y∗ = (H ⊗ Z)β + (H ⊗ IT ) ε∗,

or:y+ = Z+β + ε+,

in the obvious notation. ThatV[ε+] is diagonal can be shown by writing:

E[ε+ε+′] = (H⊗ IT )E [ε∗ε∗′] (H′⊗IT )

= (H⊗ IT ) (Ω ⊗ IT ) (H′⊗IT )

= IkT .

The multivariate least-squares (MLS) estimator ofβ is obtained by minimizingε+′ε+ with respect toβ which is found to have the following expression:

β =(Z+′Z+

)−1Z+′y+

= β +[(

H′⊗Z′) (H⊗ Z)]−1 (

H′⊗Z′) ε+

= β +(Ω⊗ (Z′Z)−1

) (H′⊗Z′) ε+.

To find the asymptotic distribution ofβ, we will make the following assumptions:(i) denoting byXt−1 = (Zt : Yt−1) it is assumed thatεt|Xt−1 ∼IN[0,Ω]; (ii) zt hasmoments up to the fourth order which are bounded asT ⇒ ∞; and (iii) zt is ergodic.Hence, write:

(H′ ⊗ Z′) ε+ =

h11Z′ · · · hk1Z′...

. . ....

h1kZ′ · · · hkkZ′

ε+

1...

ε+k

=

∑k

i=1 hi1Z′ε+i

...∑ki=1 hikZ′ε+

i

=T∑

t=1

∑k

i=1 hi1ztε+i,t

...∑ki=1 hikztε

+i,t

=

T∑t=1

(H′ ⊗ zt) ε+t ,

wherezt = (z1t, . . . , znt)′.

Because of assumption (i),εt is aMDS with respect toXt−1 sinceE[εt|Xt−1] = 0.Also, E[ε+

t ε+′t |Xt−1] = Ik. So lettingE [ztz′t] = Σzz:

V[(H′ ⊗ zt) ε+

t

]= E

[(H′ ⊗ zt)E

[ε+

t ε+′t | Xt−1

](H⊗ z′t)

]= Ω−1 ⊗ Σzz := Σ.

Definewt = λ′Σ− 12 (H′ ⊗ zt) ε+

t , whereλ is akn × 1 vector of constants such thatλ′λ = 1. Thenwt is also aMDS becauseE[wt|Xt−1] = 0. In addition:

V [wt] = λ′Σ− 12 E[(H′ ⊗ zt) ε+

t ε+′t (H ⊗ z′t)

]Σ− 1

2 λ

= λ′Σ− 12 E[(H′ ⊗ zt)E

[ε+

t ε+′t | Xt−1

](H⊗ z′t)

]Σ− 1

2 λ

= λ′Σ− 12(Ω−1 ⊗ E [ztz′t]

)Σ− 1

2 λ

= 1.

Define:

S2T =

T∑t=1

V [wt] = T.

Also, denoting byι akn× 1 vector of ones,E[w4

t

]is given by:

E[(λ′Σ− 1

2 (H′⊗zt) ε+t )4]

= E[(λ∗′ε+

t )4]

= E

( kn∑i=1

λ∗i ε+it

)4

= E

kn∑i=1

(λ∗i )4 (ε+it)4

+kn∑i6=j

(λ∗i λ

∗j ε

+itε

+jt

)2=

kn∑i=1

E[(λ∗i )

4E[(ε+it)4 |Xt−1

]]+ 2

kn∑i6=j

E[(λ∗i λ

∗j

)2E[(ε+itε

+jt

)2 |Xt−1

]].

To derive these expectations, notice thatε+ is a linear combination ofε∗ so that:

ε+ | Xt−1 ∼ NkT [0, IkT ] ,

which implies thatE[(ε+it)4 |Xt−1] = 3 andE[

(ε+itε

+jt

)2 |Xt−1] = 1 due to independ-ence of theε+it. Hence:

E[w4

t

]= 3

kn∑i=1

E[(λ∗i )

4]

+ 2kn∑i6=j

E[(λ∗i λ

∗j

)2]

=kn∑i=1

E[(λ∗i )

4]

+ 2

(kn∑i=1

E[(λ∗i λ

∗j

)2])2

<∞,

asT ⇒ ∞ because:λ∗′ = λ′Σ− 1

2 (H′⊗zt) ,

and assumption (ii). Hence:

limT→∞

S−4T

T∑t=1

E[w4

t

]= 0,

implying that the Lindeberg condition is satisfied. Also,wt is ergodic because it isstationary and serially uncorrelated, since:

E [wtwt−τ ] = λ′Σ− 12 E[(H′ ⊗ zt) ε+

t ε+′t−τ (H⊗ z′t)

]Σ− 1

2 λ

= λ′Σ− 12 E[(H′ ⊗ zt)E

[ε+

t ε+′t−τ | Xt−1

](H⊗ z′t)

]Σ− 1

2 λ

= 0.

Hence:

T−1T∑

t=1

w2t

AS→ E[w2

t

]= 1,

and so, lettingwT = T−1∑T

t=1 wt all conditions for:

1√TwT

D→ N [0, 1] ,

are satisfied. In addition, the Cram´er–Wold device implies that:

T− 12 Σ− 1

2 (H′ ⊗ Z′) ε+ = T− 12 Σ− 1

2

T∑t=1

(H′ ⊗ zt) ε+t

D→ N [0, Ikn] .

Next, by assumption (iii),zt is ergodic so that:

T−1Z′Z = T−1T∑

t=1

ztz′tAS→ Σzz,

and by Slutsky’s theorem:

Ω ⊗ (T−1Z′Z)−1 AS→ Ω⊗ Σ−1

zz .

Finally, by Cramer’s theorem:

√T(β − β

)=

(Ω⊗ (T−1Z′Z

)−1)T− 1

2 (H′ ⊗ Z′) ε+

D→ Nkn

[0,Ω⊗ Σ−1

zz

].

Also Ω = T−1ε′ε, whereε′ = Y′ − ΠZ′, is a consistent estimator ofΩ. To see this,write:

ε′ = Y′ − ΠZ′ = −(Π− Π

)Z′ + ε′,

so that:

ε′ε =(Π− Π

)Z′Z

(Π− Π

)′−(Π − Π

)Z′ε − ε′Z

(Π− Π

)′+ ε′ε.

But,β = Πv, and we have already shown that under appropriate assumptionsβ−β =Op(T−1/2). So,Π− Π = Op(T−1/2). Also:

T−1Z′ε = T−1T∑

t=1

ztε′t,

and becauseεt is IID, then it is stationary and ergodic, and sozi,tεj,t is also sta-tionary and ergodic. Hence:

T−1Z′ε AS→ E[ztε′t] = E [ztE [ε′t | Xt−1]] = 0.

Finally, by Kolmogorov’s SLLN:

T−1ε′ε = T−1T∑

t=1

εtε′t + op(1) AS→ E[εtε

′t] = E [E [εtε

′t | Xt−1]] = Ω.

3.6.2 Recursive computation ofΠ

Becauseβ = Πv and:β =

(Z+′Z+

)−1Z+′y+,

thenβt can be recursively computed from the expression:

βM+1 = βM +µM+1νM+1

1 + z+′M+1µM+1

,

where:µM+1 =

(Z+′

MZ+M

)−1z+

M+1,

and:νM+1 = y+

M+1 − β′Mz+

M+1,

with Z+M , z+

M+1 andy+M+1 denoting the firstM rows ofZ+, the(M + 1)st row ofZ+

and the(M + 1)st element ofy+, respectively (see Hendry, 1995a, pp.78–9).

3.6.3 Recursive update ofΩ

Because:

Ω =1T

ε′ε =1T

T∑t=1

εtε′t,

then:

ωii =1T

T∑t=1

ε2i,t.

From (3.39)εt = yt − Πzt so thatεi,t = yi,t − Π′izt, whereΠ′

i = βi is theith row ofΠ. So, we are back to a single-equation situation in which we can compute the residualsum of squares recursively by:

RRSi,M+1 = RSSi,M +ν2

i,M+1

1 + z+′M+1µM+1

.

This formula allow us to recursively compute the sequence of Chow statistics for theconstancy of the coefficients of theith equation:

FChow =(t+N − k

N

)RSSt+N −RSSN

RSSN.

3.7 Wiener processes

Consider the data-generation process:

yt = yt−1 + εtzt = zt−1 + ωt

where

(εtωt

)∼ IN2

[(00

),

(1 00 1

)]. (3.40)

Let xt = (yt : zt)′ wherex0 = 0. Definingvt = (v1,t : v2,t)′ such thatv1,t ≡εt andv2,t ≡ ωt, write the system asxt = xt−1 + vt with vt ∼ IN2[0, I2]. LetB(r) = (B1(r) : B2(r))′ where theBi(r) are the independent standardized Wienerprocesses on[0, 1] associated with accumulating thevi,t. You may use without proofthe following results forxt where⇒ denotes weak convergence, and[Tr] is the integerpart ofTr:

T− 12

[Tr]∑t=1

vt ⇒(B1(r)B2(r)

), (3.41)

T−2T∑

t=1

xtx′t ⇒

( ∫ 1

0B2

1(r)dr∫ 1

0B1(r)B2(r)dr∫ 1

0 B1(r)B2(r)dr∫ 1

0 B22(r)dr

)(3.42)

and:

T−1T∑

t=1

xt−1v′t ⇒

( ∫ 1

0B1(r)dB1(r)

∫ 1

0B1 (r) dB2(r)∫ 1

0 B2(r)dB1 (r)∫ 1

0 B2(r)dB2(r)

). (3.43)

Note that: ∫ 1

0

B1(r)dB1(r) = 12

[χ2(1) − 1

],

and: (∫ 1

0

B22(r)dr

)− 12 ∫ 1

0

B1(r)dB2(r) ∼ N [0, 1] .

(1) When the DGP is given by (3.40), show thatut = yt − zt is a random walk.Denote its standardized limiting distribution byU(r) for r ∈ [0, 1], and using thefact thatut = (1 : −1)xt show that:∫ 1

0

U(r)dB1(r) =∫ 1

0

B1(r)dB1(r) −∫ 1

0

B2(r)dB1(r). (3.44)

(2) Derive the limiting distributions ofT−2∑T

t=1 u2t andT−1

∑Tt=1 ut−1εt as func-

tions of the results in (3.41) to (3.43).(3) Consider the model:

∆yt = φ (y − z)t−1 + et. (3.45)

Show that the limiting distribution of theOLS estimatorφ of φ from (3.45) isgiven by:

T(φ− φ

)⇒(∫ 1

0

U2(r)dr)−1 ∫ 1

0

U(r)dB1(r). (3.46)

(4) Derive the limiting distribution of thet-test,tφ=0 = φ/SE(φ) of H0: φ = 0, andrelate it to the Dickey–Fuller and normal distributions.

(Oxford M.Phil., 1993)

3.7.1 Adding random walks

Subtract (3.40b) from (3.40a):

yt − zt = (yt−1 − zt−1) + (εt − ωt) ,

so that:ut = ut−1 + et (3.47)

whereet = εt − ωt is white noise. Hence,ut is a random walk.Next, asx0 = 0:

xt = xt−1 + vt =t∑

s=1

vs =[Tr]∑s=1

vs + vt,

for (t − 1)/T ≤ r < t/T , where[Tr] is the largest integer less than or equal toTr.Thus:

1√T

xt =1√T

[Tr]∑s=1

vs +vt√T

⇒ B(r),

asvt/√T is negligible. Sinceut = (1 : −1)xt:

1√T

(1 : −1)xt ⇒ (1 : −1)(B1(r)B2(r)

)= B1(r) −B2(r) := U(r),

so, (3.44) follows by integrating with respect toB1(r):∫ 1

0

U(r)dB1(r) =∫ 1

0

B1(r)dB1(r) −∫ 1

0

B2(r)dB1(r). (3.48)

That result can also be shown directly as follows:

ut =t∑

i=1

ei =t∑

i=1

εi −t∑

i=1

ωi =[Tr]∑s=1

εs + εt −[Tr]∑s=1

ωs − ωt,

Hence:

1√Tut =

1√T

[Tr]∑t=1

et +et√T

=1√T

[Tr]∑s=1

εs +εt√T

− 1√T

[Tr]∑s=1

ωs − ωt√T,

whereεt/√T andωt/

√T are negligible, so that from (3.41):

1√Tut ⇒ B1(r) −B2(r).

3.7.2 Limiting distributions of sample functions

Let us write:

T−2T∑

t=1

u2t = T−2

T∑t=1

(yt − zt)2

= T−2T∑

t=1

y2t − 2T−2

T∑t=1

ytzt + T−2T∑

t=1

z2t ,

so that by (3.42):

T−2T∑

t=1

u2t ⇒

∫ 1

0

[B1 (r) −B2(r)]2 dr =

∫ 1

0

U2 (r) dr. (3.49)

Next:

T−1T∑

t=1

ut−1εt = T−1T∑

t=1

(yt−1 − zt−1) εt

= T−1T∑

t=1

yt−1εt − T−1T∑

t=1

zt−1εt,

so that by (3.43):

T−1T∑

t=1

ut−1εt ⇒∫ 1

0

[B1 (r) − B2(r)] dB1(r) =∫ 1

0

U (r) dB1(r). (3.50)

3.7.3 Limiting distribution of the OLS estimator

TheOLS estimator ofφ = 0 is:

φ =∑T

t=1 ut−1∆yt∑Tt=1 u

2t−1

= φ+∑T

t=1 ut−1εt∑Tt=1 u

2t−1

,

so that:

T(φ− φ

)=T−1

∑Tt=1 ut−1εt

T−2∑T

t=1 u2t−1

⇒∫ 1

0 U (r) dB1(r)∫ 1

0U(r)2dr

,

from (3.49) and (3.50).

3.7.4 Limiting distribution of the t-test

The estimated variance ofφ is:

V[φ]

= σ2e (u′

1u1)−1 =

e′eT − 1

(T∑

t=1

u2t−1

)−1

,

where:e = ∆y − φu1 = ε − φu1.

Thus, under the null:

e′e =(ε − φu1

)′ (ε − φu1

)= ε′ε − 2φu′

1ε + φ2u′1u1

= ε′ε − 2[(u′

1u1)−1 u′

1ε]u′

1ε +[(u′

1u1)−1 u′

1ε]2

u′1u1

= ε′ε − 2 (u′1u1)

−1 (u′1ε)

2 + (u′1u1)

−1 (u′1ε)

2

= ε′ε − (u′1u1)

−1 (u′1ε)

2,

and becauseεt is IID with E[ε2t ] = 1, from (3.49) and (3.50):

1T

e′e = T−1T∑

t=1

ε2t − T−1

(T−2

T∑t=1

u2t−1

)−1(T−1

T∑t=1

ut−1εt

)2

= T−1T∑

t=1

ε2t +Op

(T−1

)AS→ E[ε2t ]. (3.51)

From (3.49) and (3.51):

T 2V[φ]

=T

T − 1e′eT

(T−2

T∑t=1

u2t−1

)−1

⇒(∫ 1

0

U2 (r) dr

)−1

,

and hence:

tφ=0 =φ

ESE[φ] ⇒

(∫ 1

0

U2 (r) dr

)− 12 ∫ 1

0

U (r) dB1(r).

Using (3.48),tφ=0 can also be written as (see Hendry, 1995a, p.110):

(∫ 1

0

U2 (r) dr

)− 12

[∫ 1

0

B21 (r) dr

] 12∫ 1

0B1 (r) dB1(r)(∫ 1

0 B21 (r) dr

) 12

−[∫ 1

0

B22 (r) dr

] 12∫ 1

0B2 (r) dB1(r)(∫ 1

0 B22 (r) dr

) 12

=

(∫ 1

0

U2 (r) dr

)− 12([∫ 1

0

B21 (r) dr

] 12

DF −[∫ 1

0

B22 (r) dr

] 12

N [0, 1]

),

whereDF denotes a Dickey–Fuller distribution. Hence, thet-ratio converges to a func-tion of a Dickey–Fuller and a normal distribution. In more general cases, the weightsin the combination depend on other parameters in the DGP.