Download - 7. Autocorrelation (Violation of Assumption #B3) · PDF file · 2014-05-15•! consequences of autocorrelation parallel those of hetero-skedasticity 187. Consequences of autocorrelation:

7. Autocorrelation(Violation of Assumption #B3)

Assumption #B3:

• The error term ui is not autocorrelated, i.e.

Cov(ui, uj) = 0

for all i = 1, . . . , N and j = 1, . . . , N where i 6= j

Where do we typically find autocorrelation:

• Time-series data sets(less frequently in cross-sectional data sets)

169

Outlook:

• Strong similarities between heteroskedasticity and autocor-relation with respect to consequences and estimation proce-dures (GLS, FGLS estimators)

Example: [I]

• Estimation of a price-revenue function (monthly data)

• Variables:

yi = monthly revenue quantity (in 1000 pieces)

xi = selling price (in euros)

170

Month Obs. Price Revenue Month Obs. Price Revenue 01:2002 1 24.2 1590 01:2003 13 32.2 1700 02:2002 2 25.5 1630 02:2003 14 32.4 1450 03:2002 3 26.8 1570 03:2003 15 33.2 1450 04:2002 4 26.4 1960 04:2003 16 34.0 1450 05:2002 5 25.2 2150 05:2003 17 33.7 1000 06:2002 6 24.4 2450 06:2003 18 32.8 1080 07:2002 7 26.2 2770 07:2003 19 31.3 1270 08:2002 8 26.1 2400 08:2003 20 30.9 1520 09:2002 9 27.4 2200 09:2003 21 30.0 1820 10:2002 10 28.4 1270 10:2003 22 28.3 1660 11:2002 11 29.8 1250 11:2003 23 27.5 1500 12:2002 12 31.3 1500 12:2003 24 26.8 1410

Dependent Variable: REVENUE Method: Least Squares Date: 11/17/04 Time: 13:50 Sample: 2002:01 2003:12 Included observations: 24

Variable Coefficient Std. Error t-Statistic Prob. C 4262.118 686.0876 6.212207 0.0000

PRICE -89.58094 23.56704 -3.801111 0.0010R-squared 0.396408 Mean dependent var 1668.750 Adjusted R-squared 0.368972 S.D. dependent var 445.9802 S.E. of regression 354.2746 Akaike info criterion 14.65768 Sum squared resid 2761231. Schwarz criterion 14.75585 Log likelihood -173.8921 F-statistic 14.44845 Durbin-Watson stat 0.760994 Prob(F-statistic) 0.000978

Example: [II]

• Linear model with a single regressor:

yi = α + β · xi + ui (i = 1, . . . ,24)

• Price increase of 1 euro decreases the monthly revenue by90000 pieces

172

500

1000

1500

2000

2500

3000

24 26 28 30 32 34 36

P r i c e

Revenu

e

Evidently:

• Line connecting the residuals rarely crosses the regressionline

−→ we frequently find that

positive ui−1-values are followed by positive ui-values

negative ui−1-values are followed by negative ui-values

−→ Cov(ui−1, ui) 6= 0

(violation of Assumption #B3)

Question:

• Impact on estimation and testing procedures

174

7.1 Consequences

Note:

• We assume an explicit pattern of autocorrelation(alternative patterns are not considered here)

Definition 7.1: (AR(1) process)

Let u1, . . . , uN be the error terms of the linear regression model.Furthermore, let ρ ∈ R be a (constant) parameter and let e1, . . . ,eN denote additional error terms that satisfy all B-assumptions(#B1–#B4). If

ui = ρui−1 + ei, (i = 2, . . . , N)

we say that the error term ui follows a first-order autoregressiveprocess (in symbols: ui ∼ AR(1)).

175

Remarks:

• An AR(1) process regresses ui on its predecessor value ui−1plus the new random shock ei

• For ρ = −1 or ρ = 1 we have so-called random walks(important stochastic processes)

• For |ρ| > 1 processes become explosive

−→ in this lecture: −1 < ρ < 1

Now:

• Expected values, (co)variances, correlation coefficients of anAR(1) process

176

Theorem 7.2: (Moments of an AR(1) process)

Let the error term ui (i = 1, . . . , N) follow an AR(1) processaccording to Definition 7.1 where −1 < ρ < 1. Furthermore, letVar(ei) ≡ σ2

e denote the constant variance of all ei. We thenhave for all (admissible) i = 1, . . . , N :

E(ui) = 0,

Var(ui) =σ2

e1− ρ2 ≡ σ2,

Cov(ui, ui−τ) = ρτ σ2e

1− ρ2 = ρτσ2 6= 0,

Corr(ui, ui−τ) = ρτ .

(Proof: class)

177

Obviously:

• If the error term ui follows an AR(1) process with −1 < ρ < 1,then the Assumptions #B1, #B2 are satisfied whereas #B3is violated

Now:

• Autocorrelation in matrix notation(ui’s follow an AR(1) process)

Notation:

u =

u1u2...

uN

, u−1 =

u0u1...

uN−1

, e =

e1e2...

eN

178

Matrix representation: [I]

• Linear regression model

y = Xβ + u

with AR(1) error terms (−1 < ρ < 1)

u = ρu−1 + e

−→ Theorem 7.2 yields Cov(u)

179

Matrix representation: [II]

• Due to σ2 ≡ σ2e /(1− ρ2) we obtain

Cov(u) =

σ2 Cov(u1, u2) · · · Cov(u1, uN)Cov(u2, u1) σ2 · · · Cov(u2, uN)

... ... · · · ...Cov(uN , u1) Cov(uN , u2) · · · σ2

=

σ2 ρσ2 · · · ρN−1σ2

ρσ2 σ2 · · · ρN−2σ2... ... · · · ...

ρN−1σ2 ρN−2σ2 · · · σ2

= σ2Ω

180

Matrix representation: [II]

where

Ω =

1 ρ · · · ρN−1

ρ 1 · · · ρN−2... ... · · · ...

ρN−1 ρN−2 · · · 1

181

Question:

• Is there any transformation of the autocorrelated model sothat

the parameter vector β remains unchanged

autocorrelation vanishes

the transformed model

y∗ = X∗β + u∗

satisfies all #A-, #B-, #C- assumptions?

(cf. Section 6, Slide 111)

182

Hope:

• If yes, then the OLS estimator of the transformed model (theGLS estimator) would be BLUE(cf. Section 6, Slides 123–126)

Result:

• In analogy to the line of argument given on Slides 117–121under heteroskedasticity the following result obtains:

there exists a regular matrix P so that the transformedmodel

Py = PXβ + Pu

satisfies all #A-, #B-, #C-assumptions

183

Form of P in the autocorrelated model: [I]

• P has to satisfy the following equations:

P′P = Ω−1 and PΩP′ = IN(see Slides 117, 120)

• First, the inverse of Ω from Slide 181 is given by

Ω−1 =1

1− ρ2

1 −ρ 0 · · · 0 0−ρ 1 + ρ2 −ρ · · · 0 00 −ρ 1 + ρ2 · · · 0 0... ... ... ... ... ...0 0 0 ... 1 + ρ2 −ρ0 0 0 ... −ρ 1

(check it)

184

Form of P in the autocorrelated model: [II]

• The form of P is given by

P =1

√

1− ρ2

√

1− ρ2 0 0 · · · 0 0−ρ 1 0 · · · 0 00 −ρ 1 · · · 0 0... ... ... ... ... ...0 0 0 · · · 1 00 0 0 · · · −ρ 1

185

Form of P in the autocorrelated model: [III]

−→ transformed model:

y∗ = X∗β + u∗

where

y∗ = Py =

√

1− ρ2y1y2 − ρy1

...yN − ρyN−1

, u∗ = Pu =

√

1− ρ2u1e2...

eN

X∗ = PX =

√

1− ρ2√

1− ρ2x11 · · ·√

1− ρ2xK11− ρ x12 − ρx11 · · · xK2 − ρxK1... ... ... ...1− ρ x1N − ρx1(N−1) · · · xKN − ρxK(N−1)

186

Remarks:

• The transformed model

y∗ = X∗β + u∗

satisfies all #A-, #B-, #C-assumptions

• The parameter vector β remains unchanged

−→ consequences of autocorrelation parallel those of hetero-skedasticity

187

Consequences of autocorrelation: [I]

• The OLS estimator

β =(

X′X)−1

X′y

is still unbiased, but no longer BLUE(cf. Theorem 6.1, Slide 109)

• The covariance matrix of the OLS estimator is given by

Cov(

β)

= σ2(

X′X)−1

X′ΩX(

X′X)−1

• The GLS estimator

βGLS

=[

X∗′X∗]−1

X∗′y∗ =[

X′Ω−1X]−1

X′Ω−1y

is BLUE

188

Consequences of autocorrelation: [II]

• Its covariance matrix is given by

Cov(

βGLS

)

= σ2[

X′Ω−1X]−1

(cf. Theorem 6.3, Slide 123)

• Unbiased estimator of σ2:

σ2 =u∗′u∗

N −K − 1=

(Pu)′PuN −K − 1

189

Impact of neglecting autocorrelation: [I]

• OLS estimator of β

β =(

X′X)−1

X′y

is unbiased, but inefficient

• The estimator σ2(X′X)−1 of the covariance matrix Cov(

β)

is biased

• The estimator

σ2 =u′u

N −K − 1of the error-term variance is biased

190

Impact of neglecting autocorrelation: [II]

−→ test statistics are based on biased estimators

−→ hypothesis tests are likely to be unreliable(t-, F -tests)

191

7.2 Diagnostics

Graphical analysis: [I]

• First, estimation of the model

y = Xβ + u

by OLS, i.e.

β =(

X′X)−1

X′y

−→ calculation of the residuals

u = y−Xβ

192

Graphical analysis: [II]

• Plot of the residuals versus time

slow swings around zero −→ positive autocorrelationfast swings around zero −→ negative autocorrelation

• Scatterplot of ui−1 versus ui

positive slope −→ positive autocorrelationnegative slope −→ negative autocorrelation

Example: [I]

• Price-revenue function on Slides 170–172

193

Dependent Variable: Revenue Method: Least Squares Date: 11/17/04 Time: 13:50 Sample: 2002:01 2003:12 Included observations: 24

Variable Coefficient Std. Error t-Statistic Prob. C 4262.118 686.0876 6.212207 0.0000

PRICE -89.58094 23.56704 -3.801111 0.0010R-squared 0.396408 Mean dependent var 1668.750 Adjusted R-squared 0.368972 S.D. dependent var 445.9802 S.E. of regression 354.2746 Akaike info criterion 14.65768 Sum squared resid 2761231. Schwarz criterion 14.75585 Log likelihood -173.8921 F-statistic 14.44845 Durbin-Watson stat 0.760994 Prob(F-statistic) 0.000978

OBS RESID(-1) RESID 1 -504.2594 2 -504.2594 -347.8042 3 -347.8042 -291.3490 4 -291.3490 62.81861 5 62.81861 145.3215 6 145.3215 373.6567 7 373.6567 854.9024 8 854.9024 475.9443 9 475.9443 392.3995

10 392.3995 -448.0195 11 -448.0195 -342.6062 12 -342.6062 41.76520 13 41.76520 322.3880 14 322.3880 90.30423 15 90.30423 161.9690 16 161.9690 233.6337 17 233.6337 -243.2406 18 -243.2406 -243.8634 19 -243.8634 -188.2348 20 -188.2348 25.93283 21 25.93283 245.3100 22 245.3100 -66.97761 23 -66.97761 -298.6424 24 -298.6424 -451.3490

-600

-400

-200

0

200

400

600

800

1000

02:01 02:04 02:07 02:10 03:01 03:04 03:07 03:10

i = 1, . . ., 24

Res

iduals

1000

-1000

-500

0

500

-1000 -500 0 500 1000

iu∧

1−∧

iu

Obviously:

• Positive dependence between ui−1 and ui

−→ indication of positive autocorrelation

−→ a conceivable specification of an AR(1) error-term processcould be

ui = ρui−1 + ei

with 0 < ρ < 1

196

Now:

• Use the pair of residuals (ui−1, ui) to estimate ρ

• Model

ui = ρui−1 + e∗i (i = 2, . . . , N)

• OLS estimator of ρ:

ρ =

N∑

i=2

(

ui−1 − u) (

ui − u)

N∑

i=2

(

ui−1 − u)2

=

N∑

i=2ui−1ui

N∑

i=2u2

i−1

197

Remarks:

• For the sum of the residuals computed via OLS we alwayshave

N∑

i=1ui = 0 and thus u =

1N

N∑

i=1ui = 0

(cf. Von Auer, 2007, p. 57)

• Since for i = 1 there is no residual ui−1 = u0, we only havei = 2, . . . , N observations

198

OLS estimate of ρ in the price-revenue example:

ρ = 0.579310

Question:

• Is ρ significantly different from zero?

−→ Durbin-Watson test for autocorrelation

199

Dependent Variable: RESID Method: Least Squares Date: 11/20/04 Time: 19:15 Sample(adjusted): 2002:02 2003:12 Included observations: 23 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. RESID(-1) 0.579310 0.171176 3.384285 0.0027

21.92432 336.8232 14.10481

0.339455 Mean dependent var 0.339455 S.D. dependent var 273.7493 Akaike info criterion 1648652. Schwarz criterion 14.15418

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood -161.2053 Durbin-Watson stat 1.574718

Durbin-Watson test: [I]

• Most popular test for autocorrelation(due to Durbin & Watson, 1950, 1951)

• Tests for both, positive (ρ > 0) and negative autocorrelation(ρ < 0)

• Test statistic:

calculate the residuals ui via OLS

test statistic:

DW =

N∑

i=2

(

ui − ui−1)2

/

N∑

i=1u2

i

200

Durbin-Watson test: [II]

• Relation to the OLS estimator ρ from Slide 197:

DW ≈ 2(1− ρ)

• Properties:

Since −1 < ρ < 1 it follows that 0 < DW < 4

no autocorrelation: ρ ≈ 0 ⇒ DW ≈ 2

positive autocorrelation: ρ ≈ 1 ⇒ DW ≈ 0

negative autocorrelation: ρ ≈ −1 ⇒ DW ≈ 4

201

Durbin-Watson test: [III]

• Test for positive autocorrelation:

hypotheses:

H0 : ρ ≤ 0 versus H1 : ρ > 0

distribution of DW under H0 depends on

– sampling size (N)

– number of exogenous regressors (K)

– specific values of the regressors x1i, . . . , xKi

−→ exact calculation by econometric software

202

Durbin-Watson test: [IV]

distribution under H0 of DW has lower and upper bounds

−→ exact critical values at the α-level (dα) have lower andupper bounds (i.e. dL

α ≤ dα ≤ dUα)

(for α = 0.05 see Von Auer, 2007, p. 402)

explicit decision rule:

– reject H0 : ρ ≤ 0 if DW < dLα

– do not reject H0 : ρ ≤ 0 if DW > dUα

– no decision if dLα ≤ DW ≤ dU

α

203

Durbin-Watson test: [V]

• Test for negative autocorrelation:

hypotheses:

H0 : ρ ≥ 0 versus H1 : ρ < 0

explicit decision rule:

– Reject H0 : ρ ≥ 0 if DW > 4− dLα

– Do not reject H0 : ρ ≥ 0 if DW < 4− dUα

– No decision if 4− dUα ≤ DW ≤ 4− dL

α

204

Example:

• Estimation of the price-revenue function (Slides 170, 171)

• Test for positive autocorrelation at the 5% level:

H0 : ρ ≤ 0 versus H1 : ρ > 0

• We have

N = 24, K = 1,DW = 0.760994, dL0.05 = 1.27, dU

0.05 = 1.45

and thus

DW = 0.760994 < 1.27 = dL0.05

−→ reject H0 at the 5%-level

205

Drawbacks of the Durbin-Watson test:

• Frequently there is no decision(e.g. if DW ∈ [dL

α, dUα]) when testing for pos. autocorrelation)

• DW-Test is unreliable if predecessor values like yi−1, yi−2, . . .are used as regressors(so-called lag-models)

• DW-test only tests for AR(1)-autocorrelation

206

7.3 Feasible Estimation Procedures

Now:

• Estimation of the autocorrelated model

y = Xβ + u

with AR(1) error-terms

u = ρu−1 + e (−1 < ρ < 1)

207

Problem:

• From the data set (X,y) we do not have direct knowledgeabout the autocorrelation parameter ρ

−→ FGLS estimation

Two feasible estimation procedures:

• GLS approach (Hildreth & Lu)

• FGLS approach (Cochrane & Orcutt)

208

1. Method by Hildreth & Lu: [I]

• Search algorithm

• Consider the GLS estimator

βGLS

=[

X′Ω−1X]−1

X′Ω−1y

where

Ω−1 =1

1− ρ2

1 −ρ 0 · · · 0 0−ρ 1 + ρ2 −ρ · · · 0 00 −ρ 1 + ρ2 · · · 0 0... ... ... ... ... ...0 0 0 ... 1 + ρ2 −ρ0 0 0 ... −ρ 1

(cf. Slides 184, 188)

209

1. Method by Hildreth & Lu: [II]

• Perform GLS estimation for distinct ρ-values (−1 < ρ < 1)

−→ compute the sum of squared residuals u∗′u∗ for each esti-mation

• Find the ρ-value with the minimal sum of squared residuals

−→ GLS estimator of β associated with this ρ-value is calledHildreth-Lu estimator

210

Example:

• Data on price-revenue function

−→ Hildreth-Lu estimates:

αHL = 3162.48, βHL = −53.51

211

ρ 0.60 0.68 0.69 0.70 0.80

u* 'u*∧ ∧

1756.358 1739.757 1739.748 1740.212 1772.438 ∧α 3420.42 3192.98 3162.48 3131.62 2817.53 ∧β −61.97 −54.50 −53.51 −52.50 −42.40

2. Method by Cochrane & Orcutt: [I]

• Iterative multi-step procedure

Procedure:

1. OLS estimation of the model

y = Xβ + u

2. Save the residuals

u = y−Xβ

212

2. Method by Cochrane & Orcutt: [II]

3. Consider the resgression

u = ρu−1 + e∗

and estimate ρ by the OLS estimator

ρ =

N∑

i=2ui−1ui

N∑

i=2u2

i−1

4. Use ρ to apply the FGLS estimator

βFGLS

=[

X′ Ω−1

X]−1

X′ Ω−1

y

213

2. Method by Cochrane & Orcutt: [III]

5. Improvement due to iteration:

• Compute the ”new” residuals

u(2) = y−XβFGLS

• Re-estimate ρ as in Step #3

• Find the ”new” FGLS estimator βFGLS(2)

• Repeat Step #5 until the FGLS estimator of β does notexhibit any further (substantial) change

214

Example: [I]

• Consider the price-revenue example

−→ Cochrane-Orcutt estimates:

αCO = 3264.88, βCO = −56.86

(possibly further iterations)

215

Estimate Iteration #1 Iteration #2 Iteration #3∧ρ

OLS estimate

0.58 0.64 0.66∧α 4262.12 3473.74 3310.85 3264.88∧β −89.58 −63.73 −58.37 −56.86

Example: [II]

• Contrasting both estimation results:

216

Parameter OLS estimate Hildreth-Lu Cochrane-Orcutt 4262.12262.12 262.12 89.58 53.51 56.86