Pitfalls and Possibilities in Predictive Regression · Nonstandard (LUR) limit theory for AR...

Pitfalls and Possibilities in Predictive Regression

Peter C. B. Phillips

SoFiE Conference June 2013

Peter C. B. Phillips () Prediction SoFiE Conference June 2013 1 / 90

OverviewPitfalls in Predictive Regressions

Bias + Nonstandard Inference

spurious predictabilitydiffi culties with multiple predictors + strong endogeneity

Confidence Intervals can have Zero Coverage Probability

failure of uniformity (in AR and predictive regression)prediction tests reject with probability one under null

Quantile Predictive Regression

crossing problemsinvalidity of QR for nonstationary datanonstandard inference for LUR case

Model Validity under Predictability

Meaningful alternativesEquation balancing in time series characteristics


OverviewPossibilities

IVX regression

use reconstructed regressor as own instrument (Magdalinos-Phillips, 2009)easy to use in empirical work (Pitarakis & Gonzalo, 2012)has broad asymptotic validity —correct CI

Quantile Predictive Regression & IVX versions

resolve quantile crossings & validityprovides predictions at all quantilessuperior for tail event prediction

Nonparametric regression

NP regression addresses balance & validity issuesasymptotics are uniform across SM, LM, nonstationary cases


Prediction ModelPredictive regressions: widely used linear predictive regression

Stock returns vs D/P ratio, Future ER vs forward premium, Consump-tion growth vs lagged Income, etc

Designed in simple form

yt+1 = βxt + u0t+1xt+1 = ρxt + uxt+1

Simple mds innovation structure often used: ut = (u0t , uxt )′

EFt−1ut = 0, EFt−1[utu′t

]=

[σ00 σ0xσx0 σxx

]=: Σ

General WD CHE dependence structure possible/desirable


Pitfalls: (a) BiasFinite Sample Bias with Stationary Regressors

Centered OLS decomposition:(β− β

)=

∑nt=1 xt−1u0t∑nt=1 x

2t−1

=∑nt=1 xt−1u0.xt∑nt=1 x

2t−1

+

(σ0xσxx

)∑nt=1 xt−1uxt∑nt=1 x

2t−1

=∑nt=1 xt−1u0.xt∑nt=1 x

2t−1

+

(σ0xσxx

)(ρ− ρ) ,

where u0.xt = u0t − σ0xσxxuxt .

Under normality, stationarity (|ρ| < 1), intercept —Stambaugh (1999):

E(

β− β)= −

(σ0xσxx

)(1+ 3ρ

n

)+O

(1n2

),


Pitfalls: (b) Nonstandard InferenceInference with Persistent Regressors

Consensus that potential explanatory regressors are highly persistent inthis model

Local to unity specification ρn = 1+ cn has received much attention:

Campbell and Yogo (2006) and Jansson and Moreira (2006)

Nonstandard (LUR) limit theory for AR coeff. - Phillips (1987,1988):

n (ρ− ρ) =1n ∑n

t=1 xt−1u0t1n2 ∑n

t=1 x2t−1

=⇒∫Jxc (r)dB0(r) + λ∫

Jxc (r)2dr

The “finite sample bias” is still present in the limit and not obviouslycorrectable since c is not consistently estimable (but see below)

Not mixed normal and not pivotal



Nonnormality comes from endogeneity & LUR limit theory

If ut ∼ mds (0,Σ)

ζnt =

[1n xt−1u0t1√nuxt

]

∑ EFnt−1ζntζ′nt =

( 1n2 ∑ x2t−1

)σ00

(1n√n ∑ xt−1

)σ0x(

1n√n ∑ xt−1

)σx0 σxx

︸︷︷︸

not diagonal unless σ0x 6=0



From the decomposition

n(

β− β)=

1n ∑n

t=1 xt−1u0.xt1n2 ∑n

t=1 x2t−1

+σx0σxx

1n ∑n

t=1 xt−1uxt1n2 ∑n

t=1 (xt−1)2

The second term is a standard LUR distribution1n ∑n

t=1 xt−1uxt1n2 ∑n

t=1 x2t−1

=⇒∫Jxc (r)dBx (r)∫Jxc (r)2dr

,

Giving limit theory for n(

β− β)

MN

(0, σ00.x

(∫Jxc (r)

2dr)−1)

+

(σx0σxx

) ∫Jxc (r)dBx (r)∫Jxc (r)2dr



Nonstandard limit theory from endogeneity and persistent regressor

Endogeneity/bias correction methods such as Phillips and Hansen (1990,FM-OLS) are designed for UR case (c = 0) and give

n(

βFM − β)=⇒ MN

(−c σx0

σxx, σ00.x

(∫Jxc (r)

2dr)−1)

Without precise information on c, the bias is not correctable



t test

β− β

σβ=⇒

(σ00.xσ00

)1/2

Z +σx0

(σxxσ00)1/2

( ∫Jxc (r)dBx (r)(

σxx∫Jxc (r)2dr

)1/2

)=

(1− δ2

)1/2Z + δηLUR (c)

whereδ =

σx0

(σxxσ00)1/2

Unless δ = 0, standard chi-square inference is not valid


Pitfalls: (b) Nonstandard InferenceBonferroni Bounds Method

Test has mixture limit theory

tβ =β− β

σβ=⇒ δηLUR (c) +

(1− δ2

)1/2Z . (1)

Cavanagh, Elliott and Stock (1995) :

pretest for δ = 0 and t-test is validif δ 6∼ 0, use a Bonferroni method and most conservative cconstruct a 100 (1− α1)% CI for c using Stock (1991) confidence beltsFor each c in CI, construct a 100 (1− α2)% CI for β, denoted asCIβ|c (α2) , using (1).A CI free of c is obtained as

CIβ (α) =⋃

c∈CIc (α1)CIβ|c (α2) .


Pitfalls: constructing a CI for cStock’s (1991) suggestion

use LUR limit theory for AR tρ (Phillips, 1987)

tρ =ρ− 1

σρ=⇒

∫JcdW(∫

J(r)2dr)1/2 + c

(∫Jc (r)2dr

)1/2

=: τc , (2)

invert statistic tρ using confidence bands constructed from the limitdistribution theory of τc

mechanism: use confidence belts (Kendall, 1946) from intensive simu-lations of (2)


Pitfalls: constructing a CI for cStock’s confidence belts for c

beltuse

Confidence belts (levels 2.5%, 50% and 97.5%) with asymptotic functionalapproximations —Phillips (2013)


Pitfalls: constructing a CI for cBonferroni Method - specifics

using ρ find CIc (α1) = [cl (α1) , cu (α1)] by inversion - Stock (1991)

using the cv dtβ,c of tβ ∼ δηLUR (c) +(1− δ2

)1/2Z , find

CIβ(α1, α2) =[d βl (α1, α2), d

βu (α1, α2)

]=

[min

cl≤c≤cudtβ,c , 12 α2

, maxcl≤c≤cu

dtβ,c ,1− 12 α2

]Then

Pr(tβ /∈

[d βl (α1, α2), d

βu (α1, α2)

])→ Pr

(δηLUR (c) +

(1− δ2

)1/2Z /∈

[d βl (α1, α2), d

βu (α1, α2)

])≤ α1 + α2. (By Bonferroni)


Pitfalls: constructing a CI for cBonferroni Method - numerically intensive like grid bootstrap computations

Golly, I just ran 5 billion regressions!!

Construct the cv dtβ,c of tβ ∼ δηLUR (c) +(1− δ2

)1/2Z across a grid

20, 000 values of c × 5 values of δ × 50, 000 replications


Pitfalls: constructing a CI for cBonferroni Method - Campbell & Yogo (2006, CY)

Campbell and Yogo (2006): the same idea with DF-GLS

Often employed in applied work and in other methodological work

Undesirable Properties:

Conservative by construction: numerical size often much less than nom-inal sizeResults in biased test, since there are local alternatives for rejecting lessoften than nominal size.No extensions to multivariate systems/multiple regressors.... plus ...

A Major New Problem!!


Pitfalls: constructing a CI for cRobustness Failure

Unnoticed Problem:

Stock (1991) confidence intervals have ZERO coverage probability when|ρ| < 1 (and more generally when n 13 (1− ρ)→ ∞)UR test statistic τc fails tightnessCIc (α1) = [cl (α1) , cu (α1)] divergesprobability mass escapes as c → −∞ and stationary region |ρ| < 1 isapproached

Consequences - serious failure of robustness to ρ

CY confidence intervals have zero coverage probability in stationary caseCY Q test indicates predictability with probability unity under the nullwhen |ρ| < 1


Asymptotic analysis - local to unity caseForm of the confidence belts

TheoremAs c → −∞ the asymptotic form of the distribution of τc is

τc ∼ N(−|c |

1/2

21/2 −1

23/2 |c |1/2 ,14

)+Op

(1

|c |1/2

), (3)

inducing the (2.5%,97.5%) confidence belts−|c |

1/2

21/2 −1

23/2 |c |1/2 ±1.962


Asymptotic analysis - local to unity caseExpansion of the limit distribution

As c → −∞, we expand τc as

τc =

∫JcdW(∫

J(r)2dr)1/2 + c

(∫Jc (r)2dr

)1/2

=(−2c)1/2 ∫ JcdW(−2c)

∫J(r)2dr

1/2 −|c |1/2

21/2

(−2c)1/2

∫Jc (r)2dr

1/2

∼ ζ1+ 2ζ

(−2c )1/2

1/2 −|c |1/2

21/2

1+

2ζ

(−2c)1/2

1/2

= −|c |1/2

21/2 +12

ζ +Op

(1

|c |1/2

), ζ ≡ N (0.1)



Asymptotic confidence belts:− |c |

1/2

21/2 − 123/2 |c |1/2 ± 1.96

2

Confidence belts with asymptotic functional approximations



Family τc = N

(−|c |

1/2

21/2 −1

23/2 |c |1/2 ,14

)is not tight as c → −∞ .

Probability mass escapes as c → −∞, i.e as stationary region |ρ| < 1 isapproached


Asymptotic analysis - stationary caseStock CIs have zero coverage probability in stationary/near stationary case

Theorem100(1− α)% CI for |ρ| < 1 is

[ρL, ρU ] =

[1− 2Aρ + 2

2ζ ± zα/2√n

A1/2ρ

], Aρ =

1− ρ

1+ ρ(4)

where ζ = N (0, 1). Coverage probability is

P ρ ∈ [ρL, ρU ] = Pζ

ζ ∈

[√n4(1− ρ)A1/2

ρ ± zα/2

2

]



[ρL, ρU ]→p3ρ− 11+ ρ

= ρ

1 2

4

2

2

rho

rho_bar



Coverage probability

P ρ ∈ [ρL, ρU ] = Pζ

ζ ∈

√n4(1− ρ)A1/2

ρ︸︷︷︸→∞

± zα/2

2

→ 0

i.e. if√n (1− ρ)A1/2

ρ → ∞.



Zero coverage probability whenever√n (1− ρ)3/2 → ∞

That is: for all fixed |ρ| < 1 and for all mildly integrated ρ with

ρ = 1+cnδ, δ <

13, c < 0


Implications for Campbell Yogo Predictive TestsComplete failure of robustness to AR coeffi cient

Coverage probabilities of Campbell-Yogo and stationary confidenceintervals for the predictive regression coeffi cient β.


Implications for Campbell Yogo Predictive TestsPrediction Failure

CY Q test employs t ratio tβ(ρ) based on regression coeffi cient β (ρ)conditional on ρ :

β (ρ) =∑nt=1 xt−1

[yt − σx0

σxx(xt − ρxt−1)

]∑nt=1 x

2t−1

Confidence interval from Stock inversion giving [ρL, ρu ] and

[βL (ρU ) , βU (ρL)] =[

β (ρU )− zα2/2σβ(ρ), β (ρL) + zα2/2σβ(ρ)

]



CY confidence interval is inconsistent for all |ρ| < 1

[βL (ρU ) , βU (ρL)] → β+σx0σxx

(ρ− 1)2

ρ+ 16= β

for all σx0 6= 0



Under H0 : β = 0, Size → Unity for |ρ| < 1

β (ρ)→pσx0σxx

(ρ− 1)2

ρ+ 16= 0

So rejection probability → 1

Always find evidence of predictability with this test!!


Possibilities: (a) & (b) Correct centeringCorrect centring of AR test statistic - then use Bonferroni construction

Hansen, 1999: bootstrap approach; Mikusheva, 2007, 2012.

Idea: do the inversion based on tρ =ρ−ρσρ

tρ =⇒

∫JcdW

(∫J (r )2dr)

1/2 ⇒c→−∞

N (0, 1) (Phillips, 1987)

N (0, 1) for n (1− ρ)→ ∞ (Giraitis & Phillips, 2006)

CI construction for ρ is then uniform in ρ ∈ (0, 1] (Mikusheva, 2007).

But: numerically intensive, applies only for AR(1), no extension tomultivariate regressors


Possibilities: (a) & (b) IVX InferenceIVX approach —Magdalinos and Phillips (MP, 2009)

Idea: self generation - create less persistent IV using own regressor xt

zt =t

∑j=1

ρt−jnz 4xj

ρnz = 1+cznϕ, ϕ ∈ (0, 1) , cz < 0,

IVX ⇒ filter the regressor xt to produce a valid instrument for xtreduce persistence when xt is LURapplicable when xt is UR, LUR, MI, ME


Possibilities: (a) & (b) IVX InferenceIVX approach - mechanics

IVX is based on the regressor xt

zt =t

∑j=1

ρt−jnz 4xj , ρnz = 1+cznϕ, ϕ ∈ (0, 1) , cz < 0

Since 4xj = cn xj−1 + uxt , we have the decomposition

zt =t

∑j=1

ρt−jnz

(cnxj−1 + uxt

)=

t

∑j=1

ρt−jnz uxt +cn

t

∑j=1

ρt−jnz xj−1

= zt +cn

ψnt

zt = ρnzzt−1 + uxt plays the role of a mildly integrated instrument.


Possibilities: (a) & (b) IVX InferenceIVX estimation & limit theory

IVX estimator:

βIVX =∑nt=1 zt−1yt

∑nt=1 zt−1xt−1

= β+∑nt=1 zt−1u0t

∑nt=1 zt−1xt−1

Nice limit theory under the rate restriction ϕ ∈ (0, 1)

n1+ϕ2(

βIVX − β)=⇒ Ψ

where Ψ is a correctly centered mixed normal random variable

The t-ratio:

tβIVX =βIVX − β

σIVX=⇒ Z

which is a standard normal

IVXasymptotics


Possibilities: (a) & (b) IVX InferenceIVX approach - key idea

asymptotic independence between MG parts of numerator/denominator.

Compared to earlier discussion

ζnt =

[ 1

n1+ϕ2zt−1u0t1√nuxt

],

∑ EFnt−1ζntζ′nt =

( 1n1+ϕ ∑ z2t−1

)σ00

(1

n1+ϕ2

∑ zt−1)

σ0x(1

n1+ϕ2

∑ zt−1)

σ0x σxx

=⇒ diagonal

achieved by reducing order of magnitude of the instrument zt


Possibilities: (a) & (b) IVX InferenceIVX approach - ongoing research

Ongoing work:

allowing multivariate xt with mixed orders of persistenceallowing intercepts, deterministic trends, and structural breaks

Desirable properties:

pivotal mixed normal limit theory under general persistent regressorsaccommodates multivariate systems + many regressorsapplicable when regressors have differing levels of persistencegood size & power properties of test


Pitfalls: (c) Invalidity of Quantile Preditive RegressionQR Model Invalidity under the Alternative

QR predictive model - conditional quantile formulation:

Qyt (τ|Ft−1) = β (τ) xt−1 + σ1/200 Φ−1u0 (τ) (5)

xt = xt−1 + uxt

Φ = cdfN (0, 1), Qyt (τ|Ft−1) = quantile function, ut = (u0t , uxt )′ ∼ mds

Quantile crossings occur for ρ > τ when natural order is reversed:

β (ρ)− β (τ) xt−1 + σ1/200

Φ−1u0 (ρ)−Φ−1u0 (τ)

< 0

that is whenever

xt−1 <σ1/200

Φ−1u0 (τ)−Φ−1u0 (ρ)

β (ρ)− β (τ)

, if β (ρ)− β (τ) > 0 (6)


Pitfalls: (c) Invalidity of Quantile Preditive RegressionQR Model Invalidity under the Alternative

TheoremQuantile crossing frequency given by

n−1n

∑t=11

xt−1√n<

1√n

σ1/200

Φ−1u0 (τ)−Φ−1u0 (ρ)

β (ρ)− β (τ)

⇒∫ 1

01 Bx (r) < 0 dr ,

distribution is the arc sine law with density 1πx 1/2(1−x )1/2 over x ∈ [0, 1]

Most likely many or few crossings for a given trajectory xtnt=1.Quantile prediction requires constant slope coeffi cient for validity!

But ...


Possibilities: (c) Quantile Preditive RegressionQR Model Validity by Marginalizing the Alternative

Local to constant slope β (τ) = β+ b(τ)kn

with kn → ∞

With localizing coeffi cient function b (τ) ∈ [bL, bU ]

Local QR model

Qyt (τ|Ft−1) =(

β+b (τ)kn

)xt−1 + σ1/2

00 Φ−1u0 (τ)


Possibilities: (c) Quantile Preditive RegressionQR Model Validity by Marginalizing the Alternative

Condition for no quantile crossing

√nb (ρ)− b (τ)

kn

xt−1√n+ σ1/2

00

Φ−1u0 (ρ)−Φ−1u0 (τ)

> 0

holds if√nkn→ 0 as n→ ∞.

Probability of quantile crossing tends to zero as n→ ∞.

kn → ∞ fast enough so quantiles do not cross(√

nkn→ 0

)kn → ∞ not too fast

(knn → 0

)so marginal deviations can be consistently

estimated


Possibilities: (c) Quantile Preditive RegressionFM-QR Estimation in UR case (Xiao, 2009)

FM-QR limit theory for xt ≡ UR

n(

β+(τ)− β (τ)

)⇒ MN (0,V ) , V =

σ2ψ.x

f (F−1 (τ))2

(∫ 1

0B2x

)−1Allowance for marginal alternatives β (τ) = β+ b(τ)

knwith kn → ∞

nkn

(b+ (τ)− b (τ)

)⇒ MN (0,V )

Extensions to IVX-QR for xt ≡ LUR,UR,MI ,ME (Lee, 2013)


Pitfalls: (d) Misbalancing in GeneralModel Invalidity under the Alternative

Predictive model formulation:

yt+1 = βxt + u0t+1; xt+1 = ρxt + uxt+1

Equation unbalanced under βA 6= 0 as βAxt−1 = Op(√n)conflicts

with yt = I (0)

But if we reject the null and βA 6= 0 and yt is I (0) we have

β =1n

1n ∑n

t=1 xt−1yt1n2 ∑n

t=1 x2t−1

= Op

(1n

)→p 0

so β→p 0 conflicts with the alternative hypothesis β = βA


Possibililites: (d) Achieving Balance under AlternativeResolving Balance by localization of coeffi cient

Local to zero predictive regression:

yt+1 = βnxt + u0t+1xt+1 = ρnxt + uxt+1

Suppose βn =bnγ with b 6= 0, marginal deviation from null

Then

yt ∼

u0t + bn12−γJx

( tn

)6= I (0) γ < 0.5

u0t + bJx( tn

)= Op (1) γ = 0.5

u0t = Op (1) γ > 0.5Test is consistent for γ ∈ [0.5, 1) and inconsistent for γ ≥ 1 since

nβn = n(

βn − βn)+ nβn = Op(1) +Op

(n1−γ

)


Possibililites: (d) Achieving Balance under AlternativeResolving Balance by localization of coeffi cient

Marginal deviations from null are detectable!!


Possibililites: (d) Achieving Balance under AlternativeResolving Balance by Nonlinearity

Nonlinearity & local linearity

Linear specifications can be satisfactory but require coeffi cient local-ization or error localization for balance

localization is a form of nonlinearity where coeffi cients change with n

Direct nonlinear specifications also offer reasonable solutions

And may also benefit from coeffi cient localization


Motivation for nonlinear specificationsResolving Balance by Nonlinearity

Model

yt+1 = g (xt ) + u0t+1,

xt+1 =(1+

cn

)xt + uxt+1,

Park and Phillips (1998): yt can be I (0) for some g such as g ∈ L1attenuates signaltransforms I (1) process to a process with memory d ∼ 1

4



Integrable Transform g ∈ L1 of a random walk

0 100 200 300 400 50020

15

10

5

0

5

10

RW vs g (RW) (red)

0 100 200 300 400 50020

15

10

5

0

5

10

g (RW) + white noise (green)



Model

yt+1 = g (xt ) + u0t+1,

xt+1 =(1+

cn

)xt + uxt+1,

Wang and Phillips (2009a,b;WP): kernel estimation gives(√nh)1/2

g =(√nh)1/2

[g − g ] +(√nh)1/2

g = Op((√

nh)1/2

)divergent, thereby achieving (2)

Limit theory is pivotal and mixed normal —so (1) and (2) are covered;and effects are small (3) without coeffi cient localization for g ∈ L1.

asymptotics , simulations


Motivation for nonlinear specificationsResolving Balance by Nonlinearity with Localization

For more general g (e.g. asymptotically homogeneous) local to zeropredictive power may be achieved with

yt+1 = βng (xt ) + u0t+1,

xt+1 = xt + uxt+1

with some function g (xt ) = g (xt ,π) known up to π

Consistent estimation possible for βn local to zero, e.g.

|βn | %bn1/4 , or

bn1/2κ (n1/2,πo )

Asymptotic theory in Shi and Phillips (2012) for weakly identified π

Solutions to (a), (b) and (c) can be achieved in this framework.


Pitfalls + Possibilities in Predictive Regressionsum up

Pitfalls

bias in estimation, testing, CIs & nonstandard inference

uncertain degrees of persistence - can be fatal as in Stock &CY

need to cope with multiple regressors and marginal predictability

omitted predictors misspecification

issues of model validity under alternative

Possibilities

IVX & Nonparametrics

Application to long horizon prediction (Phillips & Lee, 2013)

Quantile Regression balancing & IVX (Phillips, 2013; Lee, 2013)


Thank You!


Robustness Issues

Triangular system with UR or LUR regressors:

yt = Axt + u0t

xt UR, or Local to Unity (LUR)

Invalid asymptotic inference when xt LUR

Estimators for A are not asymptotically MN (0, ·)Wald test is not asymptotically χ2

FM and other correction methods do not work

Inference relies upon knowledge of type of regressor persistence


Feasible IVX Inference

Triangular system with unknown degree of persistence:

yt = Axt + u0txt = Rnxt−1 + uxt

Rn = IK +Cnα, C ≤ 0, α > 0

xt UR if C = 0 or α > 1xt LUR if C < 0 r& α = 1xt MI if C < 0 r& α ∈ (0, 1)


Feasible IVX Inference (instrument construction)

Choose

Rnz = IK + Cz/nϕ for ϕ ∈ (2/3, 1) , Cz < 0

Instrument Construction: zt = ∑tj=1 R

t−jnz ∆xj

Since ∆xj = uxj + C/nαxj−1

zt =t

∑j=1R t−jnz uxj +

Cnα

t

∑j=1R t−jnz xj−1 = zt + error

zt Rnz -mildly integrated

Two Cases: (i) α < ϕ, (ii) ϕ ≤ α ≤ 1



IVX estimationAn =

(Y ′Z − n∆0x

) (X ′Z

)−1TheoremCase (i): When 2/3 < ϕ < min (α, 1) ,

n1+ϕ2 vec

(An − A

)⇒ MN

(0,(Ψ−1xx

)′CzVzzCz Ψ−1xx ⊗Ω00

)as n→ ∞, where

Ψxx =

Ωxx +

∫ 10 BxdB

′x xt UR

Ωxx +∫ 10 JC dJ

′C xt LUR

Ωxx + VxxC xt MI



Let Vxz =∫ ∞0 e

rCVxxerCzdr

TheoremCase (ii): For α ∈ (1/3, ϕ] and ϕ ∈ (2/3, 1):

n1+α2 vec

(An − A

)⇒N(0,V−1xx ⊗Ω00

)if α < ϕ

N(0,V−1xz C

−1VxxC−1 (V′xz )−1 ⊗Ω00

)if α = ϕ

Asymptotic mixed normality applies for all

1/3 < α ≤ ϕ ∈ (2/3, 1)


Feasible IV Inference (testing)

Testing for linear restrictions:

H0 : Hvec (A) = h, rank (H) = r

Wald test based on IV procedure:

Wn =(HvecAn − h

)′ [H(X ′PZX

)−1 ⊗ Ω00

H ′]−1 (

HvecAn − h)

TheoremIf α > 1/3 and ϕ is chosen in (2/3, 1)

Wn ⇒H0 χ2 (r)

irrespective of whether xt is UR, or LUR or MI


Feasible IVX Inference —Remarks

Standard χ2 inference applies without a priori knowledge of the orderof persistence of xt : a full solution to Elliott (1998)

Price paid for robustness: reduction in the RofC of An to O(n1+ϕ2

)from O (n).Exception - no cost when: xt mildly integrated with α ≤ ϕ

IVX method is feasible: instruments constructed from the regressorswithout extra information.Hence the requirement α > 1/3: untractable simultaneityMildly integrated processes provide good instruments because

zt stationary: Op (1) zt mild: Op(nϕ/2

) zt LUR: Op

(n1/2

)reduced RoC =⇒eliminates endogeneityincreased RoC=⇒tractable asymptotic bias

IVXInference


Nonparametric predictive regressionNonparametric estimation

Model with single lagged regressor

yt = g(xt−`) + u0t ,

xt =(1+

cn

)xt−1 + uxt

unknown g , mds u0t , and SM, LM, AP uxtNonparametric NW estimation

g(x) =∑nt=`+1 Kh (xt−` − x) yt

∑nt=`+1 Kh (xt−` − x)

Limit theory (Wang & Phillips, 2009)(n

∑t=1+`

K(xt−` − xhn

))1/2

(g(x)− g(x)) d→ N(0, σ20

∫ ∞

−∞K (x)2dx

)Includes SM, LM, AP uxt with |d | < 1

2 (Kasparis, Andreou & Phillips,2013)Peter C. B. Phillips () Prediction SoFiE Conference June 2013 58 / 90

Nonparametric predictive regressionNonlinear model testing

Test constant regression function H0 : g(x) = µ using a t-ratio

t(x , µ) :=

∑nt=1+` K

(xt−`−xhn

)σ20∫ ∞−∞ K (λ)

2dλ

1/2

(g(x)− µ)d→ N (0, 1)

with µ = ∑nt=1+` yt/n and σ20 = n

−1 ∑nt=1+` (yt − µ)2 under the null

Test functionals over isolated points Xs = x1, ..., xs for some fixed s

Fsum := ∑x∈Xs

[t(x , µ)]2 and Fmax := maxx∈Xs

[t(x , µ)]2 .

Limit theory under H0 as n→ ∞

Fsumd→ χ2s and Fmax

d→ Y ∼d maxi≤s(χ21i )

back


Nonparametric predictive regressionSimulations - Model design

Model

yt = g(xt−1) + u0t , xt =(1+

cn

)xt−1 + uxt , x0 = 0

Error mechanisms

uxt = ρxuxt−1 + ηt , ρx = 0, 0.3 (SM)

(I − L)duxt = ηt , d = −0.25, 0.25 (AP & LM)[u0tηt

]∼ iidN

(0,[1 rr 1

]), |r | < 1.


Nonparametric predictive regressionSimulations - Model design

Functional forms

g0(x) = 0 (null hypothesis)g1(x) = 0.015x , (linear)g2(x) = 1

4 sign(x) |x |1/4 (polynomial)g3(x) = 1

5 ln(|x |+ 0.1) (logarithmic)

g4(x) = (1+ e−x )−1 (logistic)g5(x) = (1+ |x |0.9)−1 (recip’al)g6(x) = e−5x

2(integrable)


Nonparametric predictive regressionSimulation Results - Size (Nominal 5%), Tests - NP, FM, JM

n = 500, h = n−b , xt ∼ local to unity process (c)

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 1(a) c = 0

1 0.5 0 0.5 1

00.1

0.2

0.4

0.6

0.8

1

R

Fig. 1(b) c = −50

NPP b = 0.1, NPP b = 0.2, FMLS, J&M


Nonparametric predictive regressionSimulation Results - Size (Nominal 5%), Tests - NP, FM, JM

n = 500, h = n−b , ∆xt ∼ ARFIMA(d)

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 2(a) d=-0.25

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 2(b) d=0.25



Nonparametric predictive regressionSimulation Results - Power, Tests - NP, FM, JM

n = 1000, h = n−b , g(x)) = 0.015x : ρx = 0.3

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 3(a) c = 0

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 3(b) c = −50



Nonparametric predictive regressionSimulation Results - Power, Tests - NP, FM, JM

n = 1000, h = n−b , c = 0, ρx = 0.3

1 0.5 0 0.5 1

00.10.2

0.4

0.6

0.8

1

R

Fig. 4(a) g(x) = (1+ |x |0.9)−11 0.5 0 0.5 1

0

0.1

0.2

0.4

0.6

0.8

1

R

Fig. 4(b) g(x) = e−x2



Nonparametric predictive regressionEmpirics

Predictability Test Results for SP500 Returns 1926:1 - 2010:12 (n = 1009)

Two common predictors, Various grids and bandwidths hn = σxn−b

Predictor: Dividend Price Earnings Priceratio ratio

Log(D/P) Log(E/P)

Tests Grid pts Lag b bSum 10 1 - 0.1

4 0.3,0.4 0.1Sum 35 1 - 0.1

4 0.2,0.3 0.1,0.2Sum 50 1 0.2,0.3,0.4 0.1,0.2

4 - 0.1,0.2,0.3,0.4

back


Pitfalls + Possibilities in Predictive Regression

Pitfalls

bias & nonstandard inference

uncertain degrees of persistence - can be fatal

zero coverage probability in CIs for stationarity/MI regressors

need to cope with multiple regressors and marginal predictability

omitted predictors misspecification - need to allow for temporal depen-dence

model validity under alternative



Possibilities - IVX

easy to use, no simulations/integration, standard chi-square inference

holds for local to unity, mildly integrated, mildly explosive, and station-ary regressors

handles multiple regressors - many fundamentals, varying degrees ofpersistence

handles multiple dependent variables, e.g., size sorted, value sorted,momentum sorted returns can all be regressed simultaneously



Possibilities - Nonparametrics

easy to use

copes with predictors of general functional form

assures model validity under alternative

particularly wide applicability regarding regressor persistence

good size and power properties against parametric methods

but challenged by multiple regressors


Motivation for dependent innovationsScalar regressor misspecification with mds innovations

Many papers impose mds innovations in predictive regression

yt+1 = βnxt + u0t+1, EFt (u0t+1) = 0 (7)

Bivariate regression with mds innovations is unconvincing .... when

Two or more fundamentals x1t and x2t are known to have predictivepowerAs many authors have found (e.g. Campbell & Yogo, 2006)

Omitted variable misspecification produces contradiction to (7)

yt+1 = βnx1t + u0t+1, EFt (u0t+1) 6= 0 since x2t ∈ Ft .


Predictive Regression - IVX Limit TheoryThe Model

Multivariate system of predictive regressions:

yt+1 = Axt + u0t+1xt+1 = Rnxt + uxt+1

Rn = Ip +Cnα, for some α ∈ (0, 1]

A : m× p coeffi cient matrix and C = diag (c1, c2, ..., cp) are unknown.Each xit can be in the general vicinity of unity

mildly integrated (C < 0, α ∈ (0, 1)),nearly integrated (C < 0, α = 1),integrated (C = 0),locally explosive (C > 0, α = 1),mildly explosive (C > 0, α ∈ (0, 1)).


Predictive Regression - IVX Limit TheoryInnovation structure

Weakly dependent errors:

ut =

[u0tuxt

]=

∞

∑j=0Fj εt−j , εt ∼ iid (0,Σ) ,

Σ > 0, E ‖ε1‖4 < ∞, F0 = Im+p ,∞

∑j=0j ‖Fj‖ < ∞,

F (z) =∞

∑j=0Fjz j and F (1) =

∞

∑j=0Fj > 0.

Ω =∞

∑h=−∞

E(utu′t−h

)= F (1)ΣF (1)′,Ω =

[Ω00 Ω0x

Ωx0 Ωxx

].


Predictive Regression - IVX Limit TheoryPreliminary tools

FCLT:

1√n

bnsc

∑j=1

uj =1√n

bnsc

∑j=1

[u0juxj

]=

[B0n(s)Bxn(s)

]=⇒

[B0(s)Bx (s)

]= BM

[Ω00 Ω0x

Ωx0 Ωxx

].

Local to unity asymptotics:xt√n=⇒ Jxc (r) with bnrc = t,

whereJxc (r) =

∫ r

0e(r−s)C dBx (s),

encompassing the unit root case

Jxc (r) = Bx (r) if C = 0.


Predictive Regression - IVX Limit TheoryPreliminary tools

BN decomposition (Phillips and Solo, 1992):

ut = F (1)εt −4εt =

[F0(1)Fx (1)

]εt −4εt

εt =∞

∑j=0Fj εt−j , Fj =

∞

∑s=j+1

Fs

in partitioned form

u0t = F0(1)εt −4ε0t

uxt = Fx (1)εt −4εxt


IVX EstimationIVX estimator - local to unity case

Consider first α = 1 (local to unity xt). We impose, for technicalreasons, the rate condition:

n12

nϕ+nϕ

k+kn→ 0.

Reasonable since, e.g., k = n0.6 corresponds to 4-5 years among 60-80years for monthly data.

IVX Estimator:

B IVX − B =(n−k∑t=1

u0t+k(zkt)′)(n−k

∑t=1

xkt(zkt)′)−1

.


Limit TheoryNumerator

Numerator decomposition

1

k12 n

12+ϕ

n−k∑t=1

u0t+k(zkt)′

=1

k12 n

12+ϕ

n−k∑t=1

u0t+k(zkt)′+

1

k12 n

32+ϕ

n−k∑t=1

u0t+k(

ψknt

)′C

=1

k12 n

12+ϕ

n−k∑t=1

u0t+k(zkt)′+ op (1)



Component limits

Lemma1 Cz

k12 nϕzkt =⇒ Vx (t) ≡ N (0,Ωxx ) for any t

2 1n ∑n−k

t=1

(Czk12 nϕzkt) (

Czk12 nϕzkt)′→p Ωxx = Fx (1)ΣFx (1)′

3x kt=bnr c

kn12⇒ Jxc (r)



Use BN decomposition and

Apply MGCLT to the dominant component of the numerator

Lemma

For nϕ

k +kn → 0,

1

k12 n

12+ϕ

n−k∑t=1

u0t+k(zkt)′=⇒ N

(0,C−1z ΩxxC−1z ⊗Ω00

)Since

1

k12 n

12+ϕ

n−k∑t=1

u0t+k(zkt)′∼ 1√

nF0 (1)

n−k∑t=1

εt+k

(zktk12 nϕ

)′


Limit TheoryDenominator

Denominator decomposition

1k2n1+ϕ

n−k∑t=1

xkt(zkt)′

=1n

n−k∑t=1

xktkn

12

(zktknϕ

)′+1n

n−k∑t=1

xktkn

12

(Cz

kn12+ϕ

ψknt

)′C−1z C


Limit TheoryDenominator

Component limits

Lemma

1 1k 2n2+ϕ ∑n−k

t=1 xkt

(ψknt)′C =⇒

(∫ 10 J

xc (r) J

xc (r)

′ dr)C−1z C

2 1k 2n1+ϕ ∑n−k

t=1 xkt

(zkt)′Cz →p 1

2Ωxx

Denominator limit

1k2n1+ϕ

n−k∑t=1

xkt(zkt)′=⇒ 1

2ΩxxC−1z +

(∫ 1

0Jxc (r) J

xc (r)

′ dr)C−1z C


Limit TheoryIVX Estimator

Combining gives the limit theory for

B IVX − B =(n−k∑t=1

u0t+k(zkt)′)(n−k

∑t=1

xkt(zkt)′)−1

.

Theorem

n12 k

32

(B IVX − B

)=⇒ MN (0,ΣB ) ,

where

ΣB =(Ψ−1cxz

)′C−1z ΩxxC−1z

(Ψ−1cxz

)⊗Ω00,

Ψcxz =12

ΩxxC−1z +

(∫ 1

0Jxc (r) J

xc (r)

′ dr)C−1z C .


Limit TheoryIVX Estimator

Mixed normality holds b/c

ζnt+k :=

[ 1

k12 n

12 +ϕzkt ⊗ εt+k

1√n εt+k

]

has asymptotically orthogonal components:

∑ EFnt+k−1ζnt+k ζ ′nt+k

=

1n ∑n−k

t=1

(1

k12 nϕzkt) (

1

k12 nϕzkt)′⊗ Σ 1

n ∑n−kt=1

(1

k12 nϕzkt)⊗ Σ

1n ∑n−k

t=1

(1

k12 nϕzkt)⊗ Σ Σ

→ p

[C−1z ΩxxC−1z ⊗ Σ 0

0 Σ

].


Limit Theory of TestsSelf-normalized IVX Estimator for use in Testing

Lemma

1 1kn1+2ϕ ∑n−k

t=1

(zkt) (zkt)′= 1

kn1+2ϕ ∑n−kt=1

(zkt) (zkt)′+ op(1).

2 nk3(X ′PZX

)−1 ⊗ Ω00 =⇒(Ψ−1cxz

)C−1z (Ωxx )C−1z

(Ψ−1cxz

)′ ⊗Ω00

where (X ′PZX

)−1=

(n−k∑t=1

xkt(zkt)′)(n−k

∑t=1

(zkt) (zkt)′)−1 (n−k

∑t=1

xkt(zkt)′)′

−1

.


Limit TheorySelf-normalized IVX Estimator for use in Testing

Theorem

Vec(B IVX − B

)′ [(X ′PZX

)−1 ⊗ Ω00

]−1Vec

(B IVX − B

)=⇒ χ2 (mp) ,

using a consistent estimator Ω00 for Ω00.

Standard Wald statistic WIVX - easy to compute

Standard chi-squared test


Model Validity & Consistency under AlternativeBalance by localization of coeffi cient

Local to zero predictive regression:

yt+k = Bnxkt + u0t+k , x

kt =

k

∑j=1xt+j−1

Suppose Bn = Bnγk with B 6= 0. Now

x kt=bnr c

kn12⇒ Jxc (r)

Then

yt+k ∼

u0t+k + Bn12−γJxc

( tn

)6= I (0) γ < 0.5

u0t+k + BJxc( tn

)= Op (1) γ = 0.5

u0t+k = Op (1) γ > 0.5

Balance for γ ≥ 0.5


Model Validity & Consistency under AlternativeBalance by localization of coeffi cient

Test consistency

n12 k

32 B IVX = n

12 k

32

(B IVX − Bn

)+ n

12 k

32Bn = Op(1) +Op

(k12

nγ− 12

)

Test is consistent whenever n12 k

32

nγk = k12

nγ− 12→ ∞

Convergence rate of B IVX is fast enough to distinguish Bn 6= 0


Long-Horizon IVXIVX estimator - mildly integrated case

Consider first α < 1, C < 0 (mildly integrated xt). We use the ratecondition:

n12

nϕ+nϕ

k+knα+nα

n→ 0.

Set horizon k << nα << n

IVX limit theory:

Theorem

n12 k

32

(B IVX − B

)=⇒ N

(0, 4Ω−1xx ⊗Ω00

)WIVX =⇒ χ2 (mp)

Same convergence rate as for α = 1


Long Horizon IVXIVX estimator - mildly explosive case

Consider first α < 1, C > 0 (mildly explosive xt with Rn = I + n−αC ).We use the same rate condition:

n12

nϕ+nϕ

k+knα+nα

n→ 0.

IVX limit theory

Theorem

knα(B IVX − B

)Rnn =⇒ MN

(0,∫ ∞

0e−pCYCY

′C e−pC dp ⊗Ω00

)WIVX =⇒ χ2 (mp)

Faster convergence rate than α = 1


Localizing Coeffi cient EstimationUnivariate Model - illustration

Localizing coeffi cient C and index α in

xt+1 = Rnxt + uxt+1

Rn = 1+cnα, for some α ∈ (0, 1]

α is consistently estimable

Estimators: α = − log|Rn−1|log n →p α; α =log( 2n ∑n

t=1 x2t )

log n →p α

Limit distribution

n1−α2 log n

α− α+

1log n

log

∣∣∣∣c σ2xω2x

∣∣∣∣ ⇒ ω2x

cσ2xξc α ∈ (0, 1)

(log n) α− 1 ⇒ − log |c + ξc | α = 1


Pitfalls: constructing a CI for cStock’s confidence belts for c

back

Confidence belts (levels 2.5%, 50% and 97.5%) with asymptotic functionalapproximations —Phillips (2013)


pcb

Line

pcb

Arrow

pcb

Arrow

pcb

Line

Pitfalls and Possibilities in Predictive Regression · Nonstandard (LUR) limit theory for AR...

Documents

Transcript of Pitfalls and Possibilities in Predictive Regression · Nonstandard (LUR) limit theory for AR...