Identi cation, estimation and applications of a...

Identification, estimation and applications of a bivariate

long-range dependent time series model with general phase ∗†‡

Stefanos KechagiasSAS Institute

Vladas PipirasUniversity of North Carolina

March 27, 2018

Abstract

A new bivariate, two-sided, fractionally integrated time series model that allows for generalphase is proposed. In particular, the focus is on a suitable parametrization under which themodel is identifiable. A simulation study to test the performances of a conditional maximumlikelihood estimation method and of a forecasting approach is carried out, under the proposedmodel. Finally, an application is presented to the U.S. inflation rates in goods and serviceswhere models not allowing for general phase suffer from misspecification.

1 Introduction

In this work, we are interested in modeling bivariate (R2–vector) time series exhibiting long-rangedependence (LRD, in short). In the univariate case, long-range dependent (LRD) time seriesmodels are stationary with the autocovariance function decaying slowly like a power-law functionat large lags, or the spectral density diverging like a power-law function at the zero frequency. Theunivariate LRD is understood well in theory and used widely in applications. See, for example,Park and Willinger (2000), Robinson (2003), Doukhan et al. (2003), Palma (2007), Giraitis et al.(2012), Beran et al. (2013), Pipiras and Taqqu (2017).

Bivariate and, more generally, multivariate (vector-valued) LRD time series models have alsobeen considered by a number of researchers. But theoretical foundations for a general class ofsuch models were laid only recently in Kechagias and Pipiras (2015). In particular, Kechagias andPipiras (2015) stressed the importance of the so-called phase parameters. Turning to the bivariatecase which is the focus of this work, the phase φ appears in the cross-spectrum of a bivariate LRDseries around the zero frequency and controls the (a)symmetry of the series at large time lags.There are currently no parametric models of bivariate LRD with general phase that can be used inestimation and applications. The goal of this work is to introduce such a class of models, and toexamine it through a simulation study and an application to real data. In the rest of this section,we describe our contributions in greater detail.

A common parametric VARFIMA(0, D, 0) (Vector Autoregressive Fractionally Integrated Mov-ing Average) model for a bivariate LRD time series {Xn}n∈Z={(X1,n, X2,n)′}n∈Z is obtained as a

∗AMS subject classification. Primary: 62M10, 62M15. Secondary: 60G22, 42A16.†Keywords and phrases: long-range dependence, bivariate time series, phase parameter, estimation, VARFIMA.‡The second author was supported in part by NSA grant H98230-13-1-0220 and NSF grant DMS-1712966. The

authors would also like to thank Richard Davis (Columbia University) for his comments on an earlier version of thispaper.

natural extension of the univariate ARFIMA(0, d, 0) model by fractionally integrating the compo-nent series of a bivariate white noise series, namely,

Xn =

(X1,n

X2,n

)=

((I −B)−d1 0

0 (I −B)−d2

)(η1,n

η2,n

)= (I −B)−Dηn, (1.1)

where B is the backshift operator, I = B0 is the identity operator, d1, d2 ∈ (0, 1/2) are the LRDparameters of the component series {X1,n}n∈Z and {X2,n}n∈Z, respectively, D = diag(d1, d2) and{ηn}n∈Z= {(η1,n, η2,n)′}n∈Z is a bivariate white noise series with zero mean Eηn = 0 and covarianceEηnη′n = Σ. If Σ = QQ′, note that the model (1.1) can also be written as

Xn = (I −B)−DQεn or (I −B)DXn = ηn = Qεn, (1.2)

where {εn}n∈Z is a bivariate white noise with the identity covariance matrix Eεnε′n = I2. Through-out the paper, the prime indicates the transpose.

The model (1.1) admits a one-sided linear representation of the form

Xn =∑k∈I

Ψkηn−k =∑k∈I

Ψkεn−k, (1.3)

where I = {k ∈ Z : k ≥ 0} and Ψk, Ψk are real-valued 2 × 2 matrices whose entries decay as apower law as k →∞. In the frequency domain, the matrix-valued spectral density function1 f(λ)of the series Xn defined in (1.1) satisfies

f(λ) ∼(ω11|λ|−2d1 ω12|λ|−(d1+d2)e−isign(λ)φ

ω21|λ|−(d1+d2)eisign(λ)φ ω22|λ|−2d2

), as λ→ 0, (1.4)

where ∼ indicates the asymptotic equivalence, ω11, ω12, ω21, ω22 ∈ R and

φ = (d1 − d2)π/2. (1.5)

The asymptotic behavior (1.4) of the spectral density f with general φ ∈ (−π/2, π/2) is takenfor the definition of bivariate LRD in Kechagias and Pipiras (2015). Note that φ ∈ (−π/2, π/2) istaken and the following polar coordinate representation

z =z1

cos(φ)eiφ with φ = arctan

(z2

z1

)(1.6)

of z = z1 + iz2 ∈ C is used throughout. The special form of the phase parameter φ in (1.5) limitsthe type of bivariate LRD behavior that can be captured by the model (1.1). For example, in thecase of time-reversible models satisfying γ(n) = γ(−n), n ∈ Z, the spectral density matrix f hasreal-valued entries and hence φ = 0. Under the model (1.1) and (1.4), however, φ = 0 holds onlywhen d1 = d2. These observations naturally raise the following question:

Can one define a bivariate parametric LRD model with general phase?

1The following convention is used here. The autocovariance function γ is defined as γ(n) = EXnX ′0 and thespectral density f(λ) satisfies γ(n) =

∫ π−π e

inλf(λ)dλ. The convention is different from Kechagias and Pipiras (2015),

where EX0X′n is used as the autocovariance function, but is the same as in Brockwell and Davis (2009), Pipiras and

Taqqu (2017). See also Remark 2.2 below.

2

One solution to the question above is to consider two-sided linear representations with power-law decaying coefficients, that is, representations of the form (1.3) with the index set I now be-ing the set of all integers Z. Specifically, Kechagias and Pipiras (2015) constructed a two-sidedVARFIMA(0, D, 0) model with general phase by taking

Xn = (I −B)−DQ+εn + (I −B−1)−DQ−εn, (1.7)

where Q+, Q− are two real-valued 2× 2 matrices. The reason we refer to (1.7) as two-sided is thepresence of B−1 in the second term of the right-hand side of (1.7), which translates into having theleads of the innovation process εn. Also, the positive and negative powers of the backshift operatormotivate our notation for the subscripts of the matrices Q+ and Q−.

We shall use (1.7) in developing our parametric bivariate LRD model with general phase. Afirst issue that needs to be addressed is finding a suitable parametrization under which this model isidentifiable, while still yielding a general phase parameter. We show in Section 2 that this two-foldgoal can be achieved by taking Q− as

Q− =

(c 00 −c

)Q+ =: CQ+, (1.8)

for some real constant c. Under the relation (1.8) and letting {Zn}n∈Z be a zero mean bivariatewhite noise series with covariance matrix EZnZ

′n = Q+Q

′+ =: Σ, we can rewrite (1.7) in the more

succinct formXn = ∆c(B)−1Zn, (1.9)

where the operator ∆c(B)−1 is defined as

∆c(B)−1 = (I −B)−D + (I −B−1)−DC. (1.10)

Note that when c = 0 (and C = 0), the filter ∆c(B)−1 becomes the one-sided fractional integrationfilter ∆0(B)−1 = (I −B)−D.

The focus of Section 3 will be on extensions of the model (1.9)–(1.10) involving autoregressiveand moving average parts, namely, a general phase VARFIMA(p,D, q) model

Φ(B)Xn = ∆c(B)−1Θ(B)Zn, (1.11)

where Φ(B), Θ(B) are matrix polynomials of finite orders p and q satisfying the usual stationarityand invertibility conditions. In fact, for identifiability and estimation purposes, we shall work withdiagonal AR filters Φ(B) in which case the general phase VARFIMA(p,D, q) model (1.11) can alsobe expressed as

Φ(B)∆c(B)Xn = Θ(B)Zn, (1.12)

since the diagonal filters Φ(B),∆c(B) commute. (In fact, the model (1.12) is usually referred to asFIVARMA – see Section 3.2 below.) The advantage of the model (1.11), as we show, is that theautocovariance function of the right-hand side of (1.11) can be computed explicitly. In estimation,we can then employ a conditional likelihood approach where the Gaussian likelihood is written forΦ(B)Xn (though the maximum is still sought over all unknown parameters).

We should also emphasize that the analysis of this work is limited to the second-order propertiesof time series (that is, autocovariance, cross-correlation, cross-spectrum, and so on). Thus, althoughthe models (1.7) and (1.11)–(1.12) are expressed through two-sided and hence non-causal linearrepresentations, their noncausal nature is irrelevant to the extent that these models are used only

3

as suitable parameterizations of bivariate long-range dependent models allowing for general phasethrough their second-order properties.

Our estimation procedure follows the approach of Tsay (2010) who considered one-sided models(1.12) with c = 0. Still in the case c = 0, Sowell (1986) calculated (numerically) the autocovariancefunction of the model (1.11) and performed exact likelihood estimation. For other approaches (allin the case c = 0), see also Ravishanker and Ray (1997) who considered the Bayesian analysis andPai and Ravishanker (2009a, 2009b) who employed the EM and PCG algorithms, as well as Duekerand Starz (1998), Martin and Wilkins (1999), Sela and Hurvich (2009) and Diongue (2010).

The rest of the paper is structured as follows. General phase VARFIMA(0, D, 0) andVARFIMA(p,D, q) series are presented in Sections 2 and 3. Estimation and other tasks are con-sidered in Section 4. Section 5 contains a simulation study, and Section 6 contains an applicationto the U.S. inflation rates.

2 General phase VARFIMA(0, D, 0) series

In this section, we consider the two-sided bivariate VARFIMA(0, D, 0) model (1.9)–(1.10). Kecha-gias and Pipiras (2015) showed that any phase parameter φ in (1.4) can be obtained with the model(1.7) for an appropriate choice of Q+ and Q−. However, letting the entries of these matrices takeany real value, causes identifiability issues around the zero frequency, as the same phase param-eter can be obtained by more than one choice of Q+ and Q−. Indeed, from the following simplecounting perspective, note that the specification (1.4) has 6 parameters (d1, d2, ω11, ω12, ω22 andφ) whereas the model (1.7) has 10 (d1, d2 and the entries of Q+ and Q−). One might naturallyexpect identifiability up to Q+Q

′+ and Q−Q

′− but this still leaves the number of parameters at 8

(d1, d2 and the 6 different entries of Q+Q′+ and Q−Q

′−). In Proposition 2.1 and Corollary 2.1 below

(see also the discussion following the latter), we show that the parameterization (1.8) addresses theidentifiability and general phase issues. For one, note that the model (1.9)–(1.10) has the requirednumber of 6 parameters (d1, d2, c and the three different entries of Σ = Q+Q

′+).

Proposition 2.1 Let d1, d2 ∈ (0, 1/2) and Q+ be a 2 × 2 matrix with real-valued entries. Let

also {X(c)n }n∈Z be a time series defined by (1.9)–(1.10) where D = diag(d1, d2). For any φc ∈

(−π/2, π/2), there exists a unique constant c ∈ (−1, 1) such that the series {X(c)n }n∈Z has the

phase parameter φ = φc in (1.4). Moreover, the constant c has a closed form given by

c = c(φc) =

{a1−a2a1+a2

, if φc = arctan a1−a2a1+a2

,2(a1+a2)−

√∆

2(a1−a2+tan(φc)(1+a1a2)) , otherwise,(2.1)

where

a1 = tan

(πd1

2

), a2 = tan

(πd2

2

)and ∆ = 16a1a2 + 4(1 + a1a2)2 tan2(φc). (2.2)

The function c = c(φc) in (2.1) is continuous at φc = arctan((a1 − a2)/(a1 + a2)).

Proof: By using Theorem 11.8.3 in Brockwell and Davis (2009), the VARFIMA(0, D, 0) seriesin (1.9)–(1.10) has a spectral density matrix

f(λ) =1

2π∆c(e

−iλ)−1Σ∆c(e−iλ)−1∗, (2.3)

4

where the superscript ∗ denotes the complex conjugate operation. From (1.8) and by using the factthat 1− e±iλ ∼ ∓iλ, as λ→ 0, we have

f(λ) ∼ 1

2π

((iλ)−D + (−iλ)−DC

)Σ((−iλ)−D + C(iλ)−D

), as λ→ 0. (2.4)

Next, by denoting Σ = (σjk)j,k=1,2 and using the relation ±i = e±iπ/2, we get that the (j, k) elementof the spectral density f(λ) satisfies

fjk(λ) ∼ gjkλ−(dj+dk), as λ→ 0+, (2.5)

where the complex constant gjk is given by

gjk =σjk2π

(e−iπdj/2 + (−1)j+1ceiπdj/2) · (eiπdk/2 + (−1)k+1ce−iπdk/2) (2.6)

and (−1)j+1, (−1)k+1 in (2.6) account for the different signs next to c’s in the diagonal matrixC in (1.8). Focusing on the (1, 2) element, and by applying the polar-coordinate representationz = z1

cos(φ)eiφ of z = z1 + iz2 ∈ C with φ = arctan(z2/z1) (see (1.6) above) to the two multiplication

terms below separately, we have

g12 =σ12

2π

(cos(

πd1

2)(1 + c) + i sin(

πd1

2)(c− 1)

)·(

cos(πd2

2)(1− c) + i sin(

πd2

2)(1 + c)

)=

σ12

2π

cos(πd12 ) cos(πd22 )

cos(φc,1) cos(φc,2)(1− c2)e−iφc , (2.7)

where

φc = −(φc,1 + φc,2), φc,1 = arctan

(a1c− 1

1 + c

)and φc,2 = arctan

(a2

1 + c

1− c

)(2.8)

with a1 and a2 given in (2.2).By using the arctangent addition formula arctan(u) + arctan(v) = arctan( u+v

1−uv ) for uv < 1 (inour case uv = −a1a2 < 0), we can rewrite φc as

φc = − arctan

(a1

c−11+c + a2

1+c1−c

1 + a1a2

)=: h(c). (2.9)

For all d1, d2 ∈ (0, 1/2), the function h : (−1, 1)→ (−π/2, π/2) is strictly decreasing (and therefore1-1) and also satisfies

limc↓−1

h(c) =π

2, lim

c↑1h(c) = −π

2.

Since h is continuous, it is also onto its range which completes the existence and uniqueness partof the proof.

To obtain the formula (2.1), we invert the relation (2.9) to get the quadratic equation

(a1 − a2 + tan(φc)(1 + a1a2))c2 − 2(a1 + a2)c+ a1 − a2 − tan(φc)(1 + a1a2) = 0, (2.10)

whose discriminant ∆ is given by

∆ = 16a1a2 + 4(1 + a1a2)2 tan2(φc)

5

and is always positive. The solutions of (2.10) are then given by

c1 =2(a1 + a2) +

√∆

2(a1 − a2 + tan(φc)(1 + a1a2)), c2 =

2(a1 + a2)−√

∆

2(a1 − a2 + tan(φc)(1 + a1a2)).

It can be checked that c1 /∈ (−1, 1) and c2 ∈ (−1, 1). Note that, when a1−a2+tan(φc)(1+a1a2) = 0or φc = arctan((a1 − a2)/(a1 + a2)), the quadratic equation (2.10) becomes a linear equation withthe solution

c =a1 − a2 − tan(φc)(1 + a1a2)

2(a1 + a2)=a1 − a2

a1 + a2,

which always satisfies c ∈ (−1, 1). Finally, the fact that the function c = c(φc) in (2.1) is continuousat φc = arctan((a1 − a2)/(a1 + a2)) can be checked easily. �

The following result is a direct consequence of the proof of Proposition 2.1.

Corollary 2.1 The spectral density of the time series {X(c)n }n∈Z in Proposition 2.1 satisfies the

asymptotic relation (1.4) with φ = φc and

ωjj =σjj2π

(1 + c2 + (−1)j+12c cos(πdj)), j = 1, 2, (2.11)

ω12 =σ12

2π

cos(πd12 ) cos(πd22 )

cos(φc,1) cos(φc,2)(1− c2), (2.12)

where Σ = Q+Q′+ = (σjk)j,k=1,2 and φc,1, φc,2 are given in (2.8).

Proof: The relations (2.11)–(2.12) follow from (2.5)–(2.6) and (2.7)–(2.8). �

Corollary 2.1 shows that the bivariate LRD model (1.9)–(1.10) is identifiable around the zerofrequency when parametrized by d1, d2,Σ = Q+Q

′+ and c. It will be referred to as the general phase

VARFIMA(0, D, 0) series (two-sided VARFIMA(0, D, 0) series).

Remark 2.1 Proposition 2.1 relates the phase φ at the zero frequency and the constant c whichappears in the full model (1.9)–(1.10). For this model, however, the phase function φ(λ) of the fullcross spectral density f12(λ) = g12(λ)e−iφ(λ) or f21(λ) = g12(λ)eiφ(λ), λ ∈ (0, π), is not a constantfunction of the frequency λ. Instead, by using the identity 1− e∓iλ = 2 sin(λ2 )e∓i(λ−π) and arguingas for (2.7)–(2.9) above, it can be shown that

φ(λ) = − arctan

(x1(λ) c−1

1+c + x2(λ)1+c1−c

1 + x1(λ)x2(λ)

), (2.13)

where

x1(λ) = tan

(d1(π − λ)

2

), x2(λ) = tan

(d2(π − λ)

2

). (2.14)

Several plots of the phase function (2.13) are given in Figure 1. We also note that Sela (2010)considers LRD models with phase functions φ(λ) following special power laws, but we will notexpand in this direction.

6

λ0 0.5 1 1.5 2 2.5 3

φ(λ)

-1.5

-1

-0.5

0

0.5

1

1.5

c1=0c2=0.5c3=-0.8c4=-0.4

λ0 0.5 1 1.5 2 2.5 3

φ(λ)

-1.5

-1

-0.5

0

0.5

1

1.5

c1=0c2=-0.7c3=0.7c4=-0.3

Figure 1: Phase functions φ(λ) for the model (1.9)–(1.10) for different parameter values. Left:c = 0, 0.5,−0.8,−0.4 and d1 = 0.2, d2 = 0.4. Right: c = 0,−0.7, 0.7,−0.3, d1 = d2 = 0.3.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

Figure 2: The function bd(c), c ∈ (−1, 1), in (2.15). Left: d > 0. Right: d < 0.

Remark 2.2 The autocovariance function of the model (1.7) has an explicit form given in Propo-sition 5.1 of Kechagias and Pipiras (2015). The same form can obviously be used for the model(1.9)–(1.10). We also note here that, strictly speaking, Proposition 5.1 is incorrect as stated in thecontext of Kechagias and Pipiras (2015). As indicated in a footnote in Section 1 above, Kechagiasand Pipiras (2015) use the convention EX0X

′n for autocovariances. But the proof of Proposition

5.1 is based on Brockwell and Davis (2009), who use the different convention EXnX′0 and which

is also adopted here. The result of Proposition 5.1 is thus correct but using the convention of thispaper, and correct only up to matrix transposition in the context of Kechagias and Pipiras (2015).

Remark 2.3 Technical issues of Proposition 2.1 aside, there is a simple way to see why the pro-posed model will yield a general phase. Note that a generic term e−iπd/2 + ceiπd/2 entering (2.6)can be expressed in polar coordinates as

e−iπd/2 + ceiπd/2 = ad(c)e−ibd(c). (2.15)

The generic shape of the function bd(c), c ∈ (−1, 1), is given in Figure 2, left plot, for d ∈ (0, 1/2),and right plot, for d ∈ (−1/2, 0). When d ∈ (0, 1/2), the range of bd(c), c ∈ (−1, 1), is (0, π/2)and when d ∈ (−1/2, 0), it is (−π/2, 0). When combined into the phase φc of (2.7), this obviouslyleads to the phase φc that covers the whole range (−π/2, π/2). This discussion also shows that, forexample, the choice Q− = cQ+, c ∈ (−1, 1), would not lead to a general phase parameter for the

7

resulting bivariate LRD models. A related note is that we presently do not have an explicit formfor the inverse filter ∆c(B). But this observation also suggests that one could possibly work withthe filter

∆c(B)−1 = ((I −B)D + C(I −B−1)D)−1, (2.16)

if the goal is to have an explicit form of the filter ∆c(B) applied to the series {Xn}n∈Z. Other filtersthan (1.10) and (2.16) with interesting properties might also exist and could also be considered. Inusing (1.10), we aimed to have a general phase and an explicit form of the autocovariance function.

Remark 2.4 We have assumed in Proposition 2.1 that both component series are LRD, that is,d1, d2 ∈ (0, 1/2). In fact, the proposed model is not fully suited to accommodate the case when d’sare allowed to belong to the so-called “principal” range d ∈ (−1/2, 1/2), including the case d = 0associated with short-range dependence (SRD, for short). For example, if d2 = 0, the discussion inRemark 2.3 shows that the phase φc covers the range (0, π/2), and thus excludes φc ∈ (−π/2, 0).Similarly, when d2 < 0, only part of the range (0, π/2) is covered. When d1, d2 ∈ (−1/2, 1/2),a general phase could in fact be obtained by one of the following models (the introduced modelgiven first): with c ∈ (−1, 1),

Case of d1, d2 Same sign Opposite signs

Q+ Q

(c 00 1

)Q =: C+Q

Q−

(c 00 −c

)Q =: CQ

(1 00 −c

)Q =: C−Q

Indeed, this could be seen by using the expression (2.6) (modified accordingly for the model withQ+ = C+Q and Q− = C−Q) and the discussion found in Remark 2.3. When one of the d’s is zero,say d2 = 0, the model with Q+ = Q, Q− = CQ gives a positive phase φc only as discussed inRemark 2.3 but the model with Q+ = C+Q, Q− = C−Q gives a negative phase φc. In practice,the two models could be fitted for the range d1, d2 ∈ (−1/2, 1/2) and the model with the largerlikelihood could be selected (a BIC or other model selection criterion could also be used if thenumbers of parameters in larger models, as those considered below, are different). Whether thetwo models could be combined into a single model remains an open question.

Remark 2.5 We stress again that the case c = 0 corresponds to the phase φ = (d1 − d2)π/2 (andin particular not necessarily φ = 0.) Note also that if the two component series are interchanged (sothat d1 and d2 are interchanged, and φ becomes −φ), then the constant c in (2.1) changes to −c.Finally, we note that the relation (2.1) does not involve the covariance matrix Σ of the innovationterms, but that (2.11) and (2.12) obviously do.

Remark 2.6 Finally, we also note the following important point regarding the boundaries c = ±1of the range c ∈ (−1, 1). As c→ ±1, the phase parameter φc → ∓π/2. In the specification ω12e

−iφc

of the cross-spectrum constant, note that the cases φc = π/2 and φc = −π/2 are equivalent bychanging the sign of ω12, since ω12e

−iπ/2 = (−ω12)e−i(−π/2). In the model (1.9)–(1.10), the sign ofω12 is the same as the sign of σ12. From a practical perpective, this observation means that for themodel (1.9)–(1.10) with c close to 1 (−1, resp.), it would be common to estimate c close to −1 (1,resp.) and σ12 with the opposite sign, since the respective models are not that different. This isalso certainly what we observed in our simulations.

8

3 General phase VARFIMA(p,D, q) series

In this section, we generalize the model (1.9)–(1.10) by introducing autoregressive (AR, for short)and moving average (MA, for short) components to capture potential short-range dependenceeffects. For the one-sided model (1.1), this extension has been achieved in a number of ways.Naturally, we focus on extensions that preserve the general phase and identifiability properties. Wealso consider the problem of computing (theoretically or numerically) the autocovariance functionsof the introduced models, since these functions are used in estimation (see Section 4 below).

3.1 VARFIMA(0, D, q) series

We begin with the case p = 0 (where there is no AR part). Define the general phaseVARFIMA(0, D, q) series (two-sided VARFIMA(0, D, q) series) as

Yn = ∆c(B)−1Θ(B)Zn, (3.1)

where ∆c(B)−1 is the operator given by (1.10) and

Θ(B) = I2 + Θ1B + . . .+ ΘqBq (3.2)

is a matrix polynomial with 2×2 real-valued matrices Θs = (θjk,s)j,k=1,2, s = 1, . . . , q. As through-

out this paper, {Zn}n∈Z is a white noise series with EZnZ ′n = Σ = (σjk)j,k=1,2. In the special casewhere Θ(B) is diagonal or when d1 = d2, the model (3.1) is equivalent to

Yn = Θ(B)∆c(B)−1Zn. (3.3)

The two operators ∆c(B)−1 and Θ(B), however, do not commute in general. In fact, the two modelsin (3.1) and (3.3) are quite different. More specifically, if Θ(B) has at least one nonzero elementon the off diagonal and if d1 6= d2, the series {Yn}n∈Z in (3.3) can be thought to exhibit a formof fractional cointegration, by writing Θ(B)−1Yn = ∆c(B)−1Zn where the reduction of memory inone of the component series of {Yn}n∈Z could occur from linear combination of present and pastvariables of the two component series. On the other hand, fractional cointegration cannot occurunder the model (3.1). In the rest of this work, we will restrict our attention to this simpler case,leaving the investigation of fractional cointegration for future work.

In the next proposition, we compute the autocovariance function of the series in (3.1). Tsay(2010) calculated the autocovariance function of the one-sided analogue of (3.1) using the propertiesof the hypergeometric function. Our approach, which we find less cumbersome for the multivariatecase, is similar to the one used for the two-sided VARFIMA(0, D, 0) series in Proposition 5.1 ofKechagias and Pipiras (2015) (see also Remark 2.2 above).

Proposition 3.1 The (j, k) component γjk(n) of the autocovariance matrix function γ(n) of thebivariate two-sided VARFIMA(0, D, q) series in (3.1) is given by

γjk(n) =1

2π

2∑u,v=1

q∑s,t=0

θju,sθkv,tσuv

(a1,jkγ

(1)st,jk(n) + a2,jγ

(2)st,jk(n) + γ

(3)st,jk(n) + a4,kγ

(4)st,jk(n)

), (3.4)

where Θs = (θjk,s)j,k=1,2,s=1,...,q, Σ = (σjk)j,k=1,2,

a1,jk = c2(−1)j+k, a2,j = c(−1)j+1, a4,k = c(−1)k+1, (3.5)

9

andγ

(1)st,jk(n) = γ

(3)st,kj(n) = 2Γ(1− dj − dk) sin(πdk)

Γ(n+t−s+dk)Γ(n+t−s+1−dj) ,

γ(4)st,jk(n) = γ

(2)ts,jk(−n) =

{2π 1

Γ(dj+dk)Γ(dj+dk+n+t−s)

Γ(1+n+t−s) , n ≥ s− t,0 , n < s− t.

(3.6)

Proof: By using Theorem 11.8.3 in Brockwell and Davis (2009), the VARFIMA(0, D, q) seriesin (3.1) has a spectral density matrix

f(λ) =1

2πG(λ)ΣG(λ)∗, (3.7)

where G(λ) = ∆c(e−iλ)−1Θ(e−iλ). The (j, k) component of the spectral density is given by

fjk(λ) =1

2π

2∑u,v=1

q∑s,t=0

θju,sθkv,tσuve−i(s−t)λ(f1,jk(λ) + f2,jk(λ) + f3,jk(λ) + f4,jk(λ)), (3.8)

where

f1,jk(λ) = a1,jk(1− eiλ)−dj (1− e−iλ)−dk , f2,jk(λ) = a2,j(1− eiλ)−(dj+dk),

f3,jk(λ) = (1− e−iλ)−dj (1− eiλ)−dk , f4,jk(λ) = a4,k(1− e−iλ)−(dj+dk).(3.9)

Consequently, the (j, k) component of the autocovariance matrix satisfies γjk(n) =∫ 2π0 einλfjk(λ)dλ, which in view of the relations (3.8)–(3.9) implies (3.4)–(3.5) with

γ(1)st,jk(n) = γ

(3)st,kj(n) =

∫ 2π

0ei(n−s+t)λ(1− eiλ)−dj (1− e−iλ)−dkdλ,

γ(2)st,jk(n) =

∫ 2π

0ei(n−s+t)λ(1− eiλ)−xjkdλ, γ

(4)st,jk(n) =

∫ 2π

0ei(n−s+t)λ(1− e−iλ)−xjkdλ,

where xjk = dj + dk. The relations (3.6) follow from the evaluation of the integrals above as in theproof of Proposition 5.1 of Kechagias and Pipiras (2015). �

Remark 3.1 Since Θ(e−iλ) ∼ I2+Θ1+. . .+Θq as λ→ 0, and since the relation (2.1) in Proposition2.1 does not involve Σ, the two-sided VARFIMA(0, D, q) model has a general phase at the zerofrequency (with the same relation (2.1) between the phase φc and the parameter c). The parametersof Θ’s are identifiable if and only if they are identifiable for the same VARMA(0, q) model.

3.2 VARFIMA(p,D, q) series

We extend here the model (3.1) to a general phase fractionally integrated model containing bothautoregressive and moving average components. As for the one-sided model (1.1), two possibilitiescan be considered for this extension. Let Φ(B) = I2−Φ1B−. . .−ΦpB

p be the AR polynomial, whereΦr = (φjk,r)j,k=1,2, r = 1, . . . , p, are 2×2 real-valued matrices. Following the terminology of Sela and

Hurvich (2009), define the general phase VARFIMA(p,D, q) series (two-sided VARFIMA(p,D, q)series) {Xn}n∈Z as

Φ(B)Xn = ∆c(B)−1Θ(B)Zn, (3.10)

and the general-phase FIVARMA(p,D, q) series2 (two-sided FIVARMA(p,D, q) series) as

Φ(B)∆c(B)Xn = Θ(B)Zn. (3.11)

2The names VARFIMA and FIVARMA refer to the facts that the fractional integration (FI) operator ∆c(B)−1

is applied to the MA part in (3.10), and after writing Xn = ∆c(B)−1Φ(B)−1Θ(B)Zn, it is applied to the VARMAseries in (3.11).

10

(A priori, it is not clear whether (3.10) and (3.11) have a general phase but we use the terminologyin line with Sections 2 and 3.1.) The one-sided FIVARMA(p,D, q) series (with c = 0 in (3.11))have been more popular in the literature, with Lobato (1997), Sela and Hurvich (2009) and Tsay(2010) being notable exceptions. In particular, Sela and Hurvich (2009) investigated thoroughlythe differences between the one-sided analogues of the models (3.10) and (3.11), focusing on modelswith no MA part.

As expected, the two-sided VARFIMA and FIVARMA series differ if Φ(B) is nondiagonal andif d1 6= d2. Similarly to the discussion around the models (3.1) and (3.3), the VARFIMA modelwith nondiagonal Φ(B) allows for fractional cointegration in the sense discussed following therelation (3.3), which however cannot be produced by the FIVARMA model (3.11) (see Sela andHurvich (2009) for more details in the one-sided case). As indicated earlier, the case of fractionalcointegration will be pursued elsewhere (though we shall also briefly mention some numerical resultsin Section 5).

We will focus on the VARFIMA(p,D, q) series (3.10) with a diagonal AR part, in which casethe two models (3.10) and (3.11) are equivalent. Besides the obvious computational and simpli-fication advantages of this assumption, our consideration is also justified by similar assumptionsrecently used in Dufour and Pelletier (2014) for the construction of identifiable multivariate short-range dependent time series models. More specifically, Dufour and Pelletier (2014) show that anyVARMA(p, q) series can be transformed to have a diagonal AR (or MA) part with the cost ofincreasing the order of the MA (or AR) component. As a consequence, they construct identifiablerepresentations of VARMA(p, q) series where either AR or MA part is diagonal. As in Remark3.1, such two-sided VARFIMA(p,D, q) model has general phases at the zero frequency, and itsparameters are identifiable if and only if they are identifiable for the same VARMA(p, q) model.

The presence of the AR filter on the left-hand side of (3.10) makes it difficult to compute the au-tocovariance function of the series explicitly. Closed form formulas for the autocovariance functionof the one-sided model (3.11) with c = 0 were provided by Sowell (1986), albeit his implementa-tion is computationally inefficient as it requires multiple expensive evaluations of hypergeometricfunctions. The slow performance of Sowell’s approach was also noted by Sela (2010), who proposedfast approximate algorithms for calculating the autocovariance functions of the one-sided models(3.10) and (3.11) with c = 0 when p = 1 and q = 0. Although not exact, Sela’s algorithms arefast with negligible approximation errors. In fact, it is straightforward to extend these algorithmsto calculate the autocovariance function of a two-sided VARFIMA(1, D, q) series. For models withAR components of higher orders, however, this extension seems to require restrictive assumptionson the AR coefficients and therefore we do not pursue this approach.

Remark 3.2 There is yet another reason for making our assumption of a diagonal AR part Φ(B).By using the reparametrizations of Dufour and Pelletier (2014), the FIVARMA model (3.11) cantake the form (3.10) with diagonal Φ(B). Indeed, write first the model (3.11) as

∆c(B)Xn = Φ(B)−1Θ(B)Zn.

Next, by using the relation Φ(B)−1 = |Φ(B)|−1adj(Φ(B)), where | · | and adj(·) denote the deter-minant and adjoint of a matrix respectively, we can write

∆c(B)|Φ(B)|Xn = adj(Φ(B))Θ(B)Zn, (3.12)

where the commutation of ∆c(B) and |Φ(B)| is possible since |Φ(B)| is scalar-valued. LettingΦ(B) = diag(|Φ(B)|) and Θ(B) = adj(Φ(B))Θ(B), the relation (3.12) yields

∆c(B)Φ(B)Xn = Θ(B)Zn. (3.13)

11

Thus, a FIVARMA model with AR component of order p can indeed be written as a VARFIMAmodel with a diagonal AR part whose order will not exceed 2p (the maximum possible order of|Φ(B)|).

4 Estimation and other tasks

In this section, we discuss estimation of the general phase VARFIMA(p,D, q) model (3.10) intro-duced in Section 3.2. Estimation of the parameters of this model can be carried out by adaptingthe CLDL (Conditional Likelihood Durbin Levinson) estimation of Tsay (2010). Tsay’s method isappealing in our case for a number of reasons. First, as discussed in Section 4.1 below, the methodrequires only the knowledge of the autocovariance function of the general phase VARFIMA(0, D, q)series (3.1) for which we have an explicit form. Second, Tsay’s algorithm can be modified easily toyield multiple steps-ahead (finite sample) forecasts of the series. Finally, Tsay’s method has a mildcomputational cost, compared to most alternative estimation methods.

4.1 Estimation

The basic idea of Tsay’s CLDL algorithm is to transform a VARFIMA(p,D, q) series to aVARFIMA(0, D, q) series whose autocovariance function has a closed form. Then, a straightforwardimplementation of the well-known Durbin-Levinson (DL, for short) algorithm allows one to replacethe computationally expensive likelihood calculations of the determinant and the quadratic partwith less time consuming operations. We give next a brief description of the algorithm, startingwith some notation.

Let {Yn}n=1,...,N be the two-sided VARFIMA(0, D, q) series (3.1) and let Γ(k) = EYkY ′0 denoteits autocovariance function. Let also Θ = (vec(Θ1)′, . . . , vec(Θq)

′)′ be the vector containing theentries of the coefficient matrices of the MA polynomial Θ(B). Assuming that the bivariate whitenoise series {Zn} is Gaussian, we can express the likelihood function of {Yn}n=1,...,N with the aid ofthe multivariate DL algorithm (see Brockwell and Davis (2009), p. 422). More specifically, lettingη = (d1, d2, c, σ11, σ12, σ22,Θ

′)′ be the (6 + 4q)–dimensional vector containing all the parameters ofthe model (3.1), we can write the likelihood function as

L(η;Y ) = (2π)−N(N−1∏j=0

Vj

)−1/2exp

{− 1

2

N−1∑j=0

(Yj+1 − Yj+1)′V −1j (Yj+1 − Yj+1)

}, (4.1)

where Yj+1 = E(Yj+1|Y1, . . . , Yj) and Vj , j = 0, . . . , N − 1, are the one-step-ahead finite sam-ple predictors and their corresponding error covariance matrices obtained by the multivariate DLalgorithm. Using the fact that the series {Yn}n=1,...,N satisfies the relation

Φ(B)Xn = Yn, (4.2)

where {Xn}n=1,...,N is the two-sided VARFIMA(p,D, q) series (3.10), we can view{Φ(B)Xn}n=p+1,...,N as a two-sided VARFIMA(0, D, q) series, whose likelihood function conditionalon X1, . . . , Xp and Φ = (vec(Φ1)′, . . . , vec(Φp)

′)′ is given by

L(Φ, η;Xn|X1, . . . , Xp) ≡ L(η; Φ(B)Xn), n = p+ 1, . . . , N. (4.3)

The reason we do not absorb Φ in η, is to emphasize the different roles that these two parametershave in calculating the likelihood function in (4.3). More specifically, Φ is used to transform

12

the available data {Xn}n=1,...,N, to a two-sided VARFIMA(0, D, q) series {Yn}n=1,...,N, while η isnecessary to apply the DL algorithm.

The conditional likelihood estimators of Φ and η are then given by

(Φ, η) = argmaxΦ,η∈S

L(Φ, η;Xn|X1, . . . , Xp), (4.4)

where S = {η ∈ R6+4q : 0 < d1, d2 < 0.5, −1 < c < 1, |Σ| = σ11σ22 − σ212 > 0, (σjj)j=1,2 ≥ 0}

denotes the parameter space for η. Although there is no closed form solution for the estimates Φand η, they can be computed numerically using the quasi-Newton algorithm of Broyden, Fletcher,Goldfarb, and Shanno (BFGS).

4.2 Forecasting

The multivariate DL algorithm used in the estimation above yields the coefficients matricesΦn,1, . . . ,Φn,n in the 1–step-ahead forecast (predictor)

Yn+1 := Yn+1|n := E(Yn+1|Y1, . . . , Yn) = Φn,1Yn + . . .+ Φn,nY1, (4.5)

as well as the associated forecast error matrix Vn = E(Yn+1−Yn+1)(Yn+1−Yn+1)′. The h–step-aheadforecasts, h ≥ 1, on the other hand, are given by

Yn+h|n := E(Yn+h|Y1, . . . , Yn) = F hn,1Yn + . . .+ F hn,nY1, (4.6)

where F hn,k, k = 1, . . . , n, are 2× 2 real-valued coefficient matrices, with the corresponding forecasterror matrix

Wn+h−1|n = E(Yn+h − Yn+h|n)(Yn+h − Yn+h|n)′. (4.7)

The 1-step-ahead forecasts can be used recursively by repeated conditioning to obtain recursiveexpressions for the coefficient matrices F hh,k and an expression for the error matrix Wn+h−1|n, asstated in the next result. The standard proof is omitted for shortness sake.

Proposition 4.1 Let h ≥ 1 and Φn,k, n ≥ 1, k = 1, . . . , n, be as above. Then, the matrices F hn,kin (4.6) satisfy the recursive relation

F hn,k = Φn+h−1,h+k−1 +h−1∑j=1

Φn+h−1,jFh−jn,k , (4.8)

with F 1n,k := Φn,k, n ≥ 1, k = 1, . . . , n. Moreover, the corresponding error matrices Wn+h−1|n in

relation (4.7), are given by

Wn+h−1|n = Γ(0)−n∑j=1

F hn,jΓ(h+ j − 1)′, (4.9)

where Γ(n) = EYnY ′0 is the autocovariance matrix function of {Yn}n∈Z.

Remark 4.1 In the time series literature (e.g. Brockwell and Davis (2009)), it is more succinctand common to express the h–step-ahead forecasts by using the coefficient matrices appearing inthe multivariate Innovations (IN, for short) algorithm. We use the DL algorithm in both estimationand forecasting since it is faster than the IN algorithm: the coefficient matrices Φn,k in (4.5) arecomputed in O(n2) number of steps, whereas the computational complexity for the analogouscoefficients in the IN algorithm is O(n3).

13

The DL algorithm used in the forecasting procedure and formulae above is based on the assump-tion that the autocovariance function of the time series {Yn}n∈Z can readily be computed, as forexample for the two-sided VARFIMA(0, D, q) series. We now turn our attention to the two-sidedVARFIMA(p,D, q) series {Xn}n∈Z defined through {Yn}n∈Z in (4.2). As we do not have an explicitform of the autocovariance function of {Xn}n∈Z, it is not immediately clear how to calculate theh–step-ahead forecasts

Xn+h|n = E(Xn+h|X1, . . . , Xn)

and the corresponding error matrices

Wn+h−1|n = E(Xn+h − Xn+h|n)(Xn+h − Xn+h|n)′,

n ≥ 1, h ≥ 1. In Proposition 4.2 below, we show that Xn+h|n and Wn+h−1|n can be calculated

approximately and recursively from Yn+h|n and Wn+h−1|n. For simplicity and since this order willbe used in the simulations and the application below, we focus on the case p = 1. However, theproposition can be extended for larger values of p.

Proposition 4.2 Let F hn,k, n ≥ 1, k = 1, . . . , n, be as in (4.6). Then, the h–step-ahead forecasts

Xn+h|n = E(Xn+h|X1, . . . , Xn) satisfy

Xn+h|n = X(a)n+h|n +Rn+h|n, (4.10)

where

X(a)n+h|n = Φh

1Xn +h−1∑s=0

Φs1Yn+h−s|n, (4.11)

Rn+h|n =h−1∑s=0

Φs1

(E(Yn+h−s|X1, . . . , Xn)− E(Yn+h−s|Y1, . . . , Yn)

). (4.12)

Moreover, the error matrices W(a)n+h−1|n = E(Xn+h − X

(a)n+h|n)(Xn+h − X

(a)n+h|n)′ can be computed by

W(a)n+h−1|n =

h−1∑s=0

Φs1Wn+h−s−1|n(Φs

1)′ +

h−1∑s,t=0s 6=t

Φs1As,t(n+ h)(Φt

1)′, (4.13)

where

As,t(n+ h) = Γ(t− s)−n∑k=1

Γ(h− s+ k − 1)(F h−tn,k )′ (4.14)

and Γ(n) = EYnY ′0 is the autocovariance matrix function of {Yn}n∈Z.

Proof: By using the relation (4.2) recursively, we can write

Xn+h = Φh1Xn +

h−1∑s=0

Φs1Yn+h−s, h = 1, 2, . . . (4.15)

which implies that

Xn+h|n = Φh1Xn +

h−1∑s=0

Φs1E(Yn+h−s|X1, . . . , Xn). (4.16)

14

Since E(Yn+h−s|Y1, . . . , YN ) = Yn+h−s|n, the relation (4.16) yields (4.11).Next, we subtract (4.11) from (4.15) to get

Xn+h − X(a)n+h|n =

h−1∑s=0

Φs(Yn+h−s − Yn+h−s|n).

The h–step-ahead error matrix W(a)n+h−1|n is then given by

W(a)n+h−1|n = E

( h−1∑s=0

Φs1(Yn+h−s − Yn+h−s|n)

)( h−1∑t=0

Φt1(Yn+h−t − Yn+h−t|n)

)′=

h−1∑s=0

Φs1Wn+h−s−1|n(Φs

1)′ +h−1∑s,t=0s 6=t

Φs1As,t(n+ h)(Φt

1)′, (4.17)

where As,t(u) = E(Yu−s − Yu−s|n)(Yu−t − Yu−t|n)′. To show that As,t(u) satisfies (4.14), note thatfor s, t = 0, . . . , u− n− 1, s 6= t, we have

EYu−s|nY ′u−t = E(E(Yu−s|nY′u−t|Y1, . . . , YN )) = EYu−s|nE(Yu−t|Y1, . . . , YN ) = EYu−s|nY ′u−t|n.

Hence,

As,t(u) = EYu−sY ′u−t − EYu−sY ′u−t|n − EYu−s|nY ′u−t + EYu−s|nY ′u−t|n= Γ(t− s)− EYu−sY ′u−t|n

= Γ(t− s)− EYu−s( n∑k=1

F u−t−nn,k Yn−k+1

)′= Γ(t− s)−

n∑k=1

Γ(u− s− n+ k − 1)(F u−t−nn,k )′, (4.18)

yielding the relations (4.13)–(4.14). �

Since Xn − Φ1Xn−1 = Yn for the VARFIMA(1, D, q) series {Xn} and the VARFIMA(0, D, q)series {Yn}, the approximation error Rn+h|n in (4.12) becomes negligible for large n. For this reason,

in the simulations and the application below, we shall use the approximate forecasts X(a)n+h|n in (4.10)

and their forecast error matrices W(a)n+h−1|n given by (4.13).

5 Simulation study

In this section, we perform a Monte Carlo simulation study to assess the performance of the CLDLalgorithm for the VARFIMA(p,D, q) model (3.10) described in Section 4.1. We examine fourdifferent models with AR and MA components of orders p, q = 0, 1. When either the MA or theAR part is present, we shall consider a non-diagonal coefficient matrix. This is somewhat in contrastto what was stated in Section 3.2 for the AR part but we sought to see what happens when theAR part is non-diagonal as well and the results for the diagonal AR parts (not reported here) werequalitatively similar or better. For each model, we consider three sample sizes N = 200, 400, 1000.The Gaussian time series data are generated using the fast and exact synthesis algorithm of Helgasonet al. (2011), while the number of replications is 100.

15

To solve the maximization problem (4.4), we use the SAS/IML nlpqn function, which im-plements the BFGS quasi-Newton method, a popular iterative optimization algorithm. For ouroptimization scheme, we follow the approach found in Tsay (2010). A first step is to eliminatethe nonlinear inequality constraint |Σ| ≥ 0 in the parameter space S (defined in Section 4.1, byletting Σ = U ′U , where U = (Ujk)j,k=1,2 is an upper triangular matrix (Σ is nonnegative def-inite and such a factorization always exists). Then, the parameter vector θ can be written asθ = (d1, d2, c, U11, U12, U22,Θ

′)′ while the parameter space becomes S = {θ ∈ R6+4q : 0 < d1, d2 <0.5, −1 < c < 1}.

Next, we describe our strategy on selecting initial parameter values (ΦI , θI) for the BFGSmethod. Let

Φ01 = (φ0

jk,1)j,k=1,2, θ0 = (d01, d

02, c

0, U 011, U

012, U

022, (Θ

01)′)′, (5.1)

where Θ01 = (θ0jk,1)j,k=1,2, be the true parameter values. We consider initial values

dIk =2d0

k

1 + 2d0k

, cI =2c0

1 + |c0|, U I

jk = 1, θIjk,1 =eθ

0jk,1 − 1

eθ0jk,1 + 1

, φIjk,1 =eφ

0jk,1 − 1

eφ0jk,1 + 1

, (5.2)

where j, k = 1, 2. Note that the transformations (5.2) are essentially perturbations of the trueparameter values that also retain the range of the parameter space S. For example, the value of dIkwill be zero (or 1/2) when d0

k is also zero (or 1/2). Moreover, even though the parameter space Sdoes not include identifiability (including stability) constraints for the elements of the AR and MApolynomials as discussed in Section 3.2, we did not encounter any cases where the optimizationalgorithm considers such values.

Table 1 and Figure 3 present estimation results for the four models considered. For all simu-lations, we take (dropping the superscript 0 for simplicity) d1 = 0.2, d2 = 0.4, c = 0.6, Σ11 = 3,Σ12 = 0.5, Σ22 = 3, and wherever present, Φ11 = φ11,1 = 0.5, Φ12 = φ12,1 = 0.2, Φ21 = φ21,1 = 0.4,Φ22 = φ22,1 = −0.8, Θ11 = θ11,1 = 0.1, Θ12 = θ12,1 = −0.6, Θ21 = θ21,1 = 0.2, Θ22 = θ22,1 = 0.8.3

We also performed simulations for several other values of these parameters and got similar resultsand therefore we omit them in favor of space economy. Table 1 lists the median differences betweenthe estimates and the corresponding true values, and the respective median absolute deviations.Figure 3, on the other hand, includes the boxplots of the estimates for the various parameters.While the table concerns only the results for the sample sizes N = 200 and 400, the figure alsoincludes the case of N = 1000.

The results in Table 1 and Figure 3 indicate a satisfactory performance of the CLDL algorithmfor most cases considered: the median differences are small overall and tend to decrease with theincreasing sample size, and the decrease with the increasing sample size is also evident for themedian deviations; moreover, many median deviations and box sizes are relatively small as well.We also note that for some cases and smaller samples spaces, we could see bimodality in thehistograms of the corresponding estimates (not included here for shortness sake). For example, thiswas the case for c, (p, q) = (1, 1) when N = 200, as suggested by the larger median difference inTable 1. But bimodality either diminished or completely disappeared as the sample size increased.

Finally, we also comment on the model selection task concerning the one- and two-sided modelswhen using BIC and AIC. Figure 4, the left plot, presents the proportion of times that these informa-tion criteria select the one-sided VARFIMA(0, D, 0) model over the two-sided VARFIMA(0, D, 0),when in fact the latter model is true, for the same parameter values as in Table 1 in the casep = q = 0. The right plot of the figure presents the analogous plot for a different set of values ofthe parameters d1, d2 and c. The performance of the model selection criteria is satisfactory overall.

3For this choice of d1, d2, c, the phase parameter is equal to φ = 1.15. Taking c = −0.1985 with the same d’swould yield zero phase.

16

d1 d2 c U11 U12 U22 d1 d2 c U11 U12 U22 d1 d2 c U11 U12 U22

-0.5

0

0.5

1

d1 d2 c U11 U12 U22 Θ11Θ12Θ21Θ22 d1 d2 c U11 U12 U22 Θ11Θ12Θ21Θ22 d1 d2 c U11 U12 U22 Θ11Θ12Θ21Θ22

-0.5

0

0.5

1

d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22

-1

-0.5

0

0.5

1

d1 d2 c U11U12U22Φ11Φ12Φ21Φ22Θ11Θ12Θ21Θ22 d1 d2 c U11U12U22Φ11Φ12Φ21Φ22Θ11Θ12Θ21Θ22 d1 d2 c U11U12U22Φ11Φ12Φ21Φ22Θ11Θ12Θ21Θ22

-1

-0.5

0

0.5

1

Figure 3: The red solid lines are the medians and the blue dashed lines are the true parameter values

(except for U11, U22 which are centered at 0).Top to bottom: (p, q) = (0, 0),(0, 1),(1, 0) and (1, 1). Left to

right: N = 200, 400, 1000.

17

(p, q) (0, 0) (0, 1) (1, 0) (1, 1)

N 200 400 200 400 200 400 200 400

d10.002 −0.002 −0.023 −0.015 −0.067 −0.007 −0.058 −0.0170.030 0.031 0.058 0.040 0.110 0.065 0.117 0.059

d2−0.017 −0.005 −0.011 −0.005 −0.022 −0.026 −0.024 −0.0100.035 0.029 0.047 0.033 0.040 0.032 0.042 0.033

c0.025 0.010 0.083 0.031 0.042 0.035 0.171 0.1800.059 0.041 0.119 0.065 0.066 0.048 0.101 0.101

Φ11/Θ110.024 0.009 0.059 0.013 0.067/0.009 0.031/0.0100.069 0.044 0.081 0.057 0.092/0.066 0.065/0.055

Φ12/Θ120.123 0.043 −0.022 −0.036 −0.049/0.159 −0.026/0.1470.074 0.030 0.080 0.066 0.076/0.191 0.052/0.216

Φ21/Θ210.033 0.021 0.005 0.001 0.029/0.028 0.033/0.0010.113 0.077 0.025 0.016 0.034/0.091 0.026/0.084

Φ22/Θ22−0.061 −0.001 −0.002 −0.004 0.000/−0.242 0.001/−0.2290.061 0.039 0.021 0.012 0.008/0.129 0.008/0.079

U11−0.041 −0.028 −0.114 −0.043 −0.051 −0.051 −0.194 −0.1830.077 0.064 0.126 0.079 0.085 0.068 0.106 0.085

U12−0.009 0.001 0.092 0.033 −0.005 −0.008 0.053 0.0680.093 0.064 0.131 0.092 0.101 0.071 0.103 0.093

U220.064 0.016 0.212 0.057 0.084 0.077 0.209 0.2650.202 0.135 0.296 0.191 0.242 0.145 0.220 0.200

Table 1: Median differences between the estimates and the corresponding true values (top value ineach cell) and median absolute deviations (bottom value in each cell) for the estimated parametersof VARFIMA series with (p, q) = (0, 0), (0, 1), (1, 0), (1, 1).

6 Application

In this section, we apply the CLDL algorithm to analyze inflation rates in the U.S. under thetwo-sided VARFIMA(p,D, q) model discussed in Section 3.2. Evidence of long-range dependencebehavior in inflation rates has been found in a number of works (see, for example, Baillie etal. (1996), Doornik and Ooms (2004), Hurvich and Sela (2009), Baillie and Moreno (2012) andreferences therein). More specifically, Hurvich and Sela (2009) tested the fit of several long- andshort-range dependent models on the annualized monthly inflation rates for goods and servicesin the U.S. during the period of February 1956–January 2008 (N = 624 months) and selected aone-sided VARFIMA model as the best choice. Besides their long memory features, however, thetime series of inflation rates often exhibit asymmetric behavior, and therefore call for multivariateLRD models that allow for general phase.

Following the notation of Sela (2010),4 we denote the Consumer Price Indices series for com-modities as {CPIcn}n=0,...,N and the corresponding series for services as {CPIsn}n=0,...,N . Then, wedefine the annualized monthly inflation rates for goods and services as

gn = 1200CPIcn − CPIcn−1

CPIcn−1

and sn = 1200CPIst − CPIsn−1

CPIsn−1

,

respectively. The two series {gn}n=1,...,N , {sn}n=1,...,N are depicted in Figure 5.5

4See also the accompanying R code.5The consumer price indices (raw) data are available online from the Bureau of Labor Statistics.

18

50 100 150 200 250 300

0

10

20

30

40

50

60

70

80

90

100

50 100 150 200 250 300

0

10

20

30

40

50

60

70

80

90

100

Figure 4: The proportion of times that the considered information criteria select the one-sidedVARFIMA(0, D, 0) model over the two-sided one, when the latter is true.

Year0 100 200 300 400 500 600

Annualizedmonthly

inflationrate–Goods

-20

-10

0

10

20

30

Year0 100 200 300 400 500 600

Annualizedmonthly

inflationrate–Services

-20

-10

0

10

20

30

Figure 5: Annualized monthly inflation rates for goods (left) and services (right) from February1956 to January 2008.

The two plots in Figure 6 provide some motivation for why a general phase model is needed forthis dataset. More specifically, the left plot in Figure 6 depicts the sample cross-correlation functionρ12(h) of the two series for all lags such that |h| < 25. Observe that for negative lags the samplecross-correlation function decays faster than for positive lags suggesting time-non-reversibility ofthe series and hence non-zero phase.

Further evidence for general phase can be obtained from the local Whittle estimation of Robin-son (2008) which can be used to estimate the phase and the LRD parameters directly from the data.The estimation is semiparametric in the sense that it only requires specification of the spectral den-sity at low frequencies. The right plot in Figure 6 depicts two local Whittle estimates of the phaseparameter φ as functions of m – a tuning parameter representing the number of lower frequenciesused in the estimation. The dashed line corresponds to the special phase estimate φ = (d1− d2)π/2of the one-sided VARFIMA model based on the local Whittle estimates of the two d’s. On theother hand, the solid line shows the phase parameter estimated directly from the data. The twolines being visibly different suggest that the special phase parameter and the associated VARFIMAmodel are not appropriate. A more detailed local Whittle analysis of the dataset can be found inBaek et al. (2018).

19

lag-25 -20 -15 -10 -5 0 5 10 15 20 25

γ12(h)

-0.1

0

0.1

0.2

0.3

0.4

0.5

50 100 150 200 250 300 350 400 450−1.5

−1

−0.5

0

0.5

φ

m

Whitt le (data driven)

Mode l (spec ial phase)

Figure 6: Left plot: Sample cross-correlation ρ12(h) of the series {gn}n=1,...,N and {sn}n=1,...,N

depicted in Figure 5 for |h| ≤ 25. Right plot: Local Whittle phase estimates, one corresponding tothe one-sided VARFIMA (dashed line) and one estimated directly from the data (solid line). Bothestimates are plotted as functions of a tuning parameter m = N0.25+0.0125k, k = 1, . . . , 51, whereN = 624 is the sample size of the series.

In the analysis of Sela and Hurvich (2009), the one-sided VARFIMA(1, D, 0) model was selectedas the best choice (based on AIC), amongst vector autoregressive models of both low and high ordersand also amongst one-sided VARFIMA(p,D, 0) and FIVARMA(p,D, 0) models with p ≤ 1. Theestimated VARFIMA(1, D, 0) model, in particular, was

gn = 0.3027gn−1 + 0.4245sn−1 + ε1,n,sn = −0.0237gn−1 − 0.3085sn−1 + ε2,n,

(6.1)

where (ε1,n

(I −B)0.4835ε2,n

)∼ N

(0,

(20.23 0.460.46 7.08

)). (6.2)

We should note here that the SAS optimization algorithm we used produced estimates similar tothose of Sela’s algorithm (implemented in R), for all parameters except d1 which Sela estimates tobe zero while for this model we estimated it to be 0.191. More specifically, the estimated one-sidedVARFIMA(1, D, 0) model is

gn = 0.132 (0.064) gn−1 + 0.076 (0.080) sn−1 + ε1,n,sn = 0.056 (0.023) gn−1 − 0.308 (0.044) sn−1 + ε2,n,

(6.3)

where ((I −B)0.191 (0.052)ε1,n(I −B)0.475 (0.062)ε2,n

)∼ N

(0,

(20.21 0.750.75 7.02

))(6.4)

and the underlying U in Σ = U ′U has U11 = 4.605 (0.131), U12 = 0.164 (0.109) and U22 =2.645 (0.075), with standard errors of the estimates added in the parentheses throughout.

The parameter estimates in (6.1)–(6.2) reveal an interesting feature, noted by Sela (2010).In particular, while the lagged services inflation has a significant influence on goods inflation, thelagged goods inflation seems to have a small effect on services inflation. This behavior is potentiallyrelated to the so-called gap between the prices in services and the prices in goods which was studiedby Peach et al. (2004). More specifically, the term gap refers to the tendency of prices in services

20

to increase faster than prices in goods. But note that this effect is not present in the estimatedmodel (6.3)–(6.4) and, in fact, is reversed.

For comparison, we present next two estimated two-sided VARFIMA(1, D, 0) models. Theestimated two-sided VARFIMA(1, D, 0) model with a diagonal AR component is

gn = 0.106 (0.053) gn−1 + ε1,n,sn = −0.439 (0.080) sn−1 + ε2,n,

(6.5)

where(((I −B)−0.231 (0.043) + 0.344 (0.158)(I −B−1)−0.231)−1ε1,n

((I −B)−0.439 (0.044) − 0.344(I −B−1)−0.439)−1ε2,n

)∼ N

(0,

(12.141 0.9480.948 11.680

))(6.6)

and the corresponding U in Σ = U ′U has U11 = 3.484 (0.408), U12 = 0.272 (0.154) and U22 =3.407 (0.471). The estimated two-sided VARFIMA(1, D, 0) model with general AR component is

gn = 0.139 (0.061) gn−1 + 0.100 (0.084) sn−1 + ε1,n,sn = 0.046 (0.023) gn−1 − 0.416 (0.087) sn−1 + ε2,n,

(6.7)

where(((I −B)−0.190 (0.056) + 0.315 (0.186)(I −B−1)−0.190)−1ε1,n

((I −B)−0.437 (0.046) − 0.315(I −B−1)−0.437)−1ε2,n

)∼ N

(0,

(12.528 0.9550.955 11.130

))(6.8)

and the corresponding U in Σ = U ′U has U11 = 3.539 (0.499), U12 = 0.270 (0.167) and U22 =3.326 (0.524). In terms of AIC, listed from smallest to largest, the estimated models are (6.7)–(6.8), (6.3)–(6.4) and (6.5)–(6.6). In terms of BIC, the models are (6.5)–(6.6), (6.3)–(6.4) and(6.7)–(6.8). In either case, a two-sided model is preferred to the one-sided model (6.3)–(6.4). Wealso note that the models with the smallest AIC and BIC in the class of one- and two-sided models(with the orders up to 1) are VARFIMA(1, D, 1). We focus on VARFIMA(1, D, 0) for shortnesssake and comparison to an earlier work by Sela.

We have a number of interesting observations related to the estimated parameters in (6.5)–(6.8).First, note that the off-diagonal elements of the AR component in (6.7) are close to 0. This suggeststhat the (a)symmetry behavior in the inflation rates is now captured by the parameter c. Indeed,plugging the estimates of d1, d2 and c from the models (6.5)–(6.6) and (6.7)–(6.8) in relation (2.9),we obtain phase estimates around −0.85 which agree with the local Whittle estimate of the phasein the right plot of Figure 6. Note that the local Whittle estimates of d1 and d2 which we did notreport here are close to 0.2 and 0.38 respectively, and hence they are also close to the estimatedparameters of the two two-sided VARFIMA models.

Second, related to the “gap” phenomenon discussed briefly following (6.4), we also note that aswith (6.3)–(6.4), this effects is not present and, in fact, reversed in the model (6.7)–(6.8) in the sensethat the AR coefficient φ12 is non-significant and the AR coefficient φ21 is (marginally) significant.We note that the latter makes sense from the perspective of the estimated LRD parameters in thefollowing sense. As estimated, the driving noise series {ε1,n} and {ε2,n} have respective memoryparameters 0.190 and 0.437. Assuming φ12 = 0 allows the series {gn} to have the smaller memoryparameter 0.190, whereas {sn} has the larger memory parameter 0.437. If φ12 were significant(non-zero), then the series {gn} would be forced to have the larger memory parameter 0.437 aswell, which is not that observed through the local Whittle estimation as noted above.

We conclude this section with a few comments regarding forecasting. Figure 7 presents root-mean-square (out-of-sample) forecasting errors over the horizons h = 1, . . . , 25, obtained by fitting

21

the indicated one- and two-sided models over a number of sliding windows (200 or 400, resp.) ofvarying size (400 and 200, resp.). The two-sided models generally outperform or are comparableto their one-sided counterparts. We included these forecasts for the sake of completeness, realizingthat assessing the role of LRD in forecasting with ARFIMA models is known to be quite delicateeven in the univariate case (e.g. Ray (1993), Ellis and Wilson (2004)).

0 5 10 15 20 25

7.5

8

8.5

9

9.5

10

0 5 10 15 20 25

6.7

6.8

6.9

7

7.1

7.2

7.3

7.4

7.5

0 5 10 15 20 25

7.6

7.8

8

8.2

8.4

8.6

8.8

9

9.2

9.4

0 5 10 15 20 25

6.8

6.9

7

7.1

7.2

7.3

7.4

7.5

Figure 7: Root-mean-square (out-of-sample) forecasting errors over the horizons h = 1, . . . , 25,obtained by fitting the indicated one- and two-sided models over a number of sliding windows (200or 400, resp.) of varying size (400 and 200, resp.).

References

Baek, C., Kechagias, S. & Pipiras, V. (2018), Asymptotics of bivariate local Whittle estimatorswith applications to fractal connectivity, Preprint.

Baillie, R. T. & Morana, C. (2012), ‘Adaptive ARFIMA models with applications to inflation’,Economic Modelling 29(6), 2451–2459.

Baillie, R. T., Chung, C.-F. & Tieslau, M. A. (1996), ‘Analysing inflation by the fractionallyintegrated ARFIMA-GARCH model’, Journal of applied econometrics 11(1), 23–40.

Beran, J., Feng, Y., Ghosh, S. & Kulik, R. (2013), Long-Memory Processes, Springer, Heidelberg.

Brockwell, P. J. & Davis, R. A. (2009), Time Series: Theory and Methods, Springer Series inStatistics, Springer, New York. Reprint of the second (1991) edition.

22

Diongue, A. K. (2010), ‘A multivariate generalized long memory model’, Comptes Rendus Mathe-matique 348(5), 327–330.

Doornik, J. A. & Ooms, M. (2004), ‘Inference and forecasting for ARFIMA models with an appli-cation to US and UK inflation’, Studies in Nonlinear Dynamics & Econometrics.

Doukhan, P., Oppenheim, G. & Taqqu, M. S. (2003), Theory and Applications of Long-RangeDependence, Birkhauser Boston Inc., Boston, MA.

Dueker, M. & Startz, R. (1998), ‘Maximum-likelihood estimation of fractional cointegration with anapplication to US and Canadian bond rates’, Review of Economics and Statistics 80(3), 420–426.

Dufour, J.-M. & Pelletier, D. (2014), ‘Practical methods for modeling weak VARMA processes:identification, estimation and specification with a macroeconomic application’, Preprint.

Ellis, C. & Wilson, P. (2004), ‘Another look at the forecast performance of ARFIMA models’,International Review of Financial Analysis 13(1), 63–81.

Giraitis, L., Koul, H. L. & Surgailis, D. (2012), Large Sample Inference for Long Memory Processes,Imperial College Press, London.

Helgason, H., Pipiras, V. & Abry, P. (2011), ‘Fast and exact synthesis of stationary multivariateGaussian time series using circulant embedding’, Signal Processing 91(5), 1123–1133.

Kechagias, S. & Pipiras, V. (2015), ‘Definitions and representations of multivariate long-rangedependent time series’, Journal of Time Series Analysis 36(1), 1–25.

Lobato, I. N. (1997), ‘Consistency of the averaged cross-periodogram in long memory series’, Journalof Time Series Analysis 18(2), 137–155.

Martin, V. L. & Wilkins, N. P. (1999), ‘Indirect estimation of ARFIMA and VARFIMA models’,Journal of Econometrics 93(1), 149–175.

Pai, J. & Ravishanker, N. (2009a), ‘Maximum likelihood estimation in vector long memory processesvia em algorithm’, Computational Statistics & Data Analysis 53(12), 4133–4142.

Pai, J. & Ravishanker, N. (2009b), ‘A multivariate preconditioned conjugate gradient approachfor maximum likelihood estimation in vector long memory processes’, Statistics & ProbabilityLetters 79(9), 1282–1289.

Palma, W. (2007), Long-Memory Time Series, John Wiley & Sons, Inc., Hoboken, New Jersey,USA.

Park, K. & Willinger, W. (2000), Self-Similar Network Traffic and Performance Evaluation, WileyOnline Library.

Peach, R., Rich, R. & Antoniades, A. (2004), ‘The historical and recent behaviour of goods andservices inflation’, Economic Policy Review pp. 19–31.

Pipiras, V. & Taqqu, M. S. (2017), Long-Range Dependence and Self-Similarity, Cambridge Uni-versity Press, Cambridge.

23

Ravishanker, N. & Ray, B. K. (1997), ‘Bayesian analysis of vector ARFIMA processes’, AustralianJournal of Statistics 39(3), 295–311.

Ray, B. K. (1993), ‘Modeling long-memory processes for optimal long-range prediction’, Journal ofTime Series Analysis 14(5), 511–525.

Robinson, P. M. (2003), Time Series with Long Memory, Advanced Texts in Econometrics, OxfordUniversity Press, Oxford.

Robinson, P. M. (2008), ‘Multiple local Whittle estimation in stationary systems’, The Annals ofStatistics 36(5), 2508–2530.

Sela, R. J. (2010), Three essays in econometrics: multivariate long memory time series and applyingregression trees to longitudinal data, PhD thesis, New York University.

Sela, R. J. & Hurvich, C. M. (2009), ‘Computationally efficient methods for two multivariatefractionally integrated models’, Journal of Time Series Analysis 30(6), 631–651.

Sowell, F. (1986), ‘Fractionally integrated vector time series’, PhD thesis, Duke University.

Tsay, W.-J. (2010), ‘Maximum likelihood estimation of stationary multivariate ARFIMA processes’,Journal of Statistical Computation and Simulation 80(7), 729–745.

24

Identi cation, estimation and applications of a...

Documents

Transcript of Identi cation, estimation and applications of a...