A multivariate time series is the
Transcript of A multivariate time series is the
Multivariate Time Series
Consider n time series variables {y1t}, . . . , {ynt}. Amultivariate time series is the (n×1) vector time series{Yt} where the ith row of {Yt} is {yit}. That is, forany time t, Yt = (y1t, . . . , ynt)
0.
Multivariate time series analysis is used when one
wants to model and explain the interactions and co-
movements among a group of time series variables:
• Consumption and income
• Stock prices and dividends
• Forward and spot exchange rates
• interest rates, money growth, income, inflation
Stock and Watson state that macroeconometricians
do four things with multivariate time series
1. Describe and summarize macroeconomic data
2. Make macroeconomic forecasts
3. Quantify what we do or do not know about the
true structure of the macroeconomy
4. Advise macroeconomic policymakers
Stationary and Ergodic Multivariate Time Series
A multivariate time series Yt is covariance stationary
and ergodic if all of its component time series are
stationary and ergodic.
E[Yt] = μ = (μ1, . . . , μn)0
var(Yt) = Γ0 = E[(Yt − μ)(Yt − μ)0]
=
⎛⎜⎜⎜⎝var(y1t) cov(y1t, y2t) · · · cov(y1t, ynt)
cov(y2t, y1t) var(y2t) · · · cov(y2t, ynt)... ... . . . ...
cov(ynt, y1t) cov(ynt, y2t) · · · var(ynt)
⎞⎟⎟⎟⎠
The correlation matrix of Yt is the (n× n) matrix
corr(Yt) = R0 = D−1Γ0D
−1
where D is an (n×n) diagonal matrix with jth diag-
onal element (γ0jj)1/2 =var(yjt)
1/2.
The parameters μ, Γ0 and R0 are estimated from data
(Y1, . . . ,YT ) using the sample moments
Y =1
T
TXt=1
Ytp→ E[Yt] = μ
Γ0 =1
T
TXt=1
(Yt−Y)(Yt−Y)0p→ var(Yt) = Γ0
R0 = D−1Γ0D−1 p→ corr(Yt) = R0
where D is the (n×n) diagonal matrix with the samplestandard deviations of yjt along the diagonal. The
Ergodic Theorem justifies convergence of the sample
moments to their population counterparts.
Cross Covariance and Correlation Matrices
With a multivariate time series Yt each component
has autocovariances and autocorrelations but there
are also cross lead-lag covariances and correlations be-
tween all possible pairs of components. The autoco-
variances and autocorrelations of yjt for j = 1, . . . , n
are defined as
γkjj = cov(yjt, yjt−k),
ρkjj = corr(yjt, yjt−k) =γkjj
γ0jj
and these are symmetric in k: γkjj = γ−kjj , ρkjj = ρ−kjj .
The cross lag covariances and cross lag correlations
between yit and yjt are defined as
γkij = cov(yit, yjt−k),
ρkij = corr(yjt, yjt−k) =γkijqγ0iiγ
0jj
and they are not necessarily symmetric in k. In gen-
eral,
γkij = cov(yit, yjt−k) 6= cov(yit, yjt+k)= cov(yjt, yit−k) = γ−kij
Defn:
• If γkij 6= 0 for some k > 0 then yjt is said to lead
yit.
• If γ−kij 6= 0 for some k > 0 then yit is said to lead
yjt.
• It is possible that yit leads yjt and vice-versa. Inthis case, there is said to be feedback between
the two series.
All of the lag k cross covariances and correlations are
summarized in the (n×n) lag k cross covariance and
lag k cross correlation matrices
Γk = E[(Yt − μ)(Yt−k − μ)0] =⎛⎜⎜⎜⎝cov(y1t, y1t−k) cov(y1t, y2t−k) · · · cov(y1t, ynt−k)cov(y2t, y1t−k) cov(y2t, y2t−k) · · · cov(y2t, ynt−k)
... ... . . . ...cov(ynt, y1t−k) cov(ynt, y2t−k) · · · cov(ynt, ynt−k)
⎞⎟⎟⎟⎠Rk = D
−1ΓkD−1
The matrices Γk and Rk are not symmetric in k but
it is easy to show that Γ−k = Γ0k and R−k = R0k.
The matrices Γk and Rk are estimated from data
(Y1, . . . ,YT ) using
Γk =1
T
TXt=k+1
(Yt − Y)(Yt−k − Y)0
Rk = D−1ΓkD−1
Multivariate Wold Representation
Any (n × 1) covariance stationary multivariate timeseries Yt has a Wold or linear process representationof the form
Yt = μ+ εt +Ψ1εt−1 +Ψ2εt−2 + · · ·
= μ+∞Xk=0
Ψkεt−k
Ψ0 = Inεt ∼WN(0,Σ)
Ψk is an (n× n) matrix with (i, j)th element ψkij. Inlag operator notation, the Wold form is
Yt = μ+Ψ(L)εt
Ψ(L) =∞Xk=0
ΨkLk
elements of Ψ(L) are 1-summable
The moments of Yt are given by
E[Yt] = μ, var(Yt) =∞Xk=0
ΨkΣΨ0k
Long Run Variance
Let Yt be an (n×1) stationary and ergodic multivari-ate time series with E[Yt] = μ. Anderson’s central
limit theorem for stationary and ergodic process states
√T (Y − μ)
d→ N
⎛⎝0, ∞Xj=−∞
Γj
⎞⎠or
YA∼ N
⎛⎝μ, 1T
∞Xj=−∞
Γj
⎞⎠Hence, the long-run variance of Yt is T times the
asymptotic variance of Y:
LRV(Yt) = T · avar(Y) =∞X
j=−∞Γj
Since Γ−j = Γ0j, LRV(Yt) may be alternatively ex-
pressed as
LRV(Yt) = Γ0 +∞Xj=1
(Γj + Γ0j)
Using the Wold representation of Yt it can be shown
that
LRV(Yt) = Ψ(1)ΣΨ(1)0
where Ψ(1) =P∞k=0Ψk.
Non-parametric Estimate of the Long-Run Variance
A consistent estimate of LRV(Yt) may be computed
using non-parametric methods. A popular estimator
is the Newey-West weighted autocovariance estimator
dLRVNW(Yt) = Γ0 +MTXj=1
wj,T ·³Γj + Γ0j
´where wj,T are weights which sum to unity and MT
is a truncation lag parameter that satisfies MT =
O(T 1/3). Usually, the Bartlett weights are used
wBartlettj,T = 1− j
MT + 1
Vector Autoregression Models
The vector autoregression (VAR) model is one of the
most successful, flexible, and easy to use models for
the analysis of multivariate time series.
• Made fameous in Chris Sims’s paper “Macroeco-nomics and Reality,” ECTA 1980.
• It is a natural extension of the univariate autore-gressive model to dynamic multivariate time se-
ries.
• Has proven to be especially useful for describingthe dynamic behavior of economic and financial
time series and for forecasting.
• It often provides superior forecasts to those fromunivariate time series models and elaborate theory-
based simultaneous equations models.
• Used for structural inference and policy analysis.In structural analysis, certain assumptions about
the causal structure of the data under investiga-
tion are imposed, and the resulting causal impacts
of unexpected shocks or innovations to specified
variables on the variables in the model are summa-
rized. These causal impacts are usually summa-
rized with impulse response functions and forecast
error variance
The Stationary Vector Autoregression Model
Let Yt = (y1t, y2t, . . . , ynt)0 denote an (n × 1) vec-
tor of time series variables. The basic p-lag vector
autoregressive (VAR(p)) model has the form
Yt = c+Π1Yt−1 +Π2Yt−2 + · · ·+ΠpYt−p + εt
εt ∼WN (0,Σ)
Example: Bivariate VAR(2)Ãy1ty2t
!=
Ãc1c2
!+
Ãπ111 π112π121 π122
!Ãy1t−1y2t−1
!
+
Ãπ211 π212π221 π222
!Ãy1t−2y2t−2
!+
Ãε1tε2t
!or
y1t = c1 + π111y1t−1 + π112y2t−1+π211y1t−2 + π212y2t−2 + ε1t
y2t = c2 + π121y1t−1 + π122y2t−1+π221y1t−1 + π222y2t−1 + ε2t
where cov(ε1t, ε2s) = σ12 for t = s; 0 otherwise.
Remarks:
• Each equation has the same regressors — laggedvalues of y1t and y2t.
• Endogeneity is avoided by using lagged values ofy1t and y2t.
• The VAR(p) model is just a seemingly unrelatedregression (SUR) model with lagged variables and
deterministic terms as common regressors.
In lag operator notation, the VAR(p) is written as
Π(L)Yt = c+ εt
Π(L) = In −Π1L− ...−ΠpLp
The VAR(p) is stable if the roots of
det (In −Π1z − · · ·−Πpzp) = 0
lie outside the complex unit circle (have modulus greater
than one), or, equivalently, if the eigenvalues of the
companion matrix
F =
⎛⎜⎜⎜⎝Π1 Π2 · · · Πn
In 0 · · · 00 . . . 0 ...0 0 In 0
⎞⎟⎟⎟⎠have modulus less than one. A stable VAR(p) process
is stationary and ergodic with time invariant means,
variances, and autocovariances.
Example: Stability of bivariate VAR(1) model
Yt = ΠYt−1 + εtÃy1ty2t
!=
Ãπ11 π12π21 π22
!Ãy1t−1y2t−1
!+
Ãε1tε2t
!
Then det (In −Πz) = 0 becomes
(1− π11z)(1− π22z)− π12π21z2 = 0
• Stability condition involves cross terms π12 andπ21
• If π12 = π21 = 0 (diagonal VAR) then bivariate
stability condition reduces to univariate stability
conditions for each equation.
If Yt is covariance stationary, then the unconditional
mean is given by
μ = (In −Π1 − · · ·−Πp)−1c
The mean-adjusted form of the VAR(p) is then
Yt − μ = Π1(Yt−1 − μ) +Π2(Yt−2 − μ) + · · ·+Πp(Yt−p − μ) + εt
The basic VAR(p) model may be too restrictive to
represent sufficiently the main characteristics of the
data. The general form of the VAR(p) model with
deterministic terms and exogenous variables is given
by
Yt = Π1Yt−1 +Π2Yt−2 + · · ·+ΠpYt−p+ΦDt +GXt + εt
Dt = deterministic terms
Xt = exogenous variables (E[Xtεt] = 0)
Wold Representation
Consider the stationary VAR(p) model
Π(L)Yt = c+ εt
Π(L) = In −Π1L− ...−ΠpLp
Since Yt is stationary, Π(L)−1 exists so that
Yt = Π(L)−1c+Π(L)−1εt
= μ+∞Xk=0
Ψkεt−k
Ψ0 = In
limk→∞
Ψk = 0
Note that
Π(L)−1 = Ψ(L) =∞Xk=0
ΨkLk
The Wold coefficients Ψk may be determined from
the VAR coefficients Πk by solving
Π(L)Ψ(L) = In
which implies
Ψ1 = Π1
Ψ2 = Π1Π1 +Π2...
Ψs = Π1Ψs−1 + · · ·+ΠpΨs−p
Since Π(L)−1 = Ψ(L), the long-run variance for Yt
has the form
LRVVAR(Yt) = Ψ(1)ΣΨ(1)0
= Π(1)−1ΣΠ(1)−10
= (In −Π1 − ...−Πp)−1Σ (In −Π1 − ...−Πp)
−10
Estimation
Assume that the VAR(p) model is covariance station-
ary, and there are no restrictions on the parameters
of the model. In SUR notation, each equation in the
VAR(p) may be written as
yi = Zπi + ei, i = 1, . . . , n
• yi is a (T × 1) vector of observations on the ithequation
• Z is a (T × k) matrix with tth row given by Z0t =(1,Y0t−1, . . . ,Y
0t−p)
• k = np+ 1
• πi is a (k × 1) vector of parameters and ei is a(T × 1) error with covariance matrix σ2i IT .
Since the VAR(p) is in the form of a SUR modelwhere each equation has the same explanatory vari-ables, each equation may be estimated separately byordinary least squares without losing efficiency relativeto generalized least squares. Let
Π = [π1, . . . , πn]
denote the (k×n) matrix of least squares coefficientsfor the n equations.Let
vec(Π) =
⎛⎜⎝ π1...πn
⎞⎟⎠Under standard assumptions regarding the behaviorof stationary and ergodic VAR models (see Hamilton(1994) vec(Π) is consistent and asymptotically nor-mally distributed with asymptotic covariance matrix
davar(vec(Π)) = Σ⊗ (Z0Z)−1
where
Σ =1
T − k
TXt=1
εtε0t
εt = Yt−Π0Zt
Lag Length Selection
The lag length for the VAR(p) model may be de-
termined using model selection criteria. The gen-
eral approach is to fit VAR(p) models with orders
p = 0, ..., pmax and choose the value of p which min-
imizes some model selection criteria
MSC(p) = ln |Σ(p)|+ cT · ϕ(n, p)
Σ(p) = T−1TXt=1
εtε0t
cT = function of sample size
ϕ(n, p) = penalty function
The three most common information criteria are the
Akaike (AIC), Schwarz-Bayesian (BIC) and Hannan-
Quinn (HQ):
AIC(p) = ln |Σ(p)|+ 2
Tpn2
BIC(p) = ln |Σ(p)|+ lnTT
pn2
HQ(p) = ln |Σ(p)|+ 2 ln lnTT
pn2
Remarks:
• AIC criterion asymptotically overestimates the or-der with positive probability,
• BIC and HQ criteria estimate the order consis-
tently under fairly general conditions if the true
order p is less than or equal to pmax.
Granger Causality
One of the main uses of VAR models is forecasting.
The following intuitive notion of a variable’s forecast-
ing ability is due to Granger (1969).
• If a variable, or group of variables, y1 is found tobe helpful for predicting another variable, or group
of variables, y2 then y1 is said to Granger-cause
y2; otherwise it is said to fail to Granger-cause
y2.
• Formally, y1 fails to Granger-cause y2 if for all
s > 0 the MSE of a forecast of y2,t+s based on
(y2,t, y2,t−1, . . .) is the same as the MSE of a
forecast of y2,t+s based on (y2,t, y2,t−1, . . .) and(y1,t, y1,t−1, . . .).
• The notion of Granger causality does not implytrue causality. It only implies forecasting ability.
Example: Bivariate VAR Model
In a bivariate VAR(p) , y2 fails to Granger-cause y1 if
all of the p VAR coefficient matrices Π1, . . . ,Πp are
lower triangular:Ãy1ty2t
!=
Ãc1c2
!+
Ãπ111 0π121 π122
!Ãy1t−1y2t−1
!+ · · ·
+
Ãπp11 0
πp21 π
p22
!Ãy1t−py2t−p
!+
Ãε1tε2t
!Similarly, y1 fails to Granger-cause y2 if all of the coef-
ficients on lagged values of y1 are zero in the equation
for y2. Notice that if y2 fails to Granger-cause y1 and
y1 fails to Granger-cause y2, then the VAR coefficient
matrices Π1, . . . ,Πp are diagonal.
The p linear coefficient restrictions implied by Granger
non-causality may be tested using the Wald statistic
Wald = (R·vec(Π)− r)0nRh davar(vec(Π))iR0o−1
×(R·vec(Π)− r)
Remark: In the bivariate model, testing H0 : y2 does
not Granger-cause y1 reduces to a testing H0 : π112 =
· · · = πp12 = 0 from the linear regression
y1t = c1 + π111y1t−1 + · · ·+ πp11y1t−p
+π112y2t−1 + · · ·+ πp12y2t−p + ε1t
The test statistic is a simple F-statistic.
Example: Trivariate VAR Model
In a trivariate VAR(p) , y2 and y3 fail to Granger-cause
y1 if πj12 = π
j13 = 0 for all j:⎛⎜⎝ y1t
y2ty3t
⎞⎟⎠ =
⎛⎜⎝ c1c2c3
⎞⎟⎠+⎛⎜⎝ π111 0 0
π121 π122 π123π131 π132 π133
⎞⎟⎠⎛⎜⎝ y1t−1
y2t−1y3t−1
⎞⎟⎠+ · · ·+
⎛⎜⎝ πp11 0 0
πp21 π
p22 π
p32
πp31 π
p32 π
p33
⎞⎟⎠⎛⎜⎝ y1t−py2t−py3t−p
⎞⎟⎠+⎛⎜⎝ ε1tε2tε3t
⎞⎟⎠Note: One can also use a simple F-statistic from a
linear regression in this situation:
y1t = c1 + π111y1t−1 + · · ·+ πp11y1t−p
+π112y2t−1 + · · ·+ πp12y2t−p
+π113y3t−1 + · · ·+ πp13y3t−p + ε1t
Example: Trivariate VAR Model
In a trivariate VAR(p) , y3 fails to Granger-cause
y1 and y2 if all of the p VAR coefficient matrices
Π1, . . . ,Πp are lower triangular:⎛⎜⎝ y1ty2ty3t
⎞⎟⎠ =
⎛⎜⎝ c1c2c3
⎞⎟⎠+⎛⎜⎝ π111 0 0
π121 π122 0π131 π132 π133
⎞⎟⎠⎛⎜⎝ y1t−1
y2t−1y3t−1
⎞⎟⎠+ · · ·+
⎛⎜⎝ πp11 0 0
πp21 π
p22 0
πp31 π
p32 π
p33
⎞⎟⎠⎛⎜⎝ y1t−py2t−py3t−p
⎞⎟⎠+⎛⎜⎝ ε1tε2tε3t
⎞⎟⎠Note: We cannot use a simple F-statistic in this case.
We must use the general Wald statistic.
Forecasting Algorithms
Forecasting from a VAR(p) is a straightforward exten-
sion of forecasting from an AR(p). The multivariate
Wold form is
Yt = μ+ εt +Ψ1εt−1 +Ψ2εt−2 + · · ·Yt+h = μ+ εt+h +Ψ1εt+h−1 + · · ·
+Ψh−1εt+1 +Ψhεt + · · ·εt ∼WN(0,Σ)
Note that
E[Yt] = μ
var(Yt) = E[(Yt − μ)(Yt − μ)0]
= E
⎡⎢⎣⎛⎝ ∞Xk=1
Ψkεt−k
⎞⎠⎛⎝ ∞Xk=1
Ψkεt−k
⎞⎠0⎤⎥⎦
=∞Xk=1
ΨkΣΨ0k
The minimum MSE linear forecast of Yt+h based on
It is
Yt+h|t = μ+Ψhεt +Ψh+1εt−1 + · · ·
The forecast error is
εt+h|t = Yt+h −Yt+h|t= εt+h +Ψ1εt+h−1 + · · ·+Ψh−1εt+1
The forecast error MSE is
MSE(εt+h|t) = E[εt+h|tε0t+h|t]
= Σ+Ψ1ΣΨ01 + · · ·+Ψh−1ΣΨ
0h−1
=h−1Xs=1
ΨsΣΨ0s
Chain-rule of Forecasting
The best linear predictor, in terms of minimum meansquared error (MSE), ofYt+1 or 1-step forecast basedon information available at time T is
YT+1|T = c+Π1YT + · · ·+ΠpYT−p+1Forecasts for longer horizons h (h-step forecasts) maybe obtained using the chain-rule of forecasting as
YT+h|T = c+Π1YT+h−1|T + · · ·+ΠpYT+h−p|TYT+j|T = YT+j for j ≤ 0.Note: Chain-rule may be derived from state-spaceframework (assume c = 0)⎛⎜⎜⎜⎝
YtYt−1...
Yt−p+1
⎞⎟⎟⎟⎠ =
⎛⎜⎜⎜⎝Π1 Π2 · · · Πp
In 0 ... 0... . . . ... ...0 · · · In 0
⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝
YtYt−1...
Yt−p+1
⎞⎟⎟⎟⎠
+
⎛⎜⎜⎜⎝εt0...0
⎞⎟⎟⎟⎠ξt = Fξt−1 + vt
Then
ξt+1|t = Fξt
ξt+2|t = Fξt+1|t = F2ξt
...
ξt+h|t = Fξt+h−1|t = Fhξt