Large Time-Varying Parameter VAR: A nonparametric...

Large Time-Varying Parameter VAR:A nonparametric approach

Fabrizio VendittiBanca d’Italia

George KapetaniosQueen Mary, University of London

Massimiliano MarcellinoBocconi University, IGIER and CEPR

First version: August 2014, this version, May 2015

Abstract

In this paper we introduce a nonparametric estimation method for a large VectorAutoregression (VAR) with time varying parameters. The method is computationallyefficient and can accommodate information sets as large as those typically handled byfactor models and Factor Augmented VARs (FAVAR). When applied to the problem offorecasting key macroeconomic variables, the method outperforms constant parameterbenchmarks and large Bayesian VARs with time varying parameters. The tool can alsobe used for structural analyss. As an example, we study the time varying effects of oil priceshocks on sectorial U.S. industrial output. We find that the declining role of oil prices inshaping U.S. business cycle fluctuations stems from lower supply effects in sectors relatedto Business Equipment and Materials, rather than through weaker demand for vehicles asmaintained by part of the literature.

JEL classification: C14, C32, C53, C55Keywords: Large VARs, Time Varying Parameters, Non Parametric Estimation,

Forecasting, Impulse Response Analysis

1

1 Introduction

The last ten years have seen a growing interest in empirical models of large dimensionsand with flexible parameter structure, capable of accommodating breaks in the relationshipsamong economic time series.

On the large model front, various approaches based on data reduction and on parametershrinkage have been explored. The two approaches rest on very different premises: the formersummarises in suitably formed linear combinations (factors) large amounts of data; the latteruses all the available information but steers the model parameters towards values that are apriori plausible, therefore trying to avoid the problem of over fitting due to parameter pro-liferation. Despite different premises, the approaches tend to reach similar conclusions since,as shown by De Mol et al. (2008), both methods stabilise OLS estimation by regularising thecovariance matrix of the regressors. Banbura et al. (2010) further bridge these two branches ofthe literature and study the performance of Vector Autoregressions (VARs) that use rich infor-mation sets. They show that the forecasting performance of large VARs can be improved byimposing a Minnesota prior on the parameters and increasing its tightness as the informationset increases.

As for time varying parameters (TVP) models, three different approaches have emerged inthe literature. In a Bayesian context, Cogley and Sargent (2005) and Primiceri (2005) haveseparately developed Bayesian estimation procedures for models with time varying coefficientsand stochastic volatilities. Both approaches rest on computationally intensive methods thatstrongly limit the scale of the models. A second strand of the literature assumes that modelparameters evolve deterministically over time as a linear function of the likelihood score, ratherthan being subject to random shocks (Creal et al., 2013). In this context maximum likelihoodestimation becomes feasible and more general assumptions on the distribution of the errors canbe postulated, an avenue recently explored by Delle Monache and Petrella (2014) who study theimplications of introducing t-Student distributed errors in a time varying AR model. Finally,in the nonparametric framework, Giraitis et al. (2014) and Giraitis et al. (2012b) explore theproperties of kernel estimators in the context of univariate and multivariate dynamic modelsin which the coefficients evolve stochastically over time. Building on previous work by Giraitiset al. (2012a) on robust procedures for forecasting times series, they show that time varyingparameters can be estimated by suitably discounting distant data and provide details on how tochoose the degree of such discounting.1 Besides being computationally efficient, the estimatorproposed by Giraitis et al. (2012b) has desirable frequentist properties such as consistencyand asymptotic normality. Connections between large models and time varying parametershave been recently established mainly for Factor Augmented VAR (FAVAR) models, see forexample Eickmeier et al. (2015) and Mumtaz and Surico (2012), while they are still scant

1A related strand of the literature focusses on the choice of an optimal window for forecasting with rollingregressions in the presence of structural breaks, see Rossi et al. (2014), Pesaran and Timmermann (2007) andPesaran and Pick (2011).

2

in the VAR literature. One exception is the paper by Koop and Korobilis (2013) where therestricted Kalman filter by Raftery et al. (2010) is used to make a TVP-VAR suitable forlarge information sets. The model has a parametric nature, i.e. it assumes a specific law ofmotion for the coefficients, in line with the small TVP-VARs literature. This approach, whilesolving some issues, presents some shortcomings. First, the curse of dimensionality is onlypartially addressed since the parametric nature of the model requires the use of the Kalmanfilter, implicitly limiting the size of the model. In practice this framework cannot handle thelarge information sets employed in the factor model of FAVAR literature nor the large numberof lags that are used when fitting VARs to monthly data like in Banbura et al. (2010). Second,if the true data generating process is different from the postulated random walk type variation,the model is misspecified and the Kalman filter loses its optimality properties.

In this paper we propose an estimator that addresses both these problems. Our idea is tostart from the nonparametric estimator proposed by Giraitis et al. (2012b, henceforth, GKY),and adapt it to handle large information sets. To solve the issue of overfitting that ariseswhen the size of the VAR increases, we recur to the mixed estimator by Theil and Goldberger(1960), which imposes stochastic constraints on the model coefficients, therefore mimickingin a classical context the role of the prior in Bayesian models. The resulting estimator mixessample and non sample (the stochastic constraints) information to shrink the model parametersand can be seen both as a generalization to a time varying parameter structure of the modelby Banbura et al. (2010) and as penalized regression version of the Giraitis et al. (2012b)estimator. Our estimator depends crucially on two parameters. On the one hand, like in allnonparametric methods, a tuning parameter regulates the width of the kernel window that isused to discount past data when forming the current estimate of the model coefficients. Onthe other hand, like in penalised regressions, there is a penalty parameter that determinesthe severity of the constraints imposed to control overfitting, akin to the prior tightness inBayesian estimators. In the paper we explore a variety of cross-validation techniques to setthese two parameters based on past model performance. The proposed method addresses boththe issues left unresolved by the parametric estimator by Koop and Korobilis (2013). First,the nonparametric approach is, by nature, robust to changes in the underlying data generatingprocess. Second, for popular shrinkage methods estimation can actually proceed equation byequation. This implies that the estimator can cope with systems as large as those analysed inthe FAVAR and factor model literature.

We examine the use of the proposed estimator through a number of applications. First,we explore whether time variation is indeed a necessary feature of the model to successfullyforecast a large panel (78 variables) of U.S. monthly macroeconomic time series. The analysisindicates that the introduction of time variation in the model parameters yields an improvementin prediction accuracy over models with constant coefficients, in particular when forecast com-bination is used to pool forecasts obtained with models with different degrees of time variationand shrinkage.

3

Having validated our approach against constant coefficients benchmarks, we contrast itsperformance in terms of forecast accuracy with that of the parametric TVP-VAR proposedby Koop and Korobilis (2013), both in a Monte Carlo experiment and with actual data. Theevidence we gather from both studies is quite supportive of our approach. In the Monte Carloexercise we find that when the data generating process matches exactly the one assumed in theparametric setup, the two estimators give broadly similar results. Yet, as we move away fromthis assumption, the performance of the parametric estimator deteriorates markedly, while ourestimator proves quite robust to changes in the underlying data generating process. Moreover,in the empirical application the forecasts derived from our estimator are either similar or moreaccurate than those obtained with the parametric VAR.

Finally, we further illustrate the range of applicability of the model by revisiting, in thecontext of a large information set, the issue of the changing effects of oil price shocks oneconomic activity, a question that has spurred a large number of studies in the applied macroliterature in recent years, see for example Blanchard and Gali (2007), Blanchard and Riggi(2013) and Baumeister and Peersman (2013), among others. The use of a large informationset allows us to take a more granular view, allowing us to uncover some interesting findings onthe evolving impact of oil price shocks on the output of different sectors of the U.S. industry.

The paper is structured as follows. Section 2 describes the estimation method. In Section 3we present the main forecasting exercise. In Section 4 we compare our nonparametric methodwith the estimator by Koop and Korobilis (2013). In Section 5 we present an analysis of thetime varying impact of oil price shocks on U.S. industrial production. In Section 6 we conclude.

2 Setup of the problem

Write a p order TVP-VAR with n variables as:

y′t1×n

= ztΓt + u′t

1×n(1)

zt1×k

= [y′t−1, y′t−2,...,y

′t−p, 1] (2)

Γtk×n

= [Γ′t,1,Γ′

t,2, ...,Γ′t,p, A

′t]′ (3)

where k = (np + 1). At each time t there are nk parameters to be estimated. Applying thevec operator to both sides of (1) we obtain

ytn×1

= (In ⊗ zt)n×nk

γtnk×1

+ utn×1

(4)

where γt = vec(Γt).

4

The kernel-based estimator proposed by Giraitis et al. (2012b) has the following form:

γt =

[In ⊗

T∑j=1

ωj,tz′jzj

]−1 [ T∑j=1

ωj,tvec(z′jy′j

)], (5)

where the generic jth element ωj,t(H) is a kernel, function with bandwidth H, used to discountdistant data. Throughout the paper we use a Gaussian kernel:

ωj,t(H) =Kj,t(H)∑Tj=1Kj,t(H)

where (6)

Kj,t(H) = (1/√

2π)exp

[−1

2

(j − tH

)2]

(7)

When forecasting, in order to preserve the pseudo real time nature of the exercise, we introducean indicator function that assigns zero weight to the out of sample observations, so that onlyin sample information is used to estimate the parameters:

Kj,t(H) = (1/√

2π)exp

[−1

2

(j − tH

)2]I(j ≤ t) (8)

One appealing feature of the estimator in (5) is that, given the Kronecker structure of the firstterm, it only requires the inversion of the k × k matrices

∑Tj=1 ωj,tz

′jzj. In other words the

estimation can be performed equation by equation. The estimator can be equivalently cast inthe following matrix form:

Γt(H) = [Z ′Wt(H)Z)]−1

[Z ′Wt(H)Y ] , (9)

where Wt(H) is a diagonal T ×T matrix with elements: ω1,t(H), ω2,t(H), ..., ωT,t(H) and thematrix Z is formed by stacking over t the vectors zt.

2.1 Shrinkage through stochastic constraints

When the dimension of the system (n) grows, it is desirable to impose some shrinkage on themodel parameters to avoid an increase in the estimation variance (Hastie et al., 2003). Whilein a Bayesian framework this can be achieved through the prior distribution, in a classicalframework shrinkage can be performed by using stochastic constraints, that is using the mixedestimator of Theil and Goldberger (1960).

The basic idea is to introduce non sample information in a regression model by adding, ateach time t, m stochastic constraints of the type:

(Ψ)−12 r = (Ψ)−

12Rγt + u (10)

5

The m dimensional vector r is the set of coordinates in the <m space towards which the linearcombination of the VAR coefficients Rγt is shrunk, R is an m × k matrix, Ψ−

12 is an m ×m

diagonal matrix that determines the precision of the constraints and u is a vector of meanzero, unit variance processes. Adding these constraints to the VAR in (1) gives the followingregression model: (

yt

Ψ−12 r

)=

(In ⊗ ztΨ−

12R

)γt +

(utu

)(11)

The introduction of the stochastic constraints transforms the multivariate regression modelanalysed by Giraitis et al. (2012b) in a system of penalised regressions, with the penalty beingproportional to the matrix Ψ−

12 . Straightforward application of their method gives the following

estimator for γt:

γt(H, Ψ) =

[In ⊗

T∑j=1

ωj,tz′jzj +R′Ψ−1R

]−1 [vec

(T∑j=1

ωj,tz′jy′j

)+R′Ψ−1r

](12)

A number of observations are in order. First, by setting ωj,t = 1 the estimator collapsesto the posterior mean of the VAR coefficients in a constant parameter Bayesian model withNormal conjugate priors.2 Second, by pushing the precision of the constraints to ∞ (Ψ = 0)

the constraints dominate the estimation and the coefficients are determined by non-sampleinformation. On the other hand, as the precision approaches 0 (Ψ→∞) the constraints becomeirrelevant and the unconstrained original estimator (5) is obtained. Third, the stochasticconstraints have a regularization effect on the estimation, as the matrix R′Ψ−1R pushes theeigenvalues of the second moments of the regressors (In⊗

∑Tj=1 ωj,tz

′jzj) away from zero. Fourth,

note that, in general, the introduction of the constraints does not allow for the estimation ofthe model parameters equation by equation, and an nk × nk matrix needs to be inverted ateach time t. However, in the next subsections we show that for popular shrinkage methods thecovariance matrix of the constraints has a Kronecker structure (i.e. the penalty is symmetricacross equations) and the estimator can be further simplified, retaining the computationaladvantages of its unconstrained counterpart. Finally, being the estimator in (12) a blend of akernel and a penalised regression estimator, it depends on a tuning parameter H, determiningthe width of the kernel, and on the penalty Ψ, which in most cases reduces to a single parameterλ. In section 2.4 we discuss a number of cross-validation methods that can be used to determineH and λ.

2.1.1 Time varying volatilities

So far we have not made any specific assumption on the volatility of the VAR errors. Oneissue arises when both the error variances and the VAR coefficients change over time, in thatchanges in the parameters and in the variances can be confounded, see Cogley and Sargent

2For a discussion of the Bayesian interpretation of penalized regressions see also Hastie et al. (2003).

6

(2005). An important implication is that if changes in the variances of the errors are neglectedthen the importance of variation in the VAR coefficients could be overstated. Giraitis et al.(2012b) show that the properties of their estimator are unaffected by the presence of stochasticvolatilities. When performing structural analysis, however, they suggest to model time variationin the variance of the disturbances with a two steps approach. The method consists of fittingfirst an homoscedastic VAR, then estimating the time varying volatilities on the residualsobtained in the first stage via the following kernel estimator:

Ψt =T∑j=1

ωj,t(HΨ)utu′t (13)

where the bandwidth parameter HΨ is not necessarily the same as the one used to estimate thecoefficients. Orthogonalisation of the residuals is then based on the time-varying covariancematrix Ψt. Our penalised kernel estimator can be adapted to account for changing volatilitiesalong these lines, using the GLS correction proposed in Theil and Goldberger (1960). In thefirst step the VAR coefficients are estimated using (12) and the resulting residuals are used tocompute Ψt like in (13). In a second step a GLS correction can be applied:

γt =

[T∑j=1

ωj,t

(Ψ−1j ⊗ z′jzj

)+R′Ψ−1R

]−1 [ T∑j=1

ωj,tvec(z′jy′jΨ−1t

)+R′Ψ−1r

](14)

Notice that this GLS correction requires the inversion of potentially large matrices (nk × nk),which slows down computation and limits the size of the VAR, even in cases where Ψ hasa Kronecker form. However, unreported results suggest that in empirical applications thiscorrection does not yield any gain in forecast accuracy. In practical terms, the estimator in(12) turns out to be sufficiently flexible to capture the behaviour of a large set of macro timeseries and it offers a good trade off between flexibility and computational tractability. We nowturn to discuss how popular shrinkage methods can be adapted to our setup.

2.2 Ridge type shrinkage

The Ridge regression penalty shrinks all the parameters uniformly towards zero at a givenpenalty rate 1

λ. A TVP-VAR with Ridge shrinkage can be obtained by setting R = Ink, r = 0

and Ψ = λInk in (10), which consists of imposing the following stochastic constraints at eacht:

0 =

√1

λγt + ut (15)

The resulting estimator takes the form:

γRidge(λ,H) =

[In ⊗

(T∑j=1

ωj,tz′jzj +

1

λIk

)]−1 [vec

(T∑j=1

ωj,tz′jy′j

)](16)

7

Notice that, given the Kronecker structure of Ψ the constraints, estimation can proceed equa-tion by equation and the estimator can be written in matrix form as:

ΓRidget (λ,H) =

[Z ′Wt(H)Z +

1

λIk

]−1

[Z ′Wt(H)Y ] (17)

The usual Ridge interpretation of the penalty 1λholds, as by increasing it to ∞ the VAR

coefficients are set to 0, while setting 1λ

= 0 the unconstrained GKY estimator is obtained.

2.3 Litterman type shrinkage

Some of the features of the Ridge penalty can be unappealing in the context of VARmodels. First, the fact that all the coefficients are shrunk towards zero imposes a structureof serially uncorrelated data, which is at odds with the strong persistence that characterizesmost macroeconomic time series. Second, the same penalty is imposed on all the coefficients(including the intercept). Yet, having some flexibility in the penalization of the differentparameters could be desirable. A more general set of stochastic constraints, which produce thesame effects that the Litterman prior has in a Bayesian framework,3 is given by setting r, Rand Ψ as follows:

r = vec[Γ′1,Γ′

2, ...,Γ′p, A

′]′, where (18)

Γ(i,j)l =

γ(i,j)l

if l = 1 and i = j

0 otherwise, and (19)

A′ = 0, (20)

R = Ink, (21)

Ψ = In ⊗ λ

(Σ 0

0 σ2c

), where (22)

Σ = diag(1, 22, 32, ..., p2)⊗ diag(σ21, σ

22, ..., σ

2n) (23)

Comparing equations (18) to (23) with equation (15) two main differences need to behighlighted. First, the vector r towards which the VAR coefficients are driven by the constraintsis generally different from zero. In empirical applications, for data in levels, the n values γ(i,j)

l

are typically set to 1, so that the model is pushed to behave like a multivariate random walkplus noise. Second, the precision of the constraints is not uniform across parameters but it ishigher for more distant lags, as implied by the decay terms (1, 22, ..., p2). The scaling factorsσ2

1, ..., σ2n appearing in Σ can be obtained by univariate regressions and the precision on the

intercept σ2c can be set to an arbitrarily small or large value, depending on the application.4

3Karlsson (2012) distinguishes the Litterman prior from the more general Minnesota prior based on theassumptions on the covariance matrix of the VAR residuals, which is assumed to be diagonal in the Littermanprior, full in the more general Minnesota prior, see Kadiyala and Karlsson (1993, 1997).

4For example, in a Bayesian context, Carriero et al. (2009) adopt a very tight prior centred around zero on

8

The use these more general constraints leads to the following estimator:

ΓLitt (λ,H) =

[Z ′Wt(H)Z +

1

λΩ

]−1 [Z ′Wt(H)Y +

1

λΩr

], where (24)

Ω =

(Σ−1

0

0 1σ2c

)

Summarizing, by appropriately penalizing the GKY estimator, some discipline on the VARcoefficients can be imposed through stochastic constraints a la Theil and Goldberger (1960).This makes the GKY method, originally designed for small/medium scale VARs, suitable forhandling large n dataset. For popular shrinkage methods the resulting estimator can be castin matrix form, with notable computational advantages.

The double nature of the estimator (being both nonparametric and penalised) is capturedby its dependance on the two constants: H, the bandwidth parameter that determines theweight that each observation has as a function of its distance from t, and λ a constant thatdetermines the severity of the penalty. In the next section we discuss alternative solutions tothe problem of determining jointly these two parameters in empirical applications.

2.4 Selection of λ and H

We experiment with two different criteria for selecting λ and H. The former adapts to ourproblem the procedure devised by Banbura et al. (2010), and has an “in sample” fit flavour. Thelatter is inspired by the criterion proposed by Kapetanios et al. (2008) for assigning weightsto different models in the context of forecast averaging and favours models with better out ofsample performance. In the remainder of the paper we will refer to these two criteria as Lfitand Lmse.

2.5 The Lfit criterion

Considering that, typically, policy makers monitor more closely a small subset of variables,Banbura et al. (2010) suggest to penalize overfitting of the large VAR up to the point whereit achieves the same fit as that of a smaller VAR that only includes these key variables. Weadapt their criterion to the problem of choosing simultaneously λ andH. Formally, the criterioninvolves the following steps:

1. Pick a subset of n1 variables of interest out of the n variables in the VAR.

2. Compute the in sample fit of a benchmark VAR with constant coefficients that onlyincludes these n1 variables.

the intercept in a large VAR, favouring a driftless random walk behaviour, to capture the behaviour of a panelof exchange rates.

9

3. Pick λ and H to minimize the distance between the in sample fit of the large n variateVAR (featuring both time varying parameters and shrinkage) and the benchmark VAR.

Formally, the loss function to be minimized is the following:

Lfit(λ,H) =

∣∣∣∣∣n1∑i=1

rssin(λ,H)

rssiRW−

n1∑i

rssin1

rssiRW

∣∣∣∣∣where the scaling by rssiRW (the fit of the random walk model) is needed to account for thedifferent variance of the variables.

2.6 The Lmse criterion

As an alternative, λ and H can be selected at each point in time based on the predictiveperformance of the model in the recent past. The method, which ha a cross-validation flavour,is similar in spirit to the one used by Kapetanios et al. (2008) to compute model weights inthe context of forecast averaging. The necessary steps are the following:

1. Pick a subset of n1 variables of interest out of the n variables in the VAR.

2. At each step t in the forecast exercise and for each forecast horizon h consider a relativelyshort window of recent data t − L − h, t − L − h + 1, . . . , t − 1 − h and compute the hsteps ahead Mean Square Error (MSE) mseih(λ,H), for each i ∈ n1.

3. Pick the values of λ and H that minimize the sum of these n1 MSEs.

Formally, the loss function to be minimized is:

Lmse(λ,H) =

n1∑i=1

mseih(λ,H)

var(yt,i)

where the msei are scaled by the variance of yi to account for differences in predictability ofthe different variables.

3 Forecast exercise

Penalised estimators trade off estimation bias with estimation variance. Their naturalapplication is therefore in prediction problems, where simpler (with higher penalty) modelsperform typically better than more sophisticated setups. We therefore explore the performanceof our kernel based large time varying VAR in the context of a forecast exercise on U.S. data.Throughout the exercise we will be using Litterman type constraints. The information setis composed of 78 time series spanning around five decades, from January 1959 to July 2013.Table 1 reports the list of the series used in the exercise together with the value of γ in equation

10

19 that is used for each variable. Following the convention in the Bayesian literature we set γto 1 for variables that display a trend and to 0 for variables that have a stationary behaviour(typically surveys). We experiment with VAR of two sizes. A medium sized VAR that includesonly the 20 indicators that are highlighted in red in Table 1 and a large VAR that makes useof all the available information. In line with previous studies, see e.g. Carriero et al. (2009)and Koop and Korobilis (2013), we apply a grid search approach to the problem of minimisingthe loss functions described in the previous sections. Since the estimator is not particularlycomputationally intensive we set a wide (38 elements) grid for λ:

λgrid = 10−10, 10−5, 10−4, 10−3, 10−2, 10−2 + .3, 10−2 + 2× .3, 10−2 + 3× 03, . . . , 1.

Next we need to set the shape of the kernel function ωj,t and a grid of possible values for H.We work with a six points grid for the tuning parameter H:

Hgrid = 0.5, 0.6, 0.7, 0.8, 0.9, 1.

We experiment with different model specifications in order to assess the importance of thevarious elements that characterise the proposed estimator, with a focus on the importance ofletting both the tuning constants λ and H vary over time rather than fixing some parametersat a pre-specified value. The different model specifications are obtained by intersecting variousoptions for setting λ and H as summarised in Table (2). In particular the first set of models(M1 in Table 2) is obtained by fixing H at a given point in the grid and, conditional on thisvalue of H, setting λ optimally at each t at the value that minimises the Lfit function. Thesecond set of models (M2) are obtained as variants of M1 by choosing the λ that minimisesLfit in the pre-sample and then keeping it fixed for the rest of the exercise, in line with what isdone by Banbura et al. (2010). In the third set of models (M3) the function Lfit is optimisedat each t both with respect to λ and H. The fourth case (M4) is obtained as variant of M3 bychoosing λ optimally in the pre-sample and then keeping it fixed for the rest of the exercise.The remaining models (M5 to M8) are obtained by replacing the Lfit with the Lmse criterion.

Since both the Lfit and Lmse criteria require us to select a subset n1 of variables of interestwe set n1 = 3 and monitor three indicators of particular interest for monetary policy, i.e theFed Fund Rates (FEDFUNDS), the number of non farm payroll employees (PAYEMS) andCPI inflation (CPIAUCSL). We fix the lag length to 13 and retain 10 years of data (120observations) as the first estimation sample. We then produce 1 to 24 months ahead pseudoreal time forecasts with the first estimation sample ending in January 1970 and the last oneending in July 2011, for a total number of 499 forecasts. Finally, in the case of the Lmse criterionwe need to choose L, that is the width of the short window of data on which to measure thepredictive performance of the model. We set L to 36 (corresponding to three years of data).

A natural benchmark model that can be obtained as a constrained version of our estimatoris the large Bayesian VAR (BVAR) with a Litterman prior, constant parameters and λ fixed in

11

the pre sample with the Lfit. We will therefore compare the performance in terms of forecastaccuracy of the time varying VAR with that of this benchmark model.

Before turning to the results of the forecast exercise we analyse how the two constantparameters H and λ interact with each other when they are both allowed to change over time.In Figure 1 and Figure 2 we show for both the 20 and the 78 variables VAR the evolution overthe forecasts period (1970-2013) of the discounting parameter H in the model specifications inwhich this is set optimally over time (M4 and M5, M7 and M8). In the top panel we report theresults obtained with the Lfit criterion and in the bottom panel those obtained with the Lmsecriterion. Graphs on the left side refer to model specifications in which also λ changes over time,while those on the right refer to model specifications in which λ is kept fixed in the pre-sample.Some interesting patterns are evident. First, comparing the top and the bottom panels it canbe noticed that (regardless of the size of the VAR) when the Lfit criterion is used the optimalchoice of H is rather stable over time, at relatively high values (0.8, 0.9), corresponding tomore stable VAR coefficients. The optimal values of H obtained with the Lmse criterion, onthe other hand, change very frequently. This difference reflects the different weight that thetwo criteria place on past information. The Lfit criterion, being based on the in sample fitof VAR of different dimensions, gives the same importance to all the available information,while the Lmse criterion only weighs the most recent observations, implicitly favouring morefrequent switches in the model parameters. Second, the evolution of H does not depend muchon the amount of shrinkage imposed on the parameters as no striking differences emerge whencomparing plots from the left (where also λ changes over time) and the right panels (where λis kept constant).

Next, in Figure 3 (for the 20 variables VAR) and Figure 4 (for the 78 variables VAR) weshow the behaviour of the penalty parameter λ in the specifications where also H is optimisedover time (M3 and M7). As mentioned, low values of λ imply that the constraints hold moretightly, so that the VAR coefficients are less informed by the data and are pushed more stronglytowards their constrained values. Starting from Figure 3, it can be seen that when the Lfitis used (blue solid line), three distinct phases can be identified. In the first one λ starts fromrelatively high values and falls smoothly over time. In the 80s and throughout the GreatModeration it stays relatively constant around this value, to start increasing again in the mid1990s, with a steeper slope at the beginning of the 2000s. These results are broadly in linewith those stressed in the literature on the predictability of macro times series before and afterthe Great Moderation. For example D’Agostino et al. (2006) find that the predictive contentfor inflation and economic activity of common factors extracted from large panels weakenedsignificantly during the Great Moderation, while in periods of higher volatility cross-sectionalinformation proved more relevant for forecasting. Given the direct relationship between therelevance of cross sectional information and λ, the results in Figure 3 send a similar message,as the contribution of cross-sectional information is progressively penalized by lower values of λin the 1980s. The results obtained with the Lmse criterion (green dashed line) are qualitatively

12

similar. When the dimension of the VAR increases (Figure 4) the optimal value of λ is generallylower, confirming the theoretical results in De Mol et al. (2008) on the inverse relationshipbetween the optimal level of shrinkage and cross-sectional dimensions in large panels, but a Ushaped evolution of λ can be detected also in this case.

The forecast accuracy of the time varying models is summarised in Tables 3 and 4 for,respectively, the 20 and the 78 variables VAR. These two tables are organised as follows. Therows correspond to the model specifications described in 2, while the columns correspond toa selected number of forecast horizons (6, 12, 18 and 24). Each entry reports the ratio of theRoot Mean Squared Errors (RMSE) of the time varying models with respect to the BVARwith (i) constant parameters and (ii) fixed (optimized in pre-sample) shrinkage. Values below1, which imply that the introduction of time variation through the kernel estimator inducesan improvement in prediction accuracy, are highlighted in bold. We assess the statisticalsignificance in forecast accuracy through a Diebold-Mariano test and underline the cases inwhich the null hypothesis can be rejected at the 10% significance.

The main findings of this exercise are the following. For the 20 variables VAR, in themajority of cases the models with time varying parameters outperform the constant parameterbenchmark, as evident from the high number of bold cells in the table. However, the averageimprovement appears to be small as in most of the cases the gain is of the order of 5%. As aconsequence, most of the differences in forecast accuracy are not significant, according to theDiebold Mariano test. The model specifications in which both H and λ are optimised overtime do not emerge as the best performing models. Instead the best results are obtained whenλ is chosen to minimise the Lmse criterion conditional on a fixed value of H of around .7 or .8.In this case the improvement in terms of forecast accuracy is higher, around 10-12%, and inthe case of the Fed Fund Rates it is also statistically significant.

For the VAR of larger dimensions (78 variables) the results are less favourable for the timevarying models as the number of red cells is markedly lower and there are also cases in whichthe constant parameter benchmark outperforms significantly the models with time varyingparameters.

Summarising, the introduction of time variation in the coefficients leads to a moderate im-provement of forecast accuracy in VAR of medium size but not in larger systems. Furthermore,the model specification that turns out to work well ex-post is one in which the bandwidth pa-rameter is fixed at a given level rather than optimised over time. This latter finding, however,provides little guidance from an ex-ante perspective since, in a truly real time exercise, onewould not have known the best performing value of the bandwidth parameter. As an alterna-tive, we therefore investigate a number of model selection and model averaging strategies todeal with model uncertainty.

13

3.1 Model selection and model averaging

In the presence of a large number of competing models two alternative strategies for ob-taining a single forecast are model selection or pooling (averaging) the forecasts from differentmodels. The two concepts are closely related since, once model weights have been definedthrough a given criterion, the model with the highest weight can be selected or the weightscan be used to pool the forecasts obtained with the different models. The gain from poolingstems from the fact that forecasts might be biased in different directions so that combiningmisspecified models may often attenuate the bias and improve the forecasts obtained with theindividual models.

In a Bayesian context Koop and Korobilis (2013) use Dynamic Model Selection (DMS) andDynamic Model Averaging (DMA) to select models or pool forecast from different models, wherethe selection criterion (and consequently the model weights) are derived from the predictivedensity. A similar strategy, although based on the inverse MSE obtained with alternativespecifications, is followed by Carriero et al. (2009) to select dynamically the prior for a largeBVAR. In a frequentist setting Kapetanios et al. (2008) find that model averaging techniquescan be beneficial and that weights derived from frequentist model selection criteria perform verywell, dominating in a large number of cases the Bayesian Model Averaging scheme advocatedby Wright (2009). Model selection and model pooling are also studied by Kuzin et al. (2012)in the context of nowcasting GDP. The authors compare the performance of many alternativespecifications and find that pooling forecasts from different models/estimation methods yieldssizeable gains in terms of predictive accuracy, in particular when the forecasts are averagedusing inverse MSE weights. From a theoretical standpoint, the benefits of combining forecastsgenerated from models with different rates of data discounting are analyzed by Pesaran andPick (2011). They show that averaging forecasts over different estimation windows reduces theforecast bias and mean squared forecast errors, provided that breaks are not too small.

We proceed by using inverse MSE criterion and for each model specification compute (i) ateach time t (ii) for each variable and (iii) for each forecast horizon the out of sample MSE overthe available observations. Model weights are therefore given by:

Wj,h,t =1

(ΣT−ht=1 (1/(T − h)) [FEj,h,t]

2)(25)

where FEj,h,t is the forecast error for variable j, forecast horizon h computed at time t (i.e.based on the forecast formulated at time t−h). We experiment also with a modification of suchcriterion in which the importance of past forecast errors is progressively discounted through anExponentially Weighted Moving Average (EWMA) kernel. In this case the model weights arecomputed as:

WEWMAj,h,t =

1

ΣT−ht=1 (1/(T − h))wt(ρ) [FEj,h,t]

2 (26)

where wt(ρ) are the EMWA weights. We set the smoothing parameters ρ = .96, in line with

14

the parametrisation used by Koop and Korobilis (2013) to discount past predictive densities.Having defined the criteria to compute the model weights, we next define the different model

spaces we will be working with. Notice that each point in the double grid Hgrid, λgrid definesa different model so that the whole model space is spanned by these two grids. In addition weselect three subspaces of interest: first the model specifications obtained on the basis of the Lfitcriterion (M1 to M4 following the definition used in Table 2), second the model specificationsobtained on the basis of the Lfit criterion (M5 to M8 in Table 2), finally the union of thesetwo (M1 to M8).

To justify the interest in such subspaces notice that one crucial difference between selectingmodels through the weights (25) and (26) and simply forecasting with VAR that are specifiedthrough the Lfit and the Lmse criteria is that the model weights are both variable specific(through the index j) and horizon specific (through the index h). This means that we aregoing to let the values of λ and H change depending on the target variable and on the forecasthorizon considered, therefore introducing an extra degree of flexibility. We then want to checkwhether, once this extra degree of flexibility is introduced, either specification strategy (theunion of M1 to M4 and the union of M5 to M8) emerges as uniformly superior, or if specifyingex ante the VAR with either criterion (M1 to M8) helps at all.

To further facilitate the reading of the results that we will discuss below, the full list ofmodel spaces and sub-spaces is described in Table 5, where it is clarified that combining thefour model spaces described above with the two different weighting schemes (inverse MSE andinverse EWMA-SE) amounts to 8 possible ways to select models (S1 to S8). Finally, sinceforecast combination schemes based on equal weights are usually found to perform remarkablywell, when pooling forecasts we will also use equal weights. The list of pooling methods (W1to W12) is reported separately in Table 6.

The results obtained with model selection for the 20 and 78 variables VAR are described inFigures 5 and 6. The plots are organised in three panels corresponding to the three differenttarget variables, CPI, employment and the Fed Fund Rates. The eight groups of bar plots fromS1 to S8 show the relative RMSE obtained with the corresponding model selection methodsrelative to those obtained with the benchmark large BVAR with constant parameters. Wealso report the results obtained with a Random Walk model (RW). Within each group, eachindividual bar indicates a different forecast horizon going (leftwards) from 1 to 24 monthsahead. Bars in grey identify the forecast horizons for which a Diebold-Mariano test does notreject the null hypothesis of equal forecast accuracy, while those in red denote the cases forwhich forecast accuracy is significantly different at the 10% confidence level.5

Starting from the 20 variables VAR it can be seen that most selection methods deliverforecasts that improve upon the benchmark. However, uncertainty around these differences re-mains relatively high so that for CPI and Employment the Diebold Mariano test generally does

5The test is two sided so that bars in red and higher than 1 indicate that the forecast of the benchmarkmodel is significantly more accurate.

15

not reject the null hypothesis of equal forecast accuracy. For the Fed Fund rates, on the otherhand, in a large number of occurrences the forecasts obtained by the various selection methodsare significantly more accurate than those of the large BVAR with constant parameters.

The best performing selection methods are S4 (selection through inverse MSE weights acrossmodels that are specified with the Lmse criterion) and the methods that explore the wholemodel space (S7 and S8). The latter result is especially noteworthy since it indicates that thespecification methods Lfit and Lmse, which force the coefficients to share the same penaltyregardless of the target variable to be forecast and of the forecast horizon, can be somewhatrestrictive. The improvement in forecast accuracy generally increases with the forecast horizon.

The RW model delivers forecasts that are (often significantly) less accurate than those ofthe benchmark model, and, consequently, than those obtained with the time varying VAR,with the exception of the Fed Fund rates.6

For the 78 variables VAR the results are again less satisfactory for most model selectionschemes. However, the use of selection methods that explore all the possible specifications(S7 and S8) allows the time varying models to improve upon the benchmark also in this case,especially at longer horizons.

The results obtained by pooling forecasts are organised in a similar fashion in Figures 7 and8, where the groups of bars going from W1 to W12 correspond to the pooling methods listedin Table 6.

In the case of the 20 variables VAR (Figure 7) the forecasts obtained by pooling predictionsfrom the different time varying models prove to be more accurate than those obtained from thebenchmark at almost all horizons. Furthermore, according to the Diebold Mariano test, theimprovement is statistically significant at the 10% confidence level, as evident from the largeprevalence of red bars. There is also a tendency of the relative RMSEs to fall as the forecasthorizon increases, which was already apparent in the results obtained for the model selectionstrategies S6 and S7, especially for Employment and CPI, suggesting the time variation in theVAR coefficients is relatively more important for longer than for shorter horizon forecasts.

The differences across weighting schemes and across model spaces are minor, as the variouspooling strategies (including those that assign equal weights, W3, W6, W9 and W12) deliververy similar results. In line with the literature, forecast pooling therefore emerges as a robustmethod to deal with model uncertainty.

For the 78 variables VAR the use of forecast averaging yields a strong improvement in theperformance of the time varying models as for most combinations the bars are now lower than1 and the null hypothesis of equal forecast accuracy is rejected in a large number of cases.

6Similar results were found by Banbura et al. (2010) who suggest to further constraint the autoregressivecoefficients to sum up to 1 to get more precise forecasts for the Fed Fund Rates. Since the focus of our analysisis to assess the gains from time variation in the coefficients, rather than comparing a multivariate model witha univariate benchmark, we do not further pursue this line of investigation.

16

4 A comparison with the parametric estimator

An alternative approach to the estimation of a large time-varying parameter Vector Au-toregression (TVP-VAR) is the one developed by Koop and Korobilis (2013). The model spec-ification follows closely the literature on (small) TVP-VARs proposed by Cogley and Sargent(2005) and Primiceri (2005), i.e. it assumes a random walk evolution of the VAR coefficients.The model can be cast in state space form, where the VAR equations

yt = Ztβt + εt (27)

serve as measurement equations and the unobserved states, the parameters βt, evolve as driftlessrandom walks:

βt+1 = βt + ut+1 (28)

The assumption on the distributions of the random disturbances is that εt ∼ N(0,Σt) andut ∼ N(0, Qt). Also εt and ut are independent of one another and serially uncorrelated. Inprinciple one could estimate this model by Bayesian methods, using the algorithms developedby Cogley and Sargent (2005) and Primiceri (2005), which involves obtaining posterior drawsof the system matrices Σt and Qt and then using a Kalman filter based simulation smoother toobtain an estimate of the parameters βt. However, even for medium-sized VARs this estimationprocedure becomes unfeasible due to computational limits. To overcome these difficulties,following Ljung (1992) and Sargent (1999), Koop and Korobilis (2013) make two simplifyingassumptions. The first one involves the matrix Qt, which is specified as follows:

Qt =

(1− θθ

)Pt−1/t−1 (29)

where Pt−1/t−1 is the estimated covariance matrix of the unobserved states βt−1 conditional ondata up to t− 1 and θ is a forgetting factor (0 < θ < 1). Equation (29) states that the amountof time variation of the model parameters at time t is a small fraction of the uncertainty onthe unobserved state βt, so that large uncertainty on the value of the state at time t translatesinto stronger parameter time variation. A similar simplifying assumption on Σt ensures thatthis matrix can be estimated by suitably discounting past squared one step ahead predictionerrors:

Σt = κΣt−1 + (1− κ)vtv′t (30)

where vt = yt − Ztβt/t−1. These assumptions make the system matrices Qt and Σt (whichare an input to the Kalman filter at time t) a function of the t-1 output of the Kalman filteritself. This recursive structure implies that, given an initial condition and the two constantsθ and κ, an estimate of the coefficients βt can be obtained through a single Kalman filterpass. Although it is laid out in a Bayesian spirit, the restrictions imposed on the Kalman filterrecursions reduce the estimation procedure to a discounted least squares algorithm, as shown

17

by Delle Monache and Petrella (2014), Section 2. Koop and Korobilis also suggest to impose aprior on the initial value of the parameters, β1 to discipline the estimation towards values thatare a priori plausible.

The use of a parametric model, and the simplifications imposed on the model structureto make the estimation feasible, do not come without costs. One potential pitfall is thatthe model assumes a very specific evolution for the model parameters. The driftless randomwalk assumption, widely used in econometrics and macroeconomics, does not have any othergrounding than computational convenience, since it results in a very parsimonious state spacemodel. If the true data generating process (DGP) is, however, very different from the oneposited, the model is misspecified and this could result in poor performance. The secondissue is that the curse of dimensionality is only partially solved. For 20 variables and 4 lags (astandard application with quarterly data) the stacked vector βt contains 1620 elements. Largermodel sizes (arising from a higher number of series in the system or by a higher number oflags, like the 13 lags conventionally used with monthly data in levels) are untractable in thissetup. Finally, since the only source of time variation in the model is the prediction error, itcan be shown that this forgetting factor algorithm boils down to an exponential smoothingestimator (Delle Monache and Petrella, 2014). This means that the effect of the prior on theinitial condition β1 will die out relatively quickly. Also, the longer the sample size, the lowerthe effect of the prior on the parameter estimates. In contrast, in the kernel estimator thestochastic constraints are effective at each point in time.

We assess the relative merits of the nonparametric and of the parametric estimator bycomparing their forecast accuracy in a Monte Carlo exercise and on actual data. The controlledenvironment provided by the Monte Carlo exercise allows us to evaluate how robust the twomethods are to different assumptions on the law of motion of the model parameters. Theapplication on actual data will inform us on how this is actually relevant in practical forecasting.For both methods we need to specify a number of constants. In the case of the nonparametricmethod, given the good performance of forecast pooling, we use model averaging with eitherinverse RMSE or equal weights. For the parametric estimator we follow the dynamic modelselection (DMS) algorithm proposed by Koop and Korobilis (2013), as discussed in AppendixA.1.

4.1 Monte Carlo evidence

We consider three alternative DGPs. In the first DGP (DGP-1) we assume that the coeffi-cients follow a random walk plus noise process:

Yt = ΛtYt−1 + εt

Λt = Λt−1 + ηt

18

To make the stochastic process of the coefficients consistent with the Litterman prior we boundthe first autoregressive parameter to lie between 0.85 and 1.7 In the second one (DGP-2) welet the coefficients break only occasionally rather than at each time t:

Λt = (1− I(τ))Λt−1 + I(τ)Λt−1 + ηt

The probability of the coefficients breaking equals a constant τ that we set to .025, implyingthat, with quarterly data, we would observe on average a discrete break once every ten years.We also relax the bounds on Λt, which we let fluctuate randomly between 0 and 1.8 In the thirdset of simulations (DGP-3) coefficients evolve as a sine functions and are bounded between -1and 1:

Λt = sin(10πt/T ) + ηt

In all DGPs we assume ηt v N(0, 1) and random walk stochastic volatilities for the measure-ment equations:

εit = uit exp(λit)

λit = λit−1 + νit

where uit v N(0, 1) and νit v N(0, ση). We calibrate ση = 0.01.9

The results of the Monte Carlo exercise are shown in Table 7. The methods are comparedin terms of 1 step ahead RMSE (averaged across the n variables and relative to that of theparametric estimator) for VAR of different sizes (n = 7 and n = 15) and for different sam-ple sizes (100, 150 and 200). Forecasts are computed on the second half of the sample, i.e.when T=100, forecasts are computed recursively for t=51 to t=100, when T=150 forecasts arecomputed recursively for t=76 to t=150 and so forth.

In the case of DGP-1 the performance of the two estimation methods is broadly comparable,with the parametric estimator improving slightly (by at most 2%) on the nonparametric oneonly for VARs of larger sizes. Notice that in this context one would expect the parametricestimator to have an edge, given the tight correspondence between the assumptions made bythe model and the actual DGP. The gain attained by this method proves, however, negligible.

When we move to DGP-1 and DGP-2 the relative performance of the nonparametric estima-

7Details on how this is achieved are presented in Appendix A.1.8With tight boundaries (like the 0.85-1 interval imposed in DGP-1) the difference between coefficients that

break only occasionally and coefficients that drift slowly, is negligible.9Notice that this value for ση is quite large. Cogley and Sargent (2005) for example, assume that a priori

ση is distributed as an inverse gamma with a single degree of freedom and scale parameter 0.012. Since thescale parameter can be interpreted as the (prior) sum of square residuals, this means that a priori they set thevariance of the innovations to the log-volatility to 0.012/T . Assuming T = 100, the prior variance is 10−6 , asopposed to our choice of 10−2. This means that we are stacking the deck against our nonparametric estimatorthat, in the version without GLS correction used in this exercise, does not take into account explicitly changingvariances.

19

tor improves steadily, with gains of the order of 15% in the case of DGP-3 and n=15. Althoughthese DGPs are probably less representative of the typical relationship across macroeconomictime series, they do unveil some fragility of that Kalman filter based method, whose perfor-mance rapidly deteriorates when the behaviour of the coefficients moves further and furtheraway from the random walk setting. The nonparametric estimator, on the other and, not onlyproves robust to heteroscedastic errors but also to a wide range of different specifications ofthe coefficients.

4.2 Empirical evidence

A comparison of the empirical performance of the nonparametric and parametric TVP-VAR needs to take into account the computational limitations to which the latter is subject.This means that a horse race based on monthly VARs with 13 lags, like those employed in theprevious section, is unfeasible. We therefore proceed by taking quarterly transformations ofthe variables and specify a 20 variables VAR with 4 lags, roughly corresponding to the largestmodel size considered by Koop and Korobilis (2013). The forecast exercise is similar to the oneperformed on monthly data, that is we produce 1 to 8 quarters ahead forecasts of the threekey variables in our dataset, CPI, the Fed Fund Rates and payroll employment, with an outsample period ranging from 1970:q1 to 2013:q2 (167 data points).

Figure 9 presents the RMSEs of the kernel based estimator relative to those of the paramet-ric one. Again, we use red bars to highlight the cases where a Diebold-Mariano test rejects thenull hypothesis of equal forecast accuracy. Visual inspection of the graph reveals that the non-parametric estimator generates significantly better predictions for inflation and employment,while the parametric estimator is more accurate in forecasting short term interest rates. Asfor the remaining variables, the only case in which the Diebold-Mariano test rejects in favourof the parametric estimator is for the 10 year rate and for M1 at very short horizons, whilefor the remaining 10 indicators the evidence is either in favour of the nonparametric approach(red bars lower than one) or inconclusive (grey bars).

In sum, this exercise suggests that in the context of forecasting with large VARs the non-parametric estimator coupled with stochastic constraints tends to outperform the Kalman filterbased parametric estimator. While this does not constitute conclusive evidence in favour ofone or another approach, we believe that, given the good forecasting performance and its com-putational efficiency, the method we propose in this paper provides a serious benchmark forforecasting with large data sets.

5 Structural analysis

As a final empirical exercise we use the proposed method to estimate the time varyingresponses of sectoral industrial production indexes to an oil price shock. The changing response

20

of key macroeconomic variables to unexpected oil price increases has been greatly debated inthe past decade. In particular, using structural VARs and different identification assumptionsa number of studies have found that oil price increases are associated with smaller losses in U.S.output in more recent years. While some of these studies have used sample-split approaches,see Blanchard and Gali (2007) and Blanchard and Riggi (2013), others have relied on state ofthe art Bayesian VAR with drifting coefficients and volatilities, see Baumeister and Peersman(2013) and Hahn and Mestre (2011) for an application to the euro area. A standard VARapproach, however, severely constraints the size of the system to be estimated so that only asmall number of variables can be jointly modeled. As a consequence, available evidence mainlyrefers to aggregate output, and these studies have typically attributed the declining impact ofoil prices to improved monetary policy, a smaller share of oil in production, or more flexiblelabor markets. Where a less aggregate perspective is taken, like in Edelstein and Kilian (2009)who study the response of consumption of motor vehicles to energy price shocks, a connectionwith the remaining sectors of the economy is missing.

The sectoral aspect of the effects of energy price shocks is, however, quite relevant. Forindustries that have a large share of oil in production (like chemicals) oil price shocks mainlyaffect output through supply/costs, while for others (in particular for the automobile industry)the effect goes mainly through demand. Consumer demand is curbed by unexpected energyprice increases through (i) a reduction of disposable income (ii) an increase in uncertaintyon the future path of energy prices and on the business cycle, which creates an incentive forprecautionary savings and for the postponement of the purchase of energy intensive durablegoods. A joint investigation of the time varying response of sectoral industrial productionindexes to unexpected oil price increases can then shed some light on whether the importanceof these two channels has changed over time. This, in turn, could be an important input formonetary policy that, according to theoretical models, should accommodate supply shocks butlean against demand shocks.

To investigate this question we augment the baseline 20 variables VAR with 8 industrialproduction series split by market destination.10 The additional series are Business Equip-ment, Consumer Goods and its two sub-components Durable (half of which is accounted forby Automotive products) and Nondurable (Food, Clothing, Chemical and Paper products),Final Product goods (Construction and Business Supplies and Defence and Space equipment),Material goods and its two sub-components Durable (Consumer and Equipment parts) andNondurable (Textile, Paper and Chemical). A list of the series, together with their weight onthe overall index, is reported in Table 8. We follow Blanchard and Gali (2007) and Blanchardand Riggi (2013) and identify an oil price shock assuming that unexpected changes in the priceof oil are exogenous relative to contemporaneous movements in the other variables in the sys-

10For the structural analysis we follow Giraitis et al. (2012b) and use a two sided Gaussian kernel withsmoothing parameter H = 0.5. The penalty parameter λ is chosen over the full sample through the Lfitcriterion.

21

tem.11 While much debate has surrounded such an identification scheme, see Lippi and Nobili(2012) for a discussion, we believe this assumption can serve the purpose of isolating oil priceshocks from other disturbances and maintain it for its simplicity.

The effect on overall industrial output of an oil price shock is depicted in Figure 10. Con-firming previous findings, an oil price shock generates a pronounced fall of industrial output.Furthermore, in line with results in Blanchard and Riggi (2013), the recessionary impact ofan exogenous oil price disturbance is generally more severe in the 70s than in later decades.Notice, however that the difference across the two sub-samples is entirely accounted for bythe very early 70s, a finding that cannot be uncovered with the simple sample-split strategyconsidered therein and that validates the use of time-varying coefficients models.

Turning to the sectoral analysis, in Figure 11 we report the effects of the oil price shockon the 8 sub-sectors listed above. The Impulse Response Functions (IRFs) of industrial pro-duction in the Business Equipment (BUSEQ), Material output (IPMAT) and Final Products(IPFINAL) sectors all display a similar behaviour. Throughout the sample they generally fallfollowing an energy price shock, and the decrease is stronger in the early 1970s. Moreover,the behaviour of Material goods (IPMAT) is dominated by that of its Durable component(IPDMAT) while the effect of an oil price shock on Nondurable materials (IPNMAT) has, ifanything, intensified over time.

The response of Consumer goods (IPCONGD), on the other hand, does not appear to havechanged dramatically over time, as it fluctuates within quite a narrow range (between -0.01and -0.06) three years after the shock. This relative stability reflects mainly the very mutedresponse of the Nondurable component (IPNCONGD, mainly processed food products), whichaccounts for around 80% of the overall sub-index. On the other hand, the production of Durableconsumption goods (IPDCONGD, which includes motor vehicles) is much more reactive to oilprices, but the magnitude of its response has, if anything, increased rather than declined overtime. This outcome broadly supports the analysis in Ramey and Vine (2011), who studythe automotive industry in detail and find very little change in response of motor vehicleconsumption to oil price shocks.

Overall, based on these results, the declining role of oil prices in shaping U.S. business cyclefluctuations seems to stem from lower supply effects in sectors related to Business Equipmentand Materials rather than through weaker demand effects on vehicles.

6 Conclusions

In this paper we propose an estimator for large dimensional VAR models with flexibleparameter structure, capable of accommodating breaks in the relationships among economictime series. Our procedure is based on the mixed estimator by Theil and Goldberger (1960),

11In a recursive framework this amounts to ordering oil prices as the first variable in the VAR. Like Blanchardand Gali (2007) we define an oil price shock as an unexpected increase in the nominal price of oil.

22

which imposes stochastic constraints on the model coefficients, and on the nonparametricVAR estimator proposed by Giraitis et al. (2014). The use of stochastic constraints mimicsin a classical context the role of the prior in Bayesian models, allowing to bypass the over-fitting problem that arises in large dimensional models. We discuss various aspects of thepractical implementation of the estimator, based on two alternative (fit and forecasting) criteria.We illustrate the implementation of the proposed estimator in a forecasting exercise and forstructural analysis.

In the forecasting exercise, we model up to 78 U.S. macroeconomic time series and we findthat the introduction of time variation in the VAR model parameters yields an improvementin prediction accuracy over models with a constant parameter structure, in particular whenforecast combination is used to pool forecasts obtained with models with different degrees oftime variation and penalty parameters. The nonparametric estimator also compares well withthe alternative parametric approach by Koop and Korobilis (2013).

As an example of a structural analysis, we analyse the changing effects of oil price shocks oneconomic activity, a question that has spurred a large number of studies in the applied macroliterature in recent years. Using a large structural VAR, we find that the declining role of oilprices in shaping U.S. business cycle fluctuations stems from lower supply effects in sectorsrelated to Business Equipment and Materials, rather than through weaker demand effects onvehicles as argued by part of the literature.

Overall, we believe that our findings illustrate how the econometric tool that we haveproposed opens the door to a number of interesting analyses on forecasting and on the nonlineartransmission of shocks, which have been so far constrained by computational issues.

23

No. Acronym (FRED database) Description SA Logs Prior mean

1 AAA Interest rates on AAA bonds Not Seasonally Adjusted 0 1

2 AHEMAN Average Hourly Earnings Of Production And Nonsupervisory Employees: Manufacturing Not Seasonally Adjusted 1 1

3 AWHMAN Average Weekly Hours of Production and Nonsupervisory Employees: Seasonally Adjusted 1 0

4 AWOTMAN Average Weekly Overtime Hours of Production and Nonsupervisory Seasonally Adjusted 1 0

5 BAA Interest rates on BAA bonds Not Seasonally Adjusted 1 1

6 CE16OV Civilian Employment Seasonally Adjusted 1 1

7 CPIAPPSL Consumer Price Index for All Urban Consumers: Apparel Seasonally Adjusted 1 1

8 CPIAUCSL Consumer Price Index for All Urban Consumers: All Items Seasonally Adjusted 1 1

9 CPILFESL Consumer Price Index for All Urban Consumers: All Items Less Food & Seasonally Adjusted 1 1

10 CPIMEDSL Consumer Price Index for All Urban Consumers: Medical Care Seasonally Adjusted 1 1

11 CPITRNSL Consumer Price Index for All Urban Consumers: Transportation Seasonally Adjusted 1 1

12 CPIULFSL Consumer Price Index for All Urban Consumers: All Items Less Food Seasonally Adjusted 1 1

13 DMANEMP All Employees: Durable goods Seasonally Adjusted 1 0

14 DSPIC96 Real Disposable Personal Income Seasonally Adjusted 1 1

15 DPCERA3M086SBEA Real personal consumption expenditures (chain-type quantity index) Seasonally Adjusted 1 1

16 FEDFUNDS Effective Federal Funds Rate Not Seasonally Adjusted 0 1

17 GS1 1-Year Treasury Constant Maturity Rate Not Seasonally Adjusted 0 1



20 HOUST Housing Starts: Total: New Privately Owned Housing Units Started Seasonally Adjusted Annual Rate 1 0

21 HOUSTMW Housing Starts in Midwest Census Region Seasonally Adjusted Annual Rate 1 0

22 HOUSTNE Housing Starts in Northeast Census Region Seasonally Adjusted Annual Rate 1 0

23 HOUSTS Housing Starts in South Census Region Seasonally Adjusted Annual Rate 1 0

24 HOUSTW Housing Starts in West Census Region Seasonally Adjusted Annual Rate 1 0

25 INDPRO Industrial Production Index Seasonally Adjusted 1 1

26 IPBUSEQ Industrial Production: Business Equipment Seasonally Adjusted 1 1

27 IPCONGD Industrial Production: Consumer Goods Seasonally Adjusted 1 1

28 IPDCONGD Industrial Production: Durable Consumer Goods Seasonally Adjusted 1 1

29 IPDMAT Industrial Production: Durable Materials Seasonally Adjusted 1 1

30 IPFINAL Industrial Production: Final Products (Market Group) Seasonally Adjusted 1 1

31 IPMAT Industrial Production: Materials Seasonally Adjusted 1 1

32 IPNCONGD Industrial Production: Nondurable Consumer Goods Seasonally Adjusted 1 1

33 IPNMAT Industrial Production: nondurable Materials Seasonally Adjusted 1 1

34 LOANS Loans and Leases in Bank Credit, All Commercial Banks Seasonally Adjusted 1 1

35 M1SL M1 Money Stock Seasonally Adjusted 1 1

36 M2SL M2 Money Stock Seasonally Adjusted 1 1

37 MANEMP All Employees: Manufacturing Seasonally Adjusted 1 0

38 NAPM ISM Manufacturing: PMI Composite Index Seasonally Adjusted 0 0

39 NAPMEI Seasonally Adjusted 0 0

40 NAPMII [] Not Seasonally Adjusted 0 0

41 NAPMNOI ISM Manufacturing: New Orders Index Seasonally Adjusted 0 0

42 NAPMPI [] Seasonally Adjusted 0 0

43 NAPMSDI [] Seasonally Adjusted 0 0

44 NDMANEMP All Employees: Nondurable goods Seasonally Adjusted 1 0

45 OILPRICE [] Not Seasonally Adjusted 1 1

46 PAYEMS All Employees: Total nonfarm Seasonally Adjusted 1 1

47 PCEPI Personal Consumption Expenditures: Chain-type Price Index Seasonally Adjusted 1 1

48 PERMIT New Private Housing Units Authorized by Building Permits Seasonally Adjusted Annual Rate 1 0

49 PERMITMW New Private Housing Units Authorized by Building Permits in the Seasonally Adjusted Annual Rate 1 0

50 PERMITNE New Private Housing Units Authorized by Building Permits in the Seasonally Adjusted Annual Rate 1 0

51 PERMITS New Private Housing Units Authorized by Building Permits in the South Seasonally Adjusted Annual Rate 1 0

52 PERMITW New Private Housing Units Authorized by Building Permits in the West Seasonally Adjusted Annual Rate 1 0

53 PI Personal Income Seasonally Adjusted Annual Rate 1 1

54 PPIACO Producer Price Index: All Commodities Not Seasonally Adjusted 1 1

55 PPICRM Producer Price Index: Crude Materials for Further Processing Seasonally Adjusted 1 1

56 PPIFCG Producer Price Index: Finished Consumer Goods Seasonally Adjusted 1 1

57 PPIFGS Producer Price Index: Finished Goods Seasonally Adjusted 1 1

58 PPIITM Producer Price Index: Intermediate Materials: Supplies & Components Seasonally Adjusted 1 1

59 SandP S&P 500 Stock Price Index 1 1

60 SRVPRD All Employees: Service-Providing Industries Seasonally Adjusted 1 1

61 TB3MS 3-Month Treasury Bill: Secondary Market Rate Not Seasonally Adjusted 0 1

62 TB6MS 6-Month Treasury Bill: Secondary Market Rate Not Seasonally Adjusted 0 1

63 UEMP15OV Number of Civilians Unemployed for 15 Weeks & Over Seasonally Adjusted 1 0

64 UEMP15T26 Number of Civilians Unemployed for 15 to 26 Weeks Seasonally Adjusted 1 0

65 UEMP27OV Number of Civilians Unemployed for 27 Weeks and Over Seasonally Adjusted 1 0

66 UEMP5TO14 Number of Civilians Unemployed for 5 to 14 Weeks Seasonally Adjusted 1 0

67 UEMPLT5 Number of Civilians Unemployed - Less Than 5 Weeks Seasonally Adjusted 1 0

68 UEMPMEAN Average (Mean) Duration of Unemployment Seasonally Adjusted 1 0

69 UNRATE Civilian Unemployment Rate Seasonally Adjusted 0 0

70 USCONS All Employees: Construction Seasonally Adjusted 1 1

71 USFIRE All Employees: Financial Activities Seasonally Adjusted 1 1

72 USGOOD All Employees: Goods-Producing Industries Seasonally Adjusted 1 0

73 USGOVT All Employees: Government Seasonally Adjusted 1 1

74 USMINE All Employees: Mining and logging Seasonally Adjusted 1 0

75 USPRIV All Employees: Total Private Industries Seasonally Adjusted 1 1

76 USTPU All Employees: Trade, Transportation & Utilities Seasonally Adjusted 1 1

77 USTRADE All Employees: Retail Trade Seasonally Adjusted 1 1

78 USWTRADE All Employees: Wholesale Trade Seasonally Adjusted 1 1

Table 1: Data description

24

Selection λ Hmethod

0.50.6

M1 Lfit Optimized at each t 0.70.80.910.50.6

M2 Lfit Optimized in pre-sample and then fixed 0.70.80.91

M3 Lfit Optimized at each t Optimized at each tM4 Lfit Optimized in pre-sample and then fixed Optimized at each t

0.50.6

M5 Lmse Optimized at each t 0.70.80.910.50.6

M6 Lmse Optimized in pre-sample and then fixed 0.70.80.91

M7 Lmse Optimized at each t Optimized at each tM8 Lmse Optimized in pre-sample and then fixed Optimized at each t

Table 2: Model specifications

25

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lfit

(λ Time Varying)

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lfit

(λ Fixed in presample)

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lmse

(λ Time Varying)

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lmse


Figure 1: Optimal H - 20 variables

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lfit

(λ Time Varying)

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lfit


1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lmse

(λ Time Varying)

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.4

0.6

0.8

1

Lmse


Figure 2: Optimal H - 78 variables

26

1970 1975 1980 1985 1990 1995 2000 2005 20100

0.2

0.4

0.6

0.8

1

Lfit

Lmse

Figure 3: Optimal λ - 20 variables

1970 1975 1980 1985 1990 1995 2000 2005 20100

0.2

0.4

0.6

0.8

1

Lfit

Lmse

Figure 4: Optimal λ - 78 variables

27

Table 3: Forecast accuracy: 20 variables

CPI EMP FFRATEλ H Forecast horizon Forecast horizon Forecast horizon

6 12 18 24 6 12 18 24 6 12 18 24

Lfit

Opt.

0.5 1.09 1.01 0.94 0.88 1.36 1.16 1.03 0.90 0.85 0.86 0.87 0.930.6 1.09 1.04 1.00 0.94 1.34 1.16 1.03 0.90 0.86 0.88 0.89 0.940.7 0.92 0.95 1.02 1.03 1.19 1.10 1.04 1.00 0.94 0.97 0.99 0.980.8 0.97 0.96 0.97 0.98 0.95 0.94 0.93 0.92 0.94 0.94 0.93 0.940.9 0.98 0.98 0.97 0.96 0.96 0.97 0.97 0.96 0.96 0.95 0.93 0.941 0.98 0.98 0.96 0.95 0.97 0.98 0.98 0.96 0.96 0.95 0.93 0.93

Fixed

0.5 1.09 1.11 1.19 1.30 0.96 1.04 1.08 1.08 0.99 1.05 1.12 1.260.6 0.95 0.99 1.05 1.09 0.92 0.93 0.92 0.91 0.97 1.02 1.07 1.160.7 0.98 1.01 1.07 1.09 0.95 0.97 0.97 0.97 1.02 1.06 1.08 1.070.8 0.97 0.97 0.98 0.99 0.95 0.97 0.97 0.97 0.98 0.97 0.96 0.950.9 0.99 0.98 0.97 0.96 0.97 0.98 0.98 0.98 0.97 0.95 0.94 0.941 0.99 0.98 0.96 0.95 0.98 0.99 0.99 0.98 0.96 0.95 0.93 0.94

Opt. OPT 0.98 0.99 0.98 0.98 0.94 0.96 0.97 0.98 0.98 0.97 0.95 0.94Fixed OPT 0.99 1.01 1.03 1.03 0.98 1.02 1.04 1.05 0.98 0.99 1.00 0.96

Lmse

Opt.

0.5 1.09 1.07 1.16 1.37 1.16 1.14 1.12 1.11 0.98 0.97 1.05 1.320.6 1.00 0.89 0.84 0.82 1.07 1.00 0.96 0.92 0.98 0.91 0.90 0.970.7 1.03 0.95 0.89 0.86 1.09 0.98 0.89 0.85 0.92 0.88 0.88 0.920.8 1.06 0.98 0.92 0.89 1.11 0.98 0.88 0.81 0.88 0.84 0.83 0.850.9 1.07 0.98 0.92 0.88 1.15 1.03 0.94 0.88 0.87 0.84 0.83 0.861 1.05 0.98 0.92 0.88 1.16 1.04 0.96 0.90 0.87 0.84 0.84 0.86

Fixed

0.5 1.14 1.27 1.49 1.82 1.01 1.09 1.16 1.19 1.02 1.15 1.31 1.720.6 0.98 1.05 1.14 1.18 0.95 0.98 1.02 1.02 1.03 1.10 1.17 1.240.7 0.97 0.98 1.00 1.02 0.93 0.92 0.90 0.89 0.99 1.01 1.02 1.030.8 0.96 0.95 0.94 0.93 0.94 0.93 0.90 0.87 0.94 0.93 0.91 0.910.9 0.98 0.96 0.94 0.92 0.98 0.97 0.95 0.92 0.94 0.92 0.90 0.911 0.98 0.95 0.93 0.92 0.99 0.98 0.96 0.92 0.93 0.92 0.90 0.91

Opt. OPT 1.04 0.97 0.95 0.97 1.05 0.98 0.95 0.92 0.94 0.88 0.87 0.96Fixed OPT 0.99 1.05 1.17 1.37 0.97 0.99 1.02 1.03 1.01 0.97 1.01 1.25

Note to Table 3. The tables show the RMSE obtained by models with time varying parameters described in

Table 2 relative to those obtained with the benchmark large BVAR with constant parameters. Values below 1

(highlighted in bold) imply that the model outperforms the benchmark. Values underlined indicate the cases

in which the Diebold Mariano test rejects the null hypothesis of equal forecast accuracy at the 10% confidence

level. The RMSE are computed on 499 forecast errors, from January 1970 to July 2013.

28

Table 4: Forecast accuracy: 78 variables

CPI EMP FFRATEλ H Forecast horizon Forecast horizon Forecast horizon

6 12 18 24 6 12 18 24 6 12 18 24

Lfit

Opt.

0.5 1.16 1.11 1.04 0.97 1.52 1.25 1.10 1.00 0.89 0.86 0.87 0.900.6 1.13 1.12 1.05 0.98 1.45 1.25 1.12 1.04 0.94 0.97 1.02 1.050.7 0.96 0.94 0.95 0.96 1.29 1.12 1.03 0.99 0.95 0.96 0.97 1.010.8 0.99 0.97 0.97 0.97 1.00 0.96 0.94 0.94 0.98 0.98 0.99 1.010.9 1.00 0.99 0.98 0.97 1.01 0.99 0.99 0.98 0.98 0.99 1.00 1.011 1.00 0.99 0.98 0.97 1.01 1.00 0.99 0.99 0.98 0.99 0.99 1.00

Fixed

0.5 1.13 1.14 1.11 1.06 1.57 1.40 1.32 1.29 0.98 1.00 1.06 1.130.6 0.99 0.94 0.89 0.86 1.06 0.99 0.93 0.90 1.00 1.01 1.08 1.180.7 0.97 0.97 1.02 1.05 0.98 0.99 1.01 1.05 0.99 1.04 1.08 1.130.8 1.00 0.99 1.01 1.02 1.00 1.01 1.01 1.01 0.99 1.01 1.02 1.040.9 1.01 1.00 0.99 0.99 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.001 1.01 1.00 0.99 0.99 1.00 1.00 1.00 1.00 0.99 0.99 0.99 0.99

Opt. OPT 1.01 1.00 1.00 1.01 1.00 0.99 0.98 1.00 1.01 1.03 1.04 1.06Fixed OPT 1.01 1.01 1.02 1.04 1.01 1.01 1.01 1.03 0.99 1.03 1.03 1.05

Lmse

Opt.

0.5 1.18 1.14 1.17 1.25 1.33 1.30 1.30 1.35 1.00 1.05 1.22 1.430.6 1.09 1.04 1.05 1.09 1.21 1.15 1.12 1.14 0.92 0.95 1.05 1.230.7 1.03 1.04 1.12 1.29 1.21 1.12 1.11 1.22 0.94 0.93 1.10 1.480.8 1.09 1.11 1.12 1.19 1.18 1.06 1.02 1.03 0.93 0.90 0.99 1.200.9 1.12 1.14 1.14 1.17 1.17 1.04 1.01 1.01 0.92 0.88 0.94 1.091 1.12 1.15 1.13 1.15 1.18 1.05 1.01 1.01 0.92 0.88 0.93 1.06

Fixed

0.5 1.10 1.11 1.10 1.09 1.12 1.14 1.16 1.20 1.11 1.17 1.31 1.500.6 1.03 1.07 1.18 1.28 0.96 1.00 1.03 1.08 1.01 1.15 1.31 1.540.7 1.00 1.03 1.13 1.23 1.01 1.04 1.08 1.14 1.00 1.04 1.13 1.300.8 1.03 1.08 1.15 1.25 1.03 1.05 1.06 1.10 1.00 1.01 1.12 1.320.9 1.03 1.08 1.12 1.19 1.02 1.03 1.05 1.07 0.99 0.99 1.06 1.201 1.03 1.07 1.11 1.17 1.02 1.03 1.04 1.06 0.99 0.99 1.05 1.17

Opt. OPT 1.13 1.19 1.25 1.34 1.19 1.08 1.05 1.06 0.99 1.00 1.13 1.36Fixed OPT 1.01 1.07 1.17 1.27 1.00 1.04 1.08 1.15 0.98 1.00 1.12 1.35

Note to Table 4. The tables show the RMSE obtained by models with time varying parameters described in

Table 2 relative to those obtained with the benchmark large BVAR with constant parameters. Values below 1

(highlighted in bold) imply that the model outperforms the benchmark. Values underlined indicate the cases

in which the Diebold Mariano test rejects the null hypothesis of equal forecast accuracy at the 10% confidence

level. The RMSE are computed on 499 forecast errors, from January 1970 to July 2013.

29

Inverse Mean Model spaces Model selection criterionS1 M1 to M4 Inverse EWMA of sq. forecast errorsS2 M1 to M4 Inverse Mean of sq. forecast errorsS3 M5 to M8 Inverse EWMA of sq. forecast errorsS4 M5 to M8 Inverse Mean of sq. forecast errorsS5 M1 to M8 Inverse EWMA of sq. forecast errorsS6 M1 to M8 Inverse Mean of sq. forecast errorsS7 All models on the grids Hgrid, λgrid Inverse EWMA of sq. forecast errorsS8 All models on the grids Hgrid, λgrid Inverse Mean of sq. forecast errors

Table 5: List of model selection methods

Model spaces Model weightsW1 M1 to M4 Inverse EWMA of sq. forecast errorsW2 M1 to M4 Inverse Mean of sq. forecast errorsW3 M1 to M4 Equal weightsW4 M5 to M8 Inverse EWMA of sq. forecast errorsW5 M5 to M8 Inverse Mean of sq. forecast errorsW6 M5 to M8 Equal weightsW7 M1 to M8 Inverse EWMA of sq. forecast errorsW8 M1 to M8 Inverse Mean of sq. forecast errorsW9 M1 to M8 Equal weightsW10 All models on the grids Hgrid, λgrid Inverse EWMA of sq. forecast errorsW11 All models on the grids Hgrid, λgrid Inverse Mean of sq. forecast errorsW12 All models on the grids Hgrid, λgrid Equal weights

Table 6: List of model pooling methods

Note to Tables 5 and 6. In order to preserve the real time nature of the exercise the forecast errors used to computethe selection/weighting criteria are those actually available at each time t, i.e. relative to the forecasts as of time t-h.The discounting parameter used for the EWMA method is 0.96.

30

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

cpi

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

empl

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

ffrate

Figure 5: Forecast accuracy comparison, model selection - 20 variables

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

cpi

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

empl

S1 S2 S3 S4 S5 S6 S7 S8 RW0.4

1

ffrate

Figure 6: Forecast accuracy comparison, model selection - 78 variables

Note to Figures 5 and 6. The bar plots show the RMSE obtained by models with time varying parametersand by the random walk (plus drift) relative to those obtained with the benchmark large BVAR withconstant parameters. Values below 1 therefore imply that the model outperforms the benchmark. Eachgroup of bars, from S1 to S8, indicates a different model selection method as described in Table 5, whileeach bar within the group indicates a different forecast horizon going (leftwards) from 1 to 24 monthsahead. Bars in grey indicate the forecast horizons for which a Diebold-Mariano test does not reject thenull hypothesis of equal forecast accuracy, while those in red denote the cases for which forecast accuracyis significantly different at the 10% confidence level.

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

cpi

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

empl

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

ffrate

Figure 7: Forecast accuracy comparison, model averaging - 20 variables

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

cpi

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

empl

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 RW0.4

1

ffrate

Figure 8: Forecast accuracy comparison, model averaging - 78 variables

Note to Figures 7 and 8. The bar plots show the RMSE obtained by models with time varying parametersand by the random walk (plus drift) relative to those obtained with the benchmark large BVAR withconstant parameters. Values below 1 therefore imply that the model outperforms the benchmark. Eachgroup of bars, from W1 to W12, indicates a different model averaging method as described in Table6, while each bar within the group indicates a different forecast horizon going (leftwards) from 1 to 24months ahead. Bars in grey indicate the forecast horizons for which a Diebold-Mariano test does notreject the null hypothesis of equal forecast accuracy, while those in red denote the cases for which forecastaccuracy is significantly different at the 10% confidence level.

Table 7: 1 step ahead, relative RMSEsT Parametric Non parametric

Inv. RMSE Equal weightsDGP-1 (Random walk coefficients)

n=7100 1 1.004 1.005150 1 0.999 1.000200 1 0.997 0.997

n=15100 1 1.021 1.024150 1 1.012 1.013200 1 1.006 1.007

DGP-2 (Occasionally breaking coefficients)n=7

100 1 0.96 0.96150 1 0.96 0.96200 1 0.96 0.96

n=15100 1 0.96 0.96150 1 0.95 0.95200 1 0.94 0.94

DGP-3 (Sine function coefficients)n=7

100 1 0.95 0.96150 1 0.96 0.98200 1 0.97 0.99

n=15100 1 0.87 0.88150 1 0.86 0.87200 1 0.85 0.86

Note to Table 7. The table shows the ratio between the one step ahead RMSE attained by, respectively,the nonparametric and the parametric model. Forecasts are computed on the second half of the sample,i.e. when T=100, forecasts are computed recursively for t=51 to t=100, when T=150 forecasts arecomputed recursively for t=76 to t=150 and when T=200 forecasts are computed recursively for t=101to t=200.

33

1 2 3 4 5 6 7 80

0.51

1.5

CPI

1 2 3 4 5 6 7 80

0.51

1.5

Fund rates

1 2 3 4 5 6 7 80

0.51

1.5

Non farm empl.

1 2 3 4 5 6 7 80

0.51

1.5

Wages−Manuf.

1 2 3 4 5 6 7 80

0.51

1.5

Disp. Income

1 2 3 4 5 6 7 80

0.51

1.5

PCE

1 2 3 4 5 6 7 80

0.51

1.5

10y Rate

1 2 3 4 5 6 7 80

0.51

1.5

Ind. Prod.

1 2 3 4 5 6 7 80

0.51

1.5

Bank Loans

1 2 3 4 5 6 7 80

0.51

1.5

M1

1 2 3 4 5 6 7 80

0.51

1.5

M2

1 2 3 4 5 6 7 80

0.51

1.5

PMI−Comp.

1 2 3 4 5 6 7 80

0.51

1.5

PMI−Orders

1 2 3 4 5 6 7 80

0.51

1.5

PPI

1 2 3 4 5 6 7 80

0.51

1.5

Stock Prices

1 2 3 4 5 6 7 80

0.51

1.5

Urate

Figure 9: Forecast accuracy, nonparametric and parametric estimators

Note to Figure 9. The bar plots show the ratio between the RMSE attained by, respectively, the nonpara-metric and the parametric model. Values below 1 imply that the nonparametric model outperforms theparametric one. Bars in grey indicate that the Diebold-Mariano test does not reject the null hypothesisof equal forecast accuracy, while those in red denote the cases for which forecast accuracy is significantlydifferent at the 10% confidence level.

34

Market group Acronym WeightIndustrial Production Index INDPRO 100Industrial Production: Business Equipment IPBUSEQ 9.18Industrial Production: Consumer Goods IPCONGD 27.2Industrial Production: Durable Consumer Goods IPDCONGD 5.59Industrial Production: Nondurable Consumer Goods IPNCONGD 21.62Industrial Production: Final Products (Market Group) IPFINAL 16.58Industrial Production: Materials IPMAT 47.03Industrial Production: Durable Materials IPDMAT 17.34Industrial Production: Nondurable Materials IPNMAT 11.44

Table 8: Industrial production indexes by market group

Note to Table 8. The shares of market groups refer to 2011 Value added in nominal terms.Nondurable consumer goods includes Consumer Energy products, which account for 5.7% of to-tal IP. We have excluded from the analysis Industrial Production of Energy Materials, whichis part of the Materials (IPMAT) group and accounts for 18.3% of overall output. Sourcehttp://www.federalreserve.gov/releases/g17/g17tab1.txt

35

Figure 10: Response of Industrial production (overall index) to a 10% oil price shock

36

Figure 11: Response of Industrial production (sectors) to a 10% oil price shock

A Comparison with the parametric estimator: technical

details

A.1 Monte Carlo exercise

In the Monte Carlo exercise the coefficients Λt are obtained through the following algorithm.

1. First, simulate n2 + n coefficients according to the chosen DGP.

2. The coefficients are then bounded by the largest one, so as to be lower or equal to 1 inabsolute value.

3. At each t the first n2 coefficients are orthogonalised through the Grahm-Shmidt proce-dure. Call the resulting orthonormal matrix Pt.

4. The last n coefficients are used to form a diagonal matrix Lt. At each point in time theelements of Lt (call them lt) are transformed using the following function

lt = 0.5(1 + θL + eps) + 0.5(1− θL − eps) arctan(lt)/ arctan(1)− eps

where lt are the input eigevenvalues (since they are scaled by the absolute value of theirmaximum they are between -1 and 1), θL is the desired lower bound and eps is a smallconstant that ensures that the upper bound is 1 − eps, that is strictly below 1. Theresulting function is relatively smooth, as it can be seen in Figure 11 where lt is plottedagainst the possible values of lt (on the x axis there are 201 equally spaced points between-1 and 1) and θL = 0.85.

5. Construct Ψt = PtLtP′t

A.2 Dynamic model selection

The estimation method of the parametric model suggested by Koop and Korobilis dependson a number of constants, the so called forgetting factor, θ, the prior tightness for the initialconditions λ and the constant κ. To select θ and λ we follow their dynamic model selection(DMS) algorithm. First the forgetting factor θ is made time varying as follows:

θt = θmin + (1− θmin)Lft (31)

where ft = −NINT (ε′t−1εt−1), NINT is the rounding to the nearest integer function, εt−1

are the one step ahead forecast errors, θ = 0.96, L = 1.1 (values calibrated to obtain aforgetting factor between 0.96 and 1). As for the prior tightness we use a grid of J values.Each point in this grid defines a new model. Weights for each model j (defined πt/t−1,j) are

38

obtained by Koop and Korobilis as a function of the predictive density at time t − 1 throughthe following recursions:

πt/t−1,j =παt−1/t−1,j∑Jl=1 π

αt−1/t−1,l

(32)

πt/t,j =πt/t−1,jpj(yt|yt−1)∑Jl=1 πt/t−1,lpl(yt|yt−1)

(33)

where pj(yt|yt−1) is the predictive likelihood. Since this is a function of the prediction errorsand of the prediction errors variance, which are part of the output of the Kalman filter, themodel weights can be computed at no cost along with the model parameters estimation. Notethat here a new forgetting factor appears, α, which discounts past predictive likelihoods andis set to 0.99. The constant κ is set to 0.96 throughout the exercise. At each point in time,forecast are obtained on the basis of the model with the highest weight πt/t−1,j.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10.85

0.9

0.95

1

Figure 12: Bounded coefficients

39

References

Banbura, Marta, Domenico Giannone, and Lucrezia Reichlin (2010), “Large Bayesian vectorauto regressions.” Journal of Applied Econometrics, 25, 71–92.

Baumeister, Christiane and Gert Peersman (2013), “Time-Varying Effects of Oil Supply Shockson the US Economy.” American Economic Journal: Macroeconomics, 5, 1–28.

Blanchard, Olivier J. and Jordi Gali (2007), “The Macroeconomic Effects of Oil Price Shocks:Why are the 2000s so different from the 1970s?” In International Dimensions of MonetaryPolicy, NBER Chapters, 373–421, National Bureau of Economic Research, Inc.

Blanchard, Olivier J. and Marianna Riggi (2013), “Why are the 2000s so different from the1970s? A structural interpretation of changes in the macroeconomic effects of oil prices.”Journal of the European Economic Association, 11, 1032–1052.

Carriero, A., G. Kapetanios, and M. Marcellino (2009), “Forecasting exchange rates with alarge bayesian var.” International Journal of Forecasting, 25, 400–417.

Cogley, Timothy and Thomas J. Sargent (2005), “Drift and volatilities: Monetary policies andoutcomes in the post wwii u.s.” Review of Economic Dynamics, 8, 262–302.

Creal, Drew, Siem Jan Koopman, and André Lucas (2013), “Generalized Autoregressive ScoreModels With Applications.” Journal of Applied Econometrics, 28, 777–795.

D’Agostino, Antonello, Domenico Giannone, and Paolo Surico (2006), “(Un)Predictability andmacroeconomic stability.” Working Paper Series 0605, European Central Bank.

De Mol, Christine, Domenico Giannone, and Lucrezia Reichlin (2008), “Forecasting using alarge number of predictors: Is bayesian shrinkage a valid alternative to principal compo-nents?” Journal of Econometrics, 146, 318–328.

Delle Monache, Davide and Ivan Petrella (2014), “Adaptive Models and Heavy Tails.” Birk-beck Working Papers in Economics and Finance 1409, Birkbeck, Department of Economics,Mathematics & Statistics.

Edelstein, Paul and Lutz Kilian (2009), “How sensitive are consumer expenditures to retailenergy prices?” Journal of Monetary Economics, 56, 766–779.

Eickmeier, Sandra, Wolfgang Lemke, and Massimiliano Marcellino (2015). Journal of the RoyalStatistical Society, Series A, forthcoming.

Giraitis, L., G. Kapetanios, and T. Yates (2014), “Inference on stochastic time-varying coeffi-cient models.” Journal of Econometrics, 179, 46–65.

40

Giraitis, Liudas, George Kapetanios, and Simon Price (2012a), “Adaptive forecasting in thepresence of recent and ongoing structural change.” Working Papers 12/02, Department ofEconomics, City University London.

Giraitis, Liudas, George Kapetanios, and Tony Yates (2012b), “Inference on multivariatestochastic time varying coefficient and variance models (in progress).” mimeo.

Hahn, Elke and Ricardo Mestre (2011), “The role of oil prices in the euro area economy sincethe 1970s.” Working Paper Series 1356, European Central Bank.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2003), The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. Number 9780387952840 in Statistics,Springer.

Kapetanios, George, Vincent Labhard, and Simon Price (2008), “Forecasting Using Bayesianand Information-Theoretic Model Averaging: An Application to U.K. Inflation.” Journal ofBusiness & Economic Statistics, 26, 33–41.

Koop, Gary and Dimitris Korobilis (2013), “Large time-varying parameter VARs.” Journal ofEconometrics, 177, 185–198.

Kuzin, Vladimir, Massimiliano Marcellino, and Christian Schumacher (2012), “Pooling versusmodel selection for nowcasting with many predictors: An application to german gdp.” Journalof Applied Econometrics, XX, XX.

Lippi, Francesco and Andrea Nobili (2012), “Oil And The Macroeconomy: A QuantitativeStructural Analysis.” Journal of the European Economic Association, 10, 1059–1083.

Mumtaz, Haroon and Paolo Surico (2012), “Evolving International Inflation Dynamics: WorldAnd Country-Specific Factors.” Journal of the European Economic Association, 10, 716–734.

Pesaran, M. Hashem and Andreas Pick (2011), “Forecast Combination Across Estimation Win-dows.” Journal of Business & Economic Statistics, 29, 307–318.

Pesaran, M. Hashem and Allan Timmermann (2007), “Selection of estimation window in thepresence of breaks.” Journal of Econometrics, 137, 134–161.

Primiceri, Giorgio (2005), “Time varying vector autoregressions and monetary policy.” TheReview of Economic Studies, 1, 465–474.

Raftery, P., M. Karny, and P. Ettler (2010), “ï£¡Online prediction under model uncertainty viadynamic model averaging: Application to a cold rolling mill.” Technometrics, 52, 52–66.

Ramey, Valerie A. and Daniel J. Vine (2011), “Oil, Automobiles, and the U.S. Economy: HowMuch Have Things Really Changed?” In NBER Macroeconomics Annual 2010, Volume 25,NBER Chapters, 333–367, National Bureau of Economic Research, Inc.

41

Rossi, Barbara, Atsushi Inoue, and Liu Jin (2014), “Optimal Window Selection in the Presenceof Possible Instabilities.” Discussion Paper 10168, CEPR.

Theil, Henri and A.S. Goldberger (1960), “On pure and mixed statistical estimation in econo-metrics.” International Economic Review, 2, 65–78.

Wright, Jonathan H. (2009), “Forecasting US inflation by Bayesian model averaging.” Journalof Forecasting, 28, 131–144.

42

Large Time-Varying Parameter VAR: A nonparametric...

Documents

Transcript of Large Time-Varying Parameter VAR: A nonparametric...