Post on 27-Mar-2018
Arbitrage and the Empirical Evaluationof Asset-Pricing Models
Zhenyu WangColumbia University
Xiaoyan ZhangCornell University ∗
September 23, 2003
Abstract
A good asset-pricing model should be arbitrage-free. Consumption-based nonlinearmodels are arbitrage-free, but linear factor models and their time-varying extensionsdo not necessarily preclude arbitrage. In this paper, we introduce a simulation-basedBayesian analysis of Hansen and Jagannathan’s two pricing-error measures, of whichthe second requires the correct models to be arbitrage-free while the first does not. Thearbitrage-free requirement is important to the empirical evaluation of time-varying andmulti-factor models, especially when they are used to price conditional portfolios. Usingthe first measure, we are confident that time-varying extensions improve upon a staticmodel, that the Fama-French three-factor model is better than single-factor models,and that the time-varying Fama-French model has substantially smaller pricing errorsthan the consumption-based nonlinear models. However, we do not have the sameconfidence in these statements if the second measure is used for comparing modelson conditional portfolios, because time-varying and multi-factor models often providearbitrage opportunities.
∗The authors thank Andrew Ang, Mike Chernov, Gur Huberman, Paul Glasserman, Bob Hodrick, MichaelJohannes, and Tano Santos for helpful discussions. Zhang acknowledges support by the Cornell TheoryCenter.
Arbitrage and the Empirical Evaluationof Asset-Pricing Models
Abstract
A good asset-pricing model should be arbitrage-free. Consumption-based nonlinearmodels are arbitrage-free, but linear factor models and their time-varying extensionsdo not necessarily preclude arbitrage. In this paper, we introduce a simulation-basedBayesian analysis of Hansen and Jagannathan’s two pricing-error measures, of whichthe second requires the correct models to be arbitrage-free while the first does not. Thearbitrage-free requirement is important to the empirical evaluation of time-varying andmulti-factor models, especially when they are used to price conditional portfolios. Usingthe first measure, we are confident that time-varying extensions improve upon a staticmodel, that the Fama-French three-factor model is better than single-factor models,and that the time-varying Fama-French model has substantially smaller pricing errorsthan the consumption-based nonlinear models. However, we do not have the sameconfidence in these statements if the second measure is used for comparing modelson conditional portfolios, because time-varying and multi-factor models often providearbitrage opportunities.
Contents
1 Introduction 1
2 The Econometric Framework 52.1 Measuring Pricing Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Computing the Measures of Pricing Errors . . . . . . . . . . . . . . . . . . . . . . . . 82.3 A Simulation-Based Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Empirical Investigations 143.1 The Test Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 The Asset-Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Effects of the Arbitrage-Free Requirement . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Conclusion 26
5 References 27
6 Tables 30
7 Figures 37
1 Introduction
The stochastic discount factor (SDF) of an asset-pricing model should always be positive
if prices assigned by the model never provide arbitrage opportunities (Hansen and Richard
(1987) and Harrison and Kreps (1979)). In this case, the asset-pricing model is said to be
arbitrage-free. The SDF of the nonlinear consumption CAPM is always positive. Most non-
linear consumption-based models, as extensions of the consumption CAPM, are arbitrage-
free. In contrast, the SDF of the CAPM, in which the return on the market portfolio is the
single factor, can be negative and may thereby provide arbitrage opportunities. Multi-factor
and time-varying linear asset-pricing models are not arbitrage-free either, because their SDFs
may often take large negative values. Although a number of authors report multi-factor and
time-varying linear models are successful in explaining security returns, the empirical evalu-
ation of those models in the literature does not require a correct model to be arbitrage-free.
Our objective in this paper is to investigate the effects of the arbitrage-free requirement on
the evaluation and comparison of asset-pricing models.
Why is the arbitrage-free requirement important to the empirical evaluation of asset-
pricing models? Given a collection of test assets with observed data, our effort in fitting the
data with models might lead us to the SDFs that take negative values. A negative SDF allows
for arbitrage, especially when it is used for pricing derivative securities. A model that allows
for arbitrage is a wrong model and an arbitrage is an important pricing error. Hansen and
Jagannathan (1997) introduce two measures of pricing errors. The first measure, denoted
by δ, is the distance from a pre-specified SDF to the set, denoted by M, of all SDFs that
price the test assets correctly. The second measure, denoted by δ+, is the distance from the
pre-specified SDF to the set, denoted by M+, of all positive SDFs that price the test assets
correctly. These two measures are referred to as HJ distances. The second distance requires
a correct model to be arbitrage-free. Obviously, the second HJ distance δ+ is generally larger
than the first HJ distance δ, because M+ is a subset of M. We may underestimate pricing
1
errors if we do not require the correct models to be arbitrage-free.
It is possible that a pre-specified model has a small δ but a large δ+. Figure 1 illustrates
such a case. The pre-specified SDF yIt in the figure is much closer to M than M+, and the
first HJ distance δI is thus much smaller than the second HJ distance δI+. It is also possible
that the two HJ distances are not very different. In Figure 1, another pre-specified SDF yJt
is such an example. Suppose yJt is the SDF of a static single-factor model. Having observed
that this model has large pricing errors, we may introduce a time-varying model that has yIt
as the SDF. When we ignore arbitrage, the pricing error of yIt seems small as measured by its
first HJ distance δI. If yIt is often negative, the pricing error measured by δI
+ could be much
larger. If we require the correct models to be arbitrage-free and use the second HJ distance
to compare models, we might find that the time-varying model yIt is not really much better
than the static single-factor model yJt . Most of the improvement of the time-varying model
is getting closer to M while it does not necessarily get closer to M+. This is the main point
of our paper. The task of the paper is to compare models using the second HJ distance δ+,
as well as the first HJ distance δ.
In the empirical investigation, we find that the arbitrage-free requirement is important
to the empirical evaluation of time-varying linear factor models. Using the first HJ distance,
which does not require the correct models to be arbitrage-free, we are confident that time-
varying extension improves upon static models and that the Fama-French three-factor model
is better than single-factor models. Using the same measure, we are also confident that the
time-varying Fama-French model has substantially smaller pricing errors than consumption-
based nonlinear models. We no longer have the same confidence, however, if we use the
second HJ distance to measure pricing errors, especially on conditional portfolios. The
reason is that the SDFs in time-varying and multi-factor models take large negative values
frequently and thus provide significant arbitrage opportunities. When we use the second HJ
distance to measure pricing errors these time-varying and multi-factor models are far away
from the set of the correct models that are required to be arbitrage-free. If we ignore the
2
arbitrage-free requirement and use the first HJ distance, these time-varying and multi-factor
models seem to fit the data well.
Our work builds on the HJ distances, because they have three advantages for analyzing
specification errors in stochastic discount factor models. First, instead of assuming a pre-
specified model is correct, HJ distances examine how wrong the model is. Since all the
models are approximations, no models proposed in the literature are correct. It is natural
to ask whether a model is better than an alternative. For this objective, pricing errors serve
as a natural and practical criterion for comparing models, and HJ distances measure pricing
errors. Second, HJ distances incorporate conditioning information conveniently. Evaluating
a model using conditional information allows us to examine conditional restrictions of an
SDF. Finally and most importantly, the second HJ distance requires the correct models to
be arbitrage-free. The two distances therefore allow us to examine the importance of the
arbitrage-free requirement.
We introduce a simple and straightforward methodology, which uses the simulation-based
Bayesian analysis developed in the statistics literature. This methodology allows us to start
with an incorrectly specified model and to consider the arbitrage-free requirement. We ob-
tain the posterior distributions of HJ distances, which are convenient for formal comparison
of models. We also obtain the posterior distributions of many nonlinear measures of interest,
for which classic sampling distributions are not available or are hard to derive. The Bayesian
analysis has another advantage over the classic sampling theories, which are mostly asymp-
totic, requiring a long history of data. Bias due to the mismatch between an asymptotic
theory and a finite set of sample has been a great concern in the literature. The Bayesian
analysis we employ in this paper does not require the sample history to be unrealistically
long. Through posterior distributions, uncertainty associated with finite samples is properly
incorporated into the statistical inference of model evaluation and comparison.
The first HJ distance has been applied to a variety of empirical finance problems in the
3
cross-section of securities.1 Most existing empirical analyses using the first HJ distance are
based on the sampling distribution theory developed by Jagannathan and Wang (1996)2.
Although their sampling theory is very useful, all the studies using the theory ignore the
arbitrage-free requirement. The second HJ distance is largely ignored in the literature. An
important reason is that statistical inference of the second HJ distance is difficult, although
Hansen, Heaton, and Luttmer (1995) present a sampling theory that formulates consistency
and the limiting distribution of sample analogs of the HJ distances.
All studies using the sampling theories developed by Jagannathan and Wang (1996) and
Hansen, Heaton, and Luttmer (1995) miss an important advantage of HJ distances, namely,
the HJ distances are designed for comparing models under the assumption that all models
are wrong. The asymptotic sampling theories are useful for testing whether an HJ distance
is zero but not for comparing models, because in the sampling theories a number (e.g., zero)
for the HJ distance has to be assumed in the null hypothesis. All the studies in the literature
hypothesize that a model is correct and then test whether an HJ distance is significantly
different from zero. There has not been any formal statistical inference of model comparison
using HJ distances.
The remainder of the paper is organized as follows. Section 2 presents the econometric
framework. In this section, we review the measures of pricing errors and discuss the com-
putation of HJ distances. Then we introduce the simulation-based Bayesian analysis of HJ
distances. Section 3 presents our empirical investigation. In this section, we describe the
test assets and the asset-pricing models under our examination. Then we examine the effects
of the arbitrage-free requirement and compare models. Section 4 concludes.
1For instance, Jagannathan and Wang (1996) apply the first HJ distance to study the conditional CAPMwith human capital, Buraschi and Jackwerth (2001) apply the first HJ distance to examine option prices,Lettau and Ludvigson (2002) apply the first HJ distance to study the consumption CAPM, conditioningon the consumption-wealth ratio, and Hodrick and Zhang (2001) apply the first HJ distance to show thatall the recently proposed linear asset-pricing models have significant pricing errors. The complete list ofapplications of the first HJ distance is too long to present here.
2Although they presented the sampling theory in the context of linear asset-pricing models, their theoryapplies to nonlinear models with the help of a Taylor’s series expansion, which is often referred to as thedelta method in the econometrics literature.
4
2 The Econometric Framework
2.1 Measuring Pricing Errors
In our empirical investigation, there are n assets, which are referred to as test assets. We
observe returns on the assets and use an n× 1 vector, rt, to denote the asset returns during
period t. We also observe k factors and l state variables. At the end of period t, the
vector of the observed factors is ft, and the vector of the realized state variables is xt. Let
zt = (r′t, f′t, x
′t)
′. We assume zt follows a stationary stochastic process with finite second
moment. An SDF is denoted by mt. We assume mt ∈ L2, where L2 is the space of random
variables with finite second moments. If the SDF is correct, it satisfies:
Et−1[mtrt] = 1n , (1)
where Et−1[·] is the expectation conditioning on the information in period t − 1, and 1n is
an n × 1 vector of 1s.
To incorporate conditional asset pricing restrictions, a common practice is to scale the
returns by some lagged variables before taking the unconditional expectation. In general, we
introduce a matrix, H(zt−1), whose elements are functions of zt−1. Multiplying both sides of
equation (1) by H(zt−1) and taking unconditional expectations, the resulting asset pricing
restriction is
E[mtH(zt−1)rt] = E[H(zt−1)1n] . (2)
The vector, H(zt−1)rt, is referred to as the scaled returns, which can be viewed as payoffs
of conditional portfolios of the test assets. If we normalize each row of H(zt−1) to a vector
of weights that sum up to 1, H(zt−1)rt is the vector of returns on conditional portfolios.
Evaluating a conditional SDF model using conditioning variables is equivalent to evaluating
an unconditional SDF model on conditional portfolios.
If we let H(zt−1) = zt−1 ⊗ In+k+l, then H(zt−1)rt = vec(zt−1 ⊗ rt), which is a common
5
form of scaled returns.3 Suppose the first element of rt is the return on Treasury Bills and
the other elements are stock returns. If the state variable xt is a scaler and we use lagged
observation xt−1 to scale only the stock returns, the matrix H(zt−1) should be set to
Hx =
1 0 0 · · · 00 xt−1 0 · · · 00 0 xt−1 · · · 0...
...... · · · ...
0 0 0 · · · xt−1
. (3)
The matrix H(zt−1) also handles transformations of returns. If we choose H(zt−1) to be
H0 =
1 0 0 · · · 0−1 1 0 · · · 0−1 0 1 · · · 0...
...... · · · ...
−1 0 0 · · · 1
, (4)
then all but the first elements of H(zt−1)rt are excess stock returns over the Treasury Bills.
In this case, we have H(zt−1)1n = H01n = (1, 0, · · · , 0)′. If we are interested in only the
unconditional asset-pricing restriction on the original test assets, we choose H(zt−1) = In.
Let M be the set of SDFs that correctly price the portfolios on average, that is,
M =mt : mt ∈ L2, E[mtH(zt−1)rt] = E[H(zt−1)1n]
. (5)
Although the SDFs in M assign the correct price to the payoffs of the conditional portfolios
of the test assets, they may provide arbitrage opportunities, because M does not require
its members to be nonnegative.4 In an equilibrium model, the SDF should not be negative
because the SDF is the marginal rate of substitution of consumption between today and
3The idea of scaling asset returns by conditioning variables is originated from Hansen and Singleton(1982) and explored by numerous researchers. For a discussion on the efficient use of conditioning variables,see Ferson and Siegel (2001).
4To see how a negative mt provides arbitrage, let vt be the payoff of a security such that vt = 1 if mt < 0and vt = 0 otherwise. This security is a contingent claim and can be considered as a hypothetical derivativesecurity. Using mt to value vt, the price of the security is Et−1[mtvt]. If the conditional probability ofmt < 0 is nonzero, we have Et−1[mtvt] < 0. In this case, the security is an arbitrage because its price isnegative while its payoff is never negative. A necessary and sufficient condition for preventing such arbitrageis mt ≥ 0 with probability 1.
6
tomorrow. Let M+ be the set of nonnegative SDFs that, on average, correctly price the
conditional portfolios:
M+ =mt : mt ∈ L2, mt ≥ 0, E[mtH(zt−1)rt] = E[H(zt−1)1n]
. (6)
We assume that M+ is nonempty. This assumption holds if the observed prices of the
conditional portfolios do not provide arbitrage.
Let yt be the SDF of a pre-specified asset-pricing model in our empirical investigation. In
general, the prices assigned by the pre-specified asset pricing model yt are not consistent with
the observed prices, i.e., yt ∈ M. Hansen and Jagannathan (1997) introduce two measures
of the pricing errors:
δ = minmt∈M
√E[(yt − mt)2] (7)
δ+ = minmt∈M+
√E[(yt − mt)2] . (8)
The measure δ+ requires a correct model to be arbitrage-free while the measure δ does not.
The measures are the least-square distances from the pre-specified SDF to the set of the
SDFs that we consider to be correct. If we consider M to be the set of correct SDFs, we
measure the pricing errors of yt by δ. If we consider M+ to be the set of correct models,
we measure the pricing errors of yt by δ+. The finance literature refers to δ+ and δ as HJ
distances.5 Hansen and Jagannathan show that δ and δ+ are the maximum pricing errors
over normalized payoffs. Obviously, δ ≤ δ+ because M+ ⊂ M. Since δ and δ+ are different
in general, the arbitrage-free requirement affects our measure of pricing errors. To examine
the importance of the arbitrage-free requirement, we can look at the difference δ+ − δ and
the ratio (δ+ − δ)/δ.
In most empirical analyses of asset-pricing models, the pre-specified model often has
unknown parameters. A general form of the pre-specified SDF is yt = g(θ, ft, zt−1). Note
that we allow the SDF in the specified model to depend on lagged variables. The functional
5If yt is a constant discount factor, δ and δ+ become the two volatility bounds derived by Hansen andJagannathan (1991).
7
form g(·, ·, ·) is pre-specified and the vector of parameters, θ, is unknown. The HJ distances
of yt are therefore functions of θ and should be denoted by δ(θ) and δ+(θ). With free
parameters θ, the function g specifies a class of SDFs. The distance from the class of SDFs
to M+ (or M) is defined as the minimum of δ+(θ) (or δ(θ)) over all possible θ, as suggested
by Hansen and Jagannathan (1997). That is, we define δ = minθ δ(θ) and δ+ = minθ δ+(θ).
By θ and θ+, we denote the solutions to the two minimization problems, respectively.
When an SDF yt is specified, we can theoretically assess the arbitrage opportunity by
calculating the probability that yt is negative. Let us denote such a probability by π; i.e.,
π = Probyt < 0. We refer to π as the negativity rate of yt. When a set of SDFs is specified
in the form yt = g(θ, ft, zt−1) with free parameters, the negativity rate of yt depends on θ.
Let us denote it by π(θ). A particularly interesting θ is the one that minimizes δ(θ). Defining
π ≡ π(θ), we refer to π as the negativity rate of yt = g(θ, ft, zt) with free parameters θ. The
negativity rate π indicates the probability for yt to be negative after choosing the parameters
θ to minimize yt’s distance to M.
2.2 Computing the Measures of Pricing Errors
We assume that zt = (rt, ft, xt) follows a VAR process.6 That is,
zt = C + Azt−1 + εt , εt ∼ N(0m, Ω) , (9)
where m = n+k+ l is the dimension of vector zt and Ω is an m×m positive definite matrix.
The noise term εt is independent across time. Using the partitioned vectors and matrices,
the process zt can be expressed as rt
ft
xt
=
C1
C2
C3
+
A11 A12 A13
A21 A22 A23
A31 A32 A33
rt−1
ft−1
xt−1
+
ε1t
ε2t
ε3t
. (10)
6We can also assume a more complicated process like the multivariate GARCH for zt to allow conditionalheteroscedasticity. The VAR process is chosen here because it is relatively simple to explain but still generalenough to allow returns to be predictable, as evidenced by many studies in the literature. We also conductedour analysis using the multivariate GARCH process. Switching from the VAR process to the multivariateGARCH process does not change our empirical results materially.
8
The unconditional distribution of zt is a normal distribution, with mean µ and variance Σ,
which are given as
µ = (Im − A)−1C (11)
vec(Σ) = (Im2 − A ⊗ A)−1vec(Ω) . (12)
Given the unconditional distribution of zt = (r′t, f′t , x
′t)
′, we can derive the unconditional
distribution of vt = (r′t, f′t , z
′t−1)
′, which is necessary for the calculation of HJ distances. The
vector vt is linearly related to zt−1 and εt as follows:
vt = C + Azt−1 + Dεt , (13)
where
C =
C1
C2
C1
C2
C3
, A =
A11 A12 A13
A21 A22 A23
In 0n×k 0n×l
0k×n Ik 0k×l
0l×l 0l×l Il
, D =
In×n 0n×k 0n×l
0k×n Ik×k 0k×l
0n×n 0n×k 0n×l
0k×n 0k×k 0k×l
0l×n 0l×k 0l×l
. (14)
Therefore, the unconditional distribution of vt is normal and the mean and variance are,
respectively,
µ = C + Aµ (15)
Σ = AΣA′ + DΩD′ . (16)
The unknown parameters in the data-generating process (9) are the initial value z0, the
coefficient B = (C, A) in the autoregressive regression, and the variance Ω of the noise term.
Let Ψ = (z′0, vec(B)′, vec(Ω)′)′. It is the vector of parameters in the VAR process of zt. We
have T observations on zt, and the set of observed data is Z = (z1, · · · , zT )′. We treat z0 as
part of the unknown parameters because z0 is not in our observed data Z.
For computing δ, Hansen and Jagannathan (1997) show that the square of the HJ distance
ignoring arbitrage can be written as the weighted average of squared pricing errors. For the
9
pre-specified SDF yt = g(θ, ft, zt−1), we can apply Hansen and Jagannathan’s formula to
obtain
δ2(θ) = E[ytH(zt−1)rt − H(zt−1)1n]′(E[H(zt−1)rtr
′tH(zt−1)
′])−1
E[ytH(zt−1)rt − H(zt−1)1n] . (17)
If g is a linear function, the above formula allows us to calculate δ(θ) analytically for a given
Ψ because we can calculate the expectations in (17) analytically. However, if g is a nonlinear
function, we must calculate the expectations numerically as described later. In order to
calculate the second HJ distance, we can use the following formula, which is also obtained
by applying an equation for δ+ derived by Hansen and Jagannathan:
δ2+(θ) = max
λ∈RnE[y2
t −([yt − λ′H(zt−1)rt]
+)2 − 2λ′H(zt−1)1n
], (18)
where n is the number of rows in H(zt−1), and Rn is the space of n × 1 real vectors. The
function [·]+ is defined as [x]+ = x if x ≥ 0 and [x]+ = 0 if x < 0. In equation (18), we
cannot analytically calculate the expectation for a given Ψ. In addition, we must do the
maximization numerically to obtain δ+(θ).
We can however calculate the expectations approximately from a given Ψ. This allows
us to obtain approximations of HJ distances. Since the unconditional distribution of vt is
normal with mean µ and variance Σ, we can generate independent draws of vt from the
unconditional distribution. Then it is easy to compute HJ distances. Let the independent
draws be
v(j) =
r(j)
f (j)
z(j)
, j = 1, · · · , J . (19)
For a set of given parameters θ and a given function g in the pre-specified model yt =
g(θ, ft, zt−1), we can generate independent draws of y(j) by letting y(j) = g(θ, f (j), z(j)).
We can approximate δ(θ) using the formula
δ2(θ) = EJ [y(j)H(z(j))r(j) − H(z(j))1n]′(EJ [H(z(j))r(j)r(j)′H(z(j))′])−1
EJ [y(j)H(z(j))r(j) − H(z(j))1n] , (20)
10
where EJ [·] is defined as J−1∑Jj=1[·] and J is a large integer. Similarly, we can approximate
δ+(θ) using the formula
δ2+(θ) = max
λ∈RnEJ
[(y(j))2 −
([y(j) − λ′H(z(j))r(j)]+
)2 − 2λ′H(z(j))1n
]. (21)
The convergence of the approximation can be established by the Law of Large Numbers, and
the precision of the approximation can be assessed by applying the Central Limit Theorem.
Note that the approximation can be arbitrarily precise by making J large. The two HJ
distances δ and δ+ can then be obtained by minimizing δ(θ) and δ+(θ) over all the choices of
parameters θ. Using the simulation approach, we can also approximate the negativity rate
using the formula
π = limJ→+∞
EJ
[I[g(θ, f
(j)t , z
(j)t−1)]
], (22)
where I[x] equals 1 if x < 0 and 0 otherwise, and θ minimizes δ(θ).
2.3 A Simulation-Based Bayesian Inference
This section develops the simulation-based Bayesian inference of HJ distances. The basic
idea of our approach is as follows. We assume zt follows a general stochastic process, which
depends on some unknown parameters Ψ. Since this stochastic process should fit the data,
we specify a non-informative prior distribution for Ψ. The likelihood of the data is the
probability, denoted by p(Z|Ψ) of Z conditioning on Ψ. We want to obtain the posterior
distribution of the parameters given Z, denoted by p(Ψ|Z). For a given set of SDFs in
the form of yt = g(θ, ft, zt−1), the distribution of zt conditioning on parameters Ψ should
determine the negativity rates and HJ distances. That is, Ψ determines π, δ, and δ+ for
the given form of SDFs. Therefore, the posterior distribution of Ψ determines the posterior
distributions of π, δ, and δ+. The latter posterior distributions, denoted by p(π|Z), p(δ|Z),
and p(δ+|Z), respectively, are what we need for our analysis. The rest of this subsection
details the idea just described.
11
Imposing no prior opinion on parameters in the data-generating process, we assume the
following standard non-informative prior distribution for Ψ = (z′0, vec(B)′, vec(Ω)′)′ in the
data-generating process (9). The prior distributions of the three parts of Ψ are independent;
i.e.,
p(Ψ) = p(z0) p(B) p(Ω) , (23)
where
p(z0) ∝ constant , p(B) ∝ constant , p(Ω) ∝ |Ω|−(m+1)/2 . (24)
The conditional structure of the posterior distribution is
z0 |B, Ω, Z ∼ N(A−1(z1 − C), A−1ΩA′−1
)(25)
Ω | z0, Z ∼ IW(T Ω(z0), T − 1, m
)(26)
vec(B) |Ω, z0, Z ∼ Truncated N(vec(B(z0)), Ω ⊗ (X(z0)
′X(z0))−1)
, (27)
where IW is the inverted Wishert distribution and the functions B(z0), Ω(z0), and X(z0)
are defined as
X(z0) =((1, z′0), (1, z
′1), · · · , (1, z′T−1)
)′(28)
B(z0) = [X(z0)′X(z0)]
−1X(z0)′Z (29)
Ω(z0) =1
T[Z − B(z0)X(z0)]
′[Z − B(z0)X(z0)] . (30)
The normal distribution of vec(B) is truncated because the norm of eigenvalues of A must
be less than 1 for the VAR to be stationary.
It is analytically difficult to derive the posterior distribution of Ψ = (z′0, vec(B)′, vec(Ω)′)′,
and it is impossible to derive the posterior distribution of the HJ distances δ and δ+. The
Markov Chain Monte Carlo (MCMC) simulation method provides a way to estimate the
posterior distributions numerically. To estimate the posterior distributions of negativity
rates and HJ distances, the MCMC procedure is as follows.
12
1. Start from an arbitrary z(0)0 .
2. For i = 1, · · · , N0 + N , do the following:
(a) Obtain the ith sample of VAR parameters:
• Draw Ω(i) from IW(T Ω(z(i−1)0 ), T − 1, m);
• Draw vec(B(i)) from
truncated N(vec(B(z
(i−1)0 )), Ω(i) ⊗ [X(z
(i−1)0 )′X(z
(i−1)0 )]−1
).
• Draw z(i)0 from N
([A(i)]−1(z1 − C(i)), [A(i)]−1Ω(i)[A(i)′]−1
).
(b) Obtain the ith sample of the unconditional mean and variance of zt:
µ(i) =(Im − A(i)
)−1C(i)
vec(Σ(i)) =(Im2 − A(i) ⊗ A(i)
)−1vec(Ω(i)) .
(c) Obtain the ith sample of the unconditional mean and variance of vt:
µ(i) = C(i) + A(i)µ(i)
Σ(i) = A(i)Σ(i)A(i)′ + DΩ(i)D′ ,
where C(i), A(i), and D are constructed in the same way as in equation (14).
(d) Calculate the ith samples, δ(i), δ(i)+ , and π(i), with the help of equations (20)–(22).
3. Discard the first N0 samples.
4. Approximate the posterior distributions of HJ distances, the negativity rates, and the
model parameters can be approximated by the distribution of the samples δ(i)Ni=1,
δ(i)+ N
i=1, and π(i)Ni=1. For example, the posterior cumulative probability distribution
of δ(i)+ can be estimated by
Prob(δ+ ≤ X) ≈ 1
N
N∑i=1
I[δ(i)+ − X] , (31)
13
where I[·] is defined in equation (22). The posterior mean of δ+ can be approximated
by
E[δ+ |Z] ≈ 1
N
N∑i=1
δ(i)+ . (32)
The standard deviation and median of the posterior distribution of δ+ can be estimated
by their sample analog. The posterior distributions of π, δ, δ+ − δ, and (δ+ − δ)/δ can
be approximated similarly.
The approximation of the posterior distributions is more precise if the number of simulations,
N , is larger. We choose N = 5, 000 for this paper. We discard the first N0 simulations as
the usual MCMC practice to help the distribution of the draws converge to the posterior
distribution. We choose N0 to be 1,000.
Although HJ distances were introduced by Hansen and Jagannathan (1997) for comparing
pricing errors across models, no application in the literature has formal statistical inference
of model comparison in terms of the distances. In fact, it is straightforward to conduct
Bayesian inference of model comparison using HJ distances. For example, let δI+ and δJ
+ be
the second HJ distance for the SDFs yIt and yJ
t , respectively. The question is whether yIt has
substantially smaller pricing errors than yJt . If we think a reduction of pricing errors by 100q
percent is substantial, we are interested in the posterior probability of δI+ < (1− q)δJ
+, which
can be easily calculated using the sample draws from the posterior distributions of δI+ and
δJ+. If the question is whether yI
t is simply better than yJ+, we should set q to zero.
3 Empirical Investigations
3.1 The Test Assets
The asset returns rt considered in our empirical investigation are the monthly return on
Treasury Bills and the monthly returns on nine stock portfolios sorted by firm size and
14
book-to-market ratio. The sample consists of 456 monthly observations from January 1964
to December 2001. The nine stock portfolios are constructed in the same way as in Fama
and French (1993). We construct the excess returns on the above nine stock portfolios by
subtracting the Treasury Bill rate. This is equivalent to choosing H(zt−1) = H0 as in equation
(4). Table 1 presents the summary statistics for the excess returns on nine stock portfolios.
Our nine portfolios exhibit a large cross-sectional variation in average excess returns, which
are very similar to the 25 portfolios constructed by Fama and French. Portfolios with higher
book-to-market ratio have higher average excess returns. We choose nine portfolios rather
than 25 portfolios so we can quickly explore a variety of model specifications with a large
number of simulations in short time. We do try some analysis with 25 portfolios and results
are similar.
To incorporate conditioning information, we consider the yield spread between 30-year
Treasury bonds and 1-month Treasury Bills. This variable is the term premium and has
been used in the literature as a proxy for the changes of risk in the markets. It is shown to
correlate with the business cycle. To ensure stationarity, we filter the variable as in Hodrick
and Prescott (1997). To avoid forward-looking bias, we use only the information up to time
t when we generate the filtered value of the variable at time t. Figure 2 plots the time series
of the filtered term premium. We use the filtered variable as a state variable and refer to it
as TERM.
Other variables have been used as state variables in the literature. A popular state
variable is the default premium, which is the yield spread between Aaa-rated corporate
bonds and Baa-rated corporate bonds. We have repeated all of our analysis using this state
variable and the results are similar to the results obtained by using TERM. Therefore, we do
not report the results for the default premium. Another conditioning variable CAY, which
is the consumption-to-wealth ratio, has been suggested by Lettau and Ludvigson (2002).
We do not use CAY because it is observed quarterly while all our analyses use monthly
observations.
15
Following Hansen and Singleton (1982), we use past realizations of the state variables
to scale the asset returns in order to bring conditioning information into our examination
of pricing equations. For example, we use TERM to scale the excess returns of the stock
portfolios. This is equivalent to choosing H(zt−1) to be HxH0, with xt−1 = TERM, where
Hx and H0 are defined in equations (3) and (4). We can also scale the portfolio returns by
DEF. In this paper, we choose TERM because it exhibits larger variation over time than DEF as
shown in Figure 2. In the rest of the paper, we refer to H(zt−1)rt = H0rt as the non-scaled
asset returns and H(zt−1)rt = HxH0rt as the scaled asset returns.
3.2 The Asset-Pricing Models
To demonstrate the importance of the arbitrage-free requirement, we choose to investigate a
few basic linear and nonlinear models. The linear models we choose are the CAPM, the linear
consumption CAPM, and the Fama-French model, as well as the time-varying extensions of
these models. The nonlinear models are the consumption CAPM, the Abel model, and the
Epstein-Zin model. In the following, we provide the specifications of these models.
The classic asset-pricing model in finance is the CAPM developed by Sharpe (1964). The
SDF of this model is
yCAPMt = b0 + b1rMKT,t , (33)
where rMKT,t is the excess return on the market portfolio, and b0 and b1 are constant para-
meters in the model. The CAPM is often referred to as the unconditional or static CAPM
because it is derived in a single-period setting. To extend it to a multi-period setting, two
versions of the conditional CAPM have been introduced in the literature. The first version
adds the past realization of a state variable as a potential factor, as done in Jagannathan
and Wang (1996), except that they use the default premium as the state variable. The SDF
in this version of the conditional CAPM is
yCAPM+IVt = b0 + b1rMKT,t + c0xt−1 , (34)
16
where xt−1, referred to as the instrument variable (IV), is the past realization of the state
variable. In this paper, we use TERM as the instrument variable x. For convenience, we
denote this model as CAPM+IV. The other version of the conditional CAPM assumes that
b0 and b1 are linear functions of the instrument variable, as suggested by Cochrane (1996).
The SDF of this version of the time-varying CAPM is
yCAPM∗IVt = b0 + b1rMKT,t + c0xt−1 + c1xt−1rMKT,t . (35)
For convenience, we denote this model as CAPM*IV. The CAPM and its time-varying
extensions are not arbitrage-free because their SDFs can be negative.
After noting the large pricing errors of the CAPM on portfolios sorted by firm size and
book-to-market ratio, Fama and French (1993) propose a three-factor model that specifies
the SDF as
yFF3t = b0 + b1rMKT,t + b2rSMB,t + b3rHML,t , (36)
where rSMB,t and rHML,t are the factors constructed by Fama and French to mimic the risks
related to firm size and book-to-market ratio. We refer to this model as FF3. The Fama-
French model has some success, but the pricing errors are still substantial (Fama and French
(1997) and Daniel and Titman (1997)), especially on portfolios incorporating conditioning
information about the business cycle (Hodrick and Zhang (2001)). To improve the FF3, we
consider the following time-varying extension:
yFF3+IVt = b0 + b1rMKT,t + b2rSMB,t + b3rHML,t + c0xt−1 , (37)
which is referred to as FF3+IV in the rest of this paper. Another version of the time-varying
extension of the FF3 is
yFF3∗IVt = b0 + b1rMKT,t + b2rSMB,t + b3rHML,t (38)
+ c0xt−1 + c1xt−1rMKT,t + c2xt−1rSMB,t + b3xt−1rHML,t , (39)
which is referred to as FF3*IV. This type of extension of the Fama-French model is explored
by Kirby (1997). The FF3 and its time-varying extensions are not arbitrage-free.
17
The classic arbitrage-free model is the consumption CAPM, which specifies the SDF as
yCCt = ρ
(Ct
Ct−1
)−γ
, 0 ≤ ρ < 1, γ > 0, (40)
where ρ is the parameter for time preference and γ is the parameter for risk aversion. This
model is referred to as CC and its SDF is positive by construction. Since Mehra and Prescott
(1985), it has been well known that this model has difficulties in explaining equity returns.
Numerous studies look for ways of improving upon the CC. We consider two extensions to
the CC. The first is the model proposed by Abel (1990). The SDF of the model is
yAbelt = ρ
(Ct−1
Ct−2
)η(γ−1) (Ct
Ct−1
)−γ
, 0 ≤ ρ < 1, γ > 0, η ≥ 0, (41)
where η is the parameter for habit persistence. If η > 0 and γ = 1, yAbelt is different from
yCCt because yAbel
t depends on the past growth rate of consumption. The second extension
considered in this paper is the model proposed by Epstein and Zin (1989), which generalizes
the time-additive expected utility function in the CC. In this extension, the SDF is
yEZt = ργ/α
(Ct
Ct−1
)γ α−1α
(1 + rMKT,t)γ−α
α , 0 ≤ ρ < 1, γ > 0, α ≥ 0, (42)
where rMKT,t is the real return on the market portfolio. The parameter α is the intertemporal
rate of substitution. The deviation of parameter α from γ controls the model’s deviation
from the CC. If α = γ, the model is equivalent to the CC. In the rest of this paper, we
refer to the first extension as the Abel model and to the second as the Epstein-Zin model
or simply the EZ model. Like the CC, both models are nonlinear and exclude arbitrage
by construction. We use real per-capita purchases of non-durable goods and services as
the proxy for Ct. We obtain the consumption data from Citibase and divide them by the
monthly estimates of population from the Bureau of the Census to derive per-capita figures.
The finance literature sometimes approximates the CC by using a linear factor model with
the growth rate of consumption as the factor. This model is studied in Breeden, Gibbons and
Litzenberger (1989) and Chen, Roll and Ross (1986). We refer to the linear approximation
18
of the CC as LCC. The SDF of the model is
yLCCt = b0 + b1 ln(Ct/Ct−1) . (43)
Recent literature has also extended the LCC into time-varying models by adding a condi-
tioning variable and its interaction with the growth rate of consumption (see Hodrick and
Zhang (2001) and Lettau and Ludvigson (2002)). The SDFs of these types of models are
yLCC+IVt = b0 + b1 ln(Ct/Ct−1) + c0xt−1 (44)
yLCC∗IVt = b0 + b1 ln(Ct/Ct−1) + c0xt−1 + c1xt−1 ln(Ct/Ct−1) . (45)
As above, we choose xt−1 to be the variable TERM, and we refer to the above two models as
LCC+IV and LCC*IV, respectively. Unlike the nonlinear CC, the linear approximation and
time-varying extensions such as LCC, LCC+IV, and LCC*IV are not arbitrage-free.
3.3 Effects of the Arbitrage-Free Requirement
To examine the effects of the arbitrage-free requirement on the empirical evaluation of the
CAPM, CAPM+IV, and CAPM*IV, the posterior distributions of HJ distances for these
models are presented in Figure 3. For the non-scaled returns, the posterior distributions of
δ and δ+ are almost the same for the CAPM. This is also true for the CAPM+IV. For the
CAPM*IV, there is a slight difference between the posterior distributions of δ and δ+. For
the scaled returns, the difference between the posterior distributions of δ and δ+ is getting
larger when we move from the CAPM to the CAPM+IV, and then to the CAPM*IV. Table
2 reports the mean, median, and standard deviation of the posterior distributions of HJ
distances and their differences. For the non-scaled returns, the posterior means of (δ+−δ)/δ
are only 1% for the CAPM, 2% for the CAPM+IV, and 6% for the CAPM*IV. For the scaled
returns, the posterior means of (δ+ − δ)/δ are 7% for the CAPM, 10% for the CAPM+IV,
and 20% for the CAPM*IV. Therefore, the arbitrage-free requirement does not substantially
affect the evaluation of the CAPM, but it does moderately affect the evaluation of its time-
varying extensions. Since the conditioning variable TERM is volatile over time as shown in
19
Figure 2, it is possible that the SDFs of the time-varying models are negative more frequently
than the SDF of the static CAPM. Table 2 reports the summary statistics for the posterior
distributions of the negativity rate π. The posterior mean of π is higher for the CAPM+IV
and the CAPM*IV than the posterior mean of π for the CAPM.
A concern is whether the concentrated posterior distributions in Figure 3 are results of
data or artifacts of the non-informative prior distribution of Ψ. The posterior distributions
we discussed above contain a large amount of information in the data. To demonstrate
this, we estimate posterior distributions conditioning on observing only the monthly returns
during the last three years. These posterior distributions are presented in Figure 4. With
only 36 observations, the posterior distributions are much more dispersed and thus much less
informative. Unfortunately, we cannot estimate the distributions of HJ distances implied by
the prior distribution of Ψ because the prior is non-informative and improper. However, the
fact that the posterior distributions with 36 observations are far more dispersed indicates
that the distributions of HJ distances implied by the prior are not concentrated on some
finite values. Therefore, the concentrated posterior distributions conditioning on the 456
observations, as in Figure 3, exhibit the information contained in the data. For the CAPM,
the posterior distributions of the two HJ distances in Figure 4 are almost the same for
both the non-scaled and scaled returns. For the CAPM+IV and CAPM*IV, the posterior
distributions of the two HJ distances are also almost the same for the non-scaled returns
but very different for the scaled returns. These are further evidence that the locations of
the posterior distributions are determined by the data and the model but not by the prior
distribution, because all the posterior distributions presented in Figure 4 are derived from
the same prior distribution.
For the LCC, LCC+IV, and LCC*IV, the effects of the arbitrage-free requirement are
larger than the effects for the CAPM, CAPM+IV, and CAPM*IV, respectively. This is
true for both the non-scaled and scaled returns. The posterior distributions of δ and δ+
for the LCC, LCC+IV, and LCC*IV are plotted in Figure 5. The difference between the
20
poster distributions of δ and δ+ is clearly visible. Table 3 reports the posterior means of the
negativity rates for LCC, LCC+IV and LCC*IV, which are above zero in most cases. Table
3 also reports the summary statistics for the posterior distributions of HJ distances. For the
non-scaled returns, the posterior mean of (δ+−δ)/δ is 5% for the LCC, 8% for the LCC+IV,
and 15% for the LCC*IV. For the scaled returns, the posterior mean of (δ+− δ)/δ is 11% for
the LCC, 17% for the LCC+IV, and 24% for the LCC*IV. These numbers are larger than
the numbers for the CAPM, CAPM+IV, and CAPM*IV, respectively.
The arbitrage-free requirement has huge effects on the evaluation of the time-varying ex-
tensions of the FF3. For the FF3 and its time-varying extensions, the posterior distributions
of HJ distances are plotted in Figure 6. The summary statistics for the posterior distribu-
tions are reported in Table 4. For the non-scaled returns, the arbitrage-free requirement has
almost no effects on the evaluation of FF3 and FF3+IV but a large effect on FF3*IV. The
posterior mean and median of the negativity rate for the FF3*IV are both about 29%, and
the posterior mean and median of (δ+ − δ)/δ are 53% and 45%, respectively. For the scaled
returns, the arbitrage-free requirement has significant effects on each of the three models.
In particular, the posterior distributions of δ and δ+ for FF3*IV are drastically different, as
shown in Panel F of Figure 6. The posterior mean and median of (δ+−δ)/δ are, respectively,
127% and 129%!
Although the factors scaled by the instrument variable make the SDF vary over time and
fit the data well, they also make the model provide big or frequent arbitrage opportunities,
because the fluctuation of the instrument variable often drags the SDF to the negative region.
To fit the data, economists usually apply the generalized method of moments (GMM) to
the restriction7 E[g(θ, ft, zt−1)H(zt−1)rt] = H(zt−1)rt. If θ is the GMM estimate of θ, the
GMM estimate of the SDF at t is yt = g(θ, ft, zt−1). For the FF3 and its time-varying
extensions, we plot the GMM estimates of the SDFs in Figure 7. It shows that the SDF
7In the literature, economists often write linear factor models in terms of betas and estimate them usingthe regression method. According to Jagannathan and Wang (2002), applying the regression method to amodel in terms of betas is equivalent to applying the GMM to the restriction in terms of the model’s SDF.
21
of the FF3 estimated for the non-scaled returns is mostly positive while the SDFs of the
FF3*IV, estimated for both the non-scaled and scaled returns, take large negative values
frequently. This is consistent with the large negativity rate of the FF3*IV reported in Table
4.
The arbitrage-free requirement has little effect on the evaluation of the consumption-
based nonlinear models, namely, the CC, the Abel model, and the EZ model. In Figure 8,
for each model the posterior distributions of δ and δ+ are almost the same. In Table 5, the
posterior mean and median of (δ+ − δ)/δ are small for all the three models and for both
the non-scaled and scaled returns. Since these models are arbitrage-free by definition, the
negativity rate is always zero and thus not reported in Table 5.
3.4 Model Comparison
The main purpose of HJ distances is model comparison. Table 6 reports the posterior mean
of the HJ-distance ratios and the posterior probability that the ratios are less than 1− q for
q = 0, 10%, and 20%. All the numbers referenced in this subsection are from Table 6.
Let us first compare the CAPM with its time-varying extensions. The posterior mean
of δCAPM+IV/δCAPM is 0.95 for the non-scaled returns and 0.93 for the scaled returns. The
posterior probability that δCAPM+IV is smaller than δCAPM is 1.00 for both the non-scaled and
scaled returns. Therefore, measured by the first HJ distance, which ignores the arbitrage-
free requirement, we are sure that CAPM+IV improves upon the CAPM. The improvement
seems still clear if measured by the second HJ distance that requires the correct models to be
arbitrage-free. The posterior mean of δCAPM+IV+ /δCAPM
+ is 0.96 for the non-scaled returns and
0.95 for the scaled returns. The posterior probability that δCAPM+IV+ is smaller than δCAPM
+
is 0.97 for the non-scaled returns and 0.93 for the scaled returns. Although it is clear that
the CAPM+IV improves upon the CAPM, the improvement is unlikely to be substantial if
we regard 10% or 20% pricing error reduction as substantial reduction. The probability that
22
δCAPM+IV is smaller by at least 10% than δCAPM is 0.17 for the non-scaled returns and 0.26
for the scaled returns. The probability that δCAPM+IV+ is smaller by at least 10% than δCAPM
+
is 0.13 for the non-scaled returns and 0.15 for the scaled returns. The probability of a 20%
improvement is even lower, as shown by the probabilities for q = 20% reported in Table 6.
The results for the comparison of the CAPM and the CAPM*IV are similar.
Let us next compare the LCC and its time-varying extensions. The posterior mean of
δLCC+IV/δLCC is 0.95 for the non-scaled returns and 0.90 for the scaled returns. The ratio of
δLCC∗IV to δLCC has an even lower posterior mean. The posterior probability that δLCC+IV is
smaller than δLCC is 1.00 for both the non-scaled and scaled returns. The same is true for the
probability that δLCC∗IV is smaller than δLCC. These indicate that time-varying extensions
improve upon the LCC if the first HJ distance is used for the comparison. However, if the
second HJ distance δ+ is used, the results of the comparison are different. The posterior
means of δLCC+IV+ /δLCC
+ and δLCC∗IV+ /δLCC
+ are much closer to 1. The posterior probability that
δLCC+IV+ is smaller than δLCC
+ is 0.76 for the non-scaled returns and 0.75 for the scaled returns.
The probability that δLCC∗IV+ is smaller than δLCC
+ is 0.75 for the non-scaled returns and 0.79
for the scaled returns. Therefore, if we require the correct models to be arbitrage-free, we are
not confident that the time-varying extension improves upon the linear approximation of the
consumption CAPM. Moreover, the improvement of the time-varying extension to the LCC
is unlikely to be substantial. No matter which HJ distance is used, the posterior probability
that the LCC+IV has substantially (namely, at least 10% or 20%) smaller pricing errors
than the LCC is well bellow 0.50 for both the non-scaled and scaled returns. The same is
true for LCC*IV.
The FF3 is originally suggested by Fama and French (1993) to explain the pricing errors
of the CAPM. It is therefore interesting to compare the FF3 with the CAPM. For the non-
scaled returns, the FF3 clearly has smaller pricing errors than the CAPM no matter which
HJ distance is used to measure pricing errors. The posterior mean of δFF3/δCAPM is 0.77,
and the posterior mean of δFF3+ /δCAPM
+ is 0.78. The posterior probability that the FF3 is
23
better than the CAPM is 1.00 when either δ or δ+ is used for the comparison. It is less
certain that the improvement of the FF3 upon the CAPM is substantial. For q = 10%,
the posterior probabilities of δFF3 < (1 − q)δCAPM and δFF3+ < (1 − q)δCAPM
+ are lower than
0.90. For q = 20%, the above two probabilities are lower than 0.60. It is also interesting to
compare the FF3 with the LCC. The results of the comparison between FF3 and LCC are
very similar to the results of the comparison between FF3 and CAPM.
Note that for the non-scaled returns, the above comparison of FF3 with CAPM and LCC
are not affected by the arbitrage-free requirement. This is not true for the scaled returns.
For example, for the scaled returns, the posterior probabilities of δFF3 < δCAPM is 1.00,
but the posterior probability of δFF3+ < δCAPM
+ is only 0.71. The arbitrage-free requirement
significantly reduces the posterior probability that the pricing error of the FF3 is smaller by
at least 10% than the pricing error of CAPM. The same is true if we look at the 20% error
reduction or if we compare the FF3 with the LCC. Therefore, when the models are evaluated
by δ+ for the scaled returns, we are not confident that the FF3 is better or substantially
better than the CAPM or the LCC. A possible reason for this is as follows. Since the non-
scaled returns are sorted by firm size and book-to-market ratio, the SDF with the SMB and
HML factors do not need to take many large negative values to price the returns. However,
the scaled returns are different from the returns on the portfolios sorted by firm size and
book-to-market ratio, the SDF with the SMB and HML factors need to be negative in order
to price the returns well. This can be seen by comparing the GMM estimates of the SDFs
plotted in Panels A and D of Figure 7. When the arbitrage-free requirement is in place, the
SDF of the FF3 is thus far away from being a correct model for the scaled returns.
The FF3*IV is the model that clearly out-performs the CAPM and the LCC for both the
non-scaled and scaled returns and in terms of both HJ distances. The posterior probabilities
of δFF3∗IV < δCAPM, δFF3∗IV < δLCC, δFF3∗IV+ < δCAPM
+ , and δFF3∗IV+ < δLCC
+ are all equal to
1.00 for the non-scaled returns and above 0.96 for the scaled returns. It is also very likely
that the pricing errors of the FF3 is substantially smaller than the CAPM and the LCC,
24
except when the pricing errors are measured by δ+ for the scaled returns. If we set q = 10%,
the posterior probabilities of δFF3∗IV < (1−q)δCAPM and δFF3∗IV < (1−q)δLCC are above 0.99
for the non-scaled and scaled returns. The posterior probabilities of δFF3∗IV+ < (1− q)δCAPM
+
and δFF3∗IV+ < (1 − q)δLCC
+ are above 0.98 for the non-scaled returns but below 0.90 for the
scaled returns. If we set q = 20%, these probability are above 0.92 except the probabilities
of δFF3∗IV+ < (1− q)δCAPM
+ and δFF3∗IV+ < (1− q)δLCC
+ for the scaled returns, which are bellow
0.54. If fact, the FF3*IV is the one that has the smallest pricing errors among the time-
varying and multi-factor models considered in this paper, although we do not present formal
statistical inference of the comparisons.
We present the formal comparison of the FF3*IV with the consumption-based nonlin-
ear models, which are arbitrage-free. The posterior distributions exhibit strong confidence
that the FF3*IV is better than the three nonlinear models. The posterior probabilities of
δFF3∗IV < δCC, δFF3∗IV < δAbel, and δFF3∗IV < δEZ are all above 0.99 for both the non-scaled
and scaled returns. Similar probabilities for δ+ are all above 0.97 for the non-scaled returns
and around 0.90 for the scaled returns. The inference of the magnitude of the difference
between the pricing errors of the FF3 and a consumption-based non-linear model, however,
depends on the arbitrage-free requirement. If the arbitrage-free requirement is ignored, we
are confident that FF3*IV is substantially better than the consumption-based nonlinear
models, but this confidence may disappear if we require the correct models to be arbitrage-
free. The posterior probability that by at least 20% δFF3∗IV is smaller than δCC is above 0.95
for both the non-scaled and scaled returns. This is true if we replace the CC by the Able or
EZ model, but no longer true if we replace δ by δ+. For q = 20%, the posterior probabilities
of δFF3+ < (1 − q)δCC
+ , δFF3+ < (1 − q)δAbel
+ , and δFF3+ < (1 − q)δEZ
+ are all lower than 0.90 for
the non-scaled returns and 0.50 for the scaled returns. If we set q = 10%, these probabilities
are bellow 0.80 for the scaled returns.
25
4 Conclusion
A good asset-pricing model should satisfy three criteria. First, it should have small pric-
ing errors. Second, it should price conditionally managed portfolios. Third, it should be
arbitrage-free. In this paper, we emphasize the third criterion, which is mostly ignored in
the literature. The third criterion will reduce pricing errors of an estimated model while be-
ing used for pricing securities that are not included in the test assets and is indispensable if
the estimated model is to be used for pricing derivative securities. We introduce a straight-
forward methodology for analyzing two HJ distances, which allow us to either ignore or
incorporate the arbitrage-free requirement when measuring pricing errors. Our methodology
provides formal statistical inference of the model comparison based on HJ distances.
To demonstrate the importance of the arbitrage-free requirement to the empirical evalua-
tion of asset-pricing models, we focus on the comparison of static models to their time-varying
extensions, the comparison of single-factor models to multi-factor extensions, and the com-
parison of the linear factor models to consumption-based non-linear models. Although the
time-varying and multi-factor models are often successful in explaining asset returns, they
are not arbitrage-free. In contrast, the static single-factor linear models are not successful but
their SDFs are mostly positive. The consumption-based nonlinear models are not successful
but arbitrage-free. Using the first HJ distance, which ignores the arbitrage-free requirement,
we are confident that time-varying extension improves upon static models, that the Fama-
French three-factor model is better than single-factor models, and that the time-varying
Fama-French model has substantially smaller pricing errors than the consumption-based
nonlinear models. However, the confidence disappear, especially for conditional portfolios,
if we use the second HJ distance, which requires the correct models to be arbitrage-free.
Therefore, arbitrage is particularly important to the empirical evaluation of time-varying
linear factor models.
Many other important asset-pricing models should be evaluated under the arbitrage-
26
free requirement but are not considered in this paper. The model proposed by Chen,
Roll and Ross (1986) includes many economic factors. The model proposed by Bansal
and Viswanathan (1993) specifies the SDF as a polynomial function of economic variables.
The model with the momentum factor is suggested by Carhart (1997) based on the findings
of Jegadeesh and Titman (1993). The model proposed by Jagannathan and Wang is sim-
ilar to the CAPM+IV but has the labor-income growth rate as an additional factor. The
model proposed by Lettau and Ludvigson (2002) resembles the LCC*IV but uses a different
conditioning variable, which measures the consumption-wealth ratio quarterly. The model
proposed by Campbell and Cochrane (1999) is an extension to the Abel and EZ models.
Motivated by the work of Huberman and Halka (2001), the model proposed by Pastor and
Stambaugh (2003) introduces the factor of the systematic liquidity. Observing that volatil-
ity is priced in asset returns, Ang, Hodrick, Xing and Zhang (2003) construct a model that
contains a volatility factor. None of these models, except the one proposed by Campbell and
Cochrane, is arbitrage-free. Using HJ distances, especially the second distance, to compare
these models will be interesting. Such comparison deserves a separate study, because many
special issues related to these models must be addressed and the special data must be used
for these models.
5 References
Ang, Andrew, Robert Hodrick, Yuhang Xing, and Xiaoyan Zhang, 2003, The Cross-Sectionof Volatility and Expected Returns, Working paper, Columbia University.
Abel, Andrew B., 1990, Asset prices under habit formation and catching up with the Joneses,American Economic Review 80, 38–42.
Bansal, Ravi, and S. Viswanathan, 1993, No arbitrage and arbitrage pricing: A new ap-proach, Journal of Finance 48, 1231–62.
Breeden, Douglas T., 1979, An intertemporal asset pricing model with stochastic consump-tion and investment opportunities, Journal of Financial Economics 7, 265–296.
Breeden, Douglas T., Michael R. Gibbons, and Robert H. Litzenberger, 1989, Empiricaltests of the consumption-oriented CAPM, Journal of Finance 44, 231–262
27
Buraschi, Andrea, and Jens Carsten Jackwerth, 2001, The price of a smile: Hedging andspanning in option markets, Review of Financial Studies 14, 495–527.
Campbell, John Y., and John H. Cochrane, 1999, By force of habit: A consumption-basedexplanation of aggregate stock market behavior, Journal of Political Economy 107, 205–251.
Carhart, Mark M., 1997, On persistence in mutual fund performance, Journal of Finance 52,57-82.
Chen, Nai-Fu, Richard Roll, and Stephen A. Ross, 1986, Economic forces and the stockmarket, Journal of Business 59, 383-403.
Cochrane, John H., 1996, A cross-sectional test of an investment-based asset pricing model,Journal of Political Economy 104, 572–621.
Daniel, Kent, and Sheridan Titman, 1997, Evidence on the characteristics of cross-sectionalvariation in stock returns, Journal of Finance 52, 1–33.
Epstein, Larry G., and Stanley E. Zin, 1989, Substitution, risk aversion, and the temporalbehavior of consumption and asset returns: A theoretical framework, Econometrica 57, 937–969.
Fama, Eugene F., and Kenneth R. French, 1993, Common risk factors in the returns onstocks and bonds, Journal of Financial Economics 33, 3–56.
Fama, Eugene, and Kenneth R. French, 1997, Industry costs of equity, Journal of FinancialEconomics 43, 153–193.
Ferson, Wayne E., and Andrew F. Siegel, 2001, The efficient use of conditioning informationin portfolios, Journal of Finance 56, 967–982
Hansen, Lars Peter, John Heaton, and Erzo G. J. Luttmer, 1995, Econometric evaluation ofasset pricing models, Review of Financial Studies 8, 237–274.
Hansen, Lars Peter, and Ravi Jagannathan, 1991, Implications of security market data formodels of dynamic economies, Journal of Political Economy 99, 225–262.
Hansen, Lars Peter, and Ravi Jagannathan, 1997, Assessing specification errors in stochasticdiscount factor models, Journal of Finance 52, 557–590.
Hansen, Lars Peter, and Scott F. Richard, 1987, The role of conditioning information indeducing testable restrictions implied by dynamic asset pricing models, Econometrica 55,587–613.
Hansen, Lars Peter, and Kenneth Singleton, 1982, Generalized instrumental variable esti-mation of nonlinear rational expectation models, Econometrica 50, 1269–1286.
Harrison, J. Michael, and David M. Kreps, 1979, Martingales and arbitrage in multiperiod
28
securities markets, Journal of Economic Theory 20, 381–408.
Hodrick, Robert J., and Edward C. Prescott, 1997, Postwar U.S. business cycles: An empir-ical investigation, Journal of Money, Credit and Banking 29, 1–16.
Hodrick, Robert J., and Xiaoyan Zhang, 2001, Evaluating the specification errors of asset-pricing models, Journal of Financial Economics 62, 327–376.
Huberman, Gur, and Dominika Halka, 2001, Sytematic liquidity, Journal of Financial Re-search 24, 161-178.
Jagannathan, Ravi, and Zhenyu Wang, 1996, The conditional CAPM and the cross-sectionof expected returns, Journal of Finance 51, 3–53.
Jagannathan, Ravi, and Zhenyu Wang, 2002, Empirical evaluation of assetpricing models:A comparison of the SDF and beta methods, Journal of Finance 57, 2337–2367.
Jegadeesh, Narasimhan, and Sheridan Titman, 1993, Returns to buying winners and sellinglosers: Implications for stock market efficiency, Journal of Finance 48, 65-91.
Kirby, Chris, 1997, Measuring the predictable variation in stock and bond returns, Reviewof Financial Studies 10, 579–630
Lettau, Martin, and Sydney Ludvigson, 2002, Resurrecting the (C)CAPM: A cross-sectionaltest when risk premia are time-varying, Journal of Political Economy 109, 1238–1287.
Mehra, Rajnish, and Edward C. Prescott, 1985, The equity premium: a puzzle, Journal ofMonetary Economics 15, 145–162.
Pastor, Lubos, and Robert F. Stambaugh, 2003, Liquidity risk and expected stock returns,Journal of Political Economy 111, 642-685.
Sharpe, William F., 1964, Capital asset prices: A theory of market equilibrium under con-ditions of risk, Journal of Finance 19, 425–442.
29
6 Tables
Table 1: Summary Statistics for the Stock Portfolios
The reported results are the means and standard deviations of the monthly excess returns(in percent) on size and book-to-market (B/M) portfolios. The data are from CRSP andCompustat, and the sample period is 1964-2001. Individual firms are sorted into three sizeand B/M groups independently, and the portfolios are constructed as intersections of sizeand B/M groups, as in Fama and French (1993). The ranks are in ascending orders.
Rank by Size Rank by B/M Mean Standard Deviation1 1 0.29 0.331 2 0.62 0.281 3 0.86 0.272 1 0.52 0.292 2 0.61 0.252 3 0.94 0.233 1 0.47 0.233 2 0.44 0.213 3 0.78 0.20
30
Table 2: The CAPM and Its Time-Varying Extensions
This table reports the summary statistics for the estimated posterior distributions of π, δ,δ+, δ+ − δ, and (δ+ − δ)/δ for the CAPM and its time-varying extensions. In Panel A, assetreturns are not scaled by any conditioning variables. In Panels B, asset returns are scaledby the variable TERM.
Panel A: For the Non-Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe CAPM:mean 0.000 0.316 0.320 0.004 1%median 0.000 0.315 0.317 0.000 0%stdev 0.000 0.064 0.065 0.013The CAPM+IV:mean 0.036 0.300 0.306 0.006 2%median 0.002 0.298 0.303 0.001 0%stdev 0.062 0.065 0.065 0.015The CAPM*IV:mean 0.135 0.276 0.294 0.018 6%median 0.131 0.275 0.291 0.007 2%stdev 0.088 0.067 0.066 0.027
Panel B: For the Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe CAPM:mean 0.075 0.223 0.239 0.015 7%median 0.044 0.222 0.236 0.009 4%stdev 0.084 0.051 0.054 0.018The CAPM+IV:mean 0.090 0.207 0.228 0.021 10%median 0.055 0.206 0.225 0.015 7%stdev 0.095 0.051 0.054 0.022The CAPM*IV:mean 0.105 0.182 0.218 0.036 20%median 0.082 0.182 0.216 0.028 15%stdev 0.090 0.049 0.053 0.031
31
Table 3: The LCC and Its Time-Varying Extensions
This table reports the summary statistics for the estimated posterior distributions of π, δ,δ+, δ+ − δ, and (δ+ − δ)/δ for the LCC and its time-varying extensions. In Panel A, assetreturns are not scaled by any conditioning variables. In Panel B, asset returns are scaled bythe variable TERM.
Panel A: For the Non-Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe LCC:mean 0.028 0.319 0.334 0.014 5%median 0.004 0.317 0.332 0.002 1%stdev 0.045 0.066 0.068 0.026The LCC+IV:mean 0.074 0.302 0.327 0.025 8%median 0.047 0.300 0.323 0.009 3%stdev 0.080 0.066 0.069 0.035The LCC*IV:mean 0.164 0.279 0.322 0.043 15%median 0.163 0.277 0.317 0.025 9%stdev 0.089 0.068 0.072 0.046
Panel B: For the Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe LCC:mean 0.088 0.233 0.259 0.026 11%median 0.039 0.232 0.257 0.016 7%stdev 0.103 0.052 0.060 0.029The LCC+IV:mean 0.114 0.209 0.246 0.036 17%median 0.085 0.208 0.242 0.027 13%stdev 0.109 0.051 0.058 0.034The LCC*IV:mean 0.138 0.195 0.242 0.047 24%median 0.117 0.193 0.238 0.037 19%stdev 0.110 0.053 0.059 0.038
32
Table 4: The FF3 and Its Time-Varying Extensions
This table reports the summary statistics for the estimated posterior distributions of π, δ,δ+, δ+ − δ, and (δ+ − δ)/δ for the FF3 and its time-varying extensions. In Panel A, assetreturns are not scaled by any conditioning variables. In Panel B, asset returns are scaled bythe variable TERM.
Panel A: For the Non-Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe FF3:mean 0.001 0.243 0.248 0.005 2%median 0.000 0.242 0.246 0.000 0%stdev 0.002 0.060 0.060 0.017The FF3+IV:mean 0.056 0.215 0.226 0.011 5%median 0.017 0.213 0.223 0.004 2%stdev 0.077 0.060 0.061 0.020The FF3*IV:mean 0.287 0.124 0.189 0.065 53%median 0.286 0.117 0.185 0.052 45%stdev 0.083 0.062 0.065 0.052
Panel B: For the Scaled Asset Returns
π δ δ+ δ+ − δ (δ+ − δ)/δThe FF3:mean 0.187 0.188 0.232 0.044 23%median 0.189 0.186 0.229 0.035 19%stdev 0.103 0.052 0.055 0.034The FF3+IV:mean 0.218 0.168 0.223 0.055 33%median 0.226 0.166 0.222 0.047 28%stdev 0.107 0.052 0.056 0.038The FF3*IV:mean 0.281 0.088 0.200 0.112 127%median 0.291 0.083 0.197 0.107 129%stdev 0.104 0.046 0.052 0.053
33
Table 5: The Consumption-Based Nonlinear Models
This table reports the summary statistics for the estimated posterior distributions of δ, δ+,δ+−δ, and (δ+−δ)/δ for the consumption-based nonlinear models. In Panel A, asset returnsare not scaled by any conditioning variables. In Panel B, asset returns are scaled by thevariable TERM.
Panel A: For the Non-Scaled Asset Returns
δ δ+ δ+ − δ (δ+ − δ)/δThe CC:mean 0.300 0.311 0.011 4%median 0.297 0.308 0.004 1%stdev 0.064 0.067 0.021The Abel Model:mean 0.289 0.299 0.011 4%median 0.287 0.297 0.004 4%stdev 0.066 0.069 0.020The EZ Model:mean 0.303 0.310 0.007 2%median 0.302 0.307 0.001 2%stdev 0.063 0.065 0.015
Panel B: For the Scaled Asset Returns
δ δ+ δ+ − δ (δ+ − δ)/δThe CC:mean 0.237 0.244 0.007 3%median 0.235 0.240 0.003 1%stdev 0.052 0.056 0.012The Abel Model:mean 0.223 0.237 0.015 7%median 0.221 0.235 0.007 6%stdev 0.053 0.058 0.020The EZ Model:mean 0.219 0.233 0.014 6%median 0.217 0.231 0.008 6%stdev 0.053 0.056 0.019
34
Table 6: Model Comparison
This table compares multi-factor and time-varying models with single-factor and nonlinearmodels. In Panel A, each number in the first part is the posterior mean of the ratio of δ formodel I to δ for model J, where model I is listed in the first column and model J is listedin the second row. Each number in the second part is the posterior probability of δI < δJ.Each number in the last two parts is the posterior probability that δ for model I is smallerthan δ for model J by at least 100q percent. Panel B is similar to Panel A except using δ+.
Panel A: Comparing Models Using the First HJ DistanceNon-Scaled Returns Scaled Returns
Model J: CAPM LCC CC Abel EZ CAPM LCC CC Abel EZModel I
Posterior Means of δI/δJ
CAPM+IV 0.95 0.95 1.01 1.06 0.99 0.93 0.90 0.88 0.95 0.96CAPM*IV 0.87 0.87 0.93 0.98 0.92 0.82 0.80 0.78 0.84 0.85LCC+IV 0.96 0.95 1.01 1.06 1.00 0.95 0.90 0.89 0.96 0.97LCC*IV 0.89 0.87 0.93 0.98 0.93 0.88 0.84 0.83 0.89 0.91FF3 0.77 0.77 0.82 0.86 0.81 0.84 0.82 0.80 0.86 0.87FF3+IV 0.68 0.68 0.73 0.76 0.72 0.75 0.73 0.72 0.77 0.78FF3*IV 0.39 0.39 0.42 0.44 0.41 0.40 0.39 0.38 0.41 0.41
Posterior Probabilities of δI < δJ
CAPM+IV 1.00 0.71 0.54 0.44 0.61 1.00 0.79 0.87 0.69 0.68CAPM*IV 1.00 0.84 0.71 0.61 0.79 1.00 0.91 0.94 0.83 0.83LCC+IV 0.67 1.00 0.51 0.39 0.50 0.67 1.00 0.87 0.66 0.61LCC*IV 0.82 1.00 0.70 0.59 0.69 0.79 1.00 0.93 0.78 0.75FF3 1.00 0.95 0.88 0.82 0.92 1.00 0.89 0.94 0.81 0.82FF3+IV 1.00 0.97 0.95 0.91 0.97 0.99 0.95 0.97 0.89 0.90FF3*IV 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.99 0.99
Posterior Probabilities of δI < (1 − q)δJ for q = 10%CAPM+IV 0.17 0.31 0.18 0.15 0.15 0.26 0.45 0.49 0.39 0.32CAPM*IV 0.47 0.54 0.40 0.34 0.40 0.65 0.71 0.75 0.64 0.61LCC+IV 0.28 0.18 0.13 0.10 0.21 0.34 0.35 0.45 0.34 0.31LCC*IV 0.50 0.47 0.36 0.30 0.41 0.53 0.57 0.65 0.53 0.48FF3 0.86 0.83 0.72 0.63 0.75 0.59 0.67 0.72 0.60 0.58FF3+IV 0.95 0.91 0.85 0.80 0.88 0.81 0.82 0.86 0.75 0.75FF3*IV 1.00 0.99 0.99 0.98 0.99 1.00 0.99 1.00 0.98 0.99
Posterior Probabilities of δI < (1 − q)δJ for q = 20%CAPM+IV 0.05 0.12 0.05 0.04 0.04 0.10 0.24 0.26 0.19 0.15CAPM*IV 0.22 0.29 0.19 0.16 0.18 0.36 0.48 0.51 0.42 0.38LCC+IV 0.11 0.05 0.04 0.03 0.08 0.16 0.17 0.22 0.17 0.14LCC*IV 0.26 0.21 0.16 0.14 0.21 0.28 0.31 0.37 0.29 0.26FF3 0.55 0.57 0.45 0.39 0.47 0.31 0.43 0.46 0.38 0.33FF3+IV 0.77 0.77 0.68 0.61 0.70 0.55 0.61 0.65 0.55 0.53FF3*IV 0.99 0.98 0.97 0.95 0.98 0.98 0.97 0.98 0.96 0.96
35
Panel B: Comparing Models Using the Second HJ DistanceNon-Scaled Returns Scaled Returns
Model J: CAPM LCC CC Abel EZ CAPM LCC CC Abel EZModel I
Posterior Means of δI+/δJ
+
CAPM+IV 0.96 0.92 1.00 1.04 0.99 0.95 0.89 0.94 0.97 0.99CAPM*IV 0.92 0.89 0.96 1.00 0.95 0.91 0.85 0.90 0.93 0.94LCC+IV 1.02 0.98 1.06 1.10 1.06 1.03 0.96 1.01 1.05 1.07LCC*IV 1.01 0.97 1.04 1.09 1.04 1.01 0.94 1.00 1.03 1.05FF3 0.78 0.75 0.81 0.84 0.81 0.97 0.91 0.96 0.99 1.00FF3+IV 0.71 0.68 0.74 0.77 0.73 0.93 0.87 0.92 0.95 0.97FF3*IV 0.59 0.57 0.61 0.64 0.61 0.84 0.78 0.82 0.85 0.86
Posterior Probabilities of δI+ < δJ
+
CAPM+IV 0.97 0.83 0.59 0.48 0.61 0.93 0.90 0.83 0.71 0.68CAPM*IV 0.98 0.91 0.71 0.59 0.77 0.98 0.96 0.90 0.80 0.80LCC+IV 0.47 0.76 0.37 0.26 0.32 0.48 0.75 0.60 0.50 0.42LCC*IV 0.54 0.75 0.45 0.34 0.41 0.56 0.79 0.65 0.55 0.49FF3 1.00 0.99 0.91 0.85 0.95 0.71 0.84 0.75 0.64 0.59FF3+IV 1.00 0.99 0.95 0.92 0.98 0.82 0.90 0.83 0.73 0.72FF3*IV 1.00 1.00 0.98 0.97 0.99 0.96 0.98 0.95 0.89 0.90
Posterior Probabilities of δI+ < (1 − q)δJ
+ for q = 10%CAPM+IV 0.13 0.35 0.20 0.15 0.11 0.15 0.44 0.29 0.25 0.16CAPM*IV 0.33 0.51 0.33 0.26 0.26 0.34 0.61 0.47 0.41 0.32LCC+IV 0.08 0.09 0.05 0.03 0.06 0.10 0.24 0.15 0.13 0.09LCC*IV 0.16 0.19 0.11 0.08 0.11 0.14 0.31 0.20 0.17 0.13FF3 0.85 0.90 0.76 0.68 0.76 0.11 0.39 0.24 0.20 0.11FF3+IV 0.93 0.96 0.86 0.80 0.88 0.27 0.53 0.39 0.34 0.25FF3*IV 0.98 0.99 0.95 0.92 0.97 0.74 0.85 0.78 0.70 0.67
Posterior Probabilities of δI+ < (1 − q)δJ
+ for q = 20%CAPM+IV 0.02 0.10 0.04 0.03 0.02 0.03 0.19 0.09 0.08 0.03CAPM*IV 0.08 0.18 0.10 0.08 0.06 0.07 0.28 0.15 0.13 0.07LCC+IV 0.01 0.01 0.01 0.01 0.01 0.02 0.10 0.04 0.03 0.02LCC*IV 0.03 0.04 0.02 0.02 0.02 0.03 0.13 0.05 0.05 0.03FF3 0.54 0.63 0.49 0.41 0.47 0.02 0.16 0.06 0.06 0.02FF3+IV 0.74 0.80 0.67 0.60 0.67 0.07 0.24 0.13 0.11 0.06FF3*IV 0.92 0.95 0.87 0.82 0.88 0.35 0.54 0.42 0.37 0.30
36
7 Figures
Figure 1: A Graphic Illustration of HJ Distances
In the illustration, there are two pre-specified stochastic discount factors (SDF), yIt and yJ
t .The line labeled M represents the set of SDFs that correctly price a given set of assets. Thehalf-line labeled M+ represents the set of non-negative SDFs that correctly price the givenset of assets. For yI
t, the first HJ distance is the shortest distance between yIt and M, and
is denoted δI. The second HJ distance is the shortest distance between yIt and M+, and is
denoted by δI+. The HJ distances for yJ
t are defined similarly and denoted by δJ and δJ+.
︸ ︷︷ ︸M+︸ ︷︷ ︸
M
yIt
δI
δI+
yJt
δJ
δJ+
37
Figure 2: Time Series of Conditioning Variables
The conditioning variable, TERM, is plotted in the graph. The variable TERM is the HP-filteredterm spread, the difference in yields between 30-year government bonds and 1-month T-bills.The data series are observed at the end of each month from January 1964 to December 2001.
-2%
-1%
0%
1%
2%
3%
Jan-1
964
Jan-1
966
Jan-1
968
Jan-1
970
Jan-1
972
Jan-1
974
Jan-1
976
Jan-1
978
Jan-1
980
Jan-1
982
Jan-1
984
Jan-1
986
Jan-1
988
Jan-1
990
Jan-1
992
Jan-1
994
Jan-1
996
Jan-1
998
Jan-2
000
Time
Per
cen
t
38
Figure 3: The CAPM and Its Time-Varying Extensions
For the CAPM and its time-varying extensions, this figure presents the estimated posteriordistributions of HJ distances. The gray curves are the estimated posterior probability densityfunctions of δ. The black curves are the estimated posterior probability density functionsof δ+. In the panels on the left column, asset returns are not scaled by any conditioningvariables. In the panels on the right column, asset returns are scaled by the variable TERM.
Panel A: CAPM, Non-Scaled Panel D: CAPM, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel B: CAPM+IV, Non-Scaled Panel E: CAPM+IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel C: CAPM*IV, Non-Scaled Panel F: CAPM*IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
39
Figure 4: The Small-Sample Posterior Distributions of HJ Distances
For the CAPM and its time-varying extensions, this figure presents the estimated posteriordistributions of HJ distances, conditioning on a small number observations. The gray curvesare the estimated posterior probability density functions of δ. The black curves are theestimated posterior probability density functions of δ+. In Panels A and B, asset returns arenot scaled by any conditioning variables. In Panels C and D, asset returns are scaled by thevariable TERM.
Panel A: CAPM, Non-Scaled Panel D: CAPM, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel B: CAPM+IV, Non-Scaled Panel E: CAPM+IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel C: CAPM*IV, Non-Scaled Panel F: CAPM*IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
40
Figure 5: The LCC and Its Time-Varying Extensions
For the LCC and its time-varying extensions, this figure presents the estimated posteriordistributions of HJ distances. The gray curves are the estimated posterior probability densityfunctions of δ. The black curves are the estimated posterior probability density functionsof δ+. In the panels on the left column, asset returns are not scaled by any conditioningvariables. In the panels on the right column, asset returns are scaled by the variable TERM.
Panel A: LCC, Non-Scaled Panel D: LCC, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel B: LCC+IV, Non-Scaled Panel E: LCC+IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel C: LCC*IV, Non-Scaled Panel F: LCC*IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
41
Figure 6: The FF3 and Its Time-Varying Extensions
For the Fama-French model and its time-varying extensions, this figure presents the esti-mated posterior distributions of HJ distances. The gray curves are the estimated posteriorprobability density functions of δ. The black curves are the estimated posterior probabilitydensity functions of δ+. In panels on the left column, asset returns are not scaled by anyconditioning variables. In the panels on the right column, asset returns are scaled by thevariable TERM.
Panel A: FF3, Non-Scaled Panel D: FF3, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel B: FF3+IV, Non-Scaled Panel E: FF3+IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel C: FF3*IV, Non-Scaled Panel F: FF3*IV, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
42
Figure 7: The GMM Estimates of the FF3 and Its Time-Varying Extensions
For the Fama-French model and its time-varying extensions, the GMM estimates of the SDFsare plotted over the time from January 1964 to December 2001. In panels on the left column,the asset returns used for the GMM estimation are not scaled by any conditioning variables.In the panels on the right column, the asset returns are scaled by the variable TERM.
Panel A: FF3, Non-Scaled Panel D: FF3, Scaled
-2.0
0.0
2.0
4.0
6.0
Time
Th
e E
stim
ated
SD
F
-2.0
0.0
2.0
4.0
6.0
TimeT
he
Est
imat
ed S
DF
Panel B: FF3+IV, Non-Scaled Panel E: FF3+IV, Scaled
-2.0
0.0
2.0
4.0
6.0
Time
Th
e E
stim
ated
SD
F
-2.0
0.0
2.0
4.0
6.0
Time
Th
e E
stim
ated
SD
F
Panel C: FF3*IV, Non-Scaled Panel F: FF3*IV, Scaled
-2.0
0.0
2.0
4.0
6.0
Time
Th
e E
stim
ated
SD
F
-2.0
0.0
2.0
4.0
6.0
Time
Th
e E
stim
ated
SD
F
43
Figure 8: The Consumption-Based Nonlinear Models
For the consumption-based nonlinear models, this figure presents the estimated posteriordistributions of HJ distances. The gray curves are the estimated posterior probability densityfunctions of δ. The black curves are the estimated posterior probability density functionsof δ+. In the panels on the left column, asset returns are not scaled by any conditioningvariables. In the panels on the right column, asset returns are scaled by the variable TERM.
Panel A: CC, Non-Scaled Panel D: CC, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel B: Abel, Non-Scaled Panel E: Abel, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
Panel C: EZ, Non-Scaled Panel F: EZ, Scaled
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
0.0
0.1
0.2
0.3
0.4
0.0 0.2 0.4 0.6 0.8 1.0
HJ Distances
Po
ster
ior
Pro
bab
ilit
y D
ensi
ty
44